In the health sector too, the quantity of numbers collected is important. But they must also have all the characteristics necessary to be able to use them and produce reliable results

In the age of the Internet, immense amounts of data are collected every day that concern us in every respect: social relations, economic activities, health information, habits, consumption, tastes, purchases are constantly stored and used for the most varied purposes. The availability of these Big Data has long opened new perspectives for the advancement of knowledge in the biomedical sector. Clearly, the availability of data alone is not enough, but it must be coupled with the use of the most advanced information technologies. Electronic health records, hospital medical records, biobanks, biomedical image databases allow you to conduct sophisticated and complex data analyzes to support prevention, diagnosis and treatment processes, using Machine Learning and Artificial Intelligence approaches. Today we know how to develop software capable of supporting the doctor’s oncological diagnosis on the basis of images, of providing predictive models of the evolution of a given pathology, of highlighting possible correlations between gene expressions and pathologies. It should be noted, however, that the effectiveness and reliability of these approaches is linked to both the quantity and quality of the available data. Apart from the issues relating to the respect of privacy and consent on sensitive data, therefore, the use of Big Data in the biomedical sector requires particular attention also for technical reasons. The artificial intelligence methods used today, in particular Deep Learning, are based on supervised learning, which consists in correctly classifying the input data and respective output values, so that the system that processes them can learn from the examples provided. The data must therefore represent as fully as possible the spectrum of cases that could arise. Furthermore, they must be numerically distributed evenly among the typologies to which they refer. The system must be trained so that it does not provide reliability indices conditioned by the particular case analyzed (Bias Problem). Finally, the quantity of data must be numerically consistent with the problem to be solved and with the analysis technique adopted. Big Data represents a huge opportunity for progress in the biomedical field. However, it is necessary that the databases used have all the necessary characteristics to be able to use them and produce reliable, correct and replicable, therefore applicable, results. And this is the reason why the synergy between those who work at a clinical and research level in the biomedical field and those who work in the acquisition and processing of information is increasingly essential.