
Data integration and analysis for medical systems biology
Author(s) -
van Beek Johannes H. G. M.
Publication year - 2004
Publication title -
comparative and functional genomics
Language(s) - English
Resource type - Journals
eISSN - 1532-6268
pISSN - 1531-6912
DOI - 10.1002/cfg.385
Subject(s) - computer science , data science , computational biology , systems biology , data mining , biology
It is like listening to a stewardess in a jet airliner who is explaining the safety measures: you have heard 1000 times before that the human genome has been sequenced and that a flood of data is coming over us. The question is how the massively parallel measurements of large numbers of genes, messenger RNAs, proteins and metabolites are going to help us in prognosis and diagnosis of common human diseases. Is it a manageable problem to explain the behaviour of thousands of biomolecules from our knowledge of the molecular interactions in the cells of the human body? Can we infer from the large molecular datasets how the molecular pathways are organized and interact? It has been argued that the life sciences are developing into a discoveryand data-driven science, with less emphasis on the hypothesis-driven experimental cycle. However, reasoning from experimentally determined facts to a well-founded theory of the underlying system is problematic. In his book on the structure of scientific revolutions, Kuhn [2] wrote, ‘But though this sort of fact-collecting has been essential to the origin of many significant sciences, anyone . . . will discover that it produces a morass’. Is data mining in integrated experimental databases containing large quantities of genomic and systems biology data going to produce a morass, or is this approach useful for generating hypotheses and theories which, after corroboration, lead to valid knowledge?