SoFIA: a data integration framework for annotating high-throughput datasets
Author(s) -
Liam Childs,
Soulafa Mamlouk,
Jørgen Brandt,
Christine Sers,
Ulf Leser
Publication year - 2016
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btw302
Subject(s) - workflow , computer science , annotation , task (project management) , identifier , data integration , flexibility (engineering) , set (abstract data type) , software , data mining , domain (mathematical analysis) , database , data science , information retrieval , programming language , artificial intelligence , mathematical analysis , statistics , mathematics , management , economics
Integrating heterogeneous datasets from several sources is a common bioinformatics task that often requires implementing a complex workflow intermixing database access, data filtering, format conversions, identifier mapping, among further diverse operations. Data integration is especially important when annotating next generation sequencing data, where a multitude of diverse tools and heterogeneous databases can be used to provide a large variety of annotation for genomic locations, such a single nucleotide variants or genes. Each tool and data source is potentially useful for a given project and often more than one are used in parallel for the same purpose. However, software that always produces all available data is difficult to maintain and quickly leads to an excess of data, creating an information overload rather than the desired goal-oriented and integrated result.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom