Biclustering Sparse Binary Genomic Data
Author(s) -
Miranda van Uitert,
Wouter Meuleman,
Lodewyk F.A. Wessels
Publication year - 2008
Publication title -
journal of computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.585
H-Index - 95
eISSN - 1557-8666
pISSN - 1066-5277
DOI - 10.1089/cmb.2008.0066
Subject(s) - biclustering , row , row and column spaces , binary number , set (abstract data type) , computer science , sparse matrix , binary data , logical matrix , expression (computer science) , column (typography) , feature (linguistics) , data mining , pattern recognition (psychology) , algorithm , artificial intelligence , mathematics , cluster analysis , database , cure data clustering algorithm , chemistry , arithmetic , quantum mechanics , gaussian , programming language , physics , correlation clustering , organic chemistry , group (periodic table) , philosophy , linguistics , telecommunications , frame (networking)
Genomic datasets often consist of large, binary, sparse data matrices. In such a dataset, one is often interested in finding contiguous blocks that (mostly) contain ones. This is a biclustering problem, and while many algorithms have been proposed to deal with gene expression data, only two algorithms have been proposed that specifically deal with binary matrices. None of the gene expression biclustering algorithms can handle the large number of zeros in sparse binary matrices. The two proposed binary algorithms failed to produce meaningful results. In this article, we present a new algorithm that is able to extract biclusters from sparse, binary datasets. A powerful feature is that biclusters with different numbers of rows and columns can be detected, varying from many rows to few columns and few rows to many columns. It allows the user to guide the search towards biclusters of specific dimensions. When applying our algorithm to an input matrix derived from TRANSFAC, we find transcription factors with distinctly dissimilar binding motifs, but a clear set of common targets that are significantly enriched for GO categories.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom