De Novo Annotation of Transposable Elements: Tackling the Fat Genome Issue
Author(s) -
Veronique Jamilloux,
Josquin Daron,
Frederic Choulet,
Hadi Quesneville
Publication year - 2017
Publication title -
proceedings of the ieee
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.383
H-Index - 287
eISSN - 1558-2256
pISSN - 0018-9219
DOI - 10.1109/jproc.2016.2590833
Subject(s) - general topics for engineers , engineering profession , aerospace , bioengineering , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , fields, waves and electromagnetics , geoscience , nuclear engineering , robotics and control systems , signal processing and analysis , transportation , power, energy and industry applications , communication, networking and broadcast technologies , photonics and electrooptics
Transposable elements (TEs) constitute the most dynamic and the largest component of large plant genomes: for example, 80% to 90% of the maize genome and the wheat genome may be TEs. De novo TE annotation is therefore a computational challenge, and we investigated, using current tools in the REPET package, new strategies to overcome the difficulties. We tested our methodological developments on the sequence of the chromosome 3B of the hexaploid wheat; this chromosome is ~1 Gb, one of the “fattest” genomes ever sequenced. We successfully established various strategies for annotating TEs in such a complex dataset. Our analyses show that all of our strategies can overcome the current limitations for de novo TE discovery in large plant genomes. Relative to annotation based on a library of known TEs, our de novo approaches improved genome coverage (from 84% to 90%), and the number of full length annotated copies from 14 830 to 15 905. We also developed two new metrics for qualifying TE annotation: NTE50 involves measuring the number, and LTE50 the smallest sizes of annotations that cover 50% of the genome. NTE50 decreased the number of annotations from 124 868 to 93 633 and LTE50 increased it from 1839 to 2659. This work shows how to obtain comprehensive and high-quality automatic TE annotation for a number of economically and agronomically important species.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom