Study of the Viral and Microbial Communities Associated With Crohn's Disease: A Metagenomic Approach | Zendy

Vicente PérezBrocal | Zendy; Rodrigo García-López | Zendy; Jorge F. VázquezCastellanos | Zendy; Pilar Nos | Zendy; Belén Beltrán | Zendy; Amparo Latorre | Zendy; Andrés Moyá | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Study of the Viral and Microbial Communities Associated With Crohn's Disease: A Metagenomic Approach

Author(s) -

Vicente PérezBrocal,

Rodrigo García-López,

Jorge F. VázquezCastellanos,

Pilar Nos,

Belén Beltrán,

Amparo Latorre,

Andrés Moyá

Publication year - 2013

Publication title -

clinical and translational gastroenterology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.673

H-Index - 35

ISSN - 2155-384X

DOI - 10.1038/ctg.2013.9

Subject(s) - human virome , microbiome , metagenomics , biology , bacteriophage , disease , bacteria , feces , dysbiosis , phage therapy , immunology , microbiology and biotechnology , medicine , bioinformatics , genetics , escherichia coli , gene

The viral, bacterial, archaeal, and eukaryotic communities harbored in the human gastrointestinal tract greatly outnumber the human body cells.1 In spite of this, the known microbial biodiversity may represent only a small fraction of the actual diversity. These communities are crucial for maintaining homeostasis in such a complex ecosystem.2 In particular, although human viruses are usually pathogens associated with gastroenteritis and other acute disorders, resident intestinal bacteriophages have significant roles in host microbe mortality and genetic diversity in the gut ecosystem by predation on their bacterial hosts.3, 4 In addition, bacteriophages can hinder colonization by potential bacterial pathogens5 but can also eliminate some beneficial probiotic strains,6 or introduce new phenotypic traits, such as antibiotic resistance and the ability to produce exotoxins.7 Despite their relevance in human health, most investigations of the ecological role of viruses have focused on other environments, especially aquatic systems and sediments (see for example refs 8, 9, 10, 11, 12, 13, 14, 15, 16). In contrast, a relatively small number of studies on biological samples,17, 18, 19, 20, 21, 22 and particularly those of human origin (see for example refs 23, 24, 25), has been carried out. The maintenance and compositional changes of the gut microbiota are known to be closely linked to human physiology, nutrition, and the prevalence of disease. Disruptions to the interactions between the microorganisms and human cells may occur due to genetic and/or environmental factors, thus disrupting homeostasis.26, 27, 28 In several complex diseases of the respiratory tract, such as asthma, or the digestive tract, such as type 1 diabetes and inflammatory bowel disease, interactions between human genotype and viral infections have been linked to autoimmune and inflammatory diseases.29 Crohn’s disease (CD) is a major type of inflammatory bowel disease that affects as many as 1 in 500 individuals.30 It is a chronic disorder whose onset takes places mainly during young adulthood, with a secondary increase in older adults. Inflammation may occur in multiple discontinuous regions of the intestine, although it is more frequent in the distal ileum and colon, and may involve transmural inflammation of the intestinal wall.31 Patients typically experience episodic symptoms, including fever, abdominal pain, vomiting, diarrhea and weight loss, and may also suffer more serious gastrointestinal problems. Family and twin studies have demonstrated a strong heritable component to the disease.32, 33 Early studies based on culture methods34, 35, 36 and, more recently, molecular-based approaches37, 38 to screening for particular viruses as possible etiological agents of CD have proven negative or inconclusive. However, based on higher detection levels of bacteriophages in the mucosa of CD patients by microscopy, a role for these in CD has been postulated.39 Other studies also evidence the possible role of viruses in CD. For example, induction of intestinal pathologies in mice by the interaction between a specific virus infection and a mutation in the CD susceptibility gene Atg16L1 has been demonstrated.40 The authors provided an example of how a virus-plus-susceptibility gene interaction can, in combination with additional environmental factors and commensal bacteria, determine the phenotype of hosts carrying common risk alleles for inflammatory disease. More recently, Hubbard and Cadwell41 examined the three-way relationship between viruses, autophagy genes and CD, and discussed how host–pathogen interactions can mediate complex inflammatory disorders. They concluded that although the role of viruses in CD remains speculative, accumulating evidence indicates that this possibility requires serious consideration. Identifying and measuring the community dynamics of viruses in the environment is complicated because less than 1% of microbial hosts have been cultivated in vitro. Furthermore, as there is no single gene common to all viral genomes, total uncultured viral diversity cannot be monitored using approaches analogous to ribosomal DNA profiling, commonly used for bacteria and archaea. Alternative approaches are therefore required for the evaluation of viral consortia in environmental samples. The development of metagenomic approaches, such as high-throughput sequencing, has allowed the exploration of viral diversity in a new way and is revolutionizing our knowledge of uncultured viral communities in a wider range of environments, including the human gastrointestinal tract. On the other hand, human gut virome studies carried out so far have been focused mainly on fecal samples from healthy adult or infant volunteers42, 43, 44, 45, 46, 47 or from children with various acute disorders.48, 49, 50 In the present study, we used 454 pyrosequencing to analyze the viral and microbial communities in fecal samples from a control group of healthy volunteers and from patients affected by CD, including an additional tissue sample from a surgical biopsy of a Crohn’s patient. We have also compared the diversity and structure of some of these viral communities with that of bacterial communities from the same samples, which were determined by partially sequencing the 16S rRNA gene. To amplify the viral cDNA and genomic DNA for sequencing, a whole genome amplification strategy was carried out using the Illustra GenomiPhi V2 DNA Amplification Kit (GE Healthcare, Amersham, UK), incubating for 2 h at 30 °C followed by phi29 DNA polymerase inactivation for 10 min at 65 °C. The resulting DNA amplification was confirmed by fluorometric measurement using Picogreen®, and 1 μg of DNA per sample was taken. Pools of four to five samples were sequenced simultaneously using the 454 pyrosequencing Genome Sequencer FLX titanium plus on an eighth of a PicoTiterPlate device (Roche Diagnostics, Mannheim, Germany). Bacterial raw sequence reads were filtered by quality and size using MOTHUR v.1.22.2,51 discarding sequences shorter than 200 bp, and using a procedure similar to that used with the viral metagenomic sequences but with an additional step of chimera removal. Sequence reads were assigned to their original sample using the barcode-tagged primer sequences. The nomenclature we used for viruses and prophages was initially generated using the “Fetch taxonomic representation” tool implemented on the Galaxy platform.56 Next, we used in-house Perl scripts to convert the Galaxy output into a standardized abundance table containing four of the taxonomic levels accepted for viruses (order, family, genus, and species) by inheriting the higher or, if not possible, the lower adjacent taxonomic-level tag to fill in the missing taxonomic levels of each bin. This way a nonredundant taxonomy for all entries was generated. In addition, prophages adopted their bacterial–host taxonomy with addition of the tag “phage”. Finally, to identify a taxonomic biomarker with high stringency, we employed the linear discriminant analysis effect size (LEfSe) method,59 combining the Kruskal–Wallis and pairwise Wilcoxon tests for statistical significance with linear discriminate analysis for feature selection, to confirm the differential abundance of viral OTUs. We used default significance (alpha value=0.05) and linear discriminant analysis thresholds (2.0), at all taxonomic levels between the control group and CD patients. The composition of viral OTUs at the species level (see Table 2) reveals that, regardless of whether the reads are assembled or not, our own approach retrieves more viral hits for the same threshold (10−3 on E value) than the MetaVir approach in all cases except for four samples (C10 unassembled, C9 assembled and V4 in both data sets). In fact, our approach retrieves, on average, 24.5 and 17.2 more viral taxa on non-assembled and assembled reads, respectively. The overall number of different viral OTUs we obtained with our approach is 958 different species, belonging to 379 genera, from 246 families, when non-assembled reads are considered. This figure is reduced to 766, 322, and 218 species, genera, and families, respectively, when assembled reads are considered instead. The viral taxonomic composition and abundance at the family level of the 19 samples studied are summarized in the Figure 1. Although the percentages and rank positions vary between non-assembled and assembled reads and among samples, some specific viral families remain among the most abundant. That is the case for the bacteriophages of the order Caudovirales (Siphoviridae, Myoviridae, Podoviridae, and others), as well as other bacteriophages such as the Inoviridae and certain unclassified phages, as well as certain prophages (e.g., those from Clostridiaceae or Enterobacteriaceae). Finally, viruses infecting eukaryotes had a significant presence, including Retroviridae and, more restricted to non-assembled reads, Paramyxoviridae and Bunyaviridae, or Herpesviridae in assembled contigs. The distribution and comparison of the different bacteriophage families between the CD and control groups in both non-assembled and assembled reads are shown in Figure 2. In all cases, the families Siphoviridae, Myoviridae, and Podoviridae account for most of the viral hits. In addition there is a notable presence of bacteriophages from the family Inoviridae. Most of the observed differences between CD and control groups before read assembly, such as in prophages, unclassified phages and the families Myoviridae, Podoviridae, or Inoviridae diminish afterwards. Assembly therefore leads to reduced differences between groups, and even reverts the ratios in some cases, such as in the prophages or the Myoviridae. In the assembled reads, the LEfSe method shows 57 differentially abundant viral OTUs (P<0.05) between CD and control samples at any taxonomic level (see Supplementary Table S2B in the Supplementary Materials and Methods). Of these, 54 are overrepresented in control samples whereas only 3 are overrepresented in CD samples. The latter consist of two taxa: the family (and its unnamed order) Retroviridae and the species Synechococcus phage S CBS1. In contrast, the 54 OTUs whose relative abundance is significantly higher in the control samples include 43 different viruses and 11 prophage-like viruses. When looking at higher taxonomic levels (e.g., Order) in the classification of the metagenomic sequences of bacteria (Figure 3), other OTUs specifically abundant in IC1 are the Actinomycetales, with 11.4% of the reads, compared with 0.0–0.1% in the remaining samples; the Lactobacillales, Bacillales, and Gamellales, which represent almost 20.0% compared with 0.0–2.4% of the other samples; the Enterobacteriales with 51.1% compared with 0.0–4.9% (with the previously mentioned exception of sample C7), or two orders within the TM7-3 class: EW055 with 2.6% compared with 0.0–0.1%, and CW040 with 0.4% compared with <0.1% in the remaining samples. To assess the microbial diversity within each of the samples, the alpha-diversity metrics observed species, Chao1 estimator, Shannon’s diversity index, and Simpson’s diversity index were calculated. The first three are displayed in Figure 5. The CD samples are less diverse than the control samples when considered as a group, but they also show a much higher heterogeneity when compared with the more homogeneous control samples, as they encompass some of the samples with the highest diversity indices (such as C1 or C2) but also the ones with the lowest values (C6, C8, and C10). Low diversity is also identified for the tissue sample (IC1), which exhibits the lowest values for Shannon and Simpson diversity indices. The diversity among samples was assessed with a Principal Coordinate Analysis of the microbiota and viral communities, which is represented in the two-dimensional plot in the Figure 6. For the microbiota, the plots show that the control group samples are less scattered than the CD ones, and the control samples are grouped separately from the CD samples, regardless of the dissimilarity/distance matrix used, suggesting the existence of two distinct clusters. This pattern is not as sharp for the viral communities because, although there appear to be two clusters with the CD group more scattered than the control group, the presence of exceptions involving samples from both groups blurs this distinction. For example, samples V5 and V7 (marked with arrows), which exhibit a much lower diversity than other control samples, appear to be more closely related to each other and separate from the remaining control samples. The intestinal sample microbiota, IC1 (circled), appears quite divergent from the other samples suggesting a distinct distribution, which reinforces the results from composition and abundance analyses (see above, Figure 3). The same plots for viruses are less conclusive. All in all, the results suggest the existence of greater variation in the beta diversity in the case of the CD samples in both microbial and viral communities but lower diversity within samples, whereas in the control group there is more diversity within samples in general, but also more homogeneity between sample. Virus-based clustering results display lower values of statistical support for most nodes, due in part to the lower number of viral hits. The cluster tree shows that relationships among samples vary greatly depending on the assembly, the distance matrix and taxonomic level used, but overall the reconstructions differ from the bacterial-based clustering, displaying much lower resolution. Therefore, the CD and control samples can only partially be separated into two groups based on viral composition and abundance, and with low statistical support. When looking deeper into this taxonomic bacterial distribution by groups (see Figure 8b), some notable differences between the CD and control groups are observed in both cases. For example, bacterial composition based on the 16S rDNA reveals a difference in the percentage of Enterobacteriales (+8.78%), Burkholderiales (+4.97%), Fusobacteriales (+4.90%), Bacteroidales (+3.85%), and Lactobacillales (+1.25%) in favor of the CD group, which shows a significant decrease mainly of Clostridiales (−23.47%) and RF39 (−1.20%) compared with the control group. However, when comparing the ratios, orders that are overall represented in low percentages may show greater differences between groups than orders that account for the majority of the bacteria. For example, some orders only appear in the control group, such as Aeromonadales, Bacillales, Fusobacteriales, Gemellales, Neisseriales, Oceanospirillales, and Sphingomonadales, or are significantly more represented in that group, such as Actinomycetales (246.98 fold), or Enterobacteriales (99.75 fold), compared with only a 1.07-fold increase of Bacteroidales in this group. No bacterial order is found exclusively in the CD group, but five orders are better represented in this group: Erysipelotrichales (1.97 fold), Clostridiales (2.15 fold), Verrucomicrobiales (3.63 fold), Pasteurellales (7.55 fold), and especially RF39 (80.97 fold). The comparison between groups based on the phage-hosting bacteria shows an uneven distribution when compared with their 16S rDNA data set counterparts. For example, bacteriophages whose bacterial hosts belong to the orders Bacteriodetes, Bacillales, Desulfovibrionales, Fusobacteriales, or Pseudomonadales are more abundant and Neisseriales, Oceanospirillales, or Pasteurellales less abundant in the CD group in both data sets. However, for other orders, comparing the two groups gives the opposite result between bacteria and bacterial hosts of the phages, for example, in the Actinomycetales, Burkholderiales, Clostridiales, Enterobacteriales, or Lactobacillales. Regarding the TIGRfam roles, those with greater prevalence among the samples, excluding the “not assigned” category, are proteins related to DNA metabolism (9.6%), followed by mobile and extrachromosomal element functions (5.9%), purines, pyrimidines, nucleosides, and nucleotides (5.1%), and transport and binding proteins (4.0%), whereas the less represented roles are signal transduction (0.2%), and fatty acid and phospholipid metabolism (0.3%). The two main roles, DNA metabolism and the mobile and extrachromosomal element functions, are typically associated with viral replication and structure, and are especially common in the control group (13.1% and 7.3%, respectively) compared with the CD group (7.1% and 4.9%, respectively). Our report is the first metagenomic study to investigate the viral communities associated with a multifactorial chronic intestinal disease, namely CD. It also takes into account the microbial community composition, abundance, and diversity so that a comparison of the two communities can be established between the two sample groups under study. Our analysis includes a study of the microbiota, already the subject of numerous investigations (reviewed for example by refs 2, 62, 63). These have previously reported the dysbiosis associated to the CD, including some of the features that we have also found, such a decrease in clostridia concentration (although not accompanied by a decrease in Bacteroidetes), as well as the relative abundance of members of the Enterobacteriaceae. However, unlike previous studies on inflammatory bowel disease, our efforts have been focused on the viral community associated with one particular form of this disease (CD), using a metagenomic approach combined with the massive sequencing tools. We were able to retrieve more viral hits than previous approaches because of the extensive database we used, which comprises a comprehensive collection of four extant databases. This is despite the exclusion of environmental samples, unless they had been taxonomically assigned and appear in one of the databases used. Similarly to bacteria, we observed a lower diversity in viral communities in CD samples compared with the control group. In addition, from our results we infer the existence of greater levels of variation within the CD group than within the control group, especially when analyzing the bacterial diversity, but also with viruses. We have also identified that more OTUs, in both viruses and bacteria, are underrepresented in the CD group samples compared with the control samples. However, the exceptions to this pattern, such as the case of viruses similar to members of the family Retroviridae, could be of interest for further investigation, particularly given the links between members of this family and immunodeficiency and the immune responses, key factors in CD. In our study, bacteriophages are directly inferred from the metagenomic samples. However, in analyzing the bacterial composition it must be noted that two different comparisons are carried out: one was inferred directly from bacterial 16S sequences, whereas the other used an indirect inference of the potential bacterial hosts from the bacteriophages detected in the samples, which does not necessarily correlate with, or even reflect, the actual bacterial composition and abundance in a particular environment, in the same way that the composition of predators does not necessarily allow the inference of the composition and abundance of their prey. In addition, the range of potential bacterial hosts for a bacteriophage can sometimes be very narrow, but other bacteriophages may predate a wider range of bacteria, which can further distort this picture. Furthermore, we have shown that bacterial inference based on the sampling of viruses cannot replace or even complement the analyses of the microbiome, due to the bias in the characterized bacteriophages that hinders any chance to consider them representative of the bacterial communities. For example, the databases have a bias toward bacteriophages of more well-studied bacterial orders because of their health or economic importance, such as Enterobacteriales, Actinomycetales, or Lactobacillales. Conversely, bacteriophages from largely predominant gut bacteria, such as Bacteroidales and Clostridiales, are underrepresented in the virome samples because many of their hosts remain poorly characterized despite their abundance in the human gut, probably due to the fact that many of them are uncultivable bacteria. These discrepancies may also explain the disparate clustering of the samples based on viruses, which does not match the one based on bacteria. There is also less support for clustering when based on viruses, resulting in more ambiguous and variable results. There are a series of methodological considerations to be taken into account when analyzing the viral community composition and abundance, and the derived taxonomic and functional analyses. Thus, the assembly of the reads into contigs has an impact in the distribution of the viral hits, reducing the absolute number of OTUs in terms of composition. OTU abundance is also affected by a reduction of the relative number of viruses represented by higher numbers of reads, which are therefore more likely to be redundant, as they produce a reduced number of contigs after assembly. In contrast, viruses represented by a low number of individual reads are less prone to be assembled into contigs and so tend to increase their relative OTU frequency after assembly. Analyses carried out in our group have demonstrated that the read assembly significantly increases the performance of the functional analyses (data not shown), making it preferable to assemble into contigs for the functional analysis. One point of caution is that the identification of viral hits prior to their taxonomic assignment relies on the blast search, which allows identification of the “most similar viruses” in the database. This does not necessarily imply that the viruses present in the public databases are the actual ones present in the sample. Also, slight variations in the blast results can result in a different taxonomic assignment of the E value-based best hits, and therefore variations in the viral distribution. This can result from the assembly of reads into contigs, for example, which can change the best hits in the blast results. Another noteworthy issue would be the heterogeneity in the number of reads obtained per sample, which makes comparison between samples a more difficult process. Thus, reaching the most homogeneous possible number of reads would be desirable. Finally, even though we are able to retrieve more viral hits than with the existing pipelines, such as MetaVir, most of the reads remain still unknown. So far, we can only state that we detect candidates related to those viruses available in the public databases, but we cannot rule out the possibility that other viruses, possibly more relevant to understanding the etiology and progression of the CD, may be “hidden” within the uncharacterized reads. It is necessary to expand extant viral databases and other tools to identity viruses not only by homology search but also by means that are independent of sequence. Guarantor of the article: Andrés Moya, PhD. Specific author contributions: Pilar Nos, Belén Beltrán, and Vicente Pérez-Brocal collected the samples. Andrés Moya, Vicente Pérez-Brocal conceived, and designed the experiments. Vicente Pérez-Brocal, Rodrigo García-López, and Jorge Vázquez-Castellanos performed the experiments and analyzed the data. Andrés Moya and Amparo Latorre contributed reagents/materials/analysis tools. Vicente Pérez-Brocal wrote the paper. Financial support: This work was supported by grants SAF-2009-13032-C02-01 from the Spanish Ministry of Science and Innovation (MICINN) and SAF-2012-31187 from the Ministry of Economy and Competitiveness (MECO) to Andrés Moya. Vicente Pérez-Brocal has a research contract from the Instituto de Salud Carlos III (ISCIII). Potential competing interests: None. We thank Sébastien Varlez, Adriana Cordova, Pau Esparza, Sara Ferrando, and Inés Moret for assistance with various aspects of the work presented here. We are especially grateful to Dr C. Graham Clark for his helpful comments and language editing of this manuscript. Supplementary Information accompanies this paper on the Clinical and Translational Gastroenterology website

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research