Recentrifuge: Robust comparative analysis and contamination removal for metagenomics - April 2019 - Jose Manuel Marti

Sly Saint

Senior Member (Voting Rights)
Abstract
Metagenomic sequencing is becoming widespread in biomedical and environmental research, and the pace is increasing even more thanks to nanopore sequencing. With a rising number of samples and data per sample, the challenge of efficiently comparing results within a specimen and between specimens arises. Reagents, laboratory, and host related contaminants complicate such analysis. Contamination is particularly critical in low microbial biomass body sites and environments, where it can comprise most of a sample if not all. Recentrifuge implements a robust method for the removal of negative-control and crossover taxa from the rest of samples. With Recentrifuge, researchers can analyze results from taxonomic classifiers using interactive charts with emphasis on the confidence level of the classifications. In addition to contamination-subtracted samples, Recentrifuge provides shared and exclusive taxa per sample, thus enabling robust contamination removal and comparative analysis in environmental and clinical metagenomics.

Regarding the first area, Recentrifuge’s novel approach has already demonstrated its benefits showing that microbiomes of Arctic and Antarctic solar panels display similar taxonomic profiles. In the clinical field, to confirm Recentrifuge’s ability to analyze complex metagenomes, we challenged it with data coming from a metagenomic investigation of RNA in plasma that suffered from critical contamination to the point of preventing any positive conclusion. Recentrifuge provided results that yielded new biological insight into the problem, supporting the growing evidence of a blood microbiota even in healthy individuals, mostly translocated from the gut, the oral cavity, and the genitourinary tract. We also developed a synthetic dataset carefully designed to rate the robust contamination removal algorithm, which demonstrated a significant improvement in specificity while retaining a high sensitivity even in the presence of cross-contaminants. Recentrifuge’s official website is www.recentrifuge.org. The data and source code are anonymously and freely available on GitHub and PyPI. The computing code is licensed under the AGPLv3. The Recentrifuge Wiki is the most extensive and continually-updated source of documentation for Recentrifuge, covering installation, use cases, testing, and other useful topics.
To confirm Recentrifuge’s ability to analyze complex metagenomes and provide new biological insight, we considered an ambitious but severely contaminated SMS study of RNA in plasma from individuals with Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS), alternatively diagnosed chronic Lyme syndrome (ADCLS), and systemic Lupus erythematosus (SLE) [48]. This research suffered from large batch and contamination effects and was unable to find a positive association between the plasma microbial content of sick individuals, thus highlighting the relevance of technical controls in metagenomics.

To further illustrate the difficulty of the dataset of the SMS study of plasma in ME/CFS patients regarding the contamination, just a couple of results. First, affecting the sequencing batch one, Recentrifuge detected crossover contamination in the negative control samples with the source in the positive control, consisting of human metapneumovirus (hMPV). Second, Recentrifuge reported quite more different taxa in the negative controls than in the normal samples: 65% and 22% more on average, respectively, for the batch two and three. The presence of generalized crossover contamination complicates the removal of the contaminants in the samples by merely excluding the taxa present in the controls. Here it is when the robust contamination removal algorithm of Recentrifuge is of great help: it detects the crossover contaminants (hMPV and other taxa) and removes them from all the samples except for the inferred source. Therefore, the positive control is still positive for hMPV after the contamination removal, as expected (see S7 Fig).

In conclusion, thanks to the robust contamination removal and the score-oriented comparative analysis of multiple samples in metagenomics, Recentrifuge can play a key role, firstly, in the study of oligotrophic microbes in environmental samples, as it did by showing that microbiomes of Arctic and Antartic solar panels display similar taxonomic profiles [60]; secondly, in the more reliable detection of minority organisms in clinical or forensic samples. The relevant organisms found with a high score in the SMS study of plasma in ME/CFS patients [48] after the robust contamination removal are good examples. Finally, the mock dataset confirmed the worthiness of the developed methods, which demonstrated a radical improvement in specificity while retaining high sensitivity rates even in the presence of cross-contaminants.

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006967

(I haven't a clue what any of this means but figured someone might find it of interest)
 
Back
Top Bottom