Preprint: Disease diagnostics using machine learning of immune receptors 2023 Zaslavsky et al

Andy

Retired committee member
Abstract

Clinical diagnoses rely on a wide variety of laboratory tests and imaging studies, interpreted alongside physical examination findings and the patient's history and symptoms. Currently, the tools of diagnosis make limited use of the immune system's internal record of specific disease exposures encoded by the antigen-specific receptors of memory B cells and T cells, and there has been little integration of the combined information from B cell and T cell receptor sequences.

Here, we analyze extensive receptor sequence datasets with three different machine learning representations of immune receptor repertoires to develop an interpretive framework, MAchine Learning for Immunological Diagnosis (Mal-ID), that screens for multiple illnesses simultaneously. This approach is effective in identifying a variety of disease states, including acute and chronic infections and autoimmune disorders. It is able to do so even when there are other differences present in the immune repertoires, such as between pediatric or adult patient groups. Importantly, many features of the model of immune receptor sequences are human-interpretable. They independently recapitulate known biology of the responses to infection by SARS-CoV-2 and HIV, provide evidence of receptor antigen specificity, and reveal common features of autoreactive immune receptor repertoires, indicating that machine learning on immune repertoires can yield new immunological knowledge. This framework could be useful in identifying immune responses to new infectious diseases as they emerge.

https://www.biorxiv.org/content/10.1101/2022.04.26.489314v2
 
Extended to further datasets and clinical cohorts at population scale, this immune repertoire analysis strategy offers a strategy for disease definition refinement and diagnosis, as well as improving understanding of immune response features such as autoreactivity that are shared across different pathologies. We anticipate extending this approach to other autoimmune conditions, immunological treatment complications like transplantation rejection, and less well understood conditions suspected to have an immunological basis, like chronic fatigue syndrome. This analysis technique may be able to predict which patients respond to immuno-oncology checkpoint blockade therapy and illuminate the basis for low response rates.
 
Have to give props to an acronym that almost sounds like maladie, the French word for illness.
 
"Disease diagnostics using machine learning of B cell and T cell receptor sequences" (Science):
Abstract said:
INTRODUCTION
Conventional clinical diagnosis relies on physical examination, patient history, laboratory testing, and imaging, but makes little use of the receptors on B cells and T cells that reflect current and past exposures and responses. Microbial pathogen detection underpins infectious disease diagnosis. Other conditions are more challenging: Autoimmune diseases can require a combination of imaging studies and testing for autoantibodies and other laboratory abnormalities in the blood that may not yield a definitive disease classification. This process can be lengthy and may be complicated by initial misdiagnoses and ambiguous or overlapping symptoms between conditions.

B cell receptors (BCRs) and T cell receptors (TCRs) allow these immune cells to recognize and respond to specific antigens on pathogens and sometimes the body’s own tissues. The genes encoding BCRs and TCRs are generated by random recombination of segments in the genome of individual cells during their development, and have potential as a diverse set of sequence biomarkers associated with immune system activity. BCR and TCR populations change after exposure to pathogens, after vaccination, and in response to autoantigens in autoimmune conditions, reflecting clonal expansion and selection of B cells and T cells during immune responses. Sequencing and interpreting BCR and TCR genes could provide a single diagnostic test for simultaneous assessment of many diseases.

RATIONALE

We designed experimental protocols and a data analysis framework for identifying human BCR heavy chain and TCR beta chain features characteristic of infectious and immunological disorders or elicited by therapeutic or prophylactic interventions such as vaccination. Our method, named MAchine Learning for Immunological Diagnosis (Mal-ID), combines traditional immunological analyses, such as shared sequence detection between individuals with the same condition, with more complex features derived from artificial intelligence (AI) models of protein sequences, called protein language models. Although AI systems can be difficult to interpret, we developed ways to understand how the model makes its diagnostic predictions.

RESULTS

We generated large datasets of both BCR heavy chain and TCR beta chain sequences from the same individuals, spanning six disease or immune response states, to train and evaluate the Mal-ID model. Mal-ID accurately identified immune status from blood samples of 542 individuals with COVID-19, HIV, lupus, type 1 diabetes, recent flu vaccination, and healthy controls, achieving a multiclass area under the receiver operating characteristic curve (AUROC) of 0.986 on data not used for training. Combining features from both B cell and T cell receptor data led to the highest classification performance, but even with only BCR sequences, we still achieved high classification performance (0.959 AUROC in an expanded cohort adding 51 individuals for whom only BCR data were available).

Despite the model being trained to classify multiple heterogeneous classes, it can be specialized for detecting a particular condition. When applied to specifically distinguish patients with lupus from other patients and healthy controls, the classifier achieved 93% sensitivity and 90% specificity. This performance relative to current tests indicates the potential for BCR and TCR sequence analysis to detect clinically relevant signals.

We used the model to provide insights into the biologically relevant features that enable accurate disease classification. Examining which sequence categories contributed most to predictions, we confirmed that the patterns discovered from the data matched established immunological knowledge. Additionally, we assessed whether the model identified individual immune receptors known to be associated with disease. The model assigned higher COVID-19 association scores to sequences from an external database of SARS-CoV-2 binding BCRs, compared with sequences from healthy donors. We also verified that batch effects and demographic factors such as age, sex, and ancestry were not responsible for disease classification performance, and the model performed well when tested on external datasets from other laboratories.

CONCLUSION

This pilot study demonstrates that immune receptor sequencing data can distinguish a range of disease states and extract biological insights without prior knowledge of antigen-specific receptor patterns. With further validation and extension, Mal-ID could lead to clinical tools that harness the vast information contained in immune receptor populations for medical diagnosis.

https://www.science.org/doi/10.1126/science.adp2407
 
Back
Top Bottom