Loading...
Research Project
Analysis of high-throughput antibody data for better understanding of immunogenetics and epidemiology of malaria
Funder
Authors
Publications
Revisiting IgG antibody reactivity to epstein-barr virus in myalgic encephalomyelitis/chronic fatigue syndrome and Its potential application to disease diagnosis
Publication . Sepúlveda, Nuno; Malato, João; Sotzny, Franziska; Grabowska, Anna D.; Fonseca, André; Cordeiro, Clara; Graça, Luís; Biecek, Przemyslaw; Behrends, Uta; Mautner, Josef; Westermeier, Francisco; Lacerda, Eliana M.; Scheibenbogen, Carmen
Infections by the Epstein-Barr virus (EBV) are often at the disease onset of patients suffering from Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS). However, serological analyses of these infections remain inconclusive when comparing patients with healthy controls (HCs). In particular, it is unclear if certain EBV-derived antigens eliciting antibody responses have a biomarker potential for disease diagnosis. With this purpose, we re-analyzed a previously published microarray data on the IgG antibody responses against 3,054 EBV-related antigens in 92 patients with ME/CFS and 50 HCs. This re-analysis consisted of constructing different regression models for binary outcomes with the ability to classify patients and HCs. In these models, we tested for a possible interaction of different antibodies with age and gender. When analyzing the whole data set, there were no antibody responses that could distinguish patients from healthy controls. A similar finding was obtained when comparing patients with non-infectious or unknown disease trigger with healthy controls. However, when data analysis was restricted to the comparison between HCs and patients with a putative infection at their disease onset, we could identify stronger antibody responses against two candidate antigens (EBNA4_0529 and EBNA6_0070). Using antibody responses to these two antigens together with age and gender, the final classification model had an estimated sensitivity and specificity of 0.833 and 0.720, respectively. This reliable case-control discrimination suggested the use of the antibody levels related to these candidate viral epitopes as biomarkers for disease diagnosis in this subgroup of patients. To confirm this finding, a follow-up study will be conducted in a separate cohort of patients.
Antibody selection strategies and their impact in predicting clinical malaria based on multi-sera data
Publication . Fonseca, André; Spytek, Mikolaj; Biecek, Przemysław; Cordeiro, Clara; Sepúlveda, Nuno
Nowadays, the chance of discovering the best antibody candidates for predicting clinical malaria has notably increased due to the availability of multi-sera data. The analysis of these data is typically divided into a feature selection phase followed by a predictive one where several models are constructed for predicting the outcome of interest. A key question in the analysis is to determine which antibodies should be included in the predictive stage and whether they should be included in the original or a transformed scale (i.e. binary/dichotomized).
IgG Antibody responses to Epstein-Barr Virus in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: Their effective potential for disease diagnosis and pathological antigenic mimicry
Publication . Fonseca, André; Szysz, Mateusz; Ly, Hoang Thien; Cordeiro, Clara; Sepúlveda, Nuno
The diagnosis and pathology of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) remain under debate. However, there is a growing body of
evidence for an autoimmune component in ME/CFS caused by the Epstein-Barr virus (EBV) and
other viral infections. Materials and Methods: In this work, we analyzed a large public dataset on
the IgG antibodies to 3054 EBV peptides to understand whether these immune responses could
help diagnose patients and trigger pathological autoimmunity; we used healthy controls (HCs) as a
comparator cohort. Subsequently, we aimed at predicting the disease status of the study participants
using a super learner algorithm targeting an accuracy of 85% when splitting data into train and test
datasets. Results: When we compared the data of all ME/CFS patients or the data of a subgroup of
those patients with non-infectious or unknown disease triggers to the data of the HC, we could not
find an antibody-based classifier that would meet the desired accuracy in the test dataset. However,
we could identify a 26-antibody classifier that could distinguish ME/CFS patients with an infectious
disease trigger from the HCs with 100% and 90% accuracies in the train and test sets, respectively. We
finally performed a bioinformatic analysis of the EBV peptides associated with these 26 antibodies.
We found no correlation between the importance metric of the selected antibodies in the classifier and
the maximal sequence homology between human proteins and each EBV peptide recognized by these
antibodies. Conclusions: In conclusion, these 26 antibodies against EBV have an effective potential for
disease diagnosis in a subset of patients. However, the peptides associated with these antibodies are
less likely to induce autoimmune B-cell responses that could explain the pathogenesis of ME/CFS.
From data to discovery: designing new pipelines for the analysis of high-throughput antibody data in Malaria and Chronic Fatigue Syndrome
Publication . Fonseca, André Filipe Afonso de Sousa; Sepúlveda, Nuno; Cordeiro, Clara
Current serological studies, where thousands of antibodies can now be simultaneously
screened, has allowed to enhance our understanding of the immune responses to various
pathogens and to support the development of better diagnostic tools and treatment strategies.
Nonetheless, the complexity of such data has broughtnewhurdles regarding the capability
of traditional statistical methods to cope with such data. Although Machine Learning
(ML) techniques have offered enhanced capabilities to unravel antibody biomarkers,
the exact identity of antibody biomarkers against certain diseases remains a struggle. This
challenge underscores the pressing need for innovative methodologies to enhance the accuracy
in biomarker identification, facilitatingmore effective diagnostics and targeted therapeutics.
In this thesis I developed analytical pipelines for the analysis of high-throughput antibody
data. To illustrate the potential of these pipelines, I focused on antibody data on
Malaria and Chronic Fatigue Syndrome/MyalgicEncephalomyelitis. Ingeneral, these pipelines
were based on an initial variable selection step to identify the most relevant and informative
variables, followed by a predictive step where distinct classifiers would be constructed
using ML-based approaches. At first, distinct approaches for the analysis of a relative
low number of antibodies under analysis to test the suitability on the analysis of such
data. We then proceeded to analyze data containing thousands of antibodies. This morechallenging
situation motivatedmeto fine-tune the initial pipelines to better cope with the
high dimensionality of the data. Each pipeline leveraged different statistical assumptions
and yielded benefits and drawbacks, providing predictive accuracies that ranged fromclose
to 72% up to 90% when implemented on different datasets, surpassing previous published
analyzes on he same data.
In conclusion, these new pipelines generated a good predictive performance in the case
studies evaluated. Given that they are based on general principles of data analysis, they
have the potential to increase the robustness and reproducibility of the analysis of highdimensional
antibody data.
Organizational Units
Description
Keywords
Contributors
Funders
Funding agency
Fundação para a Ciência e a Tecnologia
Funding programme
OE
Funding Award Number
SFRH/BD/147629/2019