Repository logo
 
Loading...
Project Logo
Research Project

Analysis of high-throughput antibody data for better understanding of immunogenetics and epidemiology of malaria

Authors

Publications

Revisiting IgG antibody reactivity to epstein-barr virus in myalgic encephalomyelitis/chronic fatigue syndrome and Its potential application to disease diagnosis
Publication . Sepúlveda, Nuno; Malato, João; Sotzny, Franziska; Grabowska, Anna D.; Fonseca, André; Cordeiro, Clara; Graça, Luís; Biecek, Przemyslaw; Behrends, Uta; Mautner, Josef; Westermeier, Francisco; Lacerda, Eliana M.; Scheibenbogen, Carmen
Infections by the Epstein-Barr virus (EBV) are often at the disease onset of patients suffering from Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS). However, serological analyses of these infections remain inconclusive when comparing patients with healthy controls (HCs). In particular, it is unclear if certain EBV-derived antigens eliciting antibody responses have a biomarker potential for disease diagnosis. With this purpose, we re-analyzed a previously published microarray data on the IgG antibody responses against 3,054 EBV-related antigens in 92 patients with ME/CFS and 50 HCs. This re-analysis consisted of constructing different regression models for binary outcomes with the ability to classify patients and HCs. In these models, we tested for a possible interaction of different antibodies with age and gender. When analyzing the whole data set, there were no antibody responses that could distinguish patients from healthy controls. A similar finding was obtained when comparing patients with non-infectious or unknown disease trigger with healthy controls. However, when data analysis was restricted to the comparison between HCs and patients with a putative infection at their disease onset, we could identify stronger antibody responses against two candidate antigens (EBNA4_0529 and EBNA6_0070). Using antibody responses to these two antigens together with age and gender, the final classification model had an estimated sensitivity and specificity of 0.833 and 0.720, respectively. This reliable case-control discrimination suggested the use of the antibody levels related to these candidate viral epitopes as biomarkers for disease diagnosis in this subgroup of patients. To confirm this finding, a follow-up study will be conducted in a separate cohort of patients.
Antibody selection strategies and their impact in predicting clinical malaria based on multi-sera data
Publication . Fonseca, André; Spytek, Mikolaj; Biecek, Przemysław; Cordeiro, Clara; Sepúlveda, Nuno
Nowadays, the chance of discovering the best antibody candidates for predicting clinical malaria has notably increased due to the availability of multi-sera data. The analysis of these data is typically divided into a feature selection phase followed by a predictive one where several models are constructed for predicting the outcome of interest. A key question in the analysis is to determine which antibodies should be included in the predictive stage and whether they should be included in the original or a transformed scale (i.e. binary/dichotomized).
IgG Antibody responses to Epstein-Barr Virus in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: Their effective potential for disease diagnosis and pathological antigenic mimicry
Publication . Fonseca, André; Szysz, Mateusz; Ly, Hoang Thien; Cordeiro, Clara; Sepúlveda, Nuno
The diagnosis and pathology of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) remain under debate. However, there is a growing body of evidence for an autoimmune component in ME/CFS caused by the Epstein-Barr virus (EBV) and other viral infections. Materials and Methods: In this work, we analyzed a large public dataset on the IgG antibodies to 3054 EBV peptides to understand whether these immune responses could help diagnose patients and trigger pathological autoimmunity; we used healthy controls (HCs) as a comparator cohort. Subsequently, we aimed at predicting the disease status of the study participants using a super learner algorithm targeting an accuracy of 85% when splitting data into train and test datasets. Results: When we compared the data of all ME/CFS patients or the data of a subgroup of those patients with non-infectious or unknown disease triggers to the data of the HC, we could not find an antibody-based classifier that would meet the desired accuracy in the test dataset. However, we could identify a 26-antibody classifier that could distinguish ME/CFS patients with an infectious disease trigger from the HCs with 100% and 90% accuracies in the train and test sets, respectively. We finally performed a bioinformatic analysis of the EBV peptides associated with these 26 antibodies. We found no correlation between the importance metric of the selected antibodies in the classifier and the maximal sequence homology between human proteins and each EBV peptide recognized by these antibodies. Conclusions: In conclusion, these 26 antibodies against EBV have an effective potential for disease diagnosis in a subset of patients. However, the peptides associated with these antibodies are less likely to induce autoimmune B-cell responses that could explain the pathogenesis of ME/CFS.
From data to discovery: designing new pipelines for the analysis of high-throughput antibody data in Malaria and Chronic Fatigue Syndrome
Publication . Fonseca, André Filipe Afonso de Sousa; Sepúlveda, Nuno; Cordeiro, Clara
Current serological studies, where thousands of antibodies can now be simultaneously screened, has allowed to enhance our understanding of the immune responses to various pathogens and to support the development of better diagnostic tools and treatment strategies. Nonetheless, the complexity of such data has broughtnewhurdles regarding the capability of traditional statistical methods to cope with such data. Although Machine Learning (ML) techniques have offered enhanced capabilities to unravel antibody biomarkers, the exact identity of antibody biomarkers against certain diseases remains a struggle. This challenge underscores the pressing need for innovative methodologies to enhance the accuracy in biomarker identification, facilitatingmore effective diagnostics and targeted therapeutics. In this thesis I developed analytical pipelines for the analysis of high-throughput antibody data. To illustrate the potential of these pipelines, I focused on antibody data on Malaria and Chronic Fatigue Syndrome/MyalgicEncephalomyelitis. Ingeneral, these pipelines were based on an initial variable selection step to identify the most relevant and informative variables, followed by a predictive step where distinct classifiers would be constructed using ML-based approaches. At first, distinct approaches for the analysis of a relative low number of antibodies under analysis to test the suitability on the analysis of such data. We then proceeded to analyze data containing thousands of antibodies. This morechallenging situation motivatedmeto fine-tune the initial pipelines to better cope with the high dimensionality of the data. Each pipeline leveraged different statistical assumptions and yielded benefits and drawbacks, providing predictive accuracies that ranged fromclose to 72% up to 90% when implemented on different datasets, surpassing previous published analyzes on he same data. In conclusion, these new pipelines generated a good predictive performance in the case studies evaluated. Given that they are based on general principles of data analysis, they have the potential to increase the robustness and reproducibility of the analysis of highdimensional antibody data.

Organizational Units

Description

Keywords

Contributors

Funders

Funding agency

Fundação para a Ciência e a Tecnologia

Funding programme

OE

Funding Award Number

SFRH/BD/147629/2019

ID