AI/ML methods to provide interpretable explanations and new knowledge for rare diseases.

To date, we know more than 7000 rare diseases and for the majority of them, there is a lack of relevant and quality data, also due to the fact that for a particular rare disease, there are only a few patients diagnosed in the world (small cohorts) and as these patients are living all across the globe it is difficult to perform clinical observations and upon this clinical data collection. On the other hand, due to the rapid development in gene therapies, there is also increased interest in disease-specific data from the biotech and pharma companies, but it is very hard to conduct them. However, there has been some positive shift in the last years in relation to data collection (with platforms collecting rare diseases specific data). These data are not collected in the clinical setting and are labelled as real-world data (RWD) as these data represent real insights and are conducted by citizens. RWD are not only lifestyle data (diet, sleep monitoring, etc.) collected through fitness trackers and smartwatches, but also PROs (patient/caregiver reported data). Specific rare disease platforms that collect PROs are usually using already approved/validated questionnaires. Due to the fact that patients/caregivers can answer questions online and on their own pace, these data platforms are very convenient to reach as many patients with a specific rare disease as possible (the global aspect), which is so hard to reach with classical in-person clinical settings. However, the collected data are not yet fully exploited, as platforms are mainly focusing on data collection only and not on data analytics. Because of that, the full potential of the PROs for rare diseases is still yet to come. In addition, clinicians are also not yet convinced that RWD PROs could be used for clinical research work, and this is something that we would like to change. The main objective is to develop AI/ML methods to provide interpretable explanations and new knowledge for rare diseases. The focus will be on the research of AI/ML methodologies on top of PROs, with the aim to show what information the collected data contains, and how to present this data to the clinicians in a structured, insightful, and helpful way. Our use case is the Genida registry (Genetic of Intellectual Disability and Autism Spectrum Disorders registry, managed by external partner IGBMC), collecting caregiver-reported data, as the rare disease patients covered are children and/or adults with intellectual disabilities. Our specific focus is the Kleefstra syndrome cohort, involving data for 200 Kleefstra syndrome patients from all continents. Till today this data represents the largest database of Kleefstra syndrome patients and their clinical features. Another important feature is that Genida is collecting data on a longitudinal basis, that is why correlations of symptoms during different time frames could be researched. For better UX, we will also build on human-computer interaction. This will be done in the sense of showing the results to the user (e.g. clinician), and the user would have a chance to ask the system back about the results and how and why the results were conducted. The system would show the features that help with the result explanation (e.g. which words were the most frequent in the cluster). As Kleefstra syndrome was discovered in year 2010 by clinical geneticist prof Tjitske Kleefstra from Netherlands (external partner Erasmus MC), it is relatively new. Kleefstra syndrome belongs to the group of neurodevelopmental disorders (short NDDs). With the rapidly evolving field of genetics, especially the technological advancements in genome sequencing, it is no wonder that NDDs represent the majority of rare diseases. Now it is time for AI/ML methodologies to thrive with new insights that are so much needed, as all of these diseases are so immensely underresearched.

Output

This micro project will develop new AI/ML research methodologies enabling new insights into rare diseases. The Kleefstra syndrome cohort involving data for 200 Kleefstra syndrome patients from all over the world will serve as our use case and the developed research results will be presented as a good practice example to clinicians, researchers, and rare disease patient advocacy organizations. With the results, we want to encourage further and wider participation of patients/caregivers in the data collection processes and the involvement of this data in the clinical and research work of clinicians and researchers. For better UX, we will build also on human-computer interaction ideas. This will be done in the sense of showing the micro project results to the user (e.g. clinician) using an user interface (UI), and the user would have a chance to ask the system back why the results are like that. The system would show the features that help with the result explanation (e.g. which words were the most frequent in the cluster). Main results of the micro project: The developed research methodologies will enable new insights into rare diseases through data analysis and AI/ML, and will serve the whole rare disease community. Tangible outputs:
– scientific publication
– a tangible result will be made available through the AI4EU (AI4Europe) platform

Project Partners

  • Jožef Stefan Institute, Erik Novak
  • Erasmuc MC, Tjitske Kleefstra
  • IGBMC, Pauline Burger
  • IDefine Europe, Martin Draksler

Primary Contact

Tanja Zdolšek Draksler, Jožef Stefan Institute