SciNoBo: An AI system collaborating with Journalists in Science Communication (resubmission)

Science communication conveys scientific findings and informs about research developments the general public, policymakers and other non-expert groups raising interest, trust in science and engagement on societal problems (e.g., United Nations Sustainable Development Goals). In this context, evidence-based science communication isolates topics of interest from the scientific literature, frames the relevant evidence and disseminates the relevant information to targeted non-scholarly audiences through a wide range of communication channels and strategies.

The proposed microproject (MP) focusses on science journalism and the public outreach on scientific topics in Health and Climate Change. The MP will bring together and enable interactions of science communicators (e.g., science journalists, policy analysts, science advisors for policymakers, other actors) with an AI system, capable of identifying statements about Health and Climate in mass media, grounding them on scientific evidence and simplifying the language of the scientific discourse by reducing the complexity of the text while keeping the meaning and the information the same.

Technologically, we plan to build on our previous MP work on neuro-symbolic Q&A (*) and further exploit and advance recent developments in instruction fine-tuning of large language models, retrieval augmentation and natural language understanding – specifically the NLP areas of argumentation mining, claim verification and text (ie, lexical and syntactic) simplification.

The proposed MP addresses the topic of “Collaborative AI” by developing an AI system equipped with innovative NLP tools that can collaborate with humans (ie, science communicators -SCs) communicating statements on Health & Climate Change topics, grounding them on scientific evidence (Interactive grounding) and providing explanations in simplified language, thus, facilitating SCs in science communication. The innovative AI solution will be tested on a real-world scenario in collaboration with OpenAIRE by employing OpenAIRE research graph (ORG) services in Open Science publications.

The proposed work is divided into two phases running in parallel. The main focus in phase I is the construction of the data collections and the adaptations and improvements needed in PDF processing tools. Phase II deals with the development of the two subsystems: claim analysis and text simplification as well as their evaluation.

Phase I
Two collections with News and scientific publications will be compiled in the areas of Health and Climate. The News collection will be built based on an existing dataset with News stories and ARC automated classification system in the areas of interest. The second collection with publications will be provided by OpenAIRE ORG service and further processed, managed and properly indexed by ARC SciNoBo toolkit. A small-scale annotation is foreseen by DFKI in support of the simplification subsystem.

Phase II
In phase II, we will be developing/advancing, finetuning and evaluating the two subsystems. Concretely, the “claim analysis” subsystem encompasses (i) ARC previous work on “claim identification”, (ii) a retrieval engine fetching relevant scientific publications (based on our previous miniProject), and (iii) an evidence-synthesis module indicating whether the publications fetched and the scientists’ claims therein, support or refute the News claim under examination.
DFKI will be examining both lexical and syntax-based representations, exploring their contribution to text simplification and evaluating (neural) simplification models on the Eval dataset. Phase II work will be led by ARC in collaboration with DFKI and OpenAIRE.

Ethics: AI is used but without raising ethical concerns related to human rights and values.

(*): Combining symbolic and sub-symbolic approaches – Improving neural QA-Systems through Document Analysis for enhanced accuracy and efficiency in Human-AI interaction.


Paper(s) in Conferences:
We plan to submit at least two papers about the “claim analysis” and the “text simplification” subsystems.

Practical demonstrations, tools:
A full-fledged demonstrator showing the functionality supported will be available (expected at the last month of the project).

Project Partners

  • ILSP/ATHENA RC, Haris Papageorgiou
  • German Research Centre for Artificial Intelligence (DFKI), Julián Moreno Schneider
  • OpenAIRE, Natalia Manola

Primary Contact

Haris Papageorgiou, ILSP/ATHENA RC

AI/ML methods to provide interpretable explanations and new knowledge for rare diseases.

To date, we know more than 7000 rare diseases and for the majority of them, there is a lack of relevant and quality data, also due to the fact that for a particular rare disease, there are only a few patients diagnosed in the world (small cohorts) and as these patients are living all across the globe it is difficult to perform clinical observations and upon this clinical data collection. On the other hand, due to the rapid development in gene therapies, there is also increased interest in disease-specific data from the biotech and pharma companies, but it is very hard to conduct them. However, there has been some positive shift in the last years in relation to data collection (with platforms collecting rare diseases specific data). These data are not collected in the clinical setting and are labelled as real-world data (RWD) as these data represent real insights and are conducted by citizens. RWD are not only lifestyle data (diet, sleep monitoring, etc.) collected through fitness trackers and smartwatches, but also PROs (patient/caregiver reported data). Specific rare disease platforms that collect PROs are usually using already approved/validated questionnaires. Due to the fact that patients/caregivers can answer questions online and on their own pace, these data platforms are very convenient to reach as many patients with a specific rare disease as possible (the global aspect), which is so hard to reach with classical in-person clinical settings. However, the collected data are not yet fully exploited, as platforms are mainly focusing on data collection only and not on data analytics. Because of that, the full potential of the PROs for rare diseases is still yet to come. In addition, clinicians are also not yet convinced that RWD PROs could be used for clinical research work, and this is something that we would like to change. The main objective is to develop AI/ML methods to provide interpretable explanations and new knowledge for rare diseases. The focus will be on the research of AI/ML methodologies on top of PROs, with the aim to show what information the collected data contains, and how to present this data to the clinicians in a structured, insightful, and helpful way. Our use case is the Genida registry (Genetic of Intellectual Disability and Autism Spectrum Disorders registry, managed by external partner IGBMC), collecting caregiver-reported data, as the rare disease patients covered are children and/or adults with intellectual disabilities. Our specific focus is the Kleefstra syndrome cohort, involving data for 200 Kleefstra syndrome patients from all continents. Till today this data represents the largest database of Kleefstra syndrome patients and their clinical features. Another important feature is that Genida is collecting data on a longitudinal basis, that is why correlations of symptoms during different time frames could be researched. For better UX, we will also build on human-computer interaction. This will be done in the sense of showing the results to the user (e.g. clinician), and the user would have a chance to ask the system back about the results and how and why the results were conducted. The system would show the features that help with the result explanation (e.g. which words were the most frequent in the cluster). As Kleefstra syndrome was discovered in year 2010 by clinical geneticist prof Tjitske Kleefstra from Netherlands (external partner Erasmus MC), it is relatively new. Kleefstra syndrome belongs to the group of neurodevelopmental disorders (short NDDs). With the rapidly evolving field of genetics, especially the technological advancements in genome sequencing, it is no wonder that NDDs represent the majority of rare diseases. Now it is time for AI/ML methodologies to thrive with new insights that are so much needed, as all of these diseases are so immensely underresearched.


This micro project will develop new AI/ML research methodologies enabling new insights into rare diseases. The Kleefstra syndrome cohort involving data for 200 Kleefstra syndrome patients from all over the world will serve as our use case and the developed research results will be presented as a good practice example to clinicians, researchers, and rare disease patient advocacy organizations. With the results, we want to encourage further and wider participation of patients/caregivers in the data collection processes and the involvement of this data in the clinical and research work of clinicians and researchers. For better UX, we will build also on human-computer interaction ideas. This will be done in the sense of showing the micro project results to the user (e.g. clinician) using an user interface (UI), and the user would have a chance to ask the system back why the results are like that. The system would show the features that help with the result explanation (e.g. which words were the most frequent in the cluster). Main results of the micro project: The developed research methodologies will enable new insights into rare diseases through data analysis and AI/ML, and will serve the whole rare disease community. Tangible outputs:
– scientific publication
– a tangible result will be made available through the AI4EU (AI4Europe) platform

Project Partners

  • Jožef Stefan Institute, Erik Novak
  • Erasmuc MC, Tjitske Kleefstra
  • IGBMC, Pauline Burger
  • IDefine Europe, Martin Draksler

Primary Contact

Tanja Zdolšek Draksler, Jožef Stefan Institute

We are going to build and evaluate a novel AI aviation assistant for supporting (general) aviation pilots with key flight information that facilitate decision making, placing particular emphasis on their efficient and effective visualization in 3D space.

Pilots frequently need to react to unforeseen in-flight events. Taking adequate decisions in such situations requires to consider all available information and demands strong situational awareness. Modern on-board computers and technologies like GPS radically improved the pilots’ abilities to take appropriate actions and lowered their required workload in recent years. Yet, current technologies used in aviation cockpits generally still fail to adequately map and represent 3D airspace. In response, we aim to create an AI aviation assistant that considers all relevant aircraft operation data, focuses on providing tangible action recommendations, and on visualizing them for efficient and effective interpretation in 3D space. In particular, we note that extended reality (XR) applications provide an opportunity to augment pilots’ perception through live 3D visualizations of key flight information, including airspace structure, traffic information, airport highlighting, and traffic patterns. While XR applications have been tested in aviation in the past, applications are mostly limited to military aviation and latest commercial aircrafts. This ignores the majority of pilots in general aviation, in particular, where such support could drastically increase situational awareness and lower the workload of pilots. General aviation is characterized as the non-commerical branch of aviation, often relating to single-engine and single-pilot operations.
To develop applications usable across aviation domains, we plan to create a Unity project for XR glasses. Based on this, we plan to, in the first step, systematically and iteratively explore suitable AI-based support on pilot feedback in a virtual reality study in a flight simulator. Based on our findings, we refine the Unity application and investigate opportunites to conduct a real test flight with our external partner ENAC, the French National School of Civil Aviation, who own a plane. Such a test flight would most likely use latest Augmented Reality headsets like the HoloLense 2. Considering the immense safety requirements for such a real test flight, this part of the project is considered optional at this stage and depends on the findings from the previous virtual reality evaluation.
The system development will particularly focus on the use XR techniques to create more effective AI-supported traffic advisories and visualizations. With this, we want to advance the coordination and collaboration of AI with human partners, establishing a common ground as a basis for multimodal interaction with AI (WP3 motivated). Further, the MP relates closely to “Innovation projects (WP6&7 motivated)”, calling for solutions that address “real-world challenges and opportunities in various domains such as (…) transportation […]”.


– Requirements and a prototype implementation for an AI-based assistant that provides recommendations and shows selected flight information based on pilot workload and current flight parameters
– A Unity project that implements an extended reality support tool for (general) aviation and that is used for evaluation in simulators (Virtual Reality) and possibly for a real test flight at ENAC (Augmented Reality)
– Findings from the simulator study and design recommandations
– (Optional) Impressions from a real test flight at ENAC
– A research paper detailing the system and the findings

Project Partners

  • Ludwig-Maximilians-Universität München (LMU), Florian Müller
  • Ecole Nationale de l'Aviation Civile (ENAC), Anke Brock

Primary Contact

Florian Müller, Ludwig-Maximilians-Universität München (LMU)