SciNoBo: An AI system collaborating with Journalists in Science Communication (resubmission)

Science communication conveys scientific findings and informs about research developments the general public, policymakers and other non-expert groups raising interest, trust in science and engagement on societal problems (e.g., United Nations Sustainable Development Goals). In this context, evidence-based science communication isolates topics of interest from the scientific literature, frames the relevant evidence and disseminates the relevant information to targeted non-scholarly audiences through a wide range of communication channels and strategies.

The proposed microproject (MP) focusses on science journalism and the public outreach on scientific topics in Health and Climate Change. The MP will bring together and enable interactions of science communicators (e.g., science journalists, policy analysts, science advisors for policymakers, other actors) with an AI system, capable of identifying statements about Health and Climate in mass media, grounding them on scientific evidence and simplifying the language of the scientific discourse by reducing the complexity of the text while keeping the meaning and the information the same.

Technologically, we plan to build on our previous MP work on neuro-symbolic Q&A (*) and further exploit and advance recent developments in instruction fine-tuning of large language models, retrieval augmentation and natural language understanding – specifically the NLP areas of argumentation mining, claim verification and text (ie, lexical and syntactic) simplification.

The proposed MP addresses the topic of “Collaborative AI” by developing an AI system equipped with innovative NLP tools that can collaborate with humans (ie, science communicators -SCs) communicating statements on Health & Climate Change topics, grounding them on scientific evidence (Interactive grounding) and providing explanations in simplified language, thus, facilitating SCs in science communication. The innovative AI solution will be tested on a real-world scenario in collaboration with OpenAIRE by employing OpenAIRE research graph (ORG) services in Open Science publications.

The proposed work is divided into two phases running in parallel. The main focus in phase I is the construction of the data collections and the adaptations and improvements needed in PDF processing tools. Phase II deals with the development of the two subsystems: claim analysis and text simplification as well as their evaluation.

Phase I
Two collections with News and scientific publications will be compiled in the areas of Health and Climate. The News collection will be built based on an existing dataset with News stories and ARC automated classification system in the areas of interest. The second collection with publications will be provided by OpenAIRE ORG service and further processed, managed and properly indexed by ARC SciNoBo toolkit. A small-scale annotation is foreseen by DFKI in support of the simplification subsystem.

Phase II
In phase II, we will be developing/advancing, finetuning and evaluating the two subsystems. Concretely, the “claim analysis” subsystem encompasses (i) ARC previous work on “claim identification”, (ii) a retrieval engine fetching relevant scientific publications (based on our previous miniProject), and (iii) an evidence-synthesis module indicating whether the publications fetched and the scientists’ claims therein, support or refute the News claim under examination.
DFKI will be examining both lexical and syntax-based representations, exploring their contribution to text simplification and evaluating (neural) simplification models on the Eval dataset. Phase II work will be led by ARC in collaboration with DFKI and OpenAIRE.

Ethics: AI is used but without raising ethical concerns related to human rights and values.

(*): Combining symbolic and sub-symbolic approaches – Improving neural QA-Systems through Document Analysis for enhanced accuracy and efficiency in Human-AI interaction.


Paper(s) in Conferences:
We plan to submit at least two papers about the “claim analysis” and the “text simplification” subsystems.

Practical demonstrations, tools:
A full-fledged demonstrator showing the functionality supported will be available (expected at the last month of the project).

Project Partners

  • ILSP/ATHENA RC, Haris Papageorgiou
  • German Research Centre for Artificial Intelligence (DFKI), Julián Moreno Schneider
  • OpenAIRE, Natalia Manola

Primary Contact

Haris Papageorgiou, ILSP/ATHENA RC