Contact person: Haris Papageorgiuo (haris@athenarc.gr)
Internal Partners:
- ATHENA RC, Haris Papageorgiou
- German Research Centre for Artificial Intelligence (DFKI), Julián Moreno Schneider
- OpenAIRE, Natalia Manola
SciNoBo is a microproject focused on enhancing science communication, particularly in health and climate change topics, by integrating AI systems with science journalism. The project aims to assist science communicators—such as journalists and policymakers—by utilizing AI to identify, verify, and simplify complex scientific statements found in mass media. By grounding these statements in scientific evidence, the AI will help ensure accurate dissemination of information to non-expert audiences. This approach builds on prior work involving neuro-symbolic question-answering systems and aims to leverage advanced language models, argumentation mining, and text simplification technologies. Technologically, we build on our previous MP work on neuro-symbolic Q&A (*) and further exploit and advance recent developments in instruction fine-tuning of large language models, retrieval augmentation and natural language understanding – specifically the NLP areas of argumentation mining, claim verification and text (ie, lexical and syntactic) simplification. The proposed MP addresses the topic of “Collaborative AI” by developing an AI system equipped with innovative NLP tools that can collaborate with humans (ie, science communicators -SCs) communicating statements on Health & Climate Change topics, grounding them on scientific evidence (Interactive grounding) and providing explanations in simplified language, thus, facilitating SCs in science communication. The innovative AI solution will be tested on a real-world scenario in collaboration with OpenAIRE by employing OpenAIRE research graph (ORG) services in Open Science publications.
Results Summary
The project is divided into two phases that ran in parallel. The main focus in phase I is the construction of the data collections and the adaptations and improvements needed in PDF processing tools. Phase II deals with the development of the two subsystems: claim analysis and text simplification as well as their evaluation.
- Phase I: Two collections with News and scientific publications will be compiled in the areas of Health and Climate. The News collection will be built based on an existing dataset with News stories and ARC automated classification system in the areas of interest. The second collection with publications will be provided by OpenAIRE ORG service and further processed, managed and properly indexed by ARC SciNoBo toolkit. A small-scale annotation is foreseen by DFKI in support of the simplification subsystem.
- Phase II: We developed, fine tuned and evaluated the two subsystems. Concretely, the “claim analysis” subsystem encompasses (i) ARC previous work on “claim identification”, (ii) a retrieval engine fetching relevant scientific publications (based on our previous miniProject), and (iii) an evidence-synthesis module indicating whether the publications fetched and the scientists’ claims therein, support or refute the News claim under examination.
Tangible Outcomes
- Kotitsas, S., Kounoudis, P., Koutli, E., & Papageorgiou, H. (2024, March). Leveraging fine-tuned Large Language Models with LoRA for Effective Claim, Claimer, and Claim Object Detection. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2540-2554). https://aclanthology.org/2024.eacl-long.156/
- HCN dataset: news articles in the domain of Health and Climate Change. The dataset contains news articles, annotated with the major claim, claimer(s) and claim object(s). https://github.com/iNoBo/news_claim_analysis
- Website demo: http://scinobo.ilsp.gr:1997/services
- Services for claim identification and the retrieval engine http://scinobo.ilsp.gr:1997/live-demo?HFSpace=inobo-scinobo-claim-verification.hf.space
- Service for the text simplification http://scinobo.ilsp.gr:1997/text-simplification