The ambition of the micro-project is to investigate the adoption of a multi-turn dialog strategy and the insertion in prompts of appropriate conceptual knowledge (e.g., definitions of the concepts to extract) or different types of examples (including negative examples) especially for the extraction of tasks and temporal flow relations between tasks.

Procedural documents are a source of temporal procedural knowledge of uttermost importance. These documents are different in format and scope, as they range from the description of administrative procedures to service manuals to medical guidelines and surgical procedures. The extraction of this complex and multidimensional knowledge, which includes a strong temporal dimension usually paired with further static dimensions concerning, for example, resources, tools, objects, costs, and so on, would be of the utmost importance for several tasks ranging from information extraction to validation and verification of the procedures themselves, up to the construction of AI-based systems that have to deal with these procedures (think for instance to an expert surgical system and assistant which may be involved in several different surgery procedures).
Knowledge graphs are a natural and expressive knowledge structure where to represent such multidimensional knowledge, and indeed the insertion of temporal knowledge within knowledge graphs is one of the hot challenges in this area. Nonetheless, the automated construction of knowledge graphs from procedural documents is a challenging research area. Here, the lack of annotated data, as well as raw text repositories describing real-world procedural documents, makes it extremely difficult to adopt deep learning approaches.

Pre-trained language models showed promising results concerning the knowledge extraction tasks from the models themselves. Although several works explored this strategy to build knowledge graphs, the viability of knowledge base construction by using a prompt-based learning strategy from such language models has not yet been investigated deeply. In this MP we would like to investigate the usage of prompt-based in-context learning strategy to extract, from natural language process descriptions, conceptual information that can be converted into their equivalent knowledge graphs. In particular, we would like to investigate the adoption of a multi-turn dialog strategy and the insertion of prompts of appropriate conceptual knowledge (e.g., definitions of the concepts to extract) or different types of examples (including negative examples), especially for the extraction of tasks and temporal flow relations between tasks. As such the work can contribute to the construction of structured narratives using machine learning models and hopefully enrich with conceptual knowledge in input. Moreover, the adoption of a multi-turn dialog strategy could provide insight into how these models can be used to complement the multi-turn dialog strategy usually adopted by domain experts in traditional knowledge modeling pipelines.


– At least one paper published in a ranked conference or journal.
– An open dataset containing both raw text and their equivalent structured representation providing a ground for future research activities.

Project Partners

  • Fondazione Bruno Kessler, Mauro Dragoni
  • University of Verona, Marco Rospocher

Primary Contact

Mauro Dragoni, Fondazione Bruno Kessler

Build human-in-the-loop intelligent systems for the geolocation of social media images in natural disasters

Social media generate large amounts of almost real-time data which can turn out extremely valuable in an emergency situation, specially for providing information within the first 72 hours after a disaster event. Despite there is abundant state-of-the-art machine learning techniques to automatically classify social media images and some work for geolocating them, the operational problem in the event of a new disaster remains unsolved.
Currently the state-of-the-art approach for dealing with these first response mapping is first filtering and then submitting the images to be geolocated to a crowd of volunteers [1], assigning the images randomly to the volunteers.

The project is aimed at leveraging the power of crowdsourcing and artificial intelligence (AI) to assist emergency responders and disaster relief organizations in building a damage map from a zone recently hit by a disaster.

Specifically, the project will involve the development of a platform that can intelligently distribute geolocation tasks to a crowd of volunteers based on their skills. The platform will use machine learning to determine the skills of the volunteers based on previous geolocation experiences.

Thus, the project will concentrate on two different tasks:
• Profile Learning. Based on the previous geolocations of a set of volunteers, learn a profile of each of the volunteers which encodes its geolocation capabilities. This profiles should be unterstood as competency maps of the volunteer, representing the capability of the volunteer to provide an accurate geolocation for an image coming from a specific geographical area.
• Active Task Assigment. Use the volunteer profiles efficiently in order to maximize the geolocation quality while maintaining a fair distribution of geolocation tasks among volunteers.

On a first stage we envision an experimental framework with realistically generated artificial data, which acts as a feasibility study. This will be published as a paper in a major conference or journal. Simultaneously we plan to integrate both the profile learning and the active task assignment with the crowdnalysis library, a software outcome of our previous micro-project. Furthermore, we plan to organize a geolocation workshop to take place in Barcelona with participation from the JRC, University of Geneva, United Nations, and IIIA-CSIC.

In the near future, the system will generate reports and visualizations to help these organizations quickly understand the distribution of damages. The resulting platform could enable more efficient and effective responses to natural disasters, potentially saving lives and reducing the impact of these events on communities.
The microproject will be developed by IIIA-CSIC and the University of Geneva. The micro project is also of interest to the team lead by Valerio Lorini at the Joint Research Center of the European Commission @ Ispra, Italy, who will most likely attend the geolocation workshop which we will be putting forward.

The project is in line with "Establishing Common Ground for Collaboration with AI Systems (WP 1-2)", because it is a microproject that " that seeks to provide practical demonstrations, tools, or new theoretical models for AI systems that can collaborate with and empower individuals or groups of people to attain shared goals" as is specifically mentioned in the Call for Microprojects.

The project is also in line with "Measuring, modeling, predicting the individual and collective effects of different forms of AI influence in socio-technical systems at scale (WP4)" since it ecomprises the design of a human-centered AI architectures that balance individual and collective goals for the task of geolocation.

[1] Fathi, Ramian, Dennis Thom, Steffen Koch, Thomas Ertl, and Frank Fiedrich. “VOST: A Case Study in Voluntary Digital Participation for Collaborative Emergency Management.” Information Processing & Management 57, no. 4 (July 1, 2020): 102174.


– Open source implementation of the volunteer profiling and consensus geolocation algorithms into the crowdnalysis library.
– Paper with the evaluation of the different geolocation consensus and active strategies for geolocation
– Organization of a one day workshop with United Nations, JRC, University of Geneva, CSIC

Project Partners

  • Consejo Superior de Investigaciones Científicas (CSIC), Jesus Cerquides
  • University of Geneva, Jose Luis Fernandez Marquez

Primary Contact

Jesus Cerquides, Consejo Superior de Investigaciones Científicas (CSIC)

Developing user-friendly software for narrative analysis of text data.

In this project we continue the development of the Segram package for Python. The purpose of the package is to provide tools for automated narrative analysis of text data focused on extracting information on basic building blocks of narratives – agents (both active and passive), actions, events, or relations between agents and actions (e.g. determining subjects and objects of actions), as well as descriptions of actors, actions and events. The development process is also naturally paired with conceptual work on representations of narratives.

The package is designed as a graybox model. It is based on an opaque statistical language model providing linguistic annotations, which are subsequently used by transparent deterministic algorithms for discovering narrative elements. Thus, the final output should be easy to interpret and validate by human users, whenever necessary. Moreover, by lifting the analysis from the purely linguistic level to the arguably more intuitive level of narratives, it is hoped that the provided tools will be significantly easier to use and understand for end users, including those without training in linguistics and/or computer science.

The proposed framework is aimed at language understanding and information extraction, as opposed to language generation. Namely, the role of the package is to organize narrative information in convenient data structures allowing effective querying and deriving of various statistical descriptions. Crucially, thanks to its semi-transparent nature, the produced output should be easy to validate for human users. This should facilitate development of shared representations (corresponding the WP1 and WP2 motivated goal: „Establishing Common Ground for Collaboration with AI Systems”) of narratives, understandable for both humans and machines, that are the same time trustworthy (by being easy to validate for humans), which is arguably a desirable feature, for instance in comparison to increasingly powerful but hard-to-trust large language models. In particular, the package should be useful for facilitating and informing human-driven analyses of text data.

Alpha version of the package implementing core functionalities related to grammatical and narrative analysis is ready. The goal of the present microproject is to improve the package and release a beta version. This will include implementing an easy-to-use interface (operating at the level of narrative concepts) for end users allowing effective querying and analysis of the data produced by Segram as well as developing a comprehensive documentation. Thus, the planned release should be ready for broader adoption to a wide array of use cases and users with different levels of linguistic/computational expertise.


1. Segram package for Python published officialy at Python Package Index (PyPI, It may be also published at Conda-forge, but it is not yet guaranteed at this stage.

2. Comprehensive package documentation available online at the Read the Docs platform (

Project Partners

  • University of Warsaw, Andrzej Nowak

Primary Contact

Szymon Talaga, University of Warsaw

Results Description

The aim of the project is to develop a software package (for Pyhon) providing easy to use and understand (also for researchers not trained in computer science or linguistics) tools for extracting narrative information (active and passive actors, the actions they perform as well as descriptions of both actors and actions, which together define events) and organizing them in rich hierarchical data structures (data model is implicitly graphical) from which subsequently different sorts of descriptive statistics can be generated depending on particular research questions. Crucially, for this to be practically possible, a legible and efficient framework for querying the produced data is needed.

The above goal fits into a broader HumanE-AI objective of developing common ground concepts providing better representations shared by humans and machines alike. In particular, the contribution of the project to work on aligning machine analyses with human perspective through the notion of narratives is twofold. Firstly, narrative-oriented tools for automated text analyses can empower human analysts as, arguably, the narrative framework provides a more natural and meaningful context for people without formal training in linguistics and/or computer science for reasoning about textual data. Secondly, the development of the software for narrative analysis is naturally intertwined with conceptual work on the core terms and building blocks of narratives, which can inform subsequent work on more advanced approaches.

Importantly, the software is developed as a graybox model, in which core low-level NLP tasks, such as POS and dependency tagging, are performed by a blackbox statistical model, and then they are transformed to higher order grammar and narrative data based on a set of transparent deterministic rules. This is to ensure high explainability of the approach, which is crucial for systems in which the machine part is supposed to be a helper of a human analyst instead of an implicit leader.

Currently, the core modules of the package responsible for the grammatical analysis are mostly ready (but several improvements are still planned). This includes also a coreference resolution module. Moreover, the core part of the semantic module, which translates grammatical information to more semantic constructs focused on actors, actions and descriptions, is also ready. What is still missing are an interface exposing methods for end users allowing easy access and analysis of rich data produced by the package as well as a principled and convenient query framework on which the interface should be based. This is the main focus of the ongoing and future work. The second missing part is the documentation, but this part is best finished after the interface is ready.
Thus, even though the package in the current state can seem a little rough from the perspective of an end user, its quality and usefulness will increase steadily as new updates are delivered.



Links to Tangible results