Contact person:  Loris Bozzato, (

Internal Partners:

  1. Fondazione Bruno Kessler, Loris Bozzato,
  2. Technical University of Vienna, Thomas Eiter, 

External Partners:

  1. BOSCH Deutschland, Stepanova Daria


The previous requirements fit very well with the capabilities of MR-CKR: on the one hand, we have different contexts in which the inputs need to be modified to suit a different diagnosis of failure of the model. On the other hand, we can exploit the different relations by having one relation that specifies that inputs are more modifiable in one context than another and another relation that describes whether one diagnosis is a special case of another. Additionally, it allows us to incorporate global knowledge such that we can only modify inputs in such a manner that the result is still “”realistic””, i.e., satisfies the axioms in the global knowledge.

In this work, we provide a prototype specialized in generating similar and problematic scenes in the domain of Autonomous Driving.

Results Summary

We show that the proposed approach provides high-quality semantic segmentation from the robot’s perspective, with accuracy comparable to the original one. In addition, we exploited the gained information and improved the recognition performance of the deep network for the lower viewpoints and showed that the small robot alone is capable of generating high-quality semantic maps for the human partner. The computations are close to real time, so the approach enables interactive applications.

Tangible Outcomes

  1. Prototype implementation: 

Contact person: Samuel Kaski (

Internal Partners:

  1. Aalto University
  2. Delft University of Technology, Frans Oliehoek


In human-AI collaboration, one of the key difficulties is establishing a common ground for the interaction, especially in terms of goals and beliefs. In practice, the AI might not have access to this necessary information directly and must infer it during the interaction with the human. However, training a model to support this kind of inference would require massive collections of interaction data and is not feasible in most applications.

Modern cognitive models, on the other hand, can equip AI tools with the necessary prior knowledge to readily support inference, and hence, to quickly establish a common ground for collaboration with humans. However, utilizing these models in realistic applications is currently impractical due to their computational complexity and non-differentiable structure.

Contact person: Catholijn Jonker, Maria Tsfasman (;

Internal Partners:

  1. Technical University Delft, Catharine Oertel,
  2. Eotvos Lorand University, Andras Lorincz,


In this micro-project, we propose investigating human recollection of team meetings and how conversational AI could use this information to create better team cohesion in virtual settings. Specifically, we would like to investigate how a person’s emotion, personality, relationship to fellow teammates, goal and position in the meeting influences how they remember the meeting. We want to use this information to create memory aware conversational AI that could leverage such data to increase team cohesion in future meetings. To achieve this goal, we first record a multi-modal dataset of team meetings in a virtual setting. Second, administrate questionnaires to participants in different time intervals succeeding a session. Third, annotate the corpus. Fourth, carry out an initial corpus analysis to inform the design of memory-aware conversational AI. This micro-project will contribute to a longer-term effort in building a computational memory model for human-agent interaction.

Results Summary

The MEMO corpus was collected, which contains 45 group discussions around the topic of COVID-19. A total of 15 groups were formed, consisting of 3 to 6 participants who took part in 3 group discussions, with a 3-4 day gap between sessions. A total of 59 individuals with diverse backgrounds took part in the study. Before and after each session participants completed a series of questionnaires to determine which moments they recalled from their conversations, along with their personality traits, values and perceptions.

To capture conversational memory, we collected first-party free-recall reports of the most memorable moments from the discussion immediately after the interaction and again 3-4 days later. For the shorter-term memories, participants also mapped the moments to a particular interval in the video of their discussion, which were used for the ground-truth conversational memory annotations.

For each participant, personality and value profiles were recorded in the pre-screening survey, along with demographic information to identify their social group affected by COVID-19. Pre-session questionnaires also assessed participants’ mood before each session. Post-session questionnaire included questions about mutual understanding, personal attitude and perceived social distance. The perception of the discussion and the group as a whole was also monitored in the post-session questionnaire with variables such as Task and Group Cohesion, Entitativity, Perceived Interdependence, Perceived Situation Characteristics, Syncness, and Rapport.

The following automatic annotations were extracted on the corpus:

* Transcripts – Transcripts were generated with automatic speech recognition methods and were manually reviewed and corrected where needed. Transcript timestamps are available at the utterance level as well as word-level text grid files for each recording. Speaker diarization is also available.

* Eye gaze and head pose – automatically annotated with EyeWare software, the annotation itself will be provided, but the code uses proprietary API. This includes gaze targets collected through screenshots of participants’ screen views.

* Prosody – eGeMAPS feature set was extracted from the default eGeMAPS configuration in OpenSmile

* Body pose – Body pose (upper body only) and hand pose when visible were estimated with the models available in the MediaPipe software

* Facial action units – Facial action units were estimated for participants using the OpenFace Software

A Paper describing the corpus and the annotations in more detail is in preparation. Additionally, the collected annotations are to be packaged in an appropriate manner for ease of use for future researchers.

Tangible Outcomes

  1. Tsfasman, M., Fenech, K., Tarvirdians, M., Lorincz, A., Jonker, C., & Oertel, C. (2022). Towards creating a conversational memory for long-term meeting support: predicting memorable moments in multi-party conversations through eye-gaze. In ICMI 2022 – Proceedings of the 2022 International Conference on Multimodal Interaction (pp. 94-104). (ACM International Conference Proceeding Series). Association for Computing Machinery (ACM). 
  2. summary

Contact person: James Crowley (

Internal Partners:

  1. Eotvos Lorand University – ELTE, Andras Lorincz
  2. Univ Grenoble Alpes, Dominique Vaufreydaz, Fabien Ringeval
  3. Uni Paris Saclay, Camille Guinaudeau, Marc Evrard
  4. Jozef Stefan Institut-JSI, Marko Grobelnik
  5. Charles University, Pavel Pecina


Transformers and self-attention (Vaswani et al., 2017), have become the dominant approach for natural language processing (NLP) with systems such as BERT (Devlin et al., 2019) and GPT-3 (Brown et al., 2020) rapidly displacing more established RNN and CNN structures with an architecture composed of stacked encoder-decoder modules using self- attention. This micro-project provide tools and data sets for experiments and a first initial demonstration of the potential of transformers for multimodal perception and multimodal interactions. We define research challenges, benchmark data sets and performance metrics for multimodal perception and interaction tasks such as (1) audio-visual narration of scenes, cooking actions and activities, (2) audio-video recordings of lectures and TV programs (3) audio-visual deictic (pointing) gestures, and (4) perception and evocation of engagement, attention, and emotion.

1) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. and Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762

2) Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

3) Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., and Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.

Results Summary

In this project, we explore the potential of Transformer-based models in two significant domains: unsupervised object discovery and multimodal emotion recognition using physiological signals. First, we demonstrate a novel approach for unsupervised object discovery by leveraging self-supervised learning with self-distillation loss (DINO). Our method utilizes visual tokens as nodes in a weighted graph, where edges reflect connectivity scores based on token similarity. By applying a normalized graph-cut and solving it through spectral clustering with generalized eigen-decomposition, we isolate foreground objects. This approach effectively segments self-similar regions, with the second smallest eigenvector of the decomposition providing the cutting solution that indicates token association with foreground objects. This technique not only simplifies the object discovery process but also achieves substantial performance improvements over current state-of-the-art methods such as LOST, outperforming it by 6.9%, 8.1%, and 8.1% on the VOC07, VOC12, and COCO20K benchmarks, respectively. Furthermore, integrating a second-stage class-agnostic detector (CAD) enhances these results, and our method’s adaptability is demonstrated in its application to unsupervised saliency detection and weakly supervised object detection, achieving notable IoU improvements on the ECSSD, DUTS, and DUT-OMRON datasets.

In parallel, we address the challenge of multimodal emotion recognition from physiological signals using Transformer-based models. Recognizing the advantages of attention mechanisms in Transformers for creating contextualized representations, we propose a model for processing electrocardiogram (ECG) data to predict emotions. This model highlights significant segments of the signal, ensuring that relevant information is given priority. Due to the limited size of datasets with emotional labels, we adopt a self-supervised learning approach. We pre-train our model using unlabelled ECG datasets to build robust representations and then fine-tune it on the AMIGOS dataset for emotion recognition. Our findings confirm that this approach achieves state-of-the-art results in emotion recognition tasks involving ECG signals. Additionally, the success of this strategy underscores the broader potential of Transformers and pre-training techniques for analyzing time-series data in emotion recognition tasks.

Overall, the outcomes of our project demonstrate that Transformer-based models, coupled with self-supervised learning, can significantly enhance the performance of both unsupervised object discovery and emotion recognition from physiological signals. These methods provide robust solutions for complex visual and temporal signal analysis tasks, marking a substantial step forward in computer vision and affective computing.

Tangible Outcomes

  1. Y. Wang, X. Shen, S. Hu, Y. Yuan, J. L. Crowley, D. Vaufreydaz, Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut. IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2022, pp14543-14553, New Orleans, Jun 2022. 
  2. J. Vazquez-Rodriguez, G. Lefebvre, J. Cumin and J. L Crowley, “Emotion Recognition with PreTrained Transformers Using Multimodal Signals”, 10th International Conference on Affective Computing and Intelligent Interaction (ACII), Oct 2022  .
  3. J. Vazquez-Rodriguez, G. Lefebvre, J. Cumin, J. L. Crowley. Transformer-Based SelfSupervised Learning for Emotion Recognition. 26th International Conference on Pattern Recognition (ICPR 2022), Aug 2022, Montreal, Canada. 
  4. A survey of tools and datasets for a multimodal perception with transformers ( )
  5. A tutorial on the use of transformers for multimodal perception. ( )
  6. Report on challenges for the use of transformers for multimodal perception and interaction. ( )

Contact person: Agnes Grünerbl (

Internal Partners:

  1. DFKI, EI, Agnes Grünerbl  

External Partners:

  1. Health Department, Unviversity of Southampton, Eloise Monger  


High-quality education and training of nurses are of utmost importance to keep high standards in medical care. Nevertheless, as the covid pandemic has shown quite impressively, there are too few healthcare professionals available. Therefore, education and training of nurse students, or adapting the training of nurses is challenged to accelerate, to have manpower of nurses available when it is required. Still, accelerating training often comes with reduced quality, which can easily lead to bad qualifications and, in the worst case, to a lethal outcome. Thus, in nurse training a pressing question is, how to optimize and with it accelerate training without suffering in quality. One of the significant questions for teachers in training nurse students is to understand the state of a student’s education. Are some students in need of more repetitions? Which students can proceed to the next level, who is ready to get in contact with actual patients? In this regard, optimization of training means to individualize, not only individualize the training of students but also individualize the feedback and information a teacher gets about their way of teaching. We believe this to be a field where Artificial Intelligence (AI) and more specifically the application of foundational models (LLMs large language models, paired with other methods of machine learning) can provide real support. In the first part of this microproject, together with Nurse-Teachers of the University of Southampton, we want to define and design an LWM that fits the requirements of nurse training. For this, 2-3 nurse teachers from Southampton will visit DFKI in order to get a feeling for systems that are available, and also what applications are feasible. In turn, researchers of DFKI will visit the nurse training facilities in Southampton to get a better picture of how nurse training is conducted. At the end of this first phase of the microproject, an LWM (large whatever model) is defined (existing LLMs combined with additional features and data sources, as required). In the second phase, this LWM will be implemented and tested against videos of recorded training sessions. Specific focus will be set on:• How to understand the action of a particular person?• Actions taken by the trainee, are they correct or false? What would have been the correct action?• Which teaching efforts work and which do not as much? • Which useful suggestions and feedback can be provided to the trainees and teachers?

Results Summary

Building models of medical procedures require efforts that go beyond the scope and time frame of a micro-project. Therefore, this work is still ongoing and will proceed after the end of the Humane AI Net.

So in regards of project result at the time Humane AI Net ended is:

  • identification of scenarios with a potential for generative AI to benefit health training – training of cannulation and venipuncture
  • defining a procedure how to introduce Generative AI in training of cannulation and venipuncture.
  • planning a study towards developing the required LWM models
  • recording an extensive data-set in an actual medical training facility following actual training procedures.
  • starting the long process of data processing and algorithm development (which is ongoing)

We collected a dataset consisting of: 90h of video (20 person recording 4 sessions of about 20+ min each, from 3 different cameras) acompanied with respective IMU Data + GoPro user view + audio recording and expert feedback of the process of cannulation and venipuncture.

Tangible Outcomes

  1. [arxiv] Stefan Fritsch and Matthias Tschoepe and Vitor Fortes Rey and Lars Krupp and Agnes Gruenerbl and Eloise Monger and Sarah Travenna, GenAI Assisting Medical Training, arXiv, mobiCHAI workshop in MobileHCI2024 
  2. presented at: mobiCHAI – 1st International Workshop on Mobile Cognition-Altering Technologies (CAT) using Human-Centered AI, at The ACM International Conference on Mobile Human-Computer Interaction Melbourne, Australia

Contact person: Frank van Harmelen, VU University (

Internal Partners:

  1. Stichting VU,
  2. FBK, Luciano Serafini 


This project builds on earlier work by FBK in Trento on KENN [1] and by VUA in Amsterdam [2] and aims to combine the insights of both. The project has 3 aims:

  1. The current version of KENN uses the Gödel t-conorm. We will develop versions of KENN based on other t-conorms (like the product t-conorm and Łukasiewicz), whose properties have been investigated in the earlier work by VUA. This should improve the performance of KENN.
  2. We will try to extend the expressivity of the logical constraints in KENN from sets of clauses to implications, again using the earlier theoretical work by VUA. This should increase the reasoning capabilities of KENN.
  3. It should be possible to check the exact contribution of each clause to the final predictions of KENN. This will increase explainability of KENN.



Results Summary

KENN is a neuro-symbolic architecture developed in Trento. It allows the injection of a knowledge-base when training a neural network. Theoretical work from Amsterdam has been used to improve KENN. As a result of using background knowledge from knowledge we can train the neural network with many fewer training examples. Since KENN is based on fuzzy logic, a major bottleneck was the choice of the appropriate configuration of the logic (choice of norms and co-norms), since earlier work from Amsterdam had showed that some of the classical fuzzy logic configurations would perform very poorly in a machine learning setting (with large areas of their value space having a 0 gradient, or a 0 gradient for one of their input values).

As a result of the collaborations (visit from Amsterdam staff to Trento and vice versa), we have developed so called Fuzzy Refinement Functions). Such “”refinement functions”” are functions that change the truth value computed by a fuzzy logic operator in order to improve the gradient behaviour, while still maintaining the desired logical combinatorics. We have implemented such refinement functions in an algorithm called Iterative Local Refinement (ILR). Our experiments have shown that ILR finds refinements on complex SAT formulas in significantly fewer iterations and frequently finds solutions where gradient descent can not. Finally, ILR produces competitive results in the MNIST addition task.

Tangible Outcomes

  1. Refining neural network predictions using background knowledge, Alessandro Daniele, Emile van Krieken, Luciano Serafini & Frank van Harmelen, Machine Learning (2023) 
  2. data: 
  3. code:

Contact person: Frank van Harmelen (

Internal Partners:

  1. Stichting VU,
  2. Universitat Pompeu Fabra (UPF),


IRL, developed by Luc Steels and collaborators, is a parsing technique that captures the semantics of a natural language expression as a network of logical constraints. Determining the meaning of a sentence then amounts to finding a consistent assignment of variables that satisfies these constraints.Typically, such meaning can only be determined (i.e., such constraints can only be resolved) by using the context (“narrative”) in which the sentence is to be interpreted. The central hypothesis of this project is that modern large-scale knowledge graphs are a promising source of such contextual information to help resolve the correct interpretation of a given sentence.We developed an interface between an existing IRL implementation and an existing knowledge-graph reasoning engine to test this hypothesis. Evaluation will be done on a corpus of sentences from social-historical scientific narratives against corresponding knowledge graphs with social-historical data.

Results Summary

This micro-project aims to build a bridge between a language processing system (incremental recruitment language (IRL)) and semantic memory (knowledge graphs), for building and parsing narratives.In IRL, a sentence is represented as a network of logical constraints. Resolving the interpretation of a sentence comes down to finding a consistent assignment of entities from the real world that satisfy these constraints. In this microproject, we have used knowledge graphs and other open data repositories as an external source of world knowledge that can be used to bind and disambiguate entities in context.

We have implemented a new library called Web-Services that interacts, through the use of APIs, with several open data knowledge repositories, and integrates their semantic facts into language models such as IRL. Using the Web-Services library, users can write IRL programs that send requests to different open data APIs, or convert SPARQL queries into RESTful APIs using GRLC.

Tangible Outcomes

  1. Web-Services library – Steels, Luc & Van Harmelen, Frank & Van Trijp, Remi & Galletti, Martina & Kozakosczak, Jakub & Stork, Lise & Tiddi, Ilaria
  2. Video presentation summarizing the project

Contact person: Silvia Tulli (

Internal Partners:

  1. ISIR, Sorbonne University, Silvia Tulli 


Interactive Machine Learning (IML) has gained significant attention in recent years as a means for intelligent agents to learn from human feedback, demonstration, or instruction. However, many existing IML solutions primarily rely on sparse feedback, placing an unreasonable burden on the expert involved. This project aims to address this limitation by enabling the learner to leverage richer feedback from the expert, thereby accelerating the learning process. Additionally, we seek to incorporate a model of the expert to select more informative queries, further reducing the burden placed on the expert.

We have three objetives:

(1) Explore and develop methods for incorporating causal and contrastive feedback, as supported by evidence from psychology literature, into the learning process of IML.

(2) Design and implement a belief-based system that allows the learner to explicitly maintain beliefs about the possible expert objectives, influencing the selection of queries.

(3) Utilize the received feedback to generate a posterior that informs subsequent queries and enhances the learning process within the framework of Inverse ReinforcementLearning (IRL).

The project addresses several key aspects highlighted in the work package on Collaboration with AI Systems (W1-2). Firstly, it focuses on AI systems that can communicate and understand descriptions of situations, goals, intentions, or operational plans to establish shared understanding for collaboration. By explicitly maintaining beliefs about the expert’s objectives and integrating causal and contrastive feedback, the system aims to establish a common ground and improve collaboration. Furthermore, the project aligns with the objective of systems that can explain their internal models by providing additional information to justify statements and answer questions. By utilizing the received feedback to generate a posterior and enhance the learning process, the system aims to provide explanations, verify facts, and answer questions, contributing to a deeper understanding and shared representation between the AI system and the human expert. The project also demonstrates the ambition of enabling two-way interaction between AI systems and humans, constructing shared representations, and allowing for the adaptation of representations in response to information exchange. By providing tangible results, such as user-study evaluations and methods to exploit prior knowledge about the expert, the project aims to make measurable progress toward collaborative AI.

Results Summary

This project resulted in an exchange period during which our collaborator came to our lab and spent a month with us. This opportunity allowed us to conceptualize and write a paper that we plan to submit to the IJCAI conference in December 2024. The paper addresses the challenge of learning from individuals who have a different model of the task. Specifically, we focused on identifying human bottleneck states, determining the maximal achievable set of these states given the robot’s model of the task, and querying for the bottlenecks when they cannot be achieved due to the constraints of the robot model.

In addition, we have also begun working on a survey paper regarding human modeling in sequential decision-making, which has led to a workshop paper that we are currently extending for journal publication.

Tangible Outcomes

  1. [arxiv] Human-Modeling in Sequential Decision-Making: An Analysis through the Lens of Human-Aware AI by Silvia Tulli, Stylianos Loukas Vasileiou, Sarath Sreedharan 

Contact person: Joao Gama (

Internal Partners:

  1. INESC TEC, Joao Gama, Bruno Veloso, and S´onia Teixeira
  2. Consiglio Nazionale delle Ricerche (CNR), Giuseppe Manco and Luciano Caroprese
  3. University of Leiden, Holger Hoos and Matthias K¨onig

External Partners:

  1. Portucalense University, Bruno Veloso
  2. University of British Columbia, Holger H. Hoos


Online AutoML in environments where the working conditions change over time.The main goal consists of studying online optimization methods for hyper-parameter tuning. In dynamic environments, the “optimal” hyper-parameters might change over time.Online AutoML consists of an exploration phase followed by an exploitation phase.The exploration phase is looking to find the set of hyper-parameters for the current working condition. The exploitation phase continuously monitors the learning process to detect degradation in the performance of the system which triggers a new exploitation phase.We consider complex problems described by pipelines where each step in the pipeline has its own hyper-parameters. We consider problems with many hyper-parameters where some of them might be irrelevant. Among the relevant parameters, the complexity of the model architecture (with particular reference to deep networks) is of particular relevance and is the objective of our study.

Results Summary

A Bayesian generative model is presented for recommending interesting items and trustworthy users to the targeted users in social rating networks with asymmetric and directed trust relationships. The proposed model is the first unified approach to the combination of the two recommendation tasks. Within the devised model, each user is associated with two latent-factor vectors, i.e., her susceptibility and expertise. Items are also associated with corresponding latent-factor vector representations. The probabilistic factorization of the rating data and trust relationships is exploited to infer user susceptibility and expertise. Statistical social-network modeling is instead used to constrain the trust relationships from a user to another to be governed by their respective susceptibility and expertise. The inherently ambiguous meaning of unobserved trust relationships between users is suitably disambiguated. An intensive comparative experimentation on real-world social rating networks with trust relationships demonstrates the superior predictive performance of the presented model in terms of RMSE and AUC.


Tangible Outcomes

  1. hyper-Parameter Optimization for Latent Spaces in Dynamic Recommender Systems – Bruno Veloso, Luciano Caroprese, Matthias Konig, Sonia Teixeira, Giuseppe Manco, Holger H. Hoos, and Joao Gama in Machine Learning and Knowledge Discovery in Databases. Research Track – European Conference, ECML PKDD 2021, Bilbao, Spain, September 13-17, 2021 
  2. Generator for preference data – Bruno Veloso, Luciano Caroprese, Matthias Konig, Sonia Teixeira, Giuseppe Manco, Holger H. Hoos, and Joao Gama 
  3. Self Hyper-parameter tunning – Bruno Veloso, Luciano Caroprese, Matthias Konig, Sonia Teixeira, Giuseppe Manco, Holger H. Hoos, and Joao Gama 

Contact person: Mauro Dragoni (

Internal Partners:

  1. Fondazione Bruno Kessler, Mauro Dragoni  

External Partners:

  1. University of Verona, Marco Rospocher  


Procedural documents are a source of temporal procedural knowledge of uttermost importance. These documents are different in format and scope, as they range from the description of administrative procedures to service manuals to medical guidelines and surgical procedures. The extraction of this complex and multidimensional knowledge, which includes a strong temporal dimension usually paired with further static dimensions concerning, for example, resources, tools, objects, costs, and so on, would be of the utmost importance for several tasks ranging from information extraction to validation and verification of the procedures themselves, up to the construction of AI-based systems that have to deal with these procedures (think for instance to an expert surgical system and assistant which may be involved in several different surgery procedures). Knowledge graphs are a natural and expressive knowledge structure where to represent such multidimensional knowledge, and indeed the insertion of temporal knowledge within knowledge graphs is one of the hot challenges in this area. Nonetheless, the automated construction of knowledge graphs from procedural documents is a challenging research area. Here, the lack of annotated data, as well as raw text repositories describing real-world procedural documents, makes it extremely difficult to adopt deep learning approaches. Pre-trained language models showed promising results concerning the knowledge extraction tasks from the models themselves. Although several works explored this strategy to build knowledge graphs, the viability of knowledge base construction by using a prompt-based learning strategy from such language models has not yet been investigated deeply. In this MP, we would like to investigate the usage of prompt-based in-context learning strategy to extract, from natural language process descriptions, conceptual information that can be converted into their equivalent knowledge graphs. In particular, we would like to investigate the adoption of a multi-turn dialog strategy and the insertion of prompts of appropriate conceptual knowledge (e.g., definitions of the concepts to extract)or different types of examples (including negative examples), especially for the extraction of tasks and temporal flow relations between tasks. As such, the work can contribute to the construction of structured narratives using machine learning models and hopefully enrich conceptual knowledge in input. Moreover, the adoption of a multi-turn dialog strategy could provide insight into how these models can be used to complement the multi-turn dialog strategy usually adopted by domain experts in traditional knowledge modeling pipelines.

Contact person: Brian Ravenet (

Internal Partners:

  1. CNRS, Brian Ravenet,
  2. INESC-ID, Rui Prada,


This project aims at investigating the construction of humor models to enrich conversational agents through the help of interactive reinforcement learning approaches. Our methodology consists of deploying an online platform where passersby can play a game of matching sentences with humorous comebacks against an agent. The data collected from these interactions helps to gradually build the humor models of the agent following state of the art Interactive Reinforcement Learning techniques. Our work resulted in an implementation of the platform, a first model for humor-enabled conversational agent and a publication of the obtained results and evaluations.

Results Summary

The main result of this project is the creation of an intelligent agent capable of playing a game – Cards Against Humanity- that involves matching sentences with humorous comebacks. In order to achieve this, a dataset of 1712 jokes, rated on a scale of 1 to 9 in terms of joke level, originality, positivity, entertainment, whether it makes sense and whether it is family-friendly, were collected and an online game was developed to serve as the foundation of the reinforcement mechanism.

Contact person: Jesus Cerquides (

Internal Partners:

  1. Consejo Superior de Investigaciones Científicas (CSIC), Jesus Cerquides,

External Partners:

  1. University of Geneva, Jose Luis Fernandez Marquez  


Social media generates large amounts of almost real-time data which can turn out extremely valuable in an emergency situation, especially for providing information within the first 72 hours after a disaster event. Despite there being abundant state-of-the-art machine learning techniques to automatically classify social media images and some work for geolocating them, the operational problem in the event of a new disaster remains unsolved.

Currently the state-of-the-art approach for dealing with these first response mapping is first filtering and then submitting the images to be geolocated to a crowd of volunteers [1], assigning the images randomly to the volunteers.

The project is aimed at leveraging the power of crowdsourcing and artificial intelligence (AI) to assist emergency responders and disaster relief organizations in building a damage map from a zone recently hit by a disaster. Specifically, the project involves the development of a platform that can intelligently distribute geolocation tasks to a crowd of volunteers based on their skills. The platform uses machine learning to determine the skills of the volunteers based on previous geolocation experiences.

Thus, the project concentrates on two different tasks:

  • Profile Learning. Based on the previous geolocations of a set of volunteers, learn a profile of each of the volunteers which encodes its geolocation capabilities. These profiles should be understood as competency maps of the volunteer, representing the capability of the volunteer to provide an accurate geolocation for an image coming from a specific geographical area.
  • Active Task Assignment. Use the volunteer profiles efficiently in order to maximize the geolocation quality while maintaining a fair distribution of geolocation tasks among volunteers.

In the first stage, we envision an experimental framework with realistically generated artificial data, which acts as a feasibility study. This will be published as a paper in a major conference or journal. Simultaneously we plan to integrate both the profile learning and the active task assignment with the crowdnalysis library, a software outcome of our previous micro-project. Furthermore, we plan to organize a geolocation workshop to take place in Barcelona with participation from the JRC, University of Geneva, United Nations, and IIIA-CSIC.

In the near future, the system will generate reports and visualizations to help these organizations quickly understand the distribution of damages. The resulting platform could enable more efficient and effective responses to natural disasters, potentially saving lives and reducing the impact of these events on communities.

[1] Fathi, Ramian, Dennis Thom, Steffen Koch, Thomas Ertl, and Frank Fiedrich. “VOST: A Case Study in Voluntary Digital Participation for Collaborative Emergency Management.” Information Processing and Management 57, no. 4 (July 1, 2020): 102174. 

Results Summary

The project focused on improving the accuracy and efficiency of geolocating social media images during emergencies by using crowdsourced volunteers. Key results include the development of two models: a profile-learning model to gauge volunteers’ geolocation abilities and a task assignment model that optimizes image distribution based on volunteer skills. These models outperform traditional random assignment approaches by reducing annotation requirements and improving the quality of geolocation consensus without sacrificing accuracy. This method holds promise for disaster response applications. We had 3 main outputs:

  1. Open-source implementation of the volunteer profiling and consensus geolocation algorithms into the crowd analysis library.
  2. Papers with the evaluation of the different geolocation consensus and active strategies for geolocation:
  3. an online workshop to collect expert feedback about the topic

Tangible Outcomes

  1. Ballester, Rocco, Yanis Labeyrie, Mehmet Oguz Mulayim, Jose Luis Fernandez-Marquez, and Jesus Cerquides. “Crowdsourced Geolocation: Detailed Exploration of Mathematical and Computational Modeling Approaches.” Cognitive Systems Research 88 (December 1, 2024): 101266. .
  2. Ballester, R., Labeyrie, Y., Mulayim, M.O., Fernandez-Marquez, J.L. and Cerquides, J., 2023. Mathematical and Computational Models for Crowdsourced Geolocation. In Artificial Intelligence Research and Development (pp. 301-310). IOS Press.
  3.  Firmansyah, H. B., Bono, C. A., Lorini, V., Cerquides, J., & Fernandez-Marquez, J. L. (2023). Improving Disaster Response by Combining Automated Text Information Extraction from Images and Text on Social Media. In Artificial Intelligence Research and Development (pp. 320-329). IOS Press.
  4. Cerquides J., Mülâyim M.O. Crowdnalysis: A software library to help analyze crowdsourcing results (2024), 10.5281/zenodo.5898579