WP2 Multimodal Perception and Modeling | Project Types | Humane AI

/ Projects

Contact person: Jesus Cerquides (j.cerquides@csic.es)

Internal Partners:

Consejo Superior de Investigaciones Científicas (CSIC), Jesus Cerquides, cerquide@iiia.csic.es

External Partners:

University of Geneva, Jose Luis Fernandez Marquez

Social media generates large amounts of almost real-time data which can turn out extremely valuable in an emergency situation, especially for providing information within the first 72 hours after a disaster event. Despite there being abundant state-of-the-art machine learning techniques to automatically classify social media images and some work for geolocating them, the operational problem in the event of a new disaster remains unsolved.

Currently the state-of-the-art approach for dealing with these first response mapping is first filtering and then submitting the images to be geolocated to a crowd of volunteers [1], assigning the images randomly to the volunteers.

The project is aimed at leveraging the power of crowdsourcing and artificial intelligence (AI) to assist emergency responders and disaster relief organizations in building a damage map from a zone recently hit by a disaster. Specifically, the project involves the development of a platform that can intelligently distribute geolocation tasks to a crowd of volunteers based on their skills. The platform uses machine learning to determine the skills of the volunteers based on previous geolocation experiences.

Thus, the project concentrates on two different tasks:

Profile Learning. Based on the previous geolocations of a set of volunteers, learn a profile of each of the volunteers which encodes its geolocation capabilities. These profiles should be understood as competency maps of the volunteer, representing the capability of the volunteer to provide an accurate geolocation for an image coming from a specific geographical area.
Active Task Assignment. Use the volunteer profiles efficiently in order to maximize the geolocation quality while maintaining a fair distribution of geolocation tasks among volunteers.

In the first stage, we envision an experimental framework with realistically generated artificial data, which acts as a feasibility study. This will be published as a paper in a major conference or journal. Simultaneously we plan to integrate both the profile learning and the active task assignment with the crowdnalysis library, a software outcome of our previous micro-project. Furthermore, we plan to organize a geolocation workshop to take place in Barcelona with participation from the JRC, University of Geneva, United Nations, and IIIA-CSIC.

In the near future, the system will generate reports and visualizations to help these organizations quickly understand the distribution of damages. The resulting platform could enable more efficient and effective responses to natural disasters, potentially saving lives and reducing the impact of these events on communities.

[1] Fathi, Ramian, Dennis Thom, Steffen Koch, Thomas Ertl, and Frank Fiedrich. “VOST: A Case Study in Voluntary Digital Participation for Collaborative Emergency Management.” Information Processing and Management 57, no. 4 (July 1, 2020): 102174. https://doi.org/10.1016/j.ipm.2019.102174

Results Summary

The project focused on improving the accuracy and efficiency of geolocating social media images during emergencies by using crowdsourced volunteers. Key results include the development of two models: a profile-learning model to gauge volunteers’ geolocation abilities and a task assignment model that optimizes image distribution based on volunteer skills. These models outperform traditional random assignment approaches by reducing annotation requirements and improving the quality of geolocation consensus without sacrificing accuracy. This method holds promise for disaster response applications. We had 3 main outputs:

Open-source implementation of the volunteer profiling and consensus geolocation algorithms into the crowd analysis library.
Papers with the evaluation of the different geolocation consensus and active strategies for geolocation:
an online workshop to collect expert feedback about the topic

Tangible Outcomes

Ballester, Rocco, Yanis Labeyrie, Mehmet Oguz Mulayim, Jose Luis Fernandez-Marquez, and Jesus Cerquides. “Crowdsourced Geolocation: Detailed Exploration of Mathematical and Computational Modeling Approaches.” Cognitive Systems Research 88 (December 1, 2024): 101266. https://doi.org/10.1016/j.cogsys.2024.101266 .
Ballester, R., Labeyrie, Y., Mulayim, M.O., Fernandez-Marquez, J.L. and Cerquides, J., 2023. Mathematical and Computational Models for Crowdsourced Geolocation. In Artificial Intelligence Research and Development (pp. 301-310). IOS Press. https://ebooks.iospress.nl/doi/10.3233/FAIA230699
Firmansyah, H. B., Bono, C. A., Lorini, V., Cerquides, J., & Fernandez-Marquez, J. L. (2023). Improving Disaster Response by Combining Automated Text Information Extraction from Images and Text on Social Media. In Artificial Intelligence Research and Development (pp. 320-329). IOS Press. https://ebooks.iospress.nl/doi/10.3233/FAIA230701
Cerquides J., Mülâyim M.O. Crowdnalysis: A software library to help analyze crowdsourcing results (2024), 10.5281/zenodo.5898579 https://github.com/IIIA-ML/crowdsourced_geolocation

Contact person: Elisabetta Biondi, ( elisabetta.biondi@iit.cnr.it )

Internal Partners:

Consiglio Nazionale delle Ricerche (CNR),Elisabetta Biondi, elisabetta.biondi@iit.cnr.it
Central European University (CEU), Janos Kertesz, kerteszj@ceu.edu, Gerardo Iniguez, IniguezG@ceu.edu

The Friedkin-Johnsen model is a very popular model in opinion dynamics, validated on real groups, and well-investigated from the opinion polarization standpoint. Previous research has focused almost exclusively on static networks, where links between nodes do not evolve over time. In this micro-project, we want to fill this gap by designing a variant of the Friedkin-Johnsen model that embeds the dynamicity of social networks. Furthermore, we designed a novel definition of global polarization that combines network features and opinion distribution, to capture the existence of clustered opinions. We have analyzed the polarization effect of the new dynamic model, and identified the impact of the network structure.

Results Summary

Human social networks are very complex systems and their structure has an essential impact on opinion dynamics. However, since my main goal is to study the impact of the opinion dynamics model per se, we decided to deal with two different social network typologies: a Erdős–Rényi (ER) and a stochastic block model (SBM).

— Design of the Friedkin-Johnsen (FJ) dynamic model. We have implemented a rewiring policy that has been extensively studied in discrete opinion diffusion models. This involves substituting edges that connect nodes with different opinions with other edges. We have adapted this scheme to work with the FJ model’s opinions, which are within the range of [-1,1], in both the asynchronous and synchronous versions. According to two parameters θ (the disagreement threshold) and p_rew (the rewiring probability): • With probability 1-p_rew the FJ is applied • With probability p_rew, if i and j disagree, i.e. |x_i-x_j |> θ, the edge (i,j) is replaced with an edge (i,k) where k agrees with i, i.e. |x_i-x_j |<= θ. The above algorithm was specifically designed and implemented for the ER graph. However, in the case of the SBM, I have limited the potential candidates for rewiring to nodes within a maximum of two hops distance. This decision was made to prevent the block structure from becoming entirely irrelevant. The rationale behind this choice is based 38 on the triadic closure mechanism, which suggests that individuals are more inclined to choose new acquaintances among the friends of their friends.

–Design of the polarization metric. The design of the polarization metric involved developing a definition for identifying highly polarized networks. We defined a highly polarized network as one in which there are two distinct opinions that are clustered into two tightly connected communities. To achieve this, we needed to consider both the network structure and the distribution of opinions. Therefore, we decided to use two different metrics to measure these aspects: bimodality for the opinion distribution and homogeneity for its correspondence with the network structure.

— Bimodality. The bimodality coefficient was used to measure the extent to which a distribution is bimodal. It is calculated using the skewness and kurtosis values and represents how similar the distribution is to one with two modes.

— Homogeneity To measure the homogeneity of the opinion distribution with the network structure, we examined the local distribution of nodes’ opinions. We looked at whether each node’s opinion was similar to those of its neighbors, which would suggest that it was in line with the overall opinion distribution over the network. The final homogeneity value was close to zero if the distribution of opinions was close to linear.

–Experimental evaluation. We have developed a Python simulator that can compute the dynamic FJ (rewiring included), and polarization metrics over time based on the given network and initial opinions. To test the model, we ran simulations on a small network comprising 20 nodes and compared the outcomes of the FJ with rewiring to those without rewiring. For the ER network, we used a vector of uniformly distributed opinions over [-1,1] as the initial opinions. However, for the SBM networks, we employed a different configuration, where the initial opinions were uniformly extracted over the intervals [-0.5,0-0.1] and [0.1,0.5], depending on whether the nodes belonged to one or the other block. In conclusion, this Micro-Project involves the design of a dynamic version of the FJ model for synchronous and asynchronous cases. Additionally, we have developed a new definition of polarization that considers both the distribution of opinions and the network topology. To assess the model’s effectiveness, we conducted simulations on two different network types: an ER network and an SBM network. Our findings indicate that the rewiring process has significant effects on polarization, but these effects are dependent on the initial network.

Tangible Outcomes

Github link of the code of the simulator for the new dynamic model: https://github.com/elisabettabiondi/FJ_rewiring_basic.git

Contact person: Patrizia Fattori

Internal Partners:

Università di Bologna (UNIBO), Patrizia Fattori
German Research Centre for Artificial Intelligence (DFKI), Elsa Kirchner

Reaching movements towards targets located in the 3-dimensional space are fast and accurate. Although they may seem simple and natural movements, they imply the integration of different sensory information that is carried in real time by our brain. We apply machine learning techniques to address different questions as follows: i) at which point of the movement is it accurately possible to predict the final target goal in static and dynamic conditions? ii) as at behavioural level it was hypothesized that direction and depth dimension do not rely on shared networks in the brain during the execution of movement but they are processed separately, can the targets located along the horizontal or sagittal dimension be predicted with the same or different accuracy? Finally, we frame our result in the context of improving user-agent interactions, moving from a description of human movement to a possible implementation in social/collaborative AI.

Results Summary

We measured the kinematics of reaching movement in 12 participants towards visual targets located in the 3D-space. The targets could remain static or be perturbed at the movement onset. Experiment 1: by a supervised recurrent neural network, we tested at what point, during the movement, it was possible to accurately detect the reaching endpoints given the instantaneous x, y, z coordinates of the index and wrist. The classifier successfully predicted static and perturbed reaching endpoints with progressive increasing accuracy across movement execution (mean accuracy = 0.560.19, chance level = 0.16). Experiment 2: using the same network architecture, we trained a regressor to predict the future x, y, z position of the index and wrist given the actual x, y, z positions. X, y and z components of index and wrist showed an average Rsquared higher than 0.9 suggesting an optimal reconstruction of future trajectory given the actual one.

Tangible Outcomes

Individual subject trajectories – Annalisa Bosco
https://drive.google.com/drive/folders/1FdDXKjhCupDdyLlyvCdUwfxmyGfoRZDE
Program/code: Recurrent neural network codes – Annalisa Bosco
https://drive.google.com/drive/folders/1FdDXKjhCupDdyLlyvCdUwfxmyGfoRZDE
Video presentation summarizing the project

Contact person: Jasmin Grosinger (jasmin.grosinger@oru.se)

Internal Partners:

Örebro University, ORU, Jasmin Grosinger

External Partners:

Denmark Technical Unisersity, Thomas Bolander

Previously we have investigated how an AI system can be proactive, that is, acting anticipatory and on its own initiative, by reasoning on current and future states, mental simulation of actions and their effects, and what is desirable. In this micro-project we extend our earlier work doing epistemic reasoning. That is, we want to do reasoning on knowledge and belief of the human and by that inform the AI system what kind of proactive announcement to make to the human. As in our previous work, we will consider which states are desirable and which are not, and we too will take into account how the state will evolve into the future, if the AI system does not act. Now we also want to consider the human’s false beliefs. It is not necessary and, in fact, not desirable to make announcements to correct each and any false belief that the human may have. For example, if the human is watching the TV, she need not be informed that the salt is in the red container and the sugar is in the blue container, while the human’s belief is that it is the other way around. On the other hand, when the human starts cooking and is about to use the content of the blue container believing it is salt, then it is a relevant announcement of the AI system to inform the human what is actually the case to avoid undesirable outcomes. The example shows that we need to research not only what to announce but also when to make the announcement. The methods we will use in this micro-project are knowledge-based, to be precise, we will employ Dynamic Epistemic Logic (DEL). DEL is a modal logic. It is an extension of Epistemic Logic which allows to model change in knowledge and belief of an agent herself and of other agents.1 week of visit is planned.

Results Summary

[On going project] The project is still going on. It turned out to be much bigger and is way beyond the scope of a micro-project. Also there were interruptions. We are working on our DEL-based framework for proactive agents and expect a journal article submission in January next year. The project will keep going on at least until then, but is expected to continue and extend the current status of the work.

Contact person: Holger Hoos (hh@cs.rwth-aachen.de)

Internal Partners:

Leiden University, Holger Hoos
INESC TEC, João Gama, jgama@fep.up.pt

Real world data streams are rarely stationary, but subjective to concept drift, i.e., the change of distribution of the observations. Concept drift needs to be constantly monitored, so that when the trained model is no longer adequate, a new model can be trained that fits the most recent concept. Current methods of detecting concept drift typically include monitoring the performance, and triggering a signal once this drops by a certain margin. The disadvantage of this is that this method acts retroactively, i.e., when the performance has already dropped.

The field of neural network verification detects whether a neural network is susceptible to an adversarial attack, i.e., whether a given input image can be perturbed by a given epsilon, such that the output of the network changes. This indicates that this input is close to the decision boundary. When the distribution of images that are close to the decision boundary significantly changes, this indicates that concept drift is occurring, and we can proactively (before the performance drops) retrain the model. The short-term goal of this micro-project is to define ways to a) monitor the distribution of images close to the decision boundary, and b) define control systems that can act upon this notion.

Disadvantages of this are that verifying neural networks requires significant computation time, and it will take many speed-ups before this can be utilized in high-throughput streams.

Contact person: Ondrej Dusek (odusek@ufal.mff.cuni.cz)

Internal Partners:

Brno U, Petr Schwarz
CUNI, Ondrej Dusek
Eötvös Loránd University (ELTE), Andras Lorincz

External Partners:

IDIAP, Petr Motlicek

This is a follow up microproject to a previous one. This microproject aims to design grounded dialogue models based on observation of human-to-human dialogue examples, i.e., distilling dialogue patterns automatically and aligning them to external knowledge bases. The current state-of-the-art conversation models based on fine tuned large language models are not grounded and mimic their training data, or their grounding is external and needs to be hand-designed. Furthermore, most commercially deployed dialogue models are entirely handcrafted.

Our goal is to produce grounding for these models (semi-)automatically using dialogue context embedded in vector spaces via large language models trained specifically on conversational data. If we represent dialogue states as vectors, the whole conversation can be seen as a trajectory in the vector space. By merging, pruning, and modeling the trajectories, we can get dialog skeleton models in the form of finite-state graphs or similar structures. These models could be used for data exploration and analysis, content visualization, topic detection, or clustering. This can bring faster and cheaper design of fully trustable conversation models. The approach will serve both to provide external model grounding and to analyze the progress in human-to-human dialogues, including negotiation around the participants’ common ground.

The microproject will investigate the optimal format of the dialogue context embeddings (such as temporal resolution) as well as the optimal ways of merging dialogue trajectories and distilling models. Here, Variational Recurrent Neural Networks with discrete embeddings (Shi et al., NAACL 2019) are a promising architecture, but alternatives will also be considered.

We plan to experiment with both textual and voice-based dialogues. We will use the MultiWOZ corpus (Budzianowski et al., EMNLP 2018) as well as the DIASER extension developed in a Humane AI microproject by CUNI+LIMSI (Hudecek et al., LREC 2022) for text-based experiments. For voice-based experiments, we will use MultiWOZ spoken data released for the DSTC11 Challenge and dialogue data currently developed in a Humane AI microproject by BUT+CUNI.

The work was done as a part of the JSALT workshop hosted by the University of Le Mans, France, and co-organized by Johns Hopkins University (JHU) and the Brno University of Technology. https://jsalt2023.univ-lemans.fr/en/index.html

The topic passed a scientific review by about 40 researchers in Baltimore, USA, in December 2022 and was selected among four workshop topics. https://jsalt2023.univ-lemans.fr/en/automatic-design-of-conversational-models-from-observation-of-human-to-human-conversation.html

Results Summary

We were part of the JSALT workshop in Le Mans (https://jsalt2023.univ-lemans.fr/en/index.html ), with full-time work on the topic of task-oriented dialogue structure extraction and realistic dialogue data and evaluation. The motivation is deploying AI agents instead of human agents in call centers, where calls follow similar patterns but these are not analyzed. The results include:

A private dataset of dialogues which have not been leaked into LLM training
Dialogue embeddings training toolkit
An analysis of multiple algorithms for extracting dialogue structure from data
Shared audio and text representation models – these allow to create a voice-based dialogue system implicitly, without having a speech recognition module, as the audio representations are aligned to text embeddings
Long-context speech recognition (using dialogue context) based on LLMs with adaptors

While most of these works are not (yet) published, a detailed overview of the results is given in the final presentation video of the workshop.

Tangible Outcomes

Burdisso et al. (EMNLP 2024 main conference): Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction. https://arxiv.org/abs/2410.18481 .
A unified multi-domain dialogue dataset is introduced and released along with the paper “Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction” (Burdisso et al. – EMNLP 2024 main conference). HuggingFace page: https://huggingface.co/datasets/sergioburdisso/dialog2flow-dataset . Source code was released along with the paper “Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction” (Burdisso et al. – EMNLP 2024 main conference).
Source code comes with tool-like scripts to convert any collection of dialogs to a dialog flow automatically. https://github.com/idiap/dialog2flow (it will be available by Dec 2024)
The code repository for long-context ASR is public: https://github.com/keya-dialog/LoCo-ASR/tree/main
a jupyter notebook tutorial for joint speech-text embeddings for spoken language understanding. https://github.com/keya-dialog/LoCo-ASR/tree/main
Summer school presentation on dialogue + tutorial on QLoRA finetuning an LLM for dialogue was produced for the purpose of the workshop and is available online.
1. presentation slides:
  1. part 1 (on dialogue modelling): https://raw.githubusercontent.com/keya-dialog/jsalt-dialogue-lab/refs/heads/main/conv_ai_v6.pdf
  2. part 2 (on LLMs): https://raw.githubusercontent.com/keya-dialog/jsalt-dialogue-lab/refs/heads/main/llms_v6.pdf
Tutorial code: https://github.com/keya-dialog/jsalt-dialogue-lab
Summer school presentations in JSALT. Final JSALT presentation video, featuring detailed description of all results:
1. Video: https://www.youtube.com/live/QS5zXkpXV3Q
2. Slides: https://docs.google.com/presentation/d/10ozZoo8pEoFKjoYs3yJnidZVIydDD8Bv_BfG_X7pZD8/edit

Contact person: Haris Papageorgiuo (haris@athenarc.gr)

Internal Partners:

ATHENA RC, Haris Papageorgiou
German Research Centre for Artificial Intelligence (DFKI), Julián Moreno Schneider
OpenAIRE, Natalia Manola

SciNoBo is a microproject focused on enhancing science communication, particularly in health and climate change topics, by integrating AI systems with science journalism. The project aims to assist science communicators—such as journalists and policymakers—by utilizing AI to identify, verify, and simplify complex scientific statements found in mass media. By grounding these statements in scientific evidence, the AI will help ensure accurate dissemination of information to non-expert audiences. This approach builds on prior work involving neuro-symbolic question-answering systems and aims to leverage advanced language models, argumentation mining, and text simplification technologies. Technologically, we build on our previous MP work on neuro-symbolic Q&A (*) and further exploit and advance recent developments in instruction fine-tuning of large language models, retrieval augmentation and natural language understanding – specifically the NLP areas of argumentation mining, claim verification and text (ie, lexical and syntactic) simplification. The proposed MP addresses the topic of “Collaborative AI” by developing an AI system equipped with innovative NLP tools that can collaborate with humans (ie, science communicators -SCs) communicating statements on Health & Climate Change topics, grounding them on scientific evidence (Interactive grounding) and providing explanations in simplified language, thus, facilitating SCs in science communication. The innovative AI solution will be tested on a real-world scenario in collaboration with OpenAIRE by employing OpenAIRE research graph (ORG) services in Open Science publications.

Results Summary

The project is divided into two phases that ran in parallel. The main focus in phase I is the construction of the data collections and the adaptations and improvements needed in PDF processing tools. Phase II deals with the development of the two subsystems: claim analysis and text simplification as well as their evaluation.

Phase I: Two collections with News and scientific publications will be compiled in the areas of Health and Climate. The News collection will be built based on an existing dataset with News stories and ARC automated classification system in the areas of interest. The second collection with publications will be provided by OpenAIRE ORG service and further processed, managed and properly indexed by ARC SciNoBo toolkit. A small-scale annotation is foreseen by DFKI in support of the simplification subsystem.
Phase II: We developed, fine tuned and evaluated the two subsystems. Concretely, the “claim analysis” subsystem encompasses (i) ARC previous work on “claim identification”, (ii) a retrieval engine fetching relevant scientific publications (based on our previous miniProject), and (iii) an evidence-synthesis module indicating whether the publications fetched and the scientists’ claims therein, support or refute the News claim under examination.

Tangible Outcomes

Kotitsas, S., Kounoudis, P., Koutli, E., & Papageorgiou, H. (2024, March). Leveraging fine-tuned Large Language Models with LoRA for Effective Claim, Claimer, and Claim Object Detection. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2540-2554). https://aclanthology.org/2024.eacl-long.156/
HCN dataset: news articles in the domain of Health and Climate Change. The dataset contains news articles, annotated with the major claim, claimer(s) and claim object(s). https://github.com/iNoBo/news_claim_analysis
Website demo: http://scinobo.ilsp.gr:1997/services
Services for claim identification and the retrieval engine http://scinobo.ilsp.gr:1997/live-demo?HFSpace=inobo-scinobo-claim-verification.hf.space
Service for the text simplification http://scinobo.ilsp.gr:1997/text-simplification

Contact person: Szymon Talaga (stalaga@uw.edu.pl)

Internal Partners:

Univ. Warsaw, Szymon Talaga, stalaga@uw.edu.pl

This project builds upon another finished microproject. In this project, we continue the development of the Segram package for Python. The purpose of the package is to provide tools for automated narrative analysis of text data focused on extracting information on basic building blocks of narratives – agents (both active and passive), actions, events, or relations between agents and actions (e.g. determining subjects and objects of actions), as well as descriptions of actors, actions and events. The development process is also naturally paired with conceptual work on representations of narratives.

The package is designed as a graybox model. It is based on an opaque statistical language model providing linguistic annotations, which are subsequently used by transparent deterministic algorithms for discovering narrative elements. Thus, the final output should be easy to interpret and validate by human users, whenever necessary. Moreover, by lifting the analysis from the purely linguistic level to the arguably more intuitive level of narratives, it is hoped that the provided tools will be significantly easier to use and understand for end users, including those without training in linguistics and/or computer science.

The proposed framework is aimed at language understanding and information extraction, as opposed to language generation. Namely, the role of the package is to organize narrative information in convenient data structures allowing effective querying and deriving of various statistical descriptions. Crucially, thanks to its semi-transparent nature, the produced output should be easy to validate for human users. This should facilitate development of shared representations (corresponding the WP1 and WP2 motivated goal: „Establishing Common Ground for Collaboration with AI Systems”) of narratives, understandable for both humans and machines, that are the same time trustworthy (by being easy to validate for humans), which is arguably a desirable feature, for instance in comparison to increasingly powerful but hard-to-trust large language models. In particular, the package should be useful for facilitating and informing human-driven analyses of text data.

Alpha version of the package implementing core functionalities related to grammatical and narrative analysis is ready. The goal of the present microproject was to improve the package and release a beta version. This includes implementing an easy-to-use interface (operating at the level of narrative concepts) for end users allowing effective querying and analysis of the data produced by Segram as well as developing a comprehensive documentation. Thus, the planned release should be ready for broader adoption to a wide array of use cases and users with different levels of linguistic/computational expertise.

Results Summary

The project delivered a software Python package for narrative analysis as per the project description. The package is distributed through Python Package Index (PyPI) under a permissive open-source license (MIT) and therefore is easily accessible and free-to-use. Moreover, it comes with a detailed documentation page facilitating adoption by third-parties. It is worth noting that the advent of latest-generation large language models (LLMs) has partially limited the relevance of the project results.

Tangible Outcomes

Package page at Python Package Index: https://pypi.org/project/segram/
tutorial page documenting how to use the package https://segram.readthedocs.io/en/latest/

Contact person: Eugenia Polizzi

Internal Partners:

Consiglio Nazionale delle Ricerche (CNR), ISTC: Eugenia Polizzi)
Fondazione Bruno Kessler (FBK), Marco Pistore

The goal of the project is to investigate the role of social norms on misinformation in online communities. This knowledge can help identify new interventions in online communities that help prevent the spread of misinformation. To accomplish the task, the role of norms was explored by analyzing Twitter data gathered through the Covid19 Infodemics Observatory, an online platform developed to study the relationship between the evolution of the COVID-19 epidemic and the information dynamics on social media. This study can inform a further set of microprojects addressing norms in AI systems through theoretical modelling and social simulations.

Results Summary

In this MP, we diagnosed and visualized a map of existing social norms underlying fake news related to COVID19. Through the analysis of millions of geolocated tweets collected during the Covid-19 pandemic we were able to identify the existence of structural and functional network features supporting an “illusion of the majority” on Twitter. Our results suggest that the majority of fake (and other) contents related to the pandemic are produced by a minority of users and that there is a structural segmentation in a small “core” of very active users responsible for large amount of fake news and a larger “periphery” that mainly retweets the contents produced by the core. This discrepancy between the size and identity of users involved in the production and diffusion of fake news suggests that a distorted perception of what users believe is the majority opinion may pressure users (especially those in the periphery) to comply with the group norm and further contribute to the spread of misinformation in the network.

Tangible Outcomes

The voice of few, the opinions of many: evidence of social biases in Twitter COVID-19 fake news sharing – Piergiorgio Castioni, Giulia Andrighetto, Riccardo Gallotti, Eugenia Polizzi, Manlio De Domenico https://arxiv.org/abs/2112.01304
Video presentation summarizing the project

Contact person: Szymon Talaga, (stalaga@uw.edu.pl)

Internal Partners:

Univ. Warsaw, Szymon Talaga, stalaga@uw.edu.pl
Institut Polytechnique de Grenoble, James Crowley, james.crowley@univ-grenoblealpes.fr

This Micro-Project has laid the groundwork for developing a new approach to narrative

analysis providing a gray-box (at least partially explainable) NLP model tailored for facilitating work of qualitative text/narrative analysts. The above goal fits into a broader HumanE-AI objective of developing common ground concepts providing better representations shared by humans and machines alike. In particular, the contribution of the project to work on aligning machine analyses with human perspective through the notion of narratives is twofold. Firstly, narrative-oriented tools for automated text analyses can empower human analysts as, arguably, the narrative framework provides a more natural and meaningful context for people without formal training in linguistics and/or computer science for reasoning about textual data. Secondly, the development of the software for narrative analysis is naturally intertwined with conceptual work on the core terms and building blocks of narratives, which can inform subsequent work on more advanced approaches. We conducted a proof-of-concept study combining existing standard NLP methods (e.g. topic modeling, entity recognition) with qualitative analysis of narratives about smart cities and related technologies and use this experience to conceptualize our approach to narrative analysis, in particular with respect to problems which are not easily solved with the existing tools.

Results Summary

The aim of the project was to develop a software package (for Python) providing easy to use and understand (also for researchers not trained in computer science or linguistics) tools for extracting narrative information (active and passive actors, the actions they perform as well as descriptions of both actors and actions, which together define events) and organizing them in rich hierarchical data structures (data model is implicitly graphical) from which subsequently different sorts of descriptive statistics can be generated depending on particular research questions. Crucially, for this to be practically possible, a legible and efficient framework for querying the produced data is needed.

Importantly, the software is developed as a graybox model, in which core low-level NLP tasks, such as POS and dependency tagging, are performed by a blackbox statistical model, and then they are transformed to higher order grammar and narrative data based on a set of transparent deterministic rules. This is to ensure high explainability of the approach, which is crucial for systems in which the machine part is supposed to be a helper of a human analyst instead of an implicit leader.

Currently, the core modules of the package responsible for the grammatical analysis are mostly ready (but several improvements are still planned). This includes also a coreference resolution module. Moreover, the core part of the semantic module, which translates grammatical information to more semantic constructs focused on actors, actions and descriptions, is also ready. What is still missing are an interface exposing methods for end users allowing easy access and analysis of rich data produced by the package as well as a principled and convenient query framework on which the interface should be based.

This is the main focus of the ongoing and future work. The second missing part is the documentation, but this part is best finished after the interface is ready. Thus, even though the package in the current state can seem a little rough from the perspective of an end user, its quality and usefulness will increase steadily as new updates are delivered.

Tangible Outcomes

python package providing grey box NLP model to assist qualitative analysts https://github.com/sztal/segram

Contact person: Rui Prada (rui.prada@tecnico.ulisboa.pt)

Internal Partners:

Instituto Superior Técnico, Department of Computer Science,
Eötvös Loránd University, Department of Artificial Intelligence

External Partners:

DFKI Lower Saxony, Interactive Machine Learning Lab
Carnegie Mellon University, Robotics Institute

The project addresses research on interactive grounding. It consists of the development of an Augmented Reality (AR) game, using HoloLens, that supports the interaction of a human player with an AI character in a mixed reality setting using gestures as the main communicative act. The game will integrate technology to perceive human gestures and poses. The game will bring about collaborative tasks that need coordination at the level of mutual understanding of the several elements of the required task. Players (human and AI) will have different information about the tasks to advance in the game and need to communicate that information to their partners through gestures. The main grounding challenge will be based on learning the mapping between gestures to the meaning of actions to perform in the game. There will be two levels of gestures toground, some are task-independent while others are task-dependent. In other words, besides the gestures that communicate explicit information about the game task, the players need to agree on the gestures used to coordinate the communication itself; for example, to signal agreement or doubt, to ask for more information, or close the communication. These latter gesture types can be transferred from task to task within the game, and probably to other contexts as well. It will be possible to play the game with two humans and study their gesture communication in order to gather the gestures that emerge: a human-inspired gesture set will be collected and serve the creation of a gesture dictionary in the AI repertoire. The game will provide different tasks of increasing difficulty. The first ones will ask the players to perform gestures or poses as mechanisms to open a door to progress to the next level. But later, in a more advanced version of the game, specific and constrained body poses, interaction with objects, and the need to communicate more abstract concepts (e.g., next to, under, to the right, the biggest one, …) will be introduced. The game will be built as a platform to perform studies. It will support studying diverse questions about the interactive grounding of gestures. For example, we can study the way people adapt to and ascribe meaning to the gestures performed by the AI agent, we can study how different gesture profiles influence the people’s interpretation, facilitate grounding, and have an impact on the performance of the tasks, or we can study different mechanisms on the AI to learn its gesture repertoire from humans (e.g., by imitation grounded on the context).

Results Summary

An AR game, where players face a sequence of codebreaking challenges that require them to press some buttons in a specific sequence, however, only one of the partners has access to the buttons while the other has access to the solution code. The core gameplay is centred on the communication between the two partners (AI and virtual agent), which must be performed only by using gestures. In addition, to the development of the AR game, we developed some sample AI agents that are able to play with a human player. A version using an LLM was also developed to provide some reasoning for gesture recognition and performance by the AI virtual agent.

Players face a sequence of codebreaking challenges that require them to press some buttons in a specific sequence, however, only one of the partners has access to the buttons while the other has access to the solution code. Furthermore, only gesture communication if possible. Therefore, the core gameplay is centred on the communication between the two partners (AI and virtual agent). Gestures supported in the game are split into two distinct subtypes:

Taskwork gestures: Used for conveying information about the game’s tasks and environment (e.g., an object’s colour).
Teamwork gestures: Used for giving feedback regarding communication (e.g., affirming that a gesture was understood).

The gameplay loop implies shared performance coordination and communication.

In the current version, the virtual agent is able to play reactively in response to the player’s gestures based on a gesture knowledge base that assigns meaning and action to each gesture. A version using an LLM was also developed to provide some reasoning for gesture recognition and performance by the AI virtual agent.

Tangible Outcomes

The base game – https://github.com/badomate/EscapeHololens
The extended game – https://github.com/badomate/EscapeMain
A presentation summarizing the project: https://www.youtube.com/watch?v=WmuWaNdIpcQ
A short demo for the system https://youtu.be/j_bAw8e0lNU?si=STi6sbLzbpknckGG

Contact person: Mohamed Chetouani (mohamed.chetouani@sorbonne-universite.fr)

Internal Partners:

ISIR, Sorbonne University, Mohamed Chetouani, Silvia Tulli
Vrije Universiteit Amsterdam, Kim Baraka

Human-Interactive Robot Learning (HIRL) is an area of robotics that focuses on developing robots that can learn from and interact with humans. This educational module aims to cover the basic principles and techniques of Human-Interactive Robot Learning. This interdisciplinary module will encourage graduate students (Master/PhD level) to connect different bodies of knowledge within the broad field of Artificial Intelligence, with insights from Robotics, Machine Learning, Human Modelling, and Design and Ethics. The module is meant for Master’s and PhD students in STEM, such as Computer Science, Artificial Intelligence, and Cognitive Science. This work will extend the tutorial presented in the context of the International Conference on Algorithms, Computing, and Artificial Intelligence (ACAI 2021) and will be shared with the Artificial Intelligence Doctoral Academy (AIDA). Moreover, the proposed lectures and assignments will be used as teaching material at Sorbonne University, and Vrije Universiteit Amsterdam. We plan to design a collection of approximately 12 1.5-hour lectures, 5 assignments, and a list of recommended readings, organized along relevant topics surrounding HIRL. Each lecture will include an algorithmic part and a practical example of how to integrate such an algorithm into an interactive system. The assignments will encompass the replication of existing algorithms with the possibility for the student to develop their own alternative solutions. Proposed module contents (each lecture approx. 1.5 hour): (1) Interactive Machine Learning vs Machine Learning – 1 lecture, (2) Interactive Machine Learning vs Interactive Robot Learning (Embodied vs non-embodied agents) – 1 lecture, (3) Fundamentals of Reinforcement Learning – 2 lectures, (4) Learning strategies: observation, demonstration, instruction, or feedback- Imitation Learning, Learning from Demonstration – 2 lectures- Learning from Human Feedback: evaluative, descriptive, imperative, contrastive examples – 3 lectures, (5) Evaluation metrics and benchmarks – 1 lecture, (6) Application scenarios: hands-on session – 1 lecture, and (7) Design and ethical considerations – 1 lecture.

Knowledge 4 All Foundation Ltd.
Betchworth House
57-65 Station Road
Redhill, Surrey, RH1 1DL

Humane AI on Social Media

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 952026.

[TMP-065] Intelligent crowdsourcing of geolocation tasks in natural disasters

Results Summary

Tangible Outcomes

[TMP-097] Polarization with the Friedkin-Johnsen model over a dynamic social network

Results Summary

Tangible Outcomes

[TMP-099] Prediction of static and perturbed reach goals from movement kinematics

Results Summary

Tangible Outcomes

[TMP-100] Proactive announcement based on DEL (Dynamic Epistemic Logic)

Results Summary

[TMP-106] Robustness verification for Concept Drift Detection

[TMP-059] Grounded dialog models from observation of human-to-human conversation

Results Summary

Tangible Outcomes

[TMP-107] SciNoBo: A Collaborative AI Assistant in Science Communication

Results Summary

Tangible Outcomes

[TMP-058] Graybox methods for augmenting human-driven narrative analyses – beta release of the software package

Results Summary

Tangible Outcomes

[TMP-112] Social Norms to counteract misinformation in human-AI hybrid systems

Results Summary

Tangible Outcomes

[TMP-057] Gray-box approach to narrative analysis

Results Summary

Tangible Outcomes

[TMP-055] Gesture-based Interactive Grounding for Mixed-Reality Human-AI Collaboration

Results Summary

Tangible Outcomes

[TMP-038] Educational module Human-Interactive Robot Learning (HIRL)

Knowledge 4 All Foundation Ltd.

Humane AI on Social Media