WP2 Multimodal Perception and Modeling | Project Types

/ Projects

Contact person: Loris Bozzato, ( bozzato@fbk.eu)

Internal Partners:

Fondazione Bruno Kessler, Loris Bozzato, ghidini@fbk.eu
Technical University of Vienna, Thomas Eiter, eiter@kr.tuwien.ac.at

External Partners:

BOSCH Deutschland, Stepanova Daria

The previous requirements fit very well with the capabilities of MR-CKR: on the one hand, we have different contexts in which the inputs need to be modified to suit a different diagnosis of failure of the model. On the other hand, we can exploit the different relations by having one relation that specifies that inputs are more modifiable in one context than another and another relation that describes whether one diagnosis is a special case of another. Additionally, it allows us to incorporate global knowledge such that we can only modify inputs in such a manner that the result is still “”realistic””, i.e., satisfies the axioms in the global knowledge.

In this work, we provide a prototype specialized in generating similar and problematic scenes in the domain of Autonomous Driving.

Results Summary

We show that the proposed approach provides high-quality semantic segmentation from the robot’s perspective, with accuracy comparable to the original one. In addition, we exploited the gained information and improved the recognition performance of the deep network for the lower viewpoints and showed that the small robot alone is capable of generating high-quality semantic maps for the human partner. The computations are close to real time, so the approach enables interactive applications.

Tangible Outcomes

Prototype implementation: https://github.com/raki123/MR-CKR

Contact person: Samuel Kaski (samuel.kaski@aalto.fi)

Internal Partners:

Aalto University
Delft University of Technology, Frans Oliehoek

In human-AI collaboration, one of the key difficulties is establishing a common ground for the interaction, especially in terms of goals and beliefs. In practice, the AI might not have access to this necessary information directly and must infer it during the interaction with the human. However, training a model to support this kind of inference would require massive collections of interaction data and is not feasible in most applications.

Modern cognitive models, on the other hand, can equip AI tools with the necessary prior knowledge to readily support inference, and hence, to quickly establish a common ground for collaboration with humans. However, utilizing these models in realistic applications is currently impractical due to their computational complexity and non-differentiable structure.

Contact person: Catholijn Jonker, Maria Tsfasman (c.r.m.m.oertel@tudelft.nl; m.tsfasman@tudelft.nl)

Internal Partners:

Technical University Delft, Catharine Oertel, c.r.m.m.oertel@tudelft.nl
Eotvos Lorand University, Andras Lorincz, lorincz@inf.elte.hu

In this micro-project, we propose investigating human recollection of team meetings and how conversational AI could use this information to create better team cohesion in virtual settings. Specifically, we would like to investigate how a person’s emotion, personality, relationship to fellow teammates, goal and position in the meeting influences how they remember the meeting. We want to use this information to create memory aware conversational AI that could leverage such data to increase team cohesion in future meetings. To achieve this goal, we first record a multi-modal dataset of team meetings in a virtual setting. Second, administrate questionnaires to participants in different time intervals succeeding a session. Third, annotate the corpus. Fourth, carry out an initial corpus analysis to inform the design of memory-aware conversational AI. This micro-project will contribute to a longer-term effort in building a computational memory model for human-agent interaction.

Results Summary

The MEMO corpus was collected, which contains 45 group discussions around the topic of COVID-19. A total of 15 groups were formed, consisting of 3 to 6 participants who took part in 3 group discussions, with a 3-4 day gap between sessions. A total of 59 individuals with diverse backgrounds took part in the study. Before and after each session participants completed a series of questionnaires to determine which moments they recalled from their conversations, along with their personality traits, values and perceptions.

To capture conversational memory, we collected first-party free-recall reports of the most memorable moments from the discussion immediately after the interaction and again 3-4 days later. For the shorter-term memories, participants also mapped the moments to a particular interval in the video of their discussion, which were used for the ground-truth conversational memory annotations.

For each participant, personality and value profiles were recorded in the pre-screening survey, along with demographic information to identify their social group affected by COVID-19. Pre-session questionnaires also assessed participants’ mood before each session. Post-session questionnaire included questions about mutual understanding, personal attitude and perceived social distance. The perception of the discussion and the group as a whole was also monitored in the post-session questionnaire with variables such as Task and Group Cohesion, Entitativity, Perceived Interdependence, Perceived Situation Characteristics, Syncness, and Rapport.

The following automatic annotations were extracted on the corpus:

* Transcripts – Transcripts were generated with automatic speech recognition methods and were manually reviewed and corrected where needed. Transcript timestamps are available at the utterance level as well as word-level text grid files for each recording. Speaker diarization is also available.

* Eye gaze and head pose – automatically annotated with EyeWare software, the annotation itself will be provided, but the code uses proprietary API. This includes gaze targets collected through screenshots of participants’ screen views.

* Prosody – eGeMAPS feature set was extracted from the default eGeMAPS configuration in OpenSmile

* Body pose – Body pose (upper body only) and hand pose when visible were estimated with the models available in the MediaPipe software

* Facial action units – Facial action units were estimated for participants using the OpenFace Software

A Paper describing the corpus and the annotations in more detail is in preparation. Additionally, the collected annotations are to be packaged in an appropriate manner for ease of use for future researchers.

Tangible Outcomes

Tsfasman, M., Fenech, K., Tarvirdians, M., Lorincz, A., Jonker, C., & Oertel, C. (2022). Towards creating a conversational memory for long-term meeting support: predicting memorable moments in multi-party conversations through eye-gaze. In ICMI 2022 – Proceedings of the 2022 International Conference on Multimodal Interaction (pp. 94-104). (ACM International Conference Proceeding Series). Association for Computing Machinery (ACM). https://doi.org/10.1145/3536221.3556613
summary

Contact person: Eric Blaudez, (eric.blaudez@thalesgroup.com)

Internal Partners:

Thales, Eric Blaudez, eric.blaudez@thalesgroup.com
Unibo, Paolo Torrini, p.torroni@unibo.it
CNRS

External Partners:

LISN, Christophe Servan c.servan@qwant.com

The micro-project provides a demonstration of the hierarchical framework for collaboration described in the Humane-AI Net revised strategic work plan, by constructing a multimodal and multilingual conversational agents focused on search. The framework is based on hierarchical levels of abilities:

Reactive (sensori-motor) Interaction: Interaction is tightly-coupled perception-action where actions of one agent are immediately sensed and interpreted as actions of the other. Examples include greetings, polite conversation and emotional mirroring
Situated (Spatio-temporal) Interaction Interactions are mediated by a shared model of objects and relations (states) and shared models for roles and interaction protocols.

In this micro-project, we focused on the 2 first levels (Reactive and Situational) and designed the global framework architecture to show a Proof of Concept (PoC).

Results Summary

Tangible Outcomes

T-KEIR: https://github.com/ThalesGroup/t-keir
erc-unibo-module: https://github.com/helemanc/erc-unibo-module

Contact person: Jan Hajic, Charles Univ, (jan.hajic@mff.cuni.cz)

Internal Partners:

Charles Univ, Jan Hajic
DFKI, Thierry deClerck

SynSemClass is a dataset created in a previous Humane AI Net Micro-Project called META-O-NLU. The objective of this micropropject was to convert SynSemClass to a Linguistic Linked Open Data (LLOD) datas, connecting it to the huge amount or interlinked data already available. Linguistic Linked Open Data is a generic term for a set of mutually connected language resources, using ontological relations. The connections between concepts and between concepts and their expression in natural language make them suitable for both research and industrial applications in the area of content analysis, natural language understanding, inferencing and other tasks.

Results Summary

The concrete deliverable for this micro project was an LLOD for SynSemClass, connecting it to the huge amount or interlinked data already available. A partner is involved in the Prêt-à-LLOD H2020 project, making this project synergistic in nature and multiplicative in

terms of results in previous projects. Partners are also involved in the COST Action “European network for Web-centered linguistic data science” (NexusLinguarum).

Contact person: James Crowley (James@crowley-coutaz.fr)

Internal Partners:

Eotvos Lorand University – ELTE, Andras Lorincz
Univ Grenoble Alpes, Dominique Vaufreydaz, Fabien Ringeval
Uni Paris Saclay, Camille Guinaudeau, Marc Evrard
Jozef Stefan Institut-JSI, Marko Grobelnik
Charles University, Pavel Pecina

Transformers and self-attention (Vaswani et al., 2017), have become the dominant approach for natural language processing (NLP) with systems such as BERT (Devlin et al., 2019) and GPT-3 (Brown et al., 2020) rapidly displacing more established RNN and CNN structures with an architecture composed of stacked encoder-decoder modules using self- attention. This micro-project provide tools and data sets for experiments and a first initial demonstration of the potential of transformers for multimodal perception and multimodal interactions. We define research challenges, benchmark data sets and performance metrics for multimodal perception and interaction tasks such as (1) audio-visual narration of scenes, cooking actions and activities, (2) audio-video recordings of lectures and TV programs (3) audio-visual deictic (pointing) gestures, and (4) perception and evocation of engagement, attention, and emotion.

1) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. and Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762

2) Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

3) Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., and Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.

Results Summary

In this project, we explore the potential of Transformer-based models in two significant domains: unsupervised object discovery and multimodal emotion recognition using physiological signals. First, we demonstrate a novel approach for unsupervised object discovery by leveraging self-supervised learning with self-distillation loss (DINO). Our method utilizes visual tokens as nodes in a weighted graph, where edges reflect connectivity scores based on token similarity. By applying a normalized graph-cut and solving it through spectral clustering with generalized eigen-decomposition, we isolate foreground objects. This approach effectively segments self-similar regions, with the second smallest eigenvector of the decomposition providing the cutting solution that indicates token association with foreground objects. This technique not only simplifies the object discovery process but also achieves substantial performance improvements over current state-of-the-art methods such as LOST, outperforming it by 6.9%, 8.1%, and 8.1% on the VOC07, VOC12, and COCO20K benchmarks, respectively. Furthermore, integrating a second-stage class-agnostic detector (CAD) enhances these results, and our method’s adaptability is demonstrated in its application to unsupervised saliency detection and weakly supervised object detection, achieving notable IoU improvements on the ECSSD, DUTS, and DUT-OMRON datasets.

In parallel, we address the challenge of multimodal emotion recognition from physiological signals using Transformer-based models. Recognizing the advantages of attention mechanisms in Transformers for creating contextualized representations, we propose a model for processing electrocardiogram (ECG) data to predict emotions. This model highlights significant segments of the signal, ensuring that relevant information is given priority. Due to the limited size of datasets with emotional labels, we adopt a self-supervised learning approach. We pre-train our model using unlabelled ECG datasets to build robust representations and then fine-tune it on the AMIGOS dataset for emotion recognition. Our findings confirm that this approach achieves state-of-the-art results in emotion recognition tasks involving ECG signals. Additionally, the success of this strategy underscores the broader potential of Transformers and pre-training techniques for analyzing time-series data in emotion recognition tasks.

Overall, the outcomes of our project demonstrate that Transformer-based models, coupled with self-supervised learning, can significantly enhance the performance of both unsupervised object discovery and emotion recognition from physiological signals. These methods provide robust solutions for complex visual and temporal signal analysis tasks, marking a substantial step forward in computer vision and affective computing.

Tangible Outcomes

Y. Wang, X. Shen, S. Hu, Y. Yuan, J. L. Crowley, D. Vaufreydaz, Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut. IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2022, pp14543-14553, New Orleans, Jun 2022.
https://arxiv.org/abs/2202.11539
J. Vazquez-Rodriguez, G. Lefebvre, J. Cumin and J. L Crowley, “Emotion Recognition with PreTrained Transformers Using Multimodal Signals”, 10th International Conference on Affective Computing and Intelligent Interaction (ACII), Oct 2022
https://ieeexplore.ieee.org/document/9953852 .
J. Vazquez-Rodriguez, G. Lefebvre, J. Cumin, J. L. Crowley. Transformer-Based SelfSupervised Learning for Emotion Recognition. 26th International Conference on Pattern Recognition (ICPR 2022), Aug 2022, Montreal, Canada.
https://arxiv.org/abs/2204.05103
A survey of tools and datasets for a multimodal perception with transformers (http://crowley-coutaz.fr/jlc/HumanE-AI-Net/TransfomerMicroProject/TransformerTools.pdf )
A tutorial on the use of transformers for multimodal perception. (http://crowley-coutaz.fr/jlc/Courses/ACAI2021/Multimodal-Transformer-Tutorial.html )
Report on challenges for the use of transformers for multimodal perception and interaction. (http://crowley-coutaz.fr/jlc/HumanE-AI-Net/TransfomerMicroProject/ReseachChallengesDataSets.pdf )

Contact person: Shivesh Kumar, (shivesh.kumar@dfki.de )

Internal Partners:

DFKI Bremen ,Melya Boukheddimi, Shivesh Kumar, shivesh.kumar@dfki.de
INRIA Paris, Justin Carpentier, justin.carpentier@inria.fr

The objectives of this Micro-Project were to create a software toolkit that makes it possible to achieve realistic human-like motions that can lead to a feeling of trust and comfort towards a robot. The project was based on a generic formalization of robot dancing which makes it possible to use musical features for choreography generation. The DFKI team has developed and evaluated this package using the open-source software Pinocchio developed by the INRIA Paris team with its recently introduced proximal formulation of the constrained dynamics.

Results Summary

In order to address the topic, two main contributions were introduced in this project: The first one is the proposition of a generic formalization of robot dancing which allows us to use musical features for choreography generation. Optimal dance trajectories were computed using direct optimal control. From this formalization we derive three different methods of dance generation, that differ in the level of flexibility, human involvement, and automatization. The methods are: imitated, improvised, and automatic choreography 35 generations. The imitated and improvised choreographic methods are based on beat timing extraction. The automatic choreography generation method,uses the additional music features volume and vocal melody.The results are validated on 4 different music pieces in simulation using the dynamic simulator MuJoCo as well as in experiments on the real robot RH5 Manus. This work was published in the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022, and was selected as a finalist for best entertainment and amusement paper. The second one focuses on the ability of exploiting the full capabilities of a robot through motion generation, with the aim of achieving motions that are more human-like and that can lead to a certain trust and comfort feeling of the human towards the robot acting in its environment. To this purpose, we proposed a first study on resolving all the loop-closure constraints of the series-parallel hybrid robot RH5 Manus within the trajectory optimization process. To this end, we use the open-source software Pinocchio developed by the INRIA Paris team with its recently introduced proximal formulation of the constrained dynamics. This approach allows us to converge to an optimal solution according to the least squares principle, even in the context of singularities. Among the optimization methods available in the literature, the differential dynamics programming (DDP) approach was used to generate optimal trajectories with respect to the constrained dynamics. Results are presented in simulation as well as experiments on the real robot. This work is significant for humanoid robots based on electric actuation where one must seek to push the robot to its limits to achieve human like agility. This work has been submitted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023 and is under review.

Tangible Outcomes

Melya Boukheddimi, Daniel Harnack, Shivesh Kumar, Rohit Kumar, Shubham Vyas, Octavio Arriaga, Frank Kirchner, Robot Dance Generation with Music Based Trajectory Optimization, In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022, (IROS-2022), IEEE, Nov/2022.
[under review] Melya Boukheddimi , Rohit Kumar , Shivesh Kumar , Justin Carpentier , and Frank Kirchner, Investigations into Exploiting the Full Capabilities of a Series-Parallel Hybrid Humanoid using Whole Body Trajectory Optimization, In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023, (IROS-2022), IEEE, Nov/2023.
Videohttps://www.youtube.com/watch?v=aN_v39p17tg
Video : https://www.youtube.com/watch?v=MA42YUg3e8E

Contact person: Agnes Grünerbl (agnes.gruenerbl@dfki.de)

Internal Partners:

DFKI, EI, Agnes Grünerbl

External Partners:

Health Department, Unviversity of Southampton, Eloise Monger

High-quality education and training of nurses are of utmost importance to keep high standards in medical care. Nevertheless, as the covid pandemic has shown quite impressively, there are too few healthcare professionals available. Therefore, education and training of nurse students, or adapting the training of nurses is challenged to accelerate, to have manpower of nurses available when it is required. Still, accelerating training often comes with reduced quality, which can easily lead to bad qualifications and, in the worst case, to a lethal outcome. Thus, in nurse training a pressing question is, how to optimize and with it accelerate training without suffering in quality. One of the significant questions for teachers in training nurse students is to understand the state of a student’s education. Are some students in need of more repetitions? Which students can proceed to the next level, who is ready to get in contact with actual patients? In this regard, optimization of training means to individualize, not only individualize the training of students but also individualize the feedback and information a teacher gets about their way of teaching. We believe this to be a field where Artificial Intelligence (AI) and more specifically the application of foundational models (LLMs large language models, paired with other methods of machine learning) can provide real support. In the first part of this microproject, together with Nurse-Teachers of the University of Southampton, we want to define and design an LWM that fits the requirements of nurse training. For this, 2-3 nurse teachers from Southampton will visit DFKI in order to get a feeling for systems that are available, and also what applications are feasible. In turn, researchers of DFKI will visit the nurse training facilities in Southampton to get a better picture of how nurse training is conducted. At the end of this first phase of the microproject, an LWM (large whatever model) is defined (existing LLMs combined with additional features and data sources, as required). In the second phase, this LWM will be implemented and tested against videos of recorded training sessions. Specific focus will be set on:• How to understand the action of a particular person?• Actions taken by the trainee, are they correct or false? What would have been the correct action?• Which teaching efforts work and which do not as much? • Which useful suggestions and feedback can be provided to the trainees and teachers?

Results Summary

Building models of medical procedures require efforts that go beyond the scope and time frame of a micro-project. Therefore, this work is still ongoing and will proceed after the end of the Humane AI Net.

So in regards of project result at the time Humane AI Net ended is:

identification of scenarios with a potential for generative AI to benefit health training – training of cannulation and venipuncture
defining a procedure how to introduce Generative AI in training of cannulation and venipuncture.
planning a study towards developing the required LWM models
recording an extensive data-set in an actual medical training facility following actual training procedures.
starting the long process of data processing and algorithm development (which is ongoing)

We collected a dataset consisting of: 90h of video (20 person recording 4 sessions of about 20+ min each, from 3 different cameras) acompanied with respective IMU Data + GoPro user view + audio recording and expert feedback of the process of cannulation and venipuncture.

Tangible Outcomes

[arxiv] Stefan Fritsch and Matthias Tschoepe and Vitor Fortes Rey and Lars Krupp and Agnes Gruenerbl and Eloise Monger and Sarah Travenna, GenAI Assisting Medical Training, arXiv, mobiCHAI workshop in MobileHCI2024 https://arxiv.org/abs/2410.16164
presented at: mobiCHAI – 1st International Workshop on Mobile Cognition-Altering Technologies (CAT) using Human-Centered AI, at The ACM International Conference on Mobile Human-Computer Interaction Melbourne, Australia https://ai-enhanced-cognition.com/mobichai/

Contact person: Sencer Melih Deniz (sencer.deniz@tubitak.gov.tr )

Internal Partners:

TUBITAK BILGEM, Sencer Melih Deniz, sencer.deniz@tubitak.gov.tr
DFKI Kaiserslautern, Hamraz Javaheri, Hamraz.Javaheri@dfki.de

This micro project investigated whether EEG (electroencephalography) signal can be used for detecting the motion as well as the variable weights a person is lifting. An experimental paradigm has been designed and EEG data have been acquired during performing biceps flexion-extension motions for different weight categories: lifting with no weight (empty), medium, and heavy lifting. The outcomes of the project can be applied in industrial exoskeleton applications as well as physical rehabilitation of stroke patients.

Results Summary

Features in EEG data generating difference for each lifted weight of category have been investigated. EEG data via different two EEG headsets have been collected from various participants while they lift different categories of load, namely empty, medium and heavy, in this project. Then, EEG data have been analyzed to realize if different category of weights result in difference in EEG data by applying different deep learning methods together with different machine learning methods. According to the obtained results, it can be said that EEG signals can be successfully used as a method to predict different loads during dynamic bicep curl motion. Therefore, this result could result in more research to develop rehabilitation systems robust to dynamic changes in weight. Moreover, information regarding weight change could contribute to a better estimation of fatigue conditions to be used in sports and training applications. Finally, it has been evaluated that the approach to predict different categories of lifted weight could be used in further optimizations in industrial applications for which usage of exoskeleton can be given as an example. Results of Micro-Project in which TUBITAK BILGEM and Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) Kaiserslautern collaborated was presented at the IEEEEMBS International Conference on Biomedical and Health Informatics jointly organised with the IEEE-EMBS International conference on Wearable and Implantable Body Sensor Networks organized in Ioannina, Greece between 27-30 September 2022. Also it was published with the title “Prediction of Lifted Weight Category Using EEG Equipped Headgear” in 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics Conference Proceedings.

Tangible Outcomes

“Prediction of Lifted Weight Category Using EEG Equipped Headgear”, published in 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics.
https://ieeexplore.ieee.org/document/
dataset: https://www.ai4europe.eu/research/research-bundles/neural-mechanism-human-brain-activity-during-weight-lifting?category=ai_assets
Presentation at the IEEE-EMBS International Conference on Biomedical and Health Informatics jointly organised with the IEEE-EMBS International conference on Wearable and Implantable Body Sensor Networks organized in Ioannina, Greece between 27-30 September 2022.

Contact person: Frank van Harmelen (Frank.van.Harmelen@vu.nl)

Internal Partners:

Stichting VU, Frank.van.Harmelen@vu.nl
Universitat Pompeu Fabra (UPF), luc.steels@upf.edu

IRL, developed by Luc Steels and collaborators, is a parsing technique that captures the semantics of a natural language expression as a network of logical constraints. Determining the meaning of a sentence then amounts to finding a consistent assignment of variables that satisfies these constraints.Typically, such meaning can only be determined (i.e., such constraints can only be resolved) by using the context (“narrative”) in which the sentence is to be interpreted. The central hypothesis of this project is that modern large-scale knowledge graphs are a promising source of such contextual information to help resolve the correct interpretation of a given sentence.We developed an interface between an existing IRL implementation and an existing knowledge-graph reasoning engine to test this hypothesis. Evaluation will be done on a corpus of sentences from social-historical scientific narratives against corresponding knowledge graphs with social-historical data.

Results Summary

This micro-project aims to build a bridge between a language processing system (incremental recruitment language (IRL)) and semantic memory (knowledge graphs), for building and parsing narratives.In IRL, a sentence is represented as a network of logical constraints. Resolving the interpretation of a sentence comes down to finding a consistent assignment of entities from the real world that satisfy these constraints. In this microproject, we have used knowledge graphs and other open data repositories as an external source of world knowledge that can be used to bind and disambiguate entities in context.

We have implemented a new library called Web-Services that interacts, through the use of APIs, with several open data knowledge repositories, and integrates their semantic facts into language models such as IRL. Using the Web-Services library, users can write IRL programs that send requests to different open data APIs, or convert SPARQL queries into RESTful APIs using GRLC.

Tangible Outcomes

Web-Services library – Steels, Luc & Van Harmelen, Frank & Van Trijp, Remi & Galletti, Martina & Kozakosczak, Jakub & Stork, Lise & Tiddi, Ilaria https://github.com/SonyCSLParis/web-services
Video presentation summarizing the project

Contact person: Silvia Tulli (tulli@isir.upmc.fr)

Internal Partners:

ISIR, Sorbonne University, Silvia Tulli

Interactive Machine Learning (IML) has gained significant attention in recent years as a means for intelligent agents to learn from human feedback, demonstration, or instruction. However, many existing IML solutions primarily rely on sparse feedback, placing an unreasonable burden on the expert involved. This project aims to address this limitation by enabling the learner to leverage richer feedback from the expert, thereby accelerating the learning process. Additionally, we seek to incorporate a model of the expert to select more informative queries, further reducing the burden placed on the expert.

We have three objetives:

(1) Explore and develop methods for incorporating causal and contrastive feedback, as supported by evidence from psychology literature, into the learning process of IML.

(2) Design and implement a belief-based system that allows the learner to explicitly maintain beliefs about the possible expert objectives, influencing the selection of queries.

(3) Utilize the received feedback to generate a posterior that informs subsequent queries and enhances the learning process within the framework of Inverse ReinforcementLearning (IRL).

The project addresses several key aspects highlighted in the work package on Collaboration with AI Systems (W1-2). Firstly, it focuses on AI systems that can communicate and understand descriptions of situations, goals, intentions, or operational plans to establish shared understanding for collaboration. By explicitly maintaining beliefs about the expert’s objectives and integrating causal and contrastive feedback, the system aims to establish a common ground and improve collaboration. Furthermore, the project aligns with the objective of systems that can explain their internal models by providing additional information to justify statements and answer questions. By utilizing the received feedback to generate a posterior and enhance the learning process, the system aims to provide explanations, verify facts, and answer questions, contributing to a deeper understanding and shared representation between the AI system and the human expert. The project also demonstrates the ambition of enabling two-way interaction between AI systems and humans, constructing shared representations, and allowing for the adaptation of representations in response to information exchange. By providing tangible results, such as user-study evaluations and methods to exploit prior knowledge about the expert, the project aims to make measurable progress toward collaborative AI.

Results Summary

This project resulted in an exchange period during which our collaborator came to our lab and spent a month with us. This opportunity allowed us to conceptualize and write a paper that we plan to submit to the IJCAI conference in December 2024. The paper addresses the challenge of learning from individuals who have a different model of the task. Specifically, we focused on identifying human bottleneck states, determining the maximal achievable set of these states given the robot’s model of the task, and querying for the bottlenecks when they cannot be achieved due to the constraints of the robot model.

In addition, we have also begun working on a survey paper regarding human modeling in sequential decision-making, which has led to a workshop paper that we are currently extending for journal publication.

Tangible Outcomes

[arxiv] Human-Modeling in Sequential Decision-Making: An Analysis through the Lens of Human-Aware AI by Silvia Tulli, Stylianos Loukas Vasileiou, Sarath Sreedharan https://arxiv.org/abs/2405.07773

Contact person: Mauro Dragoni (dragoni@fbk.eu)

Internal Partners:

Fondazione Bruno Kessler, Mauro Dragoni

External Partners:

University of Verona, Marco Rospocher

Procedural documents are a source of temporal procedural knowledge of uttermost importance. These documents are different in format and scope, as they range from the description of administrative procedures to service manuals to medical guidelines and surgical procedures. The extraction of this complex and multidimensional knowledge, which includes a strong temporal dimension usually paired with further static dimensions concerning, for example, resources, tools, objects, costs, and so on, would be of the utmost importance for several tasks ranging from information extraction to validation and verification of the procedures themselves, up to the construction of AI-based systems that have to deal with these procedures (think for instance to an expert surgical system and assistant which may be involved in several different surgery procedures). Knowledge graphs are a natural and expressive knowledge structure where to represent such multidimensional knowledge, and indeed the insertion of temporal knowledge within knowledge graphs is one of the hot challenges in this area. Nonetheless, the automated construction of knowledge graphs from procedural documents is a challenging research area. Here, the lack of annotated data, as well as raw text repositories describing real-world procedural documents, makes it extremely difficult to adopt deep learning approaches. Pre-trained language models showed promising results concerning the knowledge extraction tasks from the models themselves. Although several works explored this strategy to build knowledge graphs, the viability of knowledge base construction by using a prompt-based learning strategy from such language models has not yet been investigated deeply. In this MP, we would like to investigate the usage of prompt-based in-context learning strategy to extract, from natural language process descriptions, conceptual information that can be converted into their equivalent knowledge graphs. In particular, we would like to investigate the adoption of a multi-turn dialog strategy and the insertion of prompts of appropriate conceptual knowledge (e.g., definitions of the concepts to extract)or different types of examples (including negative examples), especially for the extraction of tasks and temporal flow relations between tasks. As such, the work can contribute to the construction of structured narratives using machine learning models and hopefully enrich conceptual knowledge in input. Moreover, the adoption of a multi-turn dialog strategy could provide insight into how these models can be used to complement the multi-turn dialog strategy usually adopted by domain experts in traditional knowledge modeling pipelines.

Knowledge 4 All Foundation Ltd.
Betchworth House
57-65 Station Road
Redhill, Surrey, RH1 1DL

Humane AI on Social Media

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 952026.

[TMP-082] Multi-Relational Contextual Reasoning for Complex Scene Generation for Autonomous Vehicle Data

Results Summary

Tangible Outcomes

[TMP-078] Meta-level amortization for user modeling and collaborative AI

[TMP-077] Memory Aware Conversational AI to aid virtual Team- Meetings

Results Summary

Tangible Outcomes

[TMP-083] Multilingual & Multimodal conversational agent

Results Summary

Tangible Outcomes

[TMP-085] Multilingual SynSemClass for the Semantic Web (acronym: MSSW)

Results Summary

[TMP-086] Multimodal Perception and Interaction with Transformers

Results Summary

Tangible Outcomes

[TMP-088] Natural Imitation of Dance Moves and Human Gestures with a Humanoid Robot

Results Summary

Tangible Outcomes

[TMP-072] LWMs teaching to teach

Results Summary

Tangible Outcomes

[TMP-090] Neural mechanism in human brain activity during weight lifting

Results Summary

Tangible Outcomes

[TMP-071] Linking language and semantic memory for building narratives

Results Summary

Tangible Outcomes

[TMP-069] Learning from Richer Feedback Through the Integration of Prior Beliefs

Results Summary

Tangible Outcomes

[TMP-068] Knowledge Extraction Through Prompting on Pre-trained Language Models

Knowledge 4 All Foundation Ltd.

Humane AI on Social Media