The project aims to develop a Framework for multimodal & multilingual conversational agents focus on. The framework is based on hierarchical levels of abilities:

– Reactive(sensori-motor) Interaction: Interaction is tightly-coupled perception-action where actions of one agent are immediately sensed and interpreted as actions of the other. Examples include greetings, polite conversation and emotional mirroring

– Situated (Spatio-temporal) Interaction Interactions are mediated by a shared model of objects and relations (states) and shared models for roles and interaction protocols.

– Operational Interaction Collective performance of tasks.

– Praxical Interaction Sharing of knowledge about entitles, relations, actions and tasks.

– Creative Interaction Collective construction of theories and models that predict and explain phenomena.

On this microproject we focus on the 2 first levels (Reactive & Situational) and design the global framework architecture. The work performed in this project will be demontrated in a PoC.

Output

OSS Framework (Level 1 and 2)

Project Partners:

  • Università di Bologna (UNIBO), Paolo Torroni

Primary Contact: Eric Blaudez, THALES

Research at the intersection of artificial intelligence (AI) and extended reality (XR) has amounted to substantial literature over the past 20 years. Applications cover a broad spectrum, for example, visualising neural networks in virtual reality or interacting with conversational agents. However, a systematic overview is currently missing.

This micro-project addresses this gap with a scoping review covering two main objectives: First, it aims to give an overview of the research conducted at the intersection of AI and XR. Secondly, we are particularly interested in revealing how XR can be used to improve interactive grounding in human-AI interaction. In summary, the review focuses on the following guiding questions: Which are the typical AI methods used in XR research? Which are the main use cases at the intersection of AI and XR? How can XR serve as a tool to enhance interactive grounding in human-AI interaction?

Output

Conference or journal paper co-authored by the proposers (possibly with et. partners)

Dataset of the papers including codes

Project Partners:

  • Københavns Universitet (UCPH), Teresa Hirzle
  • Københavns Universitet (UCPH), Kasper Hornbæk
  • Ludwig-Maximilians-Universität München (LMU), Florian Müller

Primary Contact: Teresa Hirzle, University of Copenhagen, Department of Computer Science

This MP studies the problem of how to alert a human user to a potentially dangerous situation, for example for handovers in automated vehicles. The goal is to develop a trustworthy alerting technique that has high accuracy and minimum false alerts. The challenge is to decide when to interrupt, because false positives and false negatives will lower trust. However, knowing when to interrupt is hard, because you must take into account both the driving situation and the driver's ability to react given the alert, moreover this inference must be done based on impoverished sensor data. The key idea of this MP is to model this as a partially observable stochastic game (POSG), which allows approximate solutions to a problem where we have two adaptive agents (human and AI). The main outcome will be an open library called COOPIHC for Python, which allows modeling different variants of this problem.

Output

COOPIHC library (Python)

Paper (e.g,. IUI’23 or CHI’23)

Project Partners:

  • Aalto University, Antti Oulasvirta
  • Centre national de la recherche scientifique (CNRS), Julien Gori

Primary Contact: Antti Oulasvirta, Aalto University

The broad availability of 3D-printing enables end-users to rapidly fabricate personalized objects. While the actual manufacturing process is largely automated, users still need knowledge of complex design applications to not only produce ready-designed objects, but also adapt them to their needs or even design new objects from scratch.

In this project, we explore an AI-powered system that assists users in creating 3D objects for digital fabrication. For this, we propose to use natural language processing (NLP) to enable users to describe objects using their natural language (e.g., "A green rectangular box."). In this micro project, we conduct a Wizard-of-Oz study to elicit the requirements for such a system. The task of the participants is to recreate a given object using a spoken description with iterative refinements. We expect that this work will support the goal to make personal digital fabrication accessible for everyone.

Output

Requirements for voice-based 3D design

dataset

Design specification for a NLP model to support voice-based 3D design

Project Partners:

  • Ludwig-Maximilians-Universität München (LMU), Florian Müller/Albrecht Schmidt

Primary Contact: Florian Müller, LMU Munich

The communication between patients and healthcare institutions is increasingly moving to digital applications. Whereas information about the patient’s wellbeing is typically collected by means of a questionnaire, this is a tedious task for many patients, especially when it has to be done periodically, and may result in incomplete or imprecise input. Much can be gained by making the process of filling in such questionnaires more interactive, by deploying a conversational agent that can not only ask the questions, but also ask follow-up questions and respond to clarification questions by the user. We propose to deploy and test such a system.

Our proposed research aligns well with the WP3 focus on human-AI communication, and will lead to re-usable conversation patterns for conducting questionnaires in healthcare. The work benefit from existing experience with patient-provider communication within Philips and build on the SUPPLE framework for dialog management and sequence expansion.

Output

A dataset on conversation(s) between a patient and a conversational AI

A dialog model derived from the dataset

Scientific publication

Project Partners:

  • Philips Electronics Nederland B.V., Aart van Halteren
  • Stichting VU, Koen Hindriks

Primary Contact: Aart van Halteren, Philips Research

We study proactive communicative behavior, where robots provide information to humans which may help them to achieve desired outcomes, or to prevent possible undesired ones. Proactive behavior in an under-addressed area in AI and robotics, and proactive human-robot communication is even more so. We will combine the past expertise of Sorbonne Univ. (intention recognition) and Orebro Univ. (proactive behavior) to define proactive behavior based on the understanding of user’s intentions, and then extend it to consider communicative actions based on second-order perspective awareness.

We propose an architecture able to (1) estimate the human's intention of goal, (2) infer robot’s and human’s knowledge about foreseen possible upcoming outcomes of intended goal, (3) detect opportunities for desirability of intended goal to robot be proactive, (4) select action from the listed opportunities. The theoretical underpinning of this work will contribute to the study of theory of mind in HRI.

Output

Jupyter Notebook / Google Colab that presents the code of proposed architecture and is able to provide plug and play interaction.

a manuscript describing the proposed architecture and initial findings of the experiment

Presentations

Project Partners:

  • Sorbonne Université, Mohamed CHETOUANI
  • Örebro University (ORU), Alessandro Saffioti and Jasmin Grosinger

Primary Contact: Mohamed CHETOUANI, Sorbonne University

Main results of micro project:

The goal of this micro-project is to develop a cognitive architecture able to generate proactive communicative behaviors during human-robot interactions. The general idea is to provide information to humans which may help them to achieve desired outcomes, or to prevent possible undesired ones. Our work proposes a framework that generates and selects among opportunities for acting based on recognizing human intention, predicting environment changes, and reasoning about what is desirable in general. Our framework has two main modules to initiate proactive behavior; intention recognition and equilibrium maintenance.
The main achievements are:
– Integration of two systems: user intention recognition and equilibrium maintenance in a generic architecture
– Showing stability of the architecture to many users
– Reasoning mechanism and 2nd order perspective awareness
The next steps will aim to show knowledge repair, prevent outcomes of lack of knowledge and improve trustability, transparency and legibility (user study)

Contribution to the objectives of HumaneAI-net WPs

– Playground system that HumaneAI-net partners could define their interactive scenario to play with the robot’s proactivity.

-T3.3 -> Study about how to model human rationality to detect and use computationally defined human belief, goal and intention. Then, use that model to make robots proactive. Human in the loop system to support cooperative behavior of robots while sharing the environment by generating proactive communication.

-T3.1 -> Study relates robots that generate proactive communication, possible effects on human cognition and interaction strategies.

Tangible outputs

Many industrial NLP applications emphasise the processing and detection of nouns, especially proper nouns (Named Entity Recognition, NER). However, processing of verbs has been neglected in recent years, even though it is crucial for the development of full NLU systems, e.g., for the detection of intents in spoken language utterances or events in written language news articles. The META-O-NLU microproject focuses on proving the feasibility of a multilingual event-type ontology based on classes of synonymous verb senses, complemented with semantic roles and links to existing semantic lexicons. Such an ontology shall be usable for content- and knowledge-based annotation, which in turn shall allow for developing NLU parsers/analyzers. The concrete goal is to extend the existing Czech-English SynSemClass lexicon (which displays all the necessary features, but only for two languages) by German and Polish, as a first step to show it can be extended to other languages as well.

Output

Common paper co-authored by the proposers (possibly with et. partners)

Extended version of SynSemClass (entried in additional languages)

Presentations

Project Partners:

  • Charles University Prague, Jan Hajič
  • German Research Centre for Artificial Intelligence (DFKI), Georg Rehm

Primary Contact: Jan Hajič, Univerzita Karlova (Charles University, CUNI)

Main results of micro project:

The main results of the META-O-NLU microproject is the extension of the original SynSemClass dataset by German classes, or more precisely, the inclusion of German verbs and event descriptors to the existing classes in SynSemClass. Together with the individual verbs, existing German lexical resources have been linked to (GermaNet, E-VALBU and GUP). Adding a third language demonstrated that future extension to other languages is feasible, in terms of annotation rules, the dataset itself, and in creating a new web browser that can show all language entries alongside each other with all the external links. The data is freely available in the LINDAT/CLAIRAH-CZ repository (and soon also through the Euroepan Language Grid) and a web browser on the resources is now also available.

Contribution to the objectives of HumaneAI-net WPs

Task 3.6 focuses on both spoken and written language-based interactions (dialogues, chats), in particular questions of multilinguality that are essential to the European vision of human-centric AI. The results of this microproject contribute especially to the multlingual issue, and is directed to full NLU (Natural Language Understanding) by describing event types, for which no general ontology exists yet. The resulting resource will be used for both text and dialog annotation, to allow for evaluation and possibly also for training of NLU systems.

Tangible outputs

The aim of the project is to investigate both the theoretical and empirical roles of agency in successful human-computer partnerships. For human-centred AI research, the understanding of agency is a key factor in achieving effective collaboration. Although recent advances in AI have enabled systems to successfully contribute to human-computer interaction, we are interested in extending this such that the interaction acts more like a ‘partnership’. This requires building systems with collaborative agency that users can manipulate in the process. Research questions include: 1) identifying which parameters are relevant to the description of the system agency, 2) what impact these parameters have on the perceived agency and 3) how to modify them in order to achieve different roles of systems in a process.

Output

Theoretical: Literature review on agency / research paper / define parameters

Empirical: Demo (paper, video, interactive)

Project Partners:

  • Institut national de recherche en sciences et technologies du numérique (INRIA), Janin Koch
  • Ludwig-Maximilians-Universität München (LMU), Albrecht Schmidt
  • Københavns Universitet (UCPH), Kasper Hornbaek
  • Stichting VU, Koen Hindriks
  • Umeå University (UMU), Helena Lindgren

Primary Contact: Janin Koch, Inria

Attachments

Agency_MicroProject_Koch_Mackay_March17.mov

Exloring the Impact of Agency INRIA J Koch Agency_MP3_Berlin.mov

We propose to research how autobiographical recall can be detected in virtual reality (VR). In particular, we experimentally investigate what physiological parameters accompany interaction with autobiographical memories in VR. We consider VR as one important representation of Human-AI collaboration.

For this, we plan to (1) record an EEG data set of people’s reaction and responses when recalling an autobiographical memory, (2) label the data set, and (3) do an initial analysis of the dataset to inform the design of autobiographical VR experiences. We would try to automate data collection as much as possible to make it easy to add more data over time.

This will contribute to a longer-term effort in model and theory formation. The main Contribution is to WP3. This is set in Task 3.2: Human-AI Interaction/collaboration paradigms and aims at better understanding user emotion in VR to model self-relevance in AI collaboration Task 3.4.

Output

dataset on autobiographical recall in VR

a manuscript describing the data set and initial insights into autobiographical recall in VR

Presentations

Project Partners:

  • Ludwig-Maximilians-Universität München (LMU), Albrecht Schmidt
  • German Research Centre for Artificial Intelligence (DFKI), Paul Lukowicz and Patrick Gebhard

Primary Contact: Albrecht Schmidt, Ludwig-Maximilians-Universität München

Main results of micro project:

We have developed VR experiences for research on autobiographical recall in virtual reality (VR). This allows us to experimentally investigate what physiological parameters accompany self-relevant memories elicited by digital content. We have piloted the experiment and are currently recording more data on the recall of autobiographical memories. After data collection is complete, we will label the data set, and do an initial analysis of the dataset to inform the design of autobiographical VR experiences. We have also co-hosted a Workshop on AI and human memory.

Contribution to the objectives of HumaneAI-net WPs

The main Contribution is to WP3. This is set in Task 3.2: Human-AI Interaction/collaboration paradigms and aims at better understanding user emotion in VR to model self-relevance in AI collaboration Task 3.4. The VR experience is implemented in Unity and we are happy to share this in the context of a joint project.

Tangible outputs

In this micro-project, we propose investigating human recollection of team meetings and how conversational AI could use this information to create better team cohesion in virtual settings.

Specifically, we would like to investigate how a person's emotion, personality, relationship to fellow teammates, goal and position in the meeting influences how they remember the meeting. We want to use this information to create memory aware conversational AI that could leverage such data to increase team cohesion in future meetings.

To achieve this goal, we plan first to record a multi-modal data-set of team meetings in a virtual-setting. Second, administrate questionnaires to participants in different time intervals succeeding a session. Third, annotate the corpus. Fourth, carry out an initial corpus analysis to inform the design of memory-aware conversational AI.

This micro-project will contribute to a longer-term effort in building a computational memory model for human-agent interaction.

Output

A corpus of repeated virtual team meetings (6 sessions spaced, 1 week each)

manual annotations (people’s recollection of the team meeting etc.)

automatic annotations (e.g. eye-gaze, affect, body posture etc.)

A paper describing the corpus and insights gained on the design of memory-aware agents from initial analysis

Project Partners:

  • TU Delft, Catholijn Jonker
  • Eötvös Loránd University (ELTE), Andras Lorincz

Primary Contact: Catharine Oertel, TU Delft

Main results of micro project:

1) A corpus of repeated virtual team meetings (4 sessions spaced, 4 days apart each).
2) Manual annotations (people's recollection of the team meeting etc.)
3) Automatic annotations (e.g. eye-gaze, affect, body posture etc.)
4)A preliminary paper describing the corpus and insights gained on the design of memory-aware agents from initial analysis

Contribution to the objectives of HumaneAI-net WPs

In this micro-project, we propose investigating human recollection of team meetings and how conversational AI could use this information to create better team cohesion in virtual settings.
Specifically, we would like to investigate how a person's emotion, personality, relationship to fellow teammates, goal and position in the meeting influences how they remember the meeting. We want to use this information to create memory aware conversational AI that could leverage such data to increase team cohesion in future meetings.
To achieve this goal, we plan first to record a multi-modal data-set of team meetings in a virtual-setting. Second, administrate questionnaires to participants in different time intervals succeeding a session. Third, annotate the corpus. Fourth, carry out an initial corpus analysis to inform the design of memory-aware conversational AI.
This micro-project will contribute to a longer-term effort in building a computational memory model for human-agent interaction.

Tangible outputs

  • Dataset: MEMO – Catharine Oertel
  • Publication: MEMO dataset paper – Catharine Oertel
  • Program/code: Memo feature extraction code – Andras Lorincx

We propose to research a scalable human-machine collaboration system with the common goal of executing high quality actions (e.g., in rehabilitation exercise). We combine video and speech for video-grounded goal-oriented dialogue. We build on our video and text database. The database has exercises for rehabilitation following knee injuries. We evaluate high performance body pose estimation tools and compare it to a real-time body pose estimation tool to be developed for smartphones via ‘knowledge distillation’ methods.

The complementing part of the project deals with the texts that we have collected for these exercises and estimates the amount of texts needed for dialogues that can lead and correct the quality of exercises. Potential topics/intents include pose relative to camera, proper light conditions, audio-visual information about pain, notes about execution errors, errors discovered by the computer evaluations, requests about additional information from the patient, and reactions to other, unrelated queries.

Output

Dataset of the dialogues

Publication on the constraints and potentials of existing state-of-the-art methods

Performance evaluation methods and usability studies

Presentations

Project Partners:

  • Eötvös Loránd University (ELTE), András Lőrincz
  • Charles University Prague, Ondřej Dušek

Primary Contact: András Lőrincz, Eotvos Lorand University

Main results of micro project:

Machine assisted physical rehabilitation is of special interest since (a) it is a relatively narrow field, but (b) observation and interactions are multimodal and include natural language processing, video processing, speech recognition and generation and (c) it is a critical medical application. We considered rehabilitation after total knee replacement as a prototype scenario. We used 2D and RGBD cameras and dialogue systems at three levels, such as
(i) video-based feedback aiming both documentation and helping performance improvements,
(ii) additional rule-based dialogue with specific error detection, and
(iii) extensions with a data-driven dialogue system based on the DialoGPT language model.
We argue that time is ripe to revitalize existing practices using recent advances of machine learning.

Contribution to the objectives of HumaneAI-net WPs

Video-based dialogue systems meet the goals of the Foundations of Human-AI interactions, whereas the rehabilitation scenario is a prototype for goal-oriented collaboration. The microproject targeted specific topics, including
(i) body motion and pain both
— in terms a language and potential dialogues and
— in more than 400 video samples that included 50 exercises and about 7 errors on the average to be detected alone or in combinations for each motion types
and
(ii) dialogues
— from experts and
— crowdsourcing based dialogue enhancements

Tangible outputs

  • Publication: DeepRehab: Real Time Pose Estimation on the Edge for Knee Injury Rehabilitation – Bruno Carlos Dos Santos Melício, Gábor Baranyi, Zsófia Gaal, Sohil Zidan,and Andras Lőrincz
    https://e-nns.org/icann2021/
  • Publication: Multimodal technologies for machine-assisted physical rehabilitation – Ondrej Dusek, András Simonyi, Dániel Sindely, Levente Juhász, Gábor Baranyi, Tomas Nekvinda, Márton Véges, Kinga Faragó, András Lőrincz
    submitted

Transformers and self-attention (Vaswani et al., 2017), have become the dominant approach for natural language processing (NLP) with systems such as BERT (Devlin et al., 2019) and GPT-3 (Brown et al., 2020) rapidly displacing more established RNN and CNN structures with an architecture composed of stacked encoder-decoder modules using self-attention.

This micro-project will provide tools and data sets for experiments and a first initial demonstration of the potential of transformers for multimodal perception and multimodal interactions. We will define research challenges, benchmark data sets and performance metrics for multimodal perception and interaction tasks such as (1) audio-visual narration of scenes, cooking actions and activities, (2) audio-video recordings of lectures and TV programs (3) audio-visual deictic (pointing) gestures, and (4) perception and evocation of engagement, attention, and emotion.

(full description and bibliography covers 200 words – available on request).

Output

Benchmark data and performance targets for a phased set of research challenges of increasing difficulty.

Tools for experiments to explore use of embeddings, encoder-decoders, self-attention architectures and related problems associated with applying transformers to different modalities.

Concept demonstrations for simple examples of multimodal perception.

Presentations

Project Partners:

  • Institut national de recherche en sciences et technologies du numérique (INRIA), James Crowley
  • Eötvös Loránd University (ELTE), Andras Lorincz
  • Université Grenoble Alpes (UGA), Fabien Ringeval
  • Centre national de la recherche scientifique (CNRS), François Yvon
  • Institut “Jožef Stefan” (JSI), Marko Grobelnik

Primary Contact: James Crowley, INRIA

Main results of micro project:

This micro-project will survey tools and data sets for experiments for demonstrating the potential use of transformers for multimodal perception and multimodal interactions. We will define research challenges and performance metrics for multimodal perception and interaction tasks such as audio-visual narration of scenes, cooking actions and activities, audio-visual deictic (pointing) gestures, and perception and evocation of engagement, attention, and emotion. We will provide tutorials on the use of transformers for multimodal perception and interaction.

Contribution to the objectives of HumaneAI-net WPs

This microproject will aid and encourage the use of a transformers and self attention for multimodal modal interaction by Humane AI Net researchers, by identifying relevant tools and benchmark data sets, by providing tutorials and training materials for education, and by identifying research challenges for multimodal perception and interaction with Transformers.

Tangible outputs

  • Dataset: A survey of tools and data-sets for a multimodal perception with transformers – James Crowley
  • Other: A tutorial on the use of transformers for multimodal perception. – Francois Yvon
  • Other: Research challenges for the use of transformers for multimodal perception and interaction. – James Crowley