Contact person: Catholijn Jonker, Maria Tsfasman (c.r.m.m.oertel@tudelft.nl; m.tsfasman@tudelft.nl)

Internal Partners:

  1. Technical University Delft, Catharine Oertel, c.r.m.m.oertel@tudelft.nl
  2. Eotvos Lorand University, Andras Lorincz, lorincz@inf.elte.hu

 

In this micro-project, we propose investigating human recollection of team meetings and how conversational AI could use this information to create better team cohesion in virtual settings. Specifically, we would like to investigate how a person’s emotion, personality, relationship to fellow teammates, goal and position in the meeting influences how they remember the meeting. We want to use this information to create memory aware conversational AI that could leverage such data to increase team cohesion in future meetings. To achieve this goal, we first record a multi-modal dataset of team meetings in a virtual setting. Second, administrate questionnaires to participants in different time intervals succeeding a session. Third, annotate the corpus. Fourth, carry out an initial corpus analysis to inform the design of memory-aware conversational AI. This micro-project will contribute to a longer-term effort in building a computational memory model for human-agent interaction.

Results Summary

The MEMO corpus was collected, which contains 45 group discussions around the topic of COVID-19. A total of 15 groups were formed, consisting of 3 to 6 participants who took part in 3 group discussions, with a 3-4 day gap between sessions. A total of 59 individuals with diverse backgrounds took part in the study. Before and after each session participants completed a series of questionnaires to determine which moments they recalled from their conversations, along with their personality traits, values and perceptions.

To capture conversational memory, we collected first-party free-recall reports of the most memorable moments from the discussion immediately after the interaction and again 3-4 days later. For the shorter-term memories, participants also mapped the moments to a particular interval in the video of their discussion, which were used for the ground-truth conversational memory annotations.

For each participant, personality and value profiles were recorded in the pre-screening survey, along with demographic information to identify their social group affected by COVID-19. Pre-session questionnaires also assessed participants’ mood before each session. Post-session questionnaire included questions about mutual understanding, personal attitude and perceived social distance. The perception of the discussion and the group as a whole was also monitored in the post-session questionnaire with variables such as Task and Group Cohesion, Entitativity, Perceived Interdependence, Perceived Situation Characteristics, Syncness, and Rapport.

The following automatic annotations were extracted on the corpus:

* Transcripts – Transcripts were generated with automatic speech recognition methods and were manually reviewed and corrected where needed. Transcript timestamps are available at the utterance level as well as word-level text grid files for each recording. Speaker diarization is also available.

* Eye gaze and head pose – automatically annotated with EyeWare software, the annotation itself will be provided, but the code uses proprietary API. This includes gaze targets collected through screenshots of participants’ screen views.

* Prosody – eGeMAPS feature set was extracted from the default eGeMAPS configuration in OpenSmile

* Body pose – Body pose (upper body only) and hand pose when visible were estimated with the models available in the MediaPipe software

* Facial action units – Facial action units were estimated for participants using the OpenFace Software

A Paper describing the corpus and the annotations in more detail is in preparation. Additionally, the collected annotations are to be packaged in an appropriate manner for ease of use for future researchers.

Tangible Outcomes

  1. Tsfasman, M., Fenech, K., Tarvirdians, M., Lorincz, A., Jonker, C., & Oertel, C. (2022). Towards creating a conversational memory for long-term meeting support: predicting memorable moments in multi-party conversations through eye-gaze. In ICMI 2022 – Proceedings of the 2022 International Conference on Multimodal Interaction (pp. 94-104). (ACM International Conference Proceeding Series). Association for Computing Machinery (ACM). https://doi.org/10.1145/3536221.3556613 
  2. summary