[TMP-055] Gesture-based Interactive Grounding for Mixed-Reality Human-AI Collaboration

Contact person: Rui Prada (rui.prada@tecnico.ulisboa.pt)

Internal Partners:

Instituto Superior Técnico, Department of Computer Science,
Eötvös Loránd University, Department of Artificial Intelligence

External Partners:

DFKI Lower Saxony, Interactive Machine Learning Lab
Carnegie Mellon University, Robotics Institute

The project addresses research on interactive grounding. It consists of the development of an Augmented Reality (AR) game, using HoloLens, that supports the interaction of a human player with an AI character in a mixed reality setting using gestures as the main communicative act. The game will integrate technology to perceive human gestures and poses. The game will bring about collaborative tasks that need coordination at the level of mutual understanding of the several elements of the required task. Players (human and AI) will have different information about the tasks to advance in the game and need to communicate that information to their partners through gestures. The main grounding challenge will be based on learning the mapping between gestures to the meaning of actions to perform in the game. There will be two levels of gestures toground, some are task-independent while others are task-dependent. In other words, besides the gestures that communicate explicit information about the game task, the players need to agree on the gestures used to coordinate the communication itself; for example, to signal agreement or doubt, to ask for more information, or close the communication. These latter gesture types can be transferred from task to task within the game, and probably to other contexts as well. It will be possible to play the game with two humans and study their gesture communication in order to gather the gestures that emerge: a human-inspired gesture set will be collected and serve the creation of a gesture dictionary in the AI repertoire. The game will provide different tasks of increasing difficulty. The first ones will ask the players to perform gestures or poses as mechanisms to open a door to progress to the next level. But later, in a more advanced version of the game, specific and constrained body poses, interaction with objects, and the need to communicate more abstract concepts (e.g., next to, under, to the right, the biggest one, …) will be introduced. The game will be built as a platform to perform studies. It will support studying diverse questions about the interactive grounding of gestures. For example, we can study the way people adapt to and ascribe meaning to the gestures performed by the AI agent, we can study how different gesture profiles influence the people’s interpretation, facilitate grounding, and have an impact on the performance of the tasks, or we can study different mechanisms on the AI to learn its gesture repertoire from humans (e.g., by imitation grounded on the context).

Results Summary

An AR game, where players face a sequence of codebreaking challenges that require them to press some buttons in a specific sequence, however, only one of the partners has access to the buttons while the other has access to the solution code. The core gameplay is centred on the communication between the two partners (AI and virtual agent), which must be performed only by using gestures. In addition, to the development of the AR game, we developed some sample AI agents that are able to play with a human player. A version using an LLM was also developed to provide some reasoning for gesture recognition and performance by the AI virtual agent.

Players face a sequence of codebreaking challenges that require them to press some buttons in a specific sequence, however, only one of the partners has access to the buttons while the other has access to the solution code. Furthermore, only gesture communication if possible. Therefore, the core gameplay is centred on the communication between the two partners (AI and virtual agent). Gestures supported in the game are split into two distinct subtypes:

Taskwork gestures: Used for conveying information about the game’s tasks and environment (e.g., an object’s colour).
Teamwork gestures: Used for giving feedback regarding communication (e.g., affirming that a gesture was understood).

The gameplay loop implies shared performance coordination and communication.

In the current version, the virtual agent is able to play reactively in response to the player’s gestures based on a gesture knowledge base that assigns meaning and action to each gesture. A version using an LLM was also developed to provide some reasoning for gesture recognition and performance by the AI virtual agent.

Tangible Outcomes

The base game – https://github.com/badomate/EscapeHololens
The extended game – https://github.com/badomate/EscapeMain
A presentation summarizing the project: https://www.youtube.com/watch?v=WmuWaNdIpcQ
A short demo for the system https://youtu.be/j_bAw8e0lNU?si=STi6sbLzbpknckGG

[TMP-055] Gesture-based Interactive Grounding for Mixed-Reality Human-AI Collaboration

Results Summary

Tangible Outcomes

Knowledge 4 All Foundation Ltd.

Humane AI on Social Media