[TMP-069] Learning from Richer Feedback Through the Integration of Prior Beliefs

Contact person: Silvia Tulli (tulli@isir.upmc.fr)

Internal Partners:

ISIR, Sorbonne University, Silvia Tulli

Interactive Machine Learning (IML) has gained significant attention in recent years as a means for intelligent agents to learn from human feedback, demonstration, or instruction. However, many existing IML solutions primarily rely on sparse feedback, placing an unreasonable burden on the expert involved. This project aims to address this limitation by enabling the learner to leverage richer feedback from the expert, thereby accelerating the learning process. Additionally, we seek to incorporate a model of the expert to select more informative queries, further reducing the burden placed on the expert.

We have three objetives:

(1) Explore and develop methods for incorporating causal and contrastive feedback, as supported by evidence from psychology literature, into the learning process of IML.

(2) Design and implement a belief-based system that allows the learner to explicitly maintain beliefs about the possible expert objectives, influencing the selection of queries.

(3) Utilize the received feedback to generate a posterior that informs subsequent queries and enhances the learning process within the framework of Inverse ReinforcementLearning (IRL).

The project addresses several key aspects highlighted in the work package on Collaboration with AI Systems (W1-2). Firstly, it focuses on AI systems that can communicate and understand descriptions of situations, goals, intentions, or operational plans to establish shared understanding for collaboration. By explicitly maintaining beliefs about the expert’s objectives and integrating causal and contrastive feedback, the system aims to establish a common ground and improve collaboration. Furthermore, the project aligns with the objective of systems that can explain their internal models by providing additional information to justify statements and answer questions. By utilizing the received feedback to generate a posterior and enhance the learning process, the system aims to provide explanations, verify facts, and answer questions, contributing to a deeper understanding and shared representation between the AI system and the human expert. The project also demonstrates the ambition of enabling two-way interaction between AI systems and humans, constructing shared representations, and allowing for the adaptation of representations in response to information exchange. By providing tangible results, such as user-study evaluations and methods to exploit prior knowledge about the expert, the project aims to make measurable progress toward collaborative AI.

Results Summary

This project resulted in an exchange period during which our collaborator came to our lab and spent a month with us. This opportunity allowed us to conceptualize and write a paper that we plan to submit to the IJCAI conference in December 2024. The paper addresses the challenge of learning from individuals who have a different model of the task. Specifically, we focused on identifying human bottleneck states, determining the maximal achievable set of these states given the robot’s model of the task, and querying for the bottlenecks when they cannot be achieved due to the constraints of the robot model.

In addition, we have also begun working on a survey paper regarding human modeling in sequential decision-making, which has led to a workshop paper that we are currently extending for journal publication.

Tangible Outcomes

[arxiv] Human-Modeling in Sequential Decision-Making: An Analysis through the Lens of Human-Aware AI by Silvia Tulli, Stylianos Loukas Vasileiou, Sarath Sreedharan https://arxiv.org/abs/2405.07773

[TMP-069] Learning from Richer Feedback Through the Integration of Prior Beliefs

Results Summary

Tangible Outcomes

Knowledge 4 All Foundation Ltd.

Humane AI on Social Media