[TMP-031] Coping with the variability of human feedback during interactive learning through ensemble reinforcement learning

Contact person: Mehdi Khamassi (mehdi.khamassi@sorbonne-universite.fr)

Internal Partners:

Sorbonne University, Mehdi Khamassi, mehdi.khamassi@sorbonne-universite.fr
ATHINA Research Center, Petros Maragos, maragos@cs.ntua.gr

This project entails robot online behavioral adaptation during interactive learning with humans. Specifically, the robot shall adapt to each human subject’s specific way of giving feedback during the interaction. Feedback here includes reward, instruction and demonstration, and can be regrouped under the term “teaching signals”. For example, some human subjects prefer a proactive robot while others prefer the robot to wait for their instructions; some only tell the robot when it performs a wrong action, while others reward correct actions, etc. The main outcome is a new ensemble method of human-robot interaction which can learn models of various human feedback strategies and use them for online tuning of reinforcement learning so that the robot can quickly learn an appropriate behavioral policy. We first derive an optimal solution to the problem and then compare the empirical performance of ensemble methods to this optimum through a set of numerical simulations.

Results Summary

We designed a new ensemble learning algorithm, combining model-based and model-free reinforcement learning, for on-the-fly robot adaptation during human-robot interaction. The algorithm includes a mechanism for the robot to autonomously detect changes in a human’s reward function from its observed behavior, and a reset of the ensemble learning accordingly. We simulated a series of human-robot interaction scenarios to test the robustness of the algorithm. In scenario 1, the human rewards the robot with various feedback profiles: stochastic reward; non-monotonic reward; or punishing for error without rewarding correct responses. In scenario 2, the humans teach the robot through demonstrations, again with different degrees of stochasticity and levels of expertise from the human. In scenario 3, we simulated a human-robot cooperation task for putting a set of cubes in the right box. The task includes abrupt changes in the target box. Results show the generality of the algorithm. Humans and robots are doomed to cooperate more and more within the society. This micro-project addresses a major AI challenge to enable robots to adapt on-the-fly to different situations and to different more-or-less naive human users. The solution consists of designing a robot learning algorithm which generalizes to a variety of simple human robot interaction scenarios. Following the HumanE AI vision, interactive learning puts the human in the loop, prompting human-aware robot behavioral adaptation.

Tangible Outcomes

Rémi Dromnelle, Erwan Renaudo, Benoît Girard, Petros Maragos, Mohamed Chetouani, Raja Chatila, Mehdi Khamassi (2022). Reducing computational cost during robot navigation and human-robot interaction with a human-inspired reinforcement learning architecture. International Journal of Social Robotics, doi: 10.1007/s12369-022-00942-6 Preprint made open on HAL – https://hal.sorbonne-universite.fr/hal03829879
Open source code: https://github.com/DromnHell/meta-control-decision-making-agent

[TMP-031] Coping with the variability of human feedback during interactive learning through ensemble reinforcement learning

Results Summary

Tangible Outcomes

Knowledge 4 All Foundation Ltd.

Humane AI on Social Media