This project entails robot online behavioral adaptation during interactive learning with humans. Specifically, the robot shall adapt to each human subject’s specific way of giving feedback during the interaction. Feedback here includes reward, instruction and demonstration, and can be regrouped under the term “teaching signals”. For example, some human subjects prefer a proactive robot while others prefer the robot to wait for their instructions; some only tell the robot when it performs a wrong action, while others reward correct actions, etc. The main outcome will be a new ensemble method of human-robot interaction which can learn models of various human feedback strategies and use them for online tuning of reinforcement learning so that the robot can quickly learn an appropriate behavioral policy. We will first derive an optimal solution to the problem and then compare the empirical performance of ensemble methods to this optimum through a set of numerical simulations.

Output

Paper in IEEE RO-MAN or ACM/IEEE HRI or ACM CHI

yes

Presentations

Project Partners:

  • Sorbonne Université, Mohamed Chetouani
  • ATHINA, Petros Maragos

Primary Contact: Mehdi Khamassi, Sorbonne University

Main results of micro project:

We designed a new ensemble learning algorithm, combining model-based and model-free reinforcement learning, for on-the-fly robot adaptation during human-robot interaction. The algorithm includes a mechanism for the robot to autonomously detect changes in a human's reward function from its observed behavior, and a reset of the ensemble learning accordingly. We simulated a series of human-robot interaction scenarios to test the robustness of the algorithm. In scenario 1, the human rewards the robot with various feedback profile: stochastic reward; non-monotonic reward; or punishing for error without rewarding correct responses. In scenario 2, the humans teaches the robot through demonstrations, again with different degrees of stochasticity and levels of expertise from the human. In scenario 3, we simulated a human-robot cooperation task for putting a set of cubes in the right box. The task includes abrupt changes in the target box. Results show the generality of the algorithm.

Contribution to the objectives of HumaneAI-net WPs

Humans and robots are doomed to cooperate more and more within the society. This micro-project addresses a major AI challenge to enable robots to adapt on-the-fly to different situations and to different more-or-less naive human users. The solution consists in designing a robot learning algorithm which generalizes to a variety of simple human-robot interaction scenarios. Following the HumanE AI vision, interactive learning puts the human in the loop, prompting human-aware robot behavioral adaptation.
The micro-project directly contributes to one of the objectives of WP1 (T1.3) to enable 'continuous incremental learning in joint human/AI systems' by ‘exploiting rich human feedback’. It also directly contributes to one of the objectives of WP3 (T3.3) to enable reflexivity and adaptation in Human AI collaboration.

Tangible outputs

  • Publication: Journal paper in preparation – Rémi Dromnelle, Erwan Renaudo, Benoît Girard, Petros Maragos, Mohamed Chetouani, Raja Chatila, Mehdi Khamassi
    in preparation
  • Program/code: Open source code to be uploaded on github – Rémi Dromnelle
    in preparation
  • Publication: Preprint to be made open on HAL – Rémi Dromnelle, Erwan Renaudo, Benoît Girard, Petros Maragos, Mohamed Chetouani, Raja Chatila, Mehdi Khamassi
    in preparation

Results Description

This 6-month project entails robot online behavioral adaptation during interactive learning with humans. Specifically, the robot shall adapt to each human subject’s specific way of giving feedback during the interaction. Feedback here includes reward, instruction and demonstration, and can be regrouped under the term “teaching signals”. For example, some human subjects prefer a proactive robot while others prefer the robot to wait for their instructions; some only tell the robot when it performs a wrong action, while others reward correct actions, etc. The main expected outcome was a new ensemble method of human-robot interaction which can learn models of various human feedback strategies and use them for online tuning of reinforcement learning so that the robot can quickly learn an appropriate behavioral policy.

We designed a new ensemble learning algorithm, combining model-based and model-free reinforcement learning, for on-the-fly robot adaptation during human-robot interaction. The algorithm includes a mechanism for the robot to autonomously detect changes in a human's reward function from its observed behavior, and a reset of the ensemble learning accordingly. We simulated a series of human-robot interaction scenarios to test the robustness of the algorithm. In scenario 1, the human rewards the robot with various feedback profile: stochastic reward; non-monotonic reward; or punishing for error without rewarding correct responses. In scenario 2, the humans teaches the robot through demonstrations, again with different degrees of stochasticity and levels of expertise from the human. In scenario 3, we simulated a human-robot cooperation task for putting a set of cubes in the right box. The task includes abrupt changes in the target box. Results show the generality of the algorithm.

Humans and robots are doomed to cooperate more and more within the society. This micro-project addresses a major AI challenge to enable robots to adapt on-the-fly to different situations and to different more-or-less naive human users. The solution consists in designing a robot learning algorithm which generalizes to a variety of simple human-robot interaction scenarios. Following the HumanE AI vision, interactive learning puts the human in the loop, prompting human-aware robot behavioral adaptation.
The micro-project directly contributes to one of the objectives of WP1 (T1.3) to enable "continuous incremental learning in joint human/AI systems" by ‘exploiting rich human feedback’. It also directly contributes to one of the objectives of WP3 (T3.3) to enable reflexivity and adaptation in Human AI collaboration.

Publications

Rémi Dromnelle, Erwan Renaudo, Benoît Girard, Petros Maragos, Mohamed Chetouani, Raja Chatila, Mehdi Khamassi (2022). Reducing computational cost during robot navigation and human-robot interaction with a human-inspired reinforcement learning architecture. International Journal of Social Robotics, doi: 10.1007/s12369-022-00942-6, https://link.springer.com/article/10.1007/s12369-022-00942-6.

Links to Tangible results

Open source code: https://github.com/DromnHell/meta-control-decision-making-agent

Publication: Preprint made open on HAL – https://hal.sorbonne-universite.fr/hal-03829879