Involving human knowledge into the learning model can be done in diverse ways, e.g., by learning from expert data in imitation learning or reward engineering in deep reinforcement learning. In many applications, however, the expert data usually covers part of the search space or “normal” behaviors/scenarios. Learning a policy in the autonomous driving application under the limited dataset can make the policy vulnerable to novel or out of distribution (OOD) inputs and, thus, produce overconfident and dangerous actions. In this microproject, we aim to learn a policy based on the expert training data, while allowing the policy to go beyond data by interacting with an environment dynamics model and accounting uncertainty in the state estimation with virtual sensors. To avoid a dramatic shift of distribution, we propose to use the uncertainty of environment dynamics to penalize the policy for states that are different from human behavior.


Paper and/or


Project Partners:

  • DFKI, Christian Müller
  • DFKI, Christian Müller
  • VW Data Lab, Andrii Kleshchonok

Primary Contact: Christian Müller, DFKI