Contact person: Christian Müller (cmueller@dfki.de)

Internal Partners:

  1. German Research Centre for Artificial Intelligence (DFKI), Christian Müller
  2. Volkswagen AG, Andrii Kleshchonok

 

Involving human knowledge into the learning model can be done in diverse ways, e.g., by learning from expert data in imitation learning or reward engineering in deep reinforcement learning. In many applications, however, the expert data usually covers part of the search space or “normal” behaviors/scenarios. Learning a policy in the autonomous driving application under the limited dataset can make the policy vulnerable to novel or out of distribution (OOD) inputs and, thus, produce overconfident and dangerous actions. In this microproject, we aim to learn a policy based on the expert training data, while allowing the policy to go beyond data by interacting with an environment dynamics model and accounting uncertainty in the state estimation with virtual sensors. To avoid a dramatic shift of distribution, we propose to use the uncertainty of environment dynamics to penalize the policy for states that are different from human behavior.

Results Summary

Our key idea in this project is to learn a representation with deep models in a way to incorporate rules (e.g., physics equations governing dynamics of the autonomous vehicle) or distributions that can be simply defined by humans in advance. The learned representations from the source domain (the domain whose samples are based on the defined equations/distributions) are then transferred to the target domain with different distributions/rules and the model adapts itself by including target-specific features that can best explain variations of target samples w.r.t. underlying source rules/distributions. In this way, human knowledge is considered implicitly in the feature space.

We aim to develop a robust and generalized model that can perform well on out-of-distribution or novel data/domains/environments. The key idea is to learn the fundamental feature distribution shared between both source (training) and the target domain (test) while learning and including target-specific features that account for different variations of the shared distribution on the target domain.