We propose to research a scalable human-machine collaboration system with the common goal of executing high quality actions (e.g., in rehabilitation exercise). We combine video and speech for video-grounded goal-oriented dialogue. We build on our video and text database. The database has exercises for rehabilitation following knee injuries. We evaluate high performance body pose estimation tools and compare it to a real-time body pose estimation tool to be developed for smartphones via ‘knowledge distillation’ methods.

The complementing part of the project deals with the texts that we have collected for these exercises and estimates the amount of texts needed for dialogues that can lead and correct the quality of exercises. Potential topics/intents include pose relative to camera, proper light conditions, audio-visual information about pain, notes about execution errors, errors discovered by the computer evaluations, requests about additional information from the patient, and reactions to other, unrelated queries.

Output

Dataset of the dialogues

Publication on the constraints and potentials of existing state-of-the-art methods

Performance evaluation methods and usability studies

Project Partners:

  • Eötvös Loránd University (ELTE), András Lőrincz
  • Charles University Prague, Ondřej Dušek

Primary Contact: András Lőrincz, Eotvos Lorand University