We propose to research a scalable human-machine collaboration system with the common goal of executing high quality actions (e.g., in rehabilitation exercise). We combine video and speech for video-grounded goal-oriented dialogue. We build on our video and text database. The database has exercises for rehabilitation following knee injuries. We evaluate high performance body pose estimation tools and compare it to a real-time body pose estimation tool to be developed for smartphones via ‘knowledge distillation’ methods.

The complementing part of the project deals with the texts that we have collected for these exercises and estimates the amount of texts needed for dialogues that can lead and correct the quality of exercises. Potential topics/intents include pose relative to camera, proper light conditions, audio-visual information about pain, notes about execution errors, errors discovered by the computer evaluations, requests about additional information from the patient, and reactions to other, unrelated queries.


Dataset of the dialogues

Publication on the constraints and potentials of existing state-of-the-art methods

Performance evaluation methods and usability studies


Project Partners:

  • Eötvös Loránd University (ELTE), András Lőrincz
  • Charles University Prague, Ondřej Dušek


Primary Contact: András Lőrincz, Eotvos Lorand University

Main results of micro project:

Machine assisted physical rehabilitation is of special interest since (a) it is a relatively narrow field, but (b) observation and interactions are multimodal and include natural language processing, video processing, speech recognition and generation and (c) it is a critical medical application. We considered rehabilitation after total knee replacement as a prototype scenario. We used 2D and RGBD cameras and dialogue systems at three levels, such as
(i) video-based feedback aiming both documentation and helping performance improvements,
(ii) additional rule-based dialogue with specific error detection, and
(iii) extensions with a data-driven dialogue system based on the DialoGPT language model.
We argue that time is ripe to revitalize existing practices using recent advances of machine learning.

Contribution to the objectives of HumaneAI-net WPs

Video-based dialogue systems meet the goals of the Foundations of Human-AI interactions, whereas the rehabilitation scenario is a prototype for goal-oriented collaboration. The microproject targeted specific topics, including
(i) body motion and pain both
— in terms a language and potential dialogues and
— in more than 400 video samples that included 50 exercises and about 7 errors on the average to be detected alone or in combinations for each motion types
(ii) dialogues
— from experts and
— crowdsourcing based dialogue enhancements

Tangible outputs

  • Publication: DeepRehab: Real Time Pose Estimation on the Edge for Knee Injury Rehabilitation – Bruno Carlos Dos Santos Melício, Gábor Baranyi, Zsófia Gaal, Sohil Zidan,and Andras Lőrincz
  • Publication: Multimodal technologies for machine-assisted physical rehabilitation – Ondrej Dusek, András Simonyi, Dániel Sindely, Levente Juhász, Gábor Baranyi, Tomas Nekvinda, Márton Véges, Kinga Faragó, András Lőrincz