Contact person: András Lőrincz (lorincz@inf.elte.hu)
Internal Partners:
- Eötvös Loránd University (ELTE), András Lőrincz and Daniel Sindley
- Charles University Prague, Ondřej Dušek and Tomáš Nekvinda
We propose research on a scalable human-machine collaboration system with the goal of executing high quality actions for rehabilitation exercises. We combine video and speech for video-grounded goal-oriented dialogue. We build on our video and text database. The database has exercises for rehabilitation following knee injuries. We evaluate high performance body pose estimation tools and compare it to a real-time body pose estimation tool to be developed for smartphones via ‘knowledge distillation’ methods. The complementing part of the project deals with the texts that we have collected for these exercises and estimates the amount of texts needed for dialogues that can lead and correct the quality of exercises. Potential topics/intents include pose relative to camera, proper light conditions, audio-visual information about pain, notes about execution errors, errors discovered by the computer evaluations, requests about additional information from the patient, and reactions to other, unrelated queries.
Results Summary
Human-machine collaboration will soon be ubiquitous, as machines can help in everyday life. However, spatial tasks are challenging because of real-time constraints. We want to optimize the interaction offline before it happens in real time to ensure high quality. We present the SPAtial TAsk (SPATA) framework. SPATA is modular, and here we address two connected components; body pose optimization and navigation. Our experiments show that 3D pose estimation using 2D cameras is accurate when the motion is captured from the right direction and distance. This limitation currently restricts us to simple forms of movement, such as those used in physical rehabilitation exercises. Accurate estimation requires (a) estimation of body size, (b) optimization of body and camera position, (c) navigation assistance to a location, and (d) activity capture and error estimation. An avatar model is used to estimate the shape and a skeleton model is used to estimate the body pose for (a). For (b), we use SLAM. For (c), we use a semantic map and optimize a minimal NLP system for human needs that we test. Finally, we estimate the accuracy of the motion and propose a visual comparison between the planned and the implemented motion pattern for (d). Our SPATA framework is useful for various tasks at home, in gyms and other spatial applications. Depending on the task, different components can be integrated. The MP targeted specific topics, including
(i) body motion and pain both
— in terms a language and potential dialogues and
— in more than 400 video samples that included 50 exercises and about 7 errors on the average to be detected alone or in combinations for each motion types
and
(ii) dialogues
— from experts and
— crowdsourcing based dialogue enhancements
Tangible Outcomes
- DeepRehab: Real Time Pose Estimation on the Edge for Knee Injury Rehabilitation – Bruno Carlos Dos Santos Melício, Gábor Baranyi, Zsófia Gaal, Sohil Zidan,and Andras Lőrincz. https://e-nns.org/icann2021/
- Video presentation summarizing the project