The project aims to explore multi-modal interaction concepts for collaborative creation of 3D objects in virtual reality with generative AI assistance.
The use of generative AI in the creation of 3D objects has the potential to greatly reduce the time and effort required for designers and developers, resulting in a more efficient and effective creation of virtual 3D objects. Yet, research still lacks an understanding of suitable interaction modalities and common grounding in this field.
The objective of this research project is to explore and compare interaction modalities that are suited to collaboratively create virtual 3D objects together with a generative AI. To this end, the project aims to investigate how different input modalities, such as voice, touch and gesture recognition, can be used to generate and alter a virtual 3D object and how we can create methods for establishing common ground between the AI and the users.
The project is split into two working packages. (1) We investigate and evaluate the use of multi-modal input modalities to alter the shape and appearance of 3D objects in virtual reality (VR). (2) Based on our insights on promising multi-modal interaction concepts, we then develop a prototypical multi-modal VR interface that allows users to collaborate on the creation of 3D objects with a generative AI. This might include, but is not limited to the AI assistant generating 3D models (e.g. using https://threedle.github.io/text2mesh or Shap-E) or providing suggestions based on the users' queries.
The project will use a combination of experimental and observational methods to evaluate the effectiveness and efficiency of the concepts. This will involve conducting controlled experiments to test the effects of different modalities and AI assistance on the collaborative creation process, as well as observing and analyzing the users’ behavior.
# Expected outcomes
The research project is expected to produce several outcomes, including a software package to prototype multi-modal VR interfaces that enables collaborative creation of 3D objects, insights into the effectiveness and efficiency of different modalities and AI assistance in enhancing the collaborative process, and guidelines for the design of multi-modal interfaces and AI assistance for collaborative creation of 3D objects. The project's outcomes may have potential applications in fields such as architecture, engineering, and entertainment.
# Relation to call
This research project is directly related to the call for proposals as it addresses the challenge of coordination and collaboration between AI and human partners in the context of creating 3D objects. The project involves the use of multi-modal interfaces and AI assistance to enhance the collaborative process, which aligns with the call's focus on speech-based and multimodal interaction with AI. Additionally, the project's investigation of co-adaptive processes in collaborative creation aligns with the call's focus on co-adaptive processes in grounding. The project's outcomes, such as the development of guidelines for the design of multi-modal interfaces and AI assistance for collaborative creation, may also contribute to the broader theme of interactive grounding. Finally, the project's potential applications in architecture, engineering, and entertainment also align with the call's focus on special application areas.
1. VR co-creation software package: The project aims to develop a publicly-available open-source software package to quickly prototype Multi-Modal VR interfaces for co-creating virtual 3D objects. It enables practitioners and VR application developers to more easily create virtual 3D objects without requiring expert knowledge in computer-aided design.
2. Recorded dataset and derived guidelines for the design of multi-modal interfaces with AI assistance: The project aims to publish all recorded datasets and further provides a set of guidelines for the design of efficient and effective multi-modal interfaces for generating and altering 3D objects with an AI assistant
3. We aim for publishing the results of this research as a paper in a leading XR or HCI venue, such as CHI, UIST, or ISMAR.
Københavns Universitet (UCPH), Teresa Hirzle
Ludwig-Maximilians-Universität München (LMU), Florian Müller/Julian Rasch
Saarland University, Martin Schmitz
Teresa Hirzle, Københavns Universitet (UCPH)