This project targets the design of grounded dialogue models from observation of human-to-human conversation, typically from a set of recordings. It will bring trustable conversation models as well as a tool for the analysis of dialogue behavior.

This microproject aims to design grounded dialogue models based on observation of human-to-human dialogue examples, i.e., distilling dialogue patterns automatically and aligning them to external knowledge bases. The current state-of-the-art conversation models based on finetuned large language models are not grounded and mimic their training data, or their grounding is external and needs to be hand-designed. Furthermore, most commercially deployed dialogue models are entirely handcrafted.
Our goal is to produce grounding for these models (semi-)automatically, using dialogue context embedded in vector spaces via large language models trained specifically on conversational data. If we represent dialogue states as vectors, the whole conversation can be seen as a trajectory in the vector space. By merging, pruning, and modeling the trajectories, we can get dialog skeleton models in the form of finite-state graphs or similar structures. These models could be used for data exploration and analysis, content visualization, topic detection, or clustering. This can bring faster and cheaper design of fully trustable conversation models. The approach will serve both to provide external model grounding and to analyze the progress in human-to-human dialogues, including negotiation around the participants’ common ground.
The microproject will investigate the optimal format of the dialogue context embeddings (such as temporal resolution) as well as the optimal ways of merging dialogue trajectories and distilling models. Here, Variational Recurrent Neural Networks with discrete embeddings (Shi et al., NAACL 2019) are a promising architecture, but alternatives will also be considered.
We plan to experiment with both textual and voice-based dialogues. We will use the MultiWOZ corpus (Budzianowski et al., EMNLP 2018) as well as the DIASER extension developed in a Humane AI microproject by CUNI+LIMSI (Hudecek et al., LREC 2022) for text-based experiments. For voice-based experiments, we will use MultiWOZ spoken data released for the DSTC11 Challenge and dialogue data currently developed in a Humane AI microproject by BUT+CUNI.
The work will be done as a part of the JSALT workshop hosted by the University of Le Mans, France, and co-organized by Johns Hopkins University (JHU) and the Brno University of Technology.

The JSALT workshop topic leader is Petr Schwarz from BUT (MP partner). The topic passed a scientific review by about 40 researchers in Baltimore, USA, in December 2022 and was selected among four workshop topics.

Workshop Topic Proposal:
Workshop Topic Presentation:

Workshop Team:

The workshop and attendants will be supported by several sources – JHU sponsors, European Esperanto project, private companies, and the HumanE AI project. Ondrej Dusek from CUNI is responsible for the HumanE AI participants (as MP PI). The aim is to cover mainly travel, accommodation, per diem to move participants to Le Mans, and some preparation.

A joint place for four workshop topics having teams with world-top researchers and initial summer school gives participants an excellent opportunity for networking and personal growth, with high visibility and high impact of the work results. We expect that this effort can start new long-term collaborations among the participating institutions.


– Software – code for dialogue embeddings & trajectory merging
– Trained embedding models
– Paper describing the dialogue skeleton models

Project Partners

  • Brno U, Petr Schwarz
  • CUNI, Ondrej Dusek
  • Eötvös Loránd University (ELTE), Andras Lorincz
  • IDIAP, Petr Motlicek

Primary Contact

Ondrej Dusek, Brno U