We aim evaluate the usefulness of current dialogue dataset annotation and propose annotation unification and automatized enhancements for better user modeling by training on larger amounts of data. Current datasets’ annotationis often only focused on annotation geared toward the dialog system learning how to answer, while the user representation should be explicit,consistent and as complete as possible for more complex user representation (e.g. cognitively). Theproject will start from existing annotated dialog corpora and produce extended versions, with improved annotation consistency and extra user representation annotations produced automatically from existing corpora like bAbI++ and MultiWOZ and others. We will explore unifying annotations from multiple datasets and evaluate the enhanced annotation using our own end-to-end dialogue models based on memorynetworks . Connection with T3.7 and T3.4 is straighforward since the task-oriented dialogue systems are the very definition of conversational, collaborative AI. T3.6 will be addressed through round-trip translation for data augmentation.
Output
Extented and unified versions of publicly available dialog corpora with explicit user modelingannotations (bAbI++, MultiWOZ etc.)
a report and papers describing a unified user modeling annotation scheme with respect toexisting dialog annotation datasets and the results of some baseline experiments using theannotated data produced by the project.
Presentations
Project Partners:
- Centre national de la recherche scientifique (CNRS), P. Paroubek
- Charles University Prague, O. Dušek
Primary Contact: Patrick Paroubek, LIMSI-CNRS
Main results of micro project:
A corpus of 37,173 annotated dialogues with unified and enhanced annotations built from existing open dialogue resources.
Code & trained models (GPT-2, MarCo) for dialogue response generation on the above corpus.
One paper accepted at the TALN2021 conference: Léon-Paul Schaub, Vojtech Hudecek, Daniel Stancl, Ondrej Dusek, Patrick Paroubek, "Defining And Detecting Inconsistent System Behavior in Task-oriented Dialogues",
https://hal.archives-ouvertes.fr/TALN-RECITAL2021/hal-03265892
One paper to be submitted to the "Dialogue and Discourse" journal.
Ongoing collaboration between LISN (Paris-Saclay University) and Fac. of Mathematics and Physics (Charles University, Pragues).
Contribution to the objectives of HumaneAI-net WPs
By providing an open annotated dialogue resource with unified and enhanced annotations, DIASER offers to the community linguistic material usable both for machine learning experiments and for testing dialog model properties in relation with dialog history management, dialog consistency checking and user modeling aspects.
The result of DIASER is related to issues pertaining to the following tasks: mainly T3.6 Language Based and Multilingual Interaction
with potential links to T3.7 Conversational, Collaborative AI,
T3.4 User Models and Interaction History, T3.2 Human AI Interaction / Collaboration Paradigms, T3.3 Reflexivity and Adaptation in Human AI collaborations.
Tangible outputs
- Dataset: DIASER corpus – Ondrej Dusek
https://gitlab.com/ufal/dsg/diaser