We aim evaluate the usefulness of current dialogue dataset annotation and propose annotation unification and automatized enhancements for better user modeling by training on larger amounts of data. Current datasets’ annotationis often only focused on annotation geared toward the dialog system learning how to answer, while the user representation should be explicit,consistent and as complete as possible for more complex user representation (e.g. cognitively). Theproject will start from existing annotated dialog corpora and produce extended versions, with improved annotation consistency and extra user representation annotations produced automatically from existing corpora like bAbI++ and MultiWOZ and others. We will explore unifying annotations from multiple datasets and evaluate the enhanced annotation using our own end-to-end dialogue models based on memorynetworks . Connection with T3.7 and T3.4 is straighforward since the task-oriented dialogue systems are the very definition of conversational, collaborative AI. T3.6 will be addressed through round-trip translation for data augmentation.

Output

Extented and unified versions of publicly available dialog corpora with explicit user modelingannotations (bAbI++, MultiWOZ etc.)

a report and papers describing a unified user modeling annotation scheme with respect toexisting dialog annotation datasets and the results of some baseline experiments using theannotated data produced by the project.

Project Partners:

  • LIMSI-CNRS, Patrick Paroubek
  • LIMSI-CNRS, P. Paroubek
  • CUNI, O. Dušek

Primary Contact: Patrick Paroubek, LIMSI-CNRS