Owing to the progress of underlying NLP technologies (speech to text, text normalization and compression, machine translation) automatic captioning technologies (ATC) both intra- and inter-lingual, are rapidly improving. ACTs are useful for many contents and contexts: from talks and lectures to news, fictions and other entertaining content.

While historical systems are based on complex NLP pipelines, recent proposals are based on integrated (end-to-end) systems, which questions standard evaluation schemes, where each module can be assessed independently from the others.

We focus on evaluating the quality of the output segmentation, where decisions regarding the length, disposition and display duration of the caption need to be taken, all having a direct impact on the acceptability and readabilitye. We will notably study ways to perform reference-free evaluations of automatic caption segmentation. We will also try to correlate these « technology-oriented » metrics with user-oriented evaluations in typical use cases: post-editing and direct broadcasting.


Survey of existing segmentation metrics

Design of a contrastive evaluation set

Comparison of metrics on multiple languages / tasks

Project Partners:

  • Centre national de la recherche scientifique (CNRS), Francois Yvon
  • Fondazione Bruno Kessler (FBK), Marco Turchi

Primary Contact: François Yvon, CNRS