[TMP-044] Evaluating segmentation in automatic captioning systems

Contact person: François Yvon (yvon@isir.upmc.fr)

Internal Partners:

Centre national de la recherche scientifique (CNRS), 2. Francois Yvon
Fondazione Bruno Kessler (FBK), Marco Turchi

Owing to the progress of underlying NLP technologies (speech to text, text normalization and compression, machine translation) automatic captioning technologies (ATC) both intra- and inter-lingual, are rapidly improving. ACTs are useful for many contents and contexts: from talks and lectures to news, fictions and other entertaining content.While historical systems are based on complex NLP pipelines, recent proposals are based on integrated (end-to-end) systems, which questions standard evaluation schemes, where each module can be assessed independently from the others. We focus on evaluating the quality of the output segmentation, where decisions regarding the length, disposition and display duration of the caption need to be taken, all having a direct impact on the acceptability and readability. We will notably study ways to perform reference-free evaluations of automatic caption segmentation. We will also try to correlate these « technology-oriented » metrics with user-oriented evaluations in typical use cases: post-editing and direct broadcasting.

Results Summary

In this MP, we did three main tasks: 1) surveyed existing segmentation metrics, 2) designed a contrastive evaluation set, and 3) implemented and compared the metrics on multiple languages / tasks. We created the EvalSubtitle tool for the community to use our results. This is a tool for reference-based evaluation of subtitle segmentation. The repository contains the Subtitle Segmentation Score (Sigma), specifically tailored for evaluating segmentation from system outputs where the text is not identical to a reference (imperfect texts). EvalSub also contains a collection of standard segmentation metrics (F1, WindowDiff etc.) as well as subtitling evaluation metrics: BLEU on segmented (BLEU_br) and non-segmented text (BLEU_nb), and TER_br. We disseminated and documented our results through a publication.

Tangible Outcomes

Karakanta, Alina, François Buet, Mauro Cettolo, and François Yvon. “Evaluating subtitle segmentation for end-to-end generation systems.” arXiv preprint arXiv:2205.09360 (2022). https://aclanthology.org/2022.lrec-1.328.pdf
EvalSubtitle: tool for reference-based evaluation of subtitle segmentation https://github.com/fyvo/EvalSubtitle

[TMP-044] Evaluating segmentation in automatic captioning systems

Results Summary

Tangible Outcomes

Knowledge 4 All Foundation Ltd.

Humane AI on Social Media