Contact person: Szymon Talaga (stalaga@uw.edu.pl

Internal Partners:

  1. Univ. Warsaw, Szymon Talaga, stalaga@uw.edu.pl

 

This project builds upon another finished microproject. In this project, we continue the development of the Segram package for Python. The purpose of the package is to provide tools for automated narrative analysis of text data focused on extracting information on basic building blocks of narratives – agents (both active and passive), actions, events, or relations between agents and actions (e.g. determining subjects and objects of actions), as well as descriptions of actors, actions and events. The development process is also naturally paired with conceptual work on representations of narratives.

The package is designed as a graybox model. It is based on an opaque statistical language model providing linguistic annotations, which are subsequently used by transparent deterministic algorithms for discovering narrative elements. Thus, the final output should be easy to interpret and validate by human users, whenever necessary. Moreover, by lifting the analysis from the purely linguistic level to the arguably more intuitive level of narratives, it is hoped that the provided tools will be significantly easier to use and understand for end users, including those without training in linguistics and/or computer science.

The proposed framework is aimed at language understanding and information extraction, as opposed to language generation. Namely, the role of the package is to organize narrative information in convenient data structures allowing effective querying and deriving of various statistical descriptions. Crucially, thanks to its semi-transparent nature, the produced output should be easy to validate for human users. This should facilitate development of shared representations (corresponding the WP1 and WP2 motivated goal: „Establishing Common Ground for Collaboration with AI Systems”) of narratives, understandable for both humans and machines, that are the same time trustworthy (by being easy to validate for humans), which is arguably a desirable feature, for instance in comparison to increasingly powerful but hard-to-trust large language models. In particular, the package should be useful for facilitating and informing human-driven analyses of text data.

Alpha version of the package implementing core functionalities related to grammatical and narrative analysis is ready. The goal of the present microproject was to improve the package and release a beta version. This includes implementing an easy-to-use interface (operating at the level of narrative concepts) for end users allowing effective querying and analysis of the data produced by Segram as well as developing a comprehensive documentation. Thus, the planned release should be ready for broader adoption to a wide array of use cases and users with different levels of linguistic/computational expertise.

Results Summary

The project delivered a software Python package for narrative analysis as per the project description. The package is distributed through Python Package Index (PyPI) under a permissive open-source license (MIT) and therefore is easily accessible and free-to-use. Moreover, it comes with a detailed documentation page facilitating adoption by third-parties. It is worth noting that the advent of latest-generation large language models (LLMs) has partially limited the relevance of the project results.

Tangible Outcomes

  1. Package page at Python Package Index: https://pypi.org/project/segram/ 
  2. tutorial page documenting how to use the package https://segram.readthedocs.io/en/latest/