Developing user-friendly software for narrative analysis of text data.
In this project we continue the development of the Segram package for Python. The purpose of the package is to provide tools for automated narrative analysis of text data focused on extracting information on basic building blocks of narratives – agents (both active and passive), actions, events, or relations between agents and actions (e.g. determining subjects and objects of actions), as well as descriptions of actors, actions and events. The development process is also naturally paired with conceptual work on representations of narratives.
The package is designed as a graybox model. It is based on an opaque statistical language model providing linguistic annotations, which are subsequently used by transparent deterministic algorithms for discovering narrative elements. Thus, the final output should be easy to interpret and validate by human users, whenever necessary. Moreover, by lifting the analysis from the purely linguistic level to the arguably more intuitive level of narratives, it is hoped that the provided tools will be significantly easier to use and understand for end users, including those without training in linguistics and/or computer science.
The proposed framework is aimed at language understanding and information extraction, as opposed to language generation. Namely, the role of the package is to organize narrative information in convenient data structures allowing effective querying and deriving of various statistical descriptions. Crucially, thanks to its semi-transparent nature, the produced output should be easy to validate for human users. This should facilitate development of shared representations (corresponding the WP1 and WP2 motivated goal: „Establishing Common Ground for Collaboration with AI Systems”) of narratives, understandable for both humans and machines, that are the same time trustworthy (by being easy to validate for humans), which is arguably a desirable feature, for instance in comparison to increasingly powerful but hard-to-trust large language models. In particular, the package should be useful for facilitating and informing human-driven analyses of text data.
Alpha version of the package implementing core functionalities related to grammatical and narrative analysis is ready. The goal of the present microproject is to improve the package and release a beta version. This will include implementing an easy-to-use interface (operating at the level of narrative concepts) for end users allowing effective querying and analysis of the data produced by Segram as well as developing a comprehensive documentation. Thus, the planned release should be ready for broader adoption to a wide array of use cases and users with different levels of linguistic/computational expertise.
1. Segram package for Python published officialy at Python Package Index (PyPI, https://pypi.org/). It may be also published at Conda-forge, but it is not yet guaranteed at this stage.
2. Comprehensive package documentation available online at the Read the Docs platform (https://readthedocs.org/).
- University of Warsaw, Andrzej Nowak
Szymon Talaga, University of Warsaw