Contact person: Szymon Talaga, (stalaga@uw.edu.pl)
Internal Partners:
- Univ. Warsaw, Szymon Talaga, stalaga@uw.edu.pl
- Institut Polytechnique de Grenoble, James Crowley, james.crowley@univ-grenoblealpes.fr
This Micro-Project has laid the groundwork for developing a new approach to narrative
analysis providing a gray-box (at least partially explainable) NLP model tailored for facilitating work of qualitative text/narrative analysts. The above goal fits into a broader HumanE-AI objective of developing common ground concepts providing better representations shared by humans and machines alike. In particular, the contribution of the project to work on aligning machine analyses with human perspective through the notion of narratives is twofold. Firstly, narrative-oriented tools for automated text analyses can empower human analysts as, arguably, the narrative framework provides a more natural and meaningful context for people without formal training in linguistics and/or computer science for reasoning about textual data. Secondly, the development of the software for narrative analysis is naturally intertwined with conceptual work on the core terms and building blocks of narratives, which can inform subsequent work on more advanced approaches. We conducted a proof-of-concept study combining existing standard NLP methods (e.g. topic modeling, entity recognition) with qualitative analysis of narratives about smart cities and related technologies and use this experience to conceptualize our approach to narrative analysis, in particular with respect to problems which are not easily solved with the existing tools.
Results Summary
The aim of the project was to develop a software package (for Python) providing easy to use and understand (also for researchers not trained in computer science or linguistics) tools for extracting narrative information (active and passive actors, the actions they perform as well as descriptions of both actors and actions, which together define events) and organizing them in rich hierarchical data structures (data model is implicitly graphical) from which subsequently different sorts of descriptive statistics can be generated depending on particular research questions. Crucially, for this to be practically possible, a legible and efficient framework for querying the produced data is needed.
Importantly, the software is developed as a graybox model, in which core low-level NLP tasks, such as POS and dependency tagging, are performed by a blackbox statistical model, and then they are transformed to higher order grammar and narrative data based on a set of transparent deterministic rules. This is to ensure high explainability of the approach, which is crucial for systems in which the machine part is supposed to be a helper of a human analyst instead of an implicit leader.
Currently, the core modules of the package responsible for the grammatical analysis are mostly ready (but several improvements are still planned). This includes also a coreference resolution module. Moreover, the core part of the semantic module, which translates grammatical information to more semantic constructs focused on actors, actions and descriptions, is also ready. What is still missing are an interface exposing methods for end users allowing easy access and analysis of rich data produced by the package as well as a principled and convenient query framework on which the interface should be based.
This is the main focus of the ongoing and future work. The second missing part is the documentation, but this part is best finished after the interface is ready. Thus, even though the package in the current state can seem a little rough from the perspective of an end user, its quality and usefulness will increase steadily as new updates are delivered.
Tangible Outcomes
- python package providing grey box NLP model to assist qualitative analysts https://github.com/sztal/segram