Knowledge Extraction Through Prompting on Pre-trained Language Models

The ambition of the micro-project is to investigate the adoption of a multi-turn dialog strategy and the insertion in prompts of appropriate conceptual knowledge (e.g., definitions of the concepts to extract) or different types of examples (including negative examples) especially for the extraction of tasks and temporal flow relations between tasks.

Procedural documents are a source of temporal procedural knowledge of uttermost importance. These documents are different in format and scope, as they range from the description of administrative procedures to service manuals to medical guidelines and surgical procedures. The extraction of this complex and multidimensional knowledge, which includes a strong temporal dimension usually paired with further static dimensions concerning, for example, resources, tools, objects, costs, and so on, would be of the utmost importance for several tasks ranging from information extraction to validation and verification of the procedures themselves, up to the construction of AI-based systems that have to deal with these procedures (think for instance to an expert surgical system and assistant which may be involved in several different surgery procedures).
Knowledge graphs are a natural and expressive knowledge structure where to represent such multidimensional knowledge, and indeed the insertion of temporal knowledge within knowledge graphs is one of the hot challenges in this area. Nonetheless, the automated construction of knowledge graphs from procedural documents is a challenging research area. Here, the lack of annotated data, as well as raw text repositories describing real-world procedural documents, makes it extremely difficult to adopt deep learning approaches.

Pre-trained language models showed promising results concerning the knowledge extraction tasks from the models themselves. Although several works explored this strategy to build knowledge graphs, the viability of knowledge base construction by using a prompt-based learning strategy from such language models has not yet been investigated deeply. In this MP we would like to investigate the usage of prompt-based in-context learning strategy to extract, from natural language process descriptions, conceptual information that can be converted into their equivalent knowledge graphs. In particular, we would like to investigate the adoption of a multi-turn dialog strategy and the insertion of prompts of appropriate conceptual knowledge (e.g., definitions of the concepts to extract) or different types of examples (including negative examples), especially for the extraction of tasks and temporal flow relations between tasks. As such the work can contribute to the construction of structured narratives using machine learning models and hopefully enrich with conceptual knowledge in input. Moreover, the adoption of a multi-turn dialog strategy could provide insight into how these models can be used to complement the multi-turn dialog strategy usually adopted by domain experts in traditional knowledge modeling pipelines.

Output

– At least one paper published in a ranked conference or journal.
– An open dataset containing both raw text and their equivalent structured representation providing a ground for future research activities.

Project Partners

Fondazione Bruno Kessler, Mauro Dragoni
University of Verona, Marco Rospocher

Primary Contact

Mauro Dragoni, Fondazione Bruno Kessler

Knowledge Extraction Through Prompting on Pre-trained Language Models

Knowledge 4 All Foundation Ltd.

Humane AI on Social Media