Determining the trustworthiness of public information on ethical, legal, and societal issues caused by the migration of Ukrainians to European countries through AI-driven analysis of social media content

Currently, almost all government, commercial, and non-profit organizations actively use social media for the dissemination of information and communication with citizens. Social media should serve to enhance citizen engagement and trust, contribute to the improvement of government institutions' transparency, and guarantee freedom of speech and expression. However, the government needs to be aware of and mitigate the risks associated with the use of social media. One of the most significant is the risk of spreading misinformation that has become increasingly prevalent and easily accessible to the global social media audience.
In general, the government and public officials' social media accounts are trustworthy and aim to disseminate high-quality and timely information to the public ensuring its reliability, integrity, accessibility, and validity. However, non-compliance with the rules of effective and trusted two-way communication on public officials' accounts, untimely updating of the communication channel, and incomplete responses to user comments could lead to a tendency of citizens to search (or check) the information in other social media sources. Such information sources include traditional news outlets, professional or casual journalists, or ordinary users. Wherein the risk of misinformation being disseminated could undermine citizens' trust in the government, as well as threaten the security and privacy of both official government data and personal data. Moreover, the sharing of inaccurate and misleading information could lead to significant social consequences, such as the exacerbation of social inequalities, the creation of social divisions between different social groups, and the manipulation of their opinions.
In our microproject, we strive to develop an actionable AI-based approach to objectively assess information trustworthiness on social media based on the combination of AI algorithms such as unsupervised machine learning, text analytics, and event argument extraction. We plan to apply our approach to the analysis of textual information in Polish, Ukrainian, and English thematically related to the main ethical, legal, and societal issues caused by the migration of Ukrainians to European countries as a result of the ongoing Russian invasion. The choice of migration crisis domain to assess the reliability of social media Information is due to the following reasons. First, as recent research demonstrates, migration issues are among the most hotly debated on social media and can be especially subjected to attempts of various kinds of disinformation. Second, the migration problem includes a lot of associated issues such as Ethical Issues (e.g., vulnerable to exploitation by employers, low wages, work in unsafe conditions, discrimination, and marginalization); Legal Issues (e.g., immigration laws and policies, visa regulations and border controls, limited access to justice); Social security (e.g., social protection, mental support, integration, language barriers, and cultural differences, social isolation); and the problems of reporting to the public the appropriate policies to support Ukrainian migrants in their countries of destination and to address the root causes of migration in Ukraine itself. The official and unofficial information on social media that covers all these issues are supposed to be considered in the microproject.
Therefore, the development of an actionable AI-based approach for determining information trustworthiness (i) will serve to expand understanding of the core information needs of citizens in communication with the government in the context of migration issues in the last year; and potential causes and nature of that information untrustworthiness in social media; and (ii) can support the government to develop relevant guidelines to oversee social media, and instruments to assess, analyze and monitor implementation and compliance with ELS principles in social media.


The main tangible results of the mini project at each task of its implementation are as follows:
Task 1. Corpus building.
For building the Corpus, it is planned to use the following assumption: information is annotated as Trustworthy (TIC) if published on a government digital platform, news channels operated by governments, or on social media accounts (e.g., Twitter) of government officials. The trustworthiness of the rest of the government-related information disseminated on social media is Uncertain (UIC) and needs additional verification. A special list of keywords relevant to ELS issues caused by the Ukrainian migration context will be developed.
The main tangible result:
(1) A dataset of social media content related to Ukrainian migration, annotated with Ukrainian trustworthiness labels (TIC or UIC).

Task 2. Argument event extraction and texts classification
Tackling the text classification problem to automatically detect whether certain textual information is trustworthy or not, is based on a supervised ML classification that applies extended semantic features (participants, arguments, and arguments' roles) for the classification model. For extraction of these semantic features, we are going to use the NLP pipeline and the Open-domain Information Extraction approach enriched by the semantic knowledge.
Expected tangible results:
(2) An AI-driven algorithm that can analyze social media content to determine its trustworthiness.
(3) The open linguistic resources, such as specialized dictionaries and events patterns, especially for Low-Resource Polish and Ukrainian Languages and for ELS migration-related issues domains, can be used in the follow-up studies.

Task 3. Trustworthiness of public information detection
We attempt to identify the degree and nature of the untrustworthiness of core topics in the context of the Ukrainian migration. The method of detection of public information Trustworthiness is based on the Enriched Event semantic annotation approach and classification rules for the Disinformation class when the facts presented in posts/comments are distorted); and the Omission class, when critical information is excluded from posts/comments disseminated in social media.
Expected tangible result:
(4) A dataset of social media content related to Ukrainian migration, annotated with trustworthiness scores.
(5) An AI-driven algorithm that identifies the degree and nature of social media information of untrustworthiness.

Task 4. Exploratory Text Analysis.
At this, we aim to identify the main topics discussed in social media in the context of the Ukrainian migration and their sentiment (separately for Trustworthy and Untrustworthy Corpus parts). Based on the degree of discussion activity of each topic (topic proportion), and the topics' negative effect (sentiment score), we intend to provide a ranking of the topic's importance.
Expected tangible result:
(6) Ranking the importance of core topics discussed on social media by three categories – ELS issues regarding the Ukrainian migration. These insights can be used by the government for policymaking toward Ukrainian migrants’ inclusion.
(7) An analytical framework for analysis of the degree and nature of trustworthiness of social media content, which can be used in future studies or applications related to assessment, analysis, and monitoring of the implementation and compliance with ELS principles in social media space

Project Partners

  • Umea University, Nina Khairova
  • Gdansk University of Technology, Nina Rizun
  • University of the Aegean, Charalampos Alexopoulos

Primary Contact

Nina Khairova, Umea University