Contact person: Paolo Ferragina (paolo.ferragina@unipi.it )
Internal Partners:
- UNIPI, Paolo Ferragina, paolo.ferragina@unipi.it
- ISTI-CNR, Giulio Rossetti, giulio.rossetti@isti.cnr.it
External Partners:
- Scuola Normale Superiore
The micro-project aims at designing a Recommender System able to foster pluralistic viewpoints in news pieces suggestions. The first step consists of quantifying the political bias of a news article. While such an issue has been widely investigated in the USA domain, as far as we know, no work has been performed in the European domain. In this scenario, we have already built a dataset with more than 8 million European news articles labeled by their political leaning, popularity, and distribution area. Since a publicly available dataset of such size and richness of annotations does not exist in the EU media landscape, we think that it could have an enormous potential value for subsequent academic studies. Additionally, we are currently leveraging AI-based techniques for NLP to define a topic modeling algorithm and a multilingual classifier able to identify the main topics and the political leaning of each article.
Results Summary
The tangible objective of this micro-project was to develop two datasets for European News with a political leaning labelling. This was needed to tackle the next step of the project, which was the one of building a bias-minimizing recommender system for European news.
The first dataset comprehends millions of European news, and it has been enriched with metadata coming from Eurotopics.net. Each entry in the dataset contains the maintext, title, publishment date, language, news source together with news source metadata. This metadata comprehends political leaning of the news source and its country.
We then built an article bias classifier, in the attempt of predicting the political label of single articles using the labels obtained through distant supervision. We then applied explainableAI to our classifier, and concluded that the classifier is effectively predicting the news source, rather than the political leaning.
In order to try and overcome this issue, we built a second dataset, which has the same features of the first one described above, but with the addition of topics, chosen between 7 macro-topics.
The immediate plan is to perform political-bias classification exploiting the new dataset by filtering out all the articles which do not carry political-bias, such as those dealing with sports or gossip.
Tangible Outcomes
- Dataset without topics https://drive.google.com/file/d/1Qq2khT7lM-5_oHSNJhbK_-EATNdOSY-n/view?usp=sharing
- Dataset with topics:
https://drive.google.com/file/d/1KGy-FcLulACK_Fa3Abd9Xr4qaurBnu2S/view?usp=sharing - Repo with the code used to build and study the datasets: https://github.com/LorenzoBellomo/EU-NewsDataset
- Repo with the political bias classifier: https://github.com/LorenzoBellomo/BiasClassification
- Report summarizing the detailed results
https://sobigdata.d4science.org/catalogue-sobigdata?path=/dataset/pluralistic_recommendation_in_news_-_report