Contact person: Andrea Galassi (a.galassi@unibo.it)
Internal Partners:
- University of Bologna, Andrea Galassi, a.galasi@unibo.it
External Partners:
- Uppsala University, Ana Tanevska, ana.tanevska@it.uu.s
As AI-powered devices, software solutions, and other products become prevalent in everyday life, there is an urgent need to prevent the creation or perpetuation of stereotypes and biases around gender, age, race, as well as other social characteristics at risk of discrimination.
There are well-documented limitations in our practices for collecting, maintaining, and distributing the datasets used in current ML models. Moreover, these AI/ML systems, their underlying datasets, as well as the stakeholders involved in their creation, often do not reflect the diversity in human societies, thus further exacerbating structural and systemic biases. Thus, it is critical for the AI community to address this lack of diversity, acknowledge its impact on technology development, and seek solutions to ensure diversity and inclusion.
Audio is a natural way of communicating for humans and allows the expression of a wide range of information. Its analysis through AI applications can provide insights regarding the emotions and inner state of the speaker, information that cannot be captured by simply analyzing text. The analysis of the speech component is valuable in any AI application designed for tasks requiring an understanding of human users behind their explicit textual expressions, such as the research area of affective computing.
Affective computing refers to the study and development of systems and devices that can recognize, interpret, and simulate human emotions and related affective phenomena. Most of the currently available speech datasets face significant limitations, such as a lack of diversity in the speaker population, which can affect the accuracy and inclusivity of speech recognition systems for speakers with different accents, dialects, or speech patterns.
Other limitations include narrow context and small scale of recordings, data quality issues, limited representation, and limited availability of data. These issues must be carefully addressed when selecting and using speech datasets in an effective computing context, to ensure that speech recognition systems can effectively contribute to applications such as intelligent virtual assistants, mental health diagnosis, and emotion recognition in diverse populations.
In this MP, we aim to contribute towards the creation of future datasets and to facilitate a more aware use of existing ones. We propose to perform an extensive review of the literature on the topic, in particular existing speech datasets, with two main objectives.
First, we identify the key characteristics required in the creation of unbiased and inclusive speech datasets and how such characteristics should be pursued and validated.
Second, we perform a meta-analysis of the domain, focusing on the underlying limitations in the existing datasets. We provide a critical evaluation of the datasets themselves, but also of the scientific articles in which they were presented. Such a fine-grained analysis will allow us to elaborate on a more general and coarse-grained evaluation of the domain.
Results Summary
In this micro-project, we addressed the domain of speech datasets for mental health and neurological disorders. We created a set of 7 desiderata for building these datasets, distilled it into a checklist of 20 elements that can be used as a tool for analysis of existing works and as guidance for future works, and finally surveyed existing literature to analyze and discuss current practices. Our set of desiderata is the first to specifically address this domain and considers both aspects that are relevant in terms of ethics and societal impact, such as “Fairness, Bias, and Diversity”, but also aspects that are more technical and domain-specific, such as the details of the recording and the involvement of medical experts in the study.
In our survey of existing literature, we identified key areas for improvement in resource creation and use. For example, several of the examined papers do not report on informed consent and accountability. Our findings highlighted the importance of involving experts from several different disciplines (e.g., computer science, medicine, social science, and law) when conducting studies in such a critical domain. These results also confirm the importance of the dissemination of principles and best practices across different disciplines.
Tangible Outcomes
- [arxiv] [under review] 1.A pre-print currently under review: Mancini, E., Tanevska, A., Galassi, A., Galatolo, A., Ruggeri, F., & Torroni, P. (2024). Promoting Fairness and Diversity in Speech Datasets for Mental Health and Neurological Disorders Research. arXiv preprint arXiv:2406.04116 (under review in JAIR (journal of artificial intelligence research))
https://arxiv.org/abs/2406.04116 - A GitHub repository with detailed analysis of literature Detailed analysis of containing 36 existing datasets and papers according to our desiderata and checklist
https://github.com/nlp-unibo/ethical-survey-speech - Invited talk “Towards an Ethical and Human-centric Artificial Intelligence: two case studies on fairness in Dialogue Systems and Speech Datasets”, at “2nd Workshop on Inside the Ethics of AI Awareness”, November 11th 2024, Uppsala, organized as part of the Horizon Europe project SymAware