Contact person: Joao Gama ( jgama@fep.up.pt

Internal Partners:

  1. INESC TEC, Joao Gama
  2. Università di Pisa (UNIPI), Dino Pedreschi
  3. Consiglio Nazionale delle Ricerche (CNR), Fosca Giannotti  

 

Nowadays ML models are used in decision-making processes in real-world problems by learning a function that maps the observed features with the decision outcomes. However, these models usually do not convey causal information about the association in observational data, thus not being easily understandable for the average user, therefore not being possible to retrace the models’ steps, nor rely on its reasoning. Hence, it is natural to investigate more explainable methodologies, such as causal discovery approaches, since they apply processes that mimic human reasoning. For this reason, we propose the usage of such methodologies to create more explicable models that replicate human thinking, and that are easier for the average user to understand. More specifically, we suggest its application in methods such as decision trees and random forest, since by themselves are highly explainable correlation-based methods.

Results Summary

In recent years, the study of causal relationships has become a crucial part of the Artificial Intelligence community, as causality can be a key tool for overcoming some limitations of correlation-based Machine Learning systems. Causality research can generally be divided into two main branches, that is, causal discovery and causal inference. The former focuses on obtaining causal knowledge directly from observational data. The latter aims to estimate the impact deriving from a change of a certain variable over an outcome of interest. The result of this project is a survey aiming at covering several methodologies that have been developed for both tasks. This survey does not only focus on theoretical aspects. But also provides a practical toolkit for interested researchers and practitioners, including software, datasets, and running examples. The published paper containts sections covering the following items. In Section 2, some basic definitions and notations are introduced. In Section 3, causal discovery techniques, tools, datasets, metrics, and examples are presented, organized by data type (cross-sectional, time-series, longitudinal). Section 4 covers causal inference techniques for several causal effects, tools, datasets, and a running example. Some remarks regarding the intersection between ML and causality are presented in Section 5, where some of the current open issues are also highlighted. Finally, conclusions are drawn.

Tangible Outcomes

  1. Nogueira, Ana Rita, Andrea Pugnana, Salvatore Ruggieri, Dino Pedreschi, and João Gama. “Methods and tools for causal discovery and causal inference.” Wiley interdisciplinary reviews: data mining and knowledge discovery 12, no. 2 (2022): e1449. https://wires.onlinelibrary.wiley.com/doi/10.1002/widm.1449
  2. Github repository of datasets, and papers related to causal discovery and causal inference research https://github.com/AnaRitaNogueira/Methods-and-Tools-for-Causal-Discovery-and-Causal-Inference
  3. Github repository of software related to causal discovery and causal inference research https://github.com/AnaRitaNogueira/Methods-and-Tools-for-Causal-Discovery-and-Causal-Inference