Contact person: Lorenzo Valerio (lorenzo.valerio@iit.cnr.it

Internal Partners:

  1. Consiglio Nazionale delle Ricerche (CNR), Lorenzo Valerio, lorenzo.valerio@iit.cnr.it
  2. Central European University, János Kertész, kerteszj@ceu.edu

 

This microproject set out to study the effect of simple social structures on the decentralized learning process in a human-AI ecosystem and how the lack of coordination impacts the resulting learned model. The project considered the following learning policies: federated learning (FedAvg), average-based decentralized learning (called DecAvg, i.e., an adaptation of FedAvg to the decentralised settings), difference-based decentralized learning (with a novel strategy called DecDiff), and KD-based decentralized learning (with a virtual teacher). For decentralized strategies, we considered both homogeneous and heterogeneous initial conditions (e.g., common initialization of models, IID and non-IID data distribution among nodes). The common benchmark is centralized learning (i.e., we assume that all users upload their data to a central server). From the social network standpoint, we initially focused on dyadic and triadic social networks, then moved on to richer topologies like erdős-rényi graphs and SBM graphs. As a learning task, we considered a standard classification problem on the MNIST dataset. Other and more challenging datasets are currently under investigation.

Results Summary

We have observed the following:

  1. In small networks where data availability is not an issue, DecAvg in model-homogeneous settings (i.e., all the users’ AI models are commonly initialised) is as good as federated learning using FedAvg (despite the lack of a central controller). Without the common initialization (i.e., all the AI models are independently and randomly initialised), the accuracy strongly depends on the initial conditions. DecDiff definitely mitigates the problem, yielding a higher accuracy despite being slower in the transient phase. The virtual teacher clearly outperforms a basic centralized approach. Instead, when data is a bottleneck, the learning strategy plays a limited role in the observed performance.
  2. In larger networks, DecDiff doesn’t suffer from the initial disruption caused by the averaging process that DecAvg suffers from. However, at steady state, it is not better than the simpler DecAvg. When the data distribution is extremely uneven, DecDiff seems to provide more reliable performance. Interestingly, KD-based decentralized learning always performs well, surpassing standard federated learning.

Tangible Outcomes

  1. [arxiv] “Coordination-free Decentralised Federated Learning on Complex Networks: Overcoming Heterogeneity” Lorenzo Valerio, Chiara Boldrini, Andrea Passarella, János Kertész, Márton Karsai, Gerardo Iñiguez, https://arxiv.org/abs/2312.04504 
  2. We implemented all the strategies in the SAI Simulator (SAISim). The repository is on Zenodo: https://zenodo.org/records/5780042#.Ybi2sX3MLPw