Contact person: Mireille Hildebrandt (m.hildebrandt@cs.ru.nl

Internal Partners:

  1. University of Sussex (UOS)
  2. German Research Centre for Artificial Intelligence (DFKI)
  3. Vrije Universiteit Brussel (VUB)  

 

HumanE-AI research needs data to advance. Often, researchers struggle to progress because of the lack of data. At the same time, collecting a rich and accurate dataset is no easy task. Therefore, we propose to share through the AI4EU platform the datasets already collected so far by different research groups. The datasets are curated to be ready-to-use for researchers. Possible extension and variation of such datasets are also generated using artificial techniques and published on the platform. A performance baseline is provided for each dataset, in the form of publication reference, developed model or written documentation. The relevant legal framework will be investigated with specific attention to privacy and data protection, as to highlight limitations and challenges for the use and extension of existing datasets as well as future data collection on the subject of multimodal data collection for perception modelling.

Results Summary

There were 2 main outputs:

— Datasets: The partners involved have created, curated and released datasets for Human Activity Recognition (HAR) tasks, in particular, the extended dataset OPPORTUNITY++ and Wearlab BeachVolleyball dataset. The participation in the microproject has offered the chance to get a closer look at the practices, doubts and difficulties emerging within the scientific community involved in the creation, curation and dissemination of training datasets. Considering that one of the goals of the HumanE-AI Net is to connect research with relevant use cases in European society and industry, the participation to the microproject has offered the occasion to situate dataset collection, curation, and release within the broader context of AI pipeline.

— A comprehensive report introducing the concept of “Legal Protection Debt”: the Report examines the potential issues that arise within current ML-practices and provides an analysis of the relevant normative frameworks that govern such practices. By bridging the gap between practices and legal norms, the Report provides researchers with the tools to assess the risks to fundamental rights and freedoms that may occur due to the implementation of AI research in real world situations and recommends a set of mitigating measures to reduce infringements and to prevent violations.

The Report acknowledges that datasets constitute the backbone infrastructure underpinning the development of Machine Learning. The datasets that are created, curated and disseminated by ML practitioners provide the data to train ML models and the benchmarks to test the improvement of such models in performing the tasks for which they are intended.

However, until recently, the practices, processes and interactions that take place further upstream the ML-pipeline, between the collection of data and the use of dataset for training ML-models, have tended to fade into the background.

The report argues that the practices of dataset creation, curation and dissemination play a crucial role in the setting of the level of legal protection that is afforded to all the legal subjects that are located downstream ML-pipelines. Where such practices lack appropriate legal safeguards, a “Legal Protection Debt” can mount up incrementally along the stages of MLpipelines.

In section 1.1., the Report provides a brief overview of how current data science

practices depend on and perpetuate an ecosystem characterised by a lack of structural safeguards for the risks posed by data processing. This can lead to the accumulation of “technical debt”. Such debt, in turn, can assume relevance in the perspective of compliance with legal requirements. Taking inspiration from the literature on technical and ethical debt, the Report introduces the concept of Legal Protection Debt. Because of this legal protection debt, data-driven systems implemented at the end of the ML pipeline may lack the safeguards necessary to avoid downstream harm to natural persons.

The Report argues that the coming about of Legal Protection Debt and its accumulation at the end of the ML pipeline can be contrasted through the adoption of a Legal protection by design approach. This implies the overcoming of a siloed understanding of legal liability that mirrors the modular character of ML pipelines. Addressing legal protection debt requires ML practitioners to adopt a forward looking perspective. Such perspective should situates the stage of development in practitioners are involved in the context of the further stages that take place both upstream and downstream the pipeline. The consideration of the downstream stages of the ML-pipeline shall, as it were, back propagate and inform the

choices as to the technical and organisational measure to be taken upstream: upstream design decisions must be based on the anticipation of the downstream uses afforded by datasets and the potential harms that the latter may cause. Translated into a legal perspective, this implies that the actors upstream the pipeline should take into consideration the legal requirements that apply to the last stages of the pipeline.

The Report illustrates how data protection law lays down a set of legal equirements that overcome modularity and encompass the ML pipeline in its entirety, connecting the actors upstream with those downstream. The GDPR makes controllers responsible for the effects of the processing that they carry out. In section 2, the Report shows how the GDPR provides the tools to mitigate the problem of many hands in ML-pipelines. The duties and obligations set by the GDPR require controllers to implement by design safeguards that conjugate the

need to address downstream harms with the necessity to comply with the standards that govern scientific research. In this perspective, the Report shows that the obligations established by data protection law either instantiate or harden most of the requirements set by the Open science and Open data framework and also the best practices emerging within the ML-community.

In section 2.1. the report illustrates the core structure of the regime of liability to which controllers are subject under the GDPR. Such a regime of liability hinges upon controllers’ duty to perform a context-dependent judgment. Such judgment must inform controllers’ decisions as to the measures to be adopted to ensure compliance with all the obligations established by the GDPR. Such judgment must be based on the consideration of the downstream harms posed by the processing.

In essence, the duty to anticipate and address potential downstream harms requires controllers to adopt a forward-looking approach. In order to ensure compliance with the GDPR, controllers must engage in a dynamic, recursive practice that addresses the requirements of present processing in the light of the future potential developments. At the same time, the planning effort required by the GDPR is strictly connected with the compliance with obligations set by other normative frameworks. In this sense, compliance with the GDPR and compliance with obligations such as those imposed by the Open science and Open data framework go hand in hand. Compliance with the GDPR is a pre-requisite

for complying with Open science and Open data framework. Simultaneously, the perspective of open access and re-usability of datasets affects the content of the obligations set by the GDPR.

As a result, the consideration of “what happens downstream” – i.e., the potential uses of datasets, potential harms that the latter may cause, further requirements imposed by other normative frameworks – back propagates, determining the requirements that apply upstream.

In section 2.2. we show how the compliance with the documentation obligations set by the GDPR can contrast the accumulation of a documentation debt and ensure controllers’ compliance with the obligations established by other normative frameworks, such as Open Data and Open Science. The overlapping between the documentation requirements established by such different frameworks shows firstly that a serious approach to the compliance with the GDPR can provide the safeguards necessary to contrast the accumulation of a documentation debt. In this way, compliance with the documentation obligations set by the GDPR can prevent the accumulation of other forms of technical debt and, eventually, of legal protection debt. At the same time, the convergence between the requirements set by the GDPR and those established by the FAIR principle and the Horizon DMP template shows how the performance of the documentation obligations established by the GDPR can also facilitate compliance with requirements specific to data processing conducted in the context of scientific research.

A correct framing of the practices of dataset creation, curation and release in the context of research requires to make an effort towards the integrity of the legal framework as a whole, taking into consideration the relations between Open data, Open science and data protection law. First, it is first important to stress that compliance with data protection law represents a pre-requisite for the achievement of the goals of Open Data and Open Science framework.

In section 2.3. the report analyses the requirements that govern the release and downstream (re)use of datasets. Compliance with the requirements set by the GDPR is essential to avoid that dataset dissemination gives rise to the accumulation of legal protection debt along ML pipelines. Based on the assessment of adequacy and effectiveness required for all forms of processing, controllers can consider the adoption a range of measures to ensure that data

transfer are compliant with the GDPR. Among such measures, the Report examines the use of licenses, the providing of adequate documentation for the released dataset, data access management and traceability measures, including the use of unique identifiers.

The Report contains an Annex illustrating the provisions of the GDPR that establish a special regime for the processing carried out for scientific research purposes. We highlight how most of the provisions contained in the GDPR are not subject to any derogation or exemption in view of the scientific research purpose of the processing. All in all, the research regime provided by the GDPR covers the application of a limited number of provisions (or part of provisions). A process that is unlawful in that it does not comply with the general provisions set by the GDPR cannot enjoy the effects of the derogations provided by the research regime. The derogations allowed under the special research regime concern

almost exclusively the GDPR provisions on the rights of data subjects, while no derogation is possible for the general obligations that delineate the responsibility of the controller. The derogations provided under the special research regime allow controllers to modulate their obligations towards data subjects where the processing of personal data is not likely to affect significantly the natural persons that are identified or identifiable through such data. As it were, the decrease of the level of potential harm makes possible the lessening of the safeguards required to ensure the protection of data subjects. Even in such cases, however, no derogation is allowed with respect to the requirements different than those concerning the rights of the data subject. This circumstance makes manifest that the system established by the GDPR aims at providing a form of protection that goes beyond the natural persons whose personal data are processed at that time by controllers.

Contact person: Joao Gama ( jgama@fep.up.pt

Internal Partners:

  1. INESC TEC, Joao Gama
  2. Università di Pisa (UNIPI), Dino Pedreschi
  3. Consiglio Nazionale delle Ricerche (CNR), Fosca Giannotti  

 

Nowadays ML models are used in decision-making processes in real-world problems by learning a function that maps the observed features with the decision outcomes. However, these models usually do not convey causal information about the association in observational data, thus not being easily understandable for the average user, therefore not being possible to retrace the models’ steps, nor rely on its reasoning. Hence, it is natural to investigate more explainable methodologies, such as causal discovery approaches, since they apply processes that mimic human reasoning. For this reason, we propose the usage of such methodologies to create more explicable models that replicate human thinking, and that are easier for the average user to understand. More specifically, we suggest its application in methods such as decision trees and random forest, since by themselves are highly explainable correlation-based methods.

Results Summary

In recent years, the study of causal relationships has become a crucial part of the Artificial Intelligence community, as causality can be a key tool for overcoming some limitations of correlation-based Machine Learning systems. Causality research can generally be divided into two main branches, that is, causal discovery and causal inference. The former focuses on obtaining causal knowledge directly from observational data. The latter aims to estimate the impact deriving from a change of a certain variable over an outcome of interest. The result of this project is a survey aiming at covering several methodologies that have been developed for both tasks. This survey does not only focus on theoretical aspects. But also provides a practical toolkit for interested researchers and practitioners, including software, datasets, and running examples. The published paper containts sections covering the following items. In Section 2, some basic definitions and notations are introduced. In Section 3, causal discovery techniques, tools, datasets, metrics, and examples are presented, organized by data type (cross-sectional, time-series, longitudinal). Section 4 covers causal inference techniques for several causal effects, tools, datasets, and a running example. Some remarks regarding the intersection between ML and causality are presented in Section 5, where some of the current open issues are also highlighted. Finally, conclusions are drawn.

Tangible Outcomes

  1. Nogueira, Ana Rita, Andrea Pugnana, Salvatore Ruggieri, Dino Pedreschi, and João Gama. “Methods and tools for causal discovery and causal inference.” Wiley interdisciplinary reviews: data mining and knowledge discovery 12, no. 2 (2022): e1449. https://wires.onlinelibrary.wiley.com/doi/10.1002/widm.1449
  2. Github repository of datasets, and papers related to causal discovery and causal inference research https://github.com/AnaRitaNogueira/Methods-and-Tools-for-Causal-Discovery-and-Causal-Inference
  3. Github repository of software related to causal discovery and causal inference research https://github.com/AnaRitaNogueira/Methods-and-Tools-for-Causal-Discovery-and-Causal-Inference

Contact person: Hamraz javaheri (hamraz.javaheri@dfki.de

Internal Partners:

  1. DFKI  

External Partners:

  1. Hospital Saarbrücken “Der Winterberg”  

 

In this project, we successfully implemented and clinically evaluated an AR assistance system for pancreatic surgery, enhancing surgical navigation and achieving more precise perioperative outcomes. However, the system’s reliance on preoperative data posed challenges, particularly due to anatomical deformations occurring in the later stages of surgery. In future research, we aim to address this by integrating real-time data sources to further improve the system’s accuracy and adaptability during surgery.

Results Summary

Throughout our project, we developed and clinically evaluated ARAS, an augmented reality (AR) assistance system designed for pancreatic surgery. The system was clinically evaluated by field surgeons during pancreatic tumor resections involving 20 patients. In a matched-pair analysis with 60 patients who underwent surgery without ARAS, the ARAS group demonstrated a significantly shorter operation time compared to the control group. Although not statistically significant, the ARAS group also exhibited clinically noticeable lower rates of excessive intraoperative bleeding and reduced need for intraoperative red blood cell (RBC) transfusions. Furthermore, ARAS enabled more precise tumor resections with tumor-free margins, and patients in this group had better postoperative outcomes, including significantly shorter hospital stays. In this project, we published 2 journal papers (1 is accepted and will be published soon), 1 conference paper, 1 demo paper (Best Demo Paper Award), and 2 more conference papers are currently under submission. The success of our project got also several international and local news and media attention including Deutsche Welle news channel (Example links provided)

Tangible Outcomes

  1. Beyond the visible: preliminary evaluation of the first wearable augmented reality assistance system for pancreatic surgery, Journal of International Journal of Computer Assisted Radiology and Surgery (https://doi.org/10.1007/s11548-024-03131-0 )
  2. Enhancing Perioperative Outcomes of Pancreatic Surgery with Wearable Augmented Reality Assistance System: A Matched-Pair Analysis, Journal of Annals of Surgery Open ( https://doi.org/10.1097/AS9.0000000000000516)
  3. Design and Clinical Evaluation of ARAS: An Augmented Reality Assistance System for Pancreatic Surgery (IEEE ISMAR 2024 (https://www.researchgate.net/publication/385116946_Design_and_Clinical_Evaluation_of_ARAS_An_Augmented_Reality_Assistance_System_for_Open_Pancreatic_Surgery_Omid_Ghamarnejad
  4. ARAS: LLM-Supported Augmented Reality Assistance System for Pancreatic Surgery, ISWC/UbiComp 2024 (https://doi.org/10.1145/3675094.3677543
  5. Media coverage for the project:
    1. https://www.dw.com/en/artificial-intelligence-saving-lives-in-the-operating-room/video-68125878
    2. https://www.dw.com/de/k%C3%BCnstliche-intelligenz-im-op-saal-rettet-leben/video-68125903
    3. https://www.saarbruecker-zeitung.de/app/consent/?ref=https%3A%2F%2Fwww.saarbruecker-zeitung.de%2Fsaarland%2Fsaarbruecken%2Fsaarbruecken%2Fsaarbruecken-winterberg-klinik-international-im-tv-zu-sehen_aid-106311259
    4. https://www.saarbruecker-zeitung.de/app/consent/?ref=https%3A%2F%2Fwww.saarbruecker-zeitung.de%2Fsaarland%2Fsaarbruecken-mittels-ki-erfolgreiche-operation-an-82-jaehriger-v29_aid-104053203
    5. https://m.focus.de/gesundheit/gesundleben/da-gibt-es-keinen-raum-fuer-fehler-kuenstliche-intelligenz-im-op-saal-rettet-leben_id_259629806.html 

Contact person: Petr Schwarz,  Brno University of Technology, (schwarzp@fit.vutbr.cz)

Internal Partners:

  1. Brno University of Technology, Petr Schwarz, schwarzp@fit.vutbr.cz
  2. Charles University, Ondrej Dusek, odusek@ufal.mff.cuni.cz

 

This project brings us data, tools, and baselines that enable us to study and improve context exchange among component and dialog sides (AI agent and human) in voice dialog systems. A better context exchange allows us to build more accurate automatic speech transcription, better dialog flow modeling, more fluent speech synthesis, and more powerful AI agents. The context exchange can be seen as an interactive grounding in two senses – among dialog sides (for example, technologies like example automatic speech transcription rarely use the other dialog side information to adapt itself) and among dialog system components (the speech synthesis rarely uses dialog context to produce more fluent or expressive speech). The individual project outputs are summarized below.

Results Summary

1) Audio data collection software based on the Twilio platform and WebRTC desktop/mobile device clients. The purpose is to collect audio data of communication between agents (company, service provider, for example, travel info provider) and users. This software enables us to collect very realistic voice dialogs that have high-quality audio (>= 16kHz sampling frequency) on the agent side and low telephone-quality audio on the user side. The code is available here: https://github.com/oplatek/speechwoz

2) We have established a relationship with Paweł Budzianowski (Poly.AI) and Izhak Shafran (Google). Paweł created the MultiWoz database – an excellent dialog corpus (https://arxiv.org/abs/1810.00278) that we use for the text-based experiment. We decided to collect our audio data similarly. Izhak organized DSTC11 Speech Aware Dialog System Technology Challenge (https://arxiv.org/abs/2212.08704) and created artificial audio data for MultiWOZ through speech synthesis, reading, and paraphrasing. Both provided us with the necessary advice for our data collection.

3) Speech dialog data – the data collection platform preparation and data collection are very time-consuming. The data collection is in progress and will be released before June 26th, 2023.

4) Initial experiments with context exchange between dialog sides (user and agent) were performed. These experiments show a nice improvement in the component of automatic speech recognition side. The results will be re-run with the collected data and published when the collection is finished.

5) Initial experiments with training instance weighting for response generation – which brings context to dialog system response generation, were performed. Experiments were based on the AuGPT system, previously developed at CUNI. The code is available here: https://github.com/knalin55/augpt. Instance weighting increases the re-use of context, compared to normal training, and can go even beyond natural occurrences in data. Simple weighting (threshold) seems better than designing a complex instance weight (in terms of automated metrics, limited manual evaluation is not conclusive). Cross entropy loss works better than unlikelihood loss, where dialogue success may be reduced.

6) We organized a workshop in JSALT research summer workshop about “Automatic design of conversational models from observation of human-to-human conversation”(https://jsalt2023.univ-lemans.fr/en/automatic-design-of-conversational-models-fromobservation-of-human-to-human-conversation.html).  https://www.clsp.jhu.edu/2023-jelinek-summer-workshop, https://jsalt2023.univ-lemans.fr/en/index.html. This is a prestigious workshop organized by John Hopkins University every year. This year it is supported and co-organized by the University of Le Mans. Our topic  passed a scientific review by more than 40 world-class researchers in AI in Baltimore, USA, in December 2022, and was selected for this workshop out of 15 proposals together with three others. The workshop topic builds on the outcome of this Micro-Project and will reuse the collected data.

Tangible Outcomes

  1. Nalin Kumar and Ondrej Dusek. 2024. LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 727–735, Mexico City, Mexico. Association for Computational Linguistics https://aclanthology.org/2024.findings-naacl.46/ 
  2. Code for audio data collection: https://github.com/oplatek/speechwoz 
  3. Code for end-to-end response generation: https://github.com/knalin55/augpt 
  4. Report for end-to-end response generation: https://docs.google.com/document/d/1iQB1YWr3wMO8aEB08BUYBqiLh0KreYjyO4EHnb395Bo/edit 
  5. “Automatic design of conversational models from observation of human-to-human conversation” workshop in the prestigious JSALT research summer workshops program https://jsalt2023.univ-lemans.fr/en/automatic-design-of-conversational-models-from-observation-of-human-to-human-conversation.html
  6. workshop proposal: https://docs.google.com/document/d/19PAOkquQY6wnPx_wUXIx2EaInYchoCRn/edit 
  7. presentations from the prestigious JSALT research summer workshop: https://youtu.be/QS5zXkpXV3Q 

Contact person: Uwe Köckemann (uwe.kockemann@oru.se; michele.lombardi2@unibo.it

Internal Partners:

  1. Örebro University (ORU), Uwe Köckemann
  2. Università di Bologna (UNIBO), Michele Lombardi 

 

Methods for injecting constraints in Machine Learning (ML) can help bridging the gap between symbolic and sub-symbolic models, and address fairness and safety issues in data-driven AI systems. The recently proposed “Moving Targets” approach achieves this via a decomposition, where a classical ML model deals with the data and a separate constraint solver with the constraints. Different applications call for different constraints, solvers, and ML models; this flexibility is a strength of the approach, but it makes it also difficult to set up and analyze. Therefore, this project relies on the AI Domain Definition Language (AIDDL) framework to obtain a flexible implementation of the approach, making it simpler to use and allowing the exploration of more case studies, different constraint solvers, and algorithmic variants. We used this implementation to investigate various new constraint types integrated with the Moving Targets approach (e.g., SMT, MINLP, CP).

Results Summary

The moving targets method integrates machine learning and constraint optimization to enforce constraints on a machine learning model. The AI Domain Definition Language (AIDDL) provides a modeling language and framework for integrative AI.

We have implemented the moving targets algorithm in the AIDDL framework for integrative AI. This has benefits for modeling, experimentation, and usability. On the modeling side, this enables us to provide applications of “moving target” as regular machine learning problems extended with constraints and a loss function. On the experimentation side, we can now easily switch the learning and constraint solvers used by the “moving targets” algorithm, and we have added support for multiple constraint types. Finally, we made the “moving targets” method easier to use, since it can now be controlled through a small model written in the AIDDL language.

Tangible Outcomes

  1. Example Jupyter Notebooks (3 data sets) – Uwe Köckemann, Fabrizio Detassis, Michele Lombardi. https://gitsvn-nt.oru.se/uwe.kockemann/moving-targets 
  2. Experiments Jupyter Notebooks (3 data sets) – Uwe Köckemann, Fabrizio Detassis, Michele Lombardi. https://gitsvn-nt.oru.se/uwe.kockemann/moving-targets 
  3. Program/code: Python library: Moving targets via AIDDL – Uwe Köckemann, Fabrizio Detassis, Michele Lombardi. https://gitsvn-nt.oru.se/uwe.kockemann/moving-targets 
  4. Moving targets tutorial – Michele Lombardi. https://gitsvn-nt.oru.se/uwe.kockemann/moving-targets 
  5. presentation about the project. https://gitsvn-nt.oru.se/uwe.kockemann/moving-targets/-/blob/master/presentations/microproject_presentation_ORU-UBO.pptx 
  6. Video presentation summarizing the project

 

Contact person: Dino Pedreschi (dino.pedreschi@unipi.it

Internal Partners:

  1. University of Pisa – Department of CS, Dino Pedreschi (dino.pedrschi@unipi.it)   

External Partners:

  1. University of Antwerp – Department of CS, Daphne Lenders (daphne.lenders@uantwerpen.be
  2. Scuola Normale Superiore, Roberto Pellungrini (roberto.pellungrini@sns.it)   

 

Our project revolves around the topic of fair Artificial Intelligence(AI), a field that explores how decision-making algorithms used in high-stake domains, such as hiring or loan allocations, can perpetuate discriminatory patterns in the data they are based on, unfairly affecting people of certain races, genders or other demographics. Early attempts to address bias in AI systems focused on automated solutions, attempting to eliminate discrimination by establishing mathematical definitions of “fairness” and optimizing algorithms accordingly. However, these approaches have faced justified criticism for disregarding the contextual nuances in which algorithms operate, and for neglecting the input of domain experts who understand and can tackle discriminatory patterns effectively. Consequently, policymakers have recognized the pitfalls of solely relying on these approaches and are now designing legal regulations, mandating that high-risk AI systems can only be deployed when they allow for oversight and intervention by human experts. With our project, we investigate how to effectively achieve this human control by exploring the intersection between fair and explainable AI (xAI), whereas the latter is concerned with explaining the decision processes of otherwise opaque black-box algorithms. We develop a tool that provides humans with explanations about an algorithmic decision-making system. Based on the explanations, users can give feedback about the system’s fairness and choose between different strategies to mitigate its discriminatory patterns. By immediately getting feedback about the effects of their chosen strategy, users can engage in an iterative process, further refining and improving the algorithm. Since little prior work has been done on Human-AI collaboration in the context of bias mitigation, we took on an exploratory approach to evaluate this system. We set up a think-aloud study where potential end-users can interact with the system and try out different mitigation strategies. We analysed their responses and thoughts, to identify the tool’s strengths and weaknesses as well as users’ mental model of the tool. Additionally, we compared the systems’ biases before and after human intervention, to see how biases were mitigated and how successful this mitigation was.

Results Summary

We developed an algorithm that can reject predictions both based on their uncertainty and their unfairness. By rejecting possibly unfair predictions, our method reduces error and positive decision rate differences across demographic groups of the non-rejected data. Since the unfairness-based rejections are based on an interpretable-by-design method, i.e., rule-based fairness checks and situation testing, we create a transparent process that can empower human decision-makers to review the unfair predictions and make more just decisions for them. This explainable aspect is especially important in light of recent AI regulations, mandating that any high-risk decision task should be overseen by human experts to reduce discrimination risks. This methodology allows us to essentially bridge the gap between classifiers with a reject option and interpretable by design methods, encouraging human intervention and comprehension. We produced a functioning software, which is available, and are working on a full publication with experiments on multiple datasets and multiple rejection strategies. A publication is planned out of the outcome.

Tangible Outcomes

  1. The full software: https://github.com/calathea21/IFAC 

Contact person: Joao Gama (INESC TEC) (jgama@fep.up.pt)

Internal Partners:

  1. INESC-Tech Joao Gama,
  2. CNR, Giuseppe Manco,
  3. ULEI, Holger Hoos 

 

The goal is to devise a data generation methodology that, given a data sample, can approximate the stochastic process that generated it. The methodology can be useful in many contexts where we need to share data while preserving user privacy. There are known literature for data generation based on Bayesian neural networks/hidden Markov models that are restricted to static and propositional data. We focus on time-evolving data and preference data. We will study essentially two aspects: (1) the generator to produce realistic data, having the same properties of the original one, and (2) we want to investigate how to inject drift within the data generation process in a controlled manner. The idea is to model the stochastic process through a dependency graph among random variables so that the drift can be simply modeled by changing the structure of the underlying graph through a morphing process.

Tangible Outcomes

  1. available on github – https://github.com/fsp22/mcd_dds4rs 
  2. implementation of the model presented in the paper “Modelling Concept Drift in Dynamic Data Streams for Recommender Systems” https://github.com/fsp22/mcd_dds4rs