Learning With LLMs: Supporting Complex Reasoning, Planning And Argumentation, Applied To Providing Educational Guidance
UCL, JSI, University of Bologna, Fortiss, OptimalAI,Tilde
Original Aims
Despite their impressive performance on a wide variety of tasks, Large Language Models (LLMs) fall short on tasks that involve complex reasoning, argumentation and planning. The aim of this macro-project is (i) to investigate how LLMs can assist humans in complex reasoning and argumentation (Task 1); (ii) to extend LLMs with a modular architecture of symbolic components to enable planning and reasoning (Task 2); and (iii) validate this architecture in providing guidance on educational pathways, which will require reasoning and planning capabilities (Task 3).
Scientific Results/Insights Summary
As GenAI becomes more reliable, competent and trustworthy, many more apps will become available that are intended to augment human knowledge activities, such as decision-making, problem-solving, writing, designing, and reasoning. The question is how best to achieve this such that humans are still engaged. This MP covers recent research into human-centred AI that has investigated how new AI tools can be designed to enhance human learning, creativity and working – where the AI works with us and not replace us. A focus is on the roles that the AI can play in helping us to reason and reflect more – through making us think differently. We propose that new models of AI-cognition are needed that can explain and predict how to truly empower people with new capabilities.
One of the most ambitious use cases of computer-assisted learning is personal recommender systems for lifelong learning which require sophisticated recommendation models accounting for a wide range of factors such as background knowledge of learners and novelty of the material while effectively maintaining knowledge states of learners for significant periods of time (Bulathwela et al., 2020). A foundational component in such systems is a model that captures and tracks learners’ knowledge states in order to assist them on their path of knowledge discovery and acquisition. One such model – TrueLearn – uses Bayesian algorithms and defines knowledge components in terms of Wikipedia topics to provide transparent estimates of learners’ knowledge states (Bulathwela et al., 2020; Qiu et al., 2023). While models such as TrueLearn could form the backbone of a lifelong learning recommendation system by tracking learners’ progress through engagement with open educational resources, many aspects and challenges associated with such a personal recommender are left unaddressed – not least the model “cold-start problem” which is relevant when onboarding a new learner (Bulathwela et al., 2021b).
TrueReason Assistant attempts to build on transparent models such as TrueLearn while broadening the scope of the application to address issues which include:
Learner onboarding: When a new learner begins to use the personal recommender, a new model will need to be initialised to represent the learner’s knowledge state as accurately as possible to provide relevant recommendations early on. In this case it is useful to elicit learners’ background knowledge in a more direct way than observing engagement and use an alternative recommendation strategy until TrueLearn has a robust estimate from engagement events (Bulathwela et al., 2021b).
Learning goals: To assist the learner beyond recommending novel and engaging educational content, it is useful to base interactions on a basic situation model which firstly keeps track of learning goals but also allows the learner to monitor and steer their own progress by providing relevant information and feedback about knowledge components and their own knowledge state. This augments learners’ agency and allows them to take control of their learning trajectory instead of being passive receivers of recommendations (Reicherts et al., 2022) – see Figure 1.
Content ontology: In contrast to traditional courses for more short-lived educational scenarios, lifelong learning supported by open educational resources has to contend with a more diverse and rapidly changing set of resources which are likely not designed to compose as do lectures in a more formal setting. However, being able to inform learners according to the structure and relations between materials could assist them on their learning path(Ilkou et al., 2021).
Knowledge review: Appropriate question generation models (Bulathwela et al., 2023) could be used to allow learners to test their knowledge or serve as a source of information to initialise or verify the learner knowledge state estimated by TrueLearn.
Knowledge gaps: Because the available set of educational resources are not necessarily designed to be comprehensive combined with the absence of a formal educator, learners may find that knowledge gaps persist after engaging with materials or that they need clarity on unfamiliar concepts. These gaps or explanations could be addressed by world knowledge contained in a large language model or methods for generating analogical explanations (Sourati et al., 2024) which rely on learners’ background knowledge.
Figure 2 summarises the components and interactions of TrueReason Assistant.
Innovation and Industry Cooperation Potential
The macroproject saw significant industrial involvement, particularly through collaborations with key industrial partners Fortiss and Tilde. Both organizations played an instrumental role in contributing to the project’s objectives. Fortiss provided valuable insights and technological suggestions, enhancing the project’s technical framework, while Tilde brought their expertise in language technology solutions, helping to shape and refine the project’s deliverables. The active participation of these industrial partners not only ensured the project remained aligned with market needs but also facilitated the potential of integration of cutting-edge technologies into the project’s workflow. OptimalAI played a key role in assessing the business value of the project by conducting thorough market analysis and identifying potential customers who could benefit from the solutions being developed. Their efforts were focused on understanding how the project’s outcomes could be positioned to meet specific industry needs and deliver tangible value. OptimalAI also explored possibilities for further development, including suggesting the creation of a web application to enhance accessibility and usability for end-users in showcasing the interface component. This web app concept aimed to provide a seamless interface for interacting with the project’s solutions, making it easier for businesses to integrate and leverage the technology in their operations.
Tangible Outcomes
Publications
Wu, Z., Suraworachet, W., Uzun, Y., Hao, X., Cukurova, M., Pérez-Ortiz, M., & Bulathwela, S. (2024). Leveraging Artificial Intelligence to Increase Higher Education Stakeholders’ Awareness of Sustainable Development Goals. Sustainability (Under Review).
Li, Z., Cukurova, M., & Bulathwela, S. (2024). A Novel Approach to Scalable and Automatic Topic-Controlled Question Generation in Education. In Learning Analytics & Knowledge Conference . ACM (Under Review).
Fawzi, F., Balan, S., Cukurova, M., Yilmaz, E., & Bulathwela, S. (2024). Towards Human-Like Educational Question Generation with Small Language Models. In International Conference on Artificial Intelligence in Education (pp. 295-303). Cham: Springer Nature Switzerland.
Bulathwela, S., Pérez-Ortiz, M., Holloway, C., Cukurova, M., & Shawe-Taylor, J. (2024). Artificial Intelligence Alone Will Not Democratise Education: On Educational Inequality, Techno-Solutionism and Inclusive Tools. Sustainability, 16(2), 781.
Qiu, Y., Djemili, K., Elezi, D., Shalman, A., Pérez-Ortiz, M., Yilmaz, E., Shawe-Taylor, J. & Bulathwela, S. (2024). A Toolbox for Modelling Engagement with Educational Videos, In Proceedings of the AAAI Conference on Artificial Intelligence.
Fawzi, F., Amini, S., & Bulathwela, S. (2023). Small Generative Language Models for Educational Question Generation. In Proceedings of the NeurIPS Workshop on Generative Artificial Intelligence for Education (GAIEd), New Orleans, LA, USA (Vol. 15).
Wu, Z., Bulathwela, S., & Koshiyama, A. (2023). Towards Auditing Large Language Models: Improving Text-based Stereotype Detection. In Proceedings of the NeurIPS workshop on Socially Responsible Language Modelling Research.
Bulathwela, S., Muse, H., & Yilmaz, E. (2023). Scalable Educational Question Generation with Pre-trained Language Models. In International Conference on Artificial Intelligence in Education (pp. 327-339). Cham: Springer Nature Switzerland.
Shawe-Taylor, J and Dignum, F (2024) Human-centric AI and Education, Journal of Artificial Intelligence for Sustainable Development 1 (1), 8-11, (2024) DOI: 10.69828/4d4k91. https://projecteuclid.org/journalArticle/Download?urlId=10.69828%2F4d4k91
Presentation of the microproject at the HumaneAI-Net project meeting, Tuesday, April 9 and 10 in Umea, Sweden
Presentation of the microproject at the HumaneAI-Net project meeting, Tuesday, April 9 and 10 in Umea, Sweden
Presentation of the microproject at the HumaneAI-Net project meeting, Tuesday, April 9 and 10 in Umea, Sweden
Presentation of the microproject at the HumaneAI-Net project meeting, Tuesday, April 9 and 10 in Umea, Sweden
Collaboration Outside the Consortium (in particular with AIoD etc)
During the course of the microproject, the team engaged in several discussions with Marco Hirsch (DFKI), aiming to explore potential collaboration opportunities. These conversations were focused on leveraging the AI-on-demand platform provided by the AI4Europe initiative (https://www.ai4europe.eu/). The team sought to understand how the platform could support their project goals and foster further collaboration, with the intention of integrating resources and expertise available through the portal to enhance the project’s outcomes. The main idea was to move forward and ingest educational materials from the AIoD portal.
Macro-project WP3 and WP4
Benchmarking and Analysis of Human-LWM Interaction
Aalto University, IST University of Lisbon, DFKI, fortiss, Eötvös Loránd University (ELTE), LMU Munich, Start2 Group, Örebro University, Umeå University, Fraunhofer IAIS
Topics
Development of a joint web benchmark for collaborative AI.
https://humane-ai.dice.aalto.fi/
Tasks
1) UI design, SW development (Aalto)
2) AI Builder integration (Fraunhofer)
3) Tasks (Örebro, ELTE, IST)
4) Metric definition (Umeå, Örebro)
5) Company outreach (Start2)
6) Academic outreach (LMU, DFKI)
Macro-project WP4 and WP5
Metrics for ethics – Evaluating and Integrating Ethics, Legal and Societal (ELS) values in advanced systems
INESC-TEC, BSC, ING, VUB, Pisa, UMU, Luleå University, Northeastern Uni, CNR, Kaiserslautern, Airbus, Université Paris-Saclay, TU Delft, California State University
This macro-project aims to address the integration of Ethics, Legal, and Societal
(ELS) values in the design, development, and utilisation of novel AI approaches,
including generative AI, Large Language Models (LLMs), and hybrid human-AI
systems. The project underscores the necessity of a multidisciplinary approach,
combining formal, empirical, and prototyping methods to reflect AI’s diverse and
evolving field. The focus is on creating a comprehensive framework encapsulating
ELS values in AI, emphasising the need for ongoing adaptation and refinement as AI
technologies evolve.
The macro-project is structured as a growing combination of building blocks, each
contributing towards a holistic understanding and practical application of ELS
principles in AI. The primary focus is on developing metrics for ethics that provide
tangible, actionable insights into the ethical dimensions of AI systems.
Topic 1: Methods And Tools For Evaluating Ai Impact
This topic explores the development of methods and tools that evaluate the impact of
AI systems from a holistic and context-specific perspective. The goal is to measure
various ELS aspects, such as fairness, bias, privacy, robustness, and transparency,
in an integrated manner. A key deliverable is the design of an integrated prototype dashboard. This dashboard will feature an interface with real-time metrics, visualisations, and contextual information to monitor multiple dimensions of AI ethics. The dashboard will enable setting thresholds and conducting risk assessments, ensuring data security and scalability.
Topic 2: Critical Multidisciplinary Studies
This topic focuses on the critical and comprehensive understanding of the ELS
implications of evolving AI systems, especially Large Multimodal Models (LMMs) and hybrid human-AI models. It includes the evaluation of international governance
approaches from the perspective of suitable implementations and understanding the
environmental impact of AI infrastructure. Another significant aspect is the
exploration of approaches for developing sustainable AI practices, considering the
long-term implications of AI technologies on society and the environment. Last but
not least, a practical instantiation of these principles and methods on an AI-based
software product (Misinformation Detection) enables practitioners to maintain strong relevance and actionability of the recommendations.
This macro-project emphasises the importance of an adaptive, continually evolving
approach to integrating ELS values in AI. It acknowledges that as AI technologies
develop, so too must our methods and understanding of their ethical, legal, and
societal impacts. This project is poised to contribute significantly to the field of AI
ethics, offering practical tools and deep insights into the responsible development
and use of AI technologies.
The call for contributions for this macro-project had a deadline set for the end of
November 2023, including several key events and milestones. Firstly, there has been
an Integration Workshop for Topic 1 in April 2024 in Umeå. This workshop primarily
focused on developing a prototype dashboard integrating various metrics. The
culmination of the efforts in the macro-project was presented at a final workshop at
HHAI 2024, at the end of June at INESC TEC (Porto). During this meeting,
participants shared insights and challenges of their work on macro-project 4 –
“Metrics for Ethics”, and included a roundtable that brought together members from
the European project HumanE-AI and invited companies such as NOS and Feedzai.
The “Trustworthy Assessment for Companies” questionnaire, integrated into the
dashboard, was debated in the roundtable. From the discussion emerged
suggestions for improvements for the dashboard, to the questionnaire and inspiring
reflections for companies regarding access and development, the need for incentives
to motivate better practices, and the consequences for the brand if they do not adopt
an ethical approach in their system.
Dashboard For Ethical And Societal Principles
A main component of the WP5 macro-project is the dashboard for monitoring and
analysing ethical and societal principles relevant to practical applications. This
resulted in an integrated framework for evaluating the ethics of Artificial Intelligence
(AI) systems, with a specific focus on high-risk applications such as creditworthiness
assessment. The research addresses the critical ethical dimensions of AI, including
explainability, fairness, robustness, and transparency, which are essential for
ensuring that AI systems operate in a manner that is both effective and ethically
sound.
The ethical evaluation of AI systems is crucial, particularly in domains where
decisions can significantly impact individuals’ lives, such as finance. Explainability is
highlighted as a key aspect, referring to the ability of AI systems to make their
operations and outcomes understandable to humans. This is especially important in
financial applications, where AI-driven decisions can affect creditworthiness and
access to financial resources. The study also emphasises the importance of fairness
in AI, particularly in avoiding biases that could lead to discriminatory outcomes.
Another domain with potential individual and societal impact is misinformation, a topic on which we address possible information bias and consequences for the
trustworthiness of information sources using the RUWA dataset.
Methodology
The research uses the German Credit Dataset (GCD) as a case study to evaluate the
ethical dimensions of AI models. This dataset includes sensitive attributes like age
and gender, which can introduce bias into AI decision-making processes. To address
this, the AI Fairness 360 (AIF360) library is employed to assess and mitigate bias in
the data and models.
The study employs logistic regression and decision tree as the primary model and evaluates it using various fairness metrics such as Disparate Impact Ratio (DIR) and Smoothed Empirical Differential Fairness (SEDF). The findings reveal inherent biases in the dataset, particularly favouring older individuals, which highlights the need for careful bias management to ensure fair AI outcomes.
In addition to addressing bias and fairness issues, the research also emphasises the importance of explainability, robustness, and transparency in AI systems. To explain credit rejection, a questionnaire was first used to identify the most effective method. The questionnaire involved 34 participants, who were presented with three cases of credit rejection analysed using LIME, SHAP, and Counterfactual methods. The participants were asked which method produced the most understandable results and which provided the most reliable explanation. Based on those results, clustering was then applied to all rejection cases to identify the feature groups that contribute most to the rejection.
For robustness, a metric was developed to reward consistency between performance under adversarial attacks and performance with real data. Transparency, being a critical ethical requirement for trustworthy AI, has gained increased attention since the approval of the AI Act. The approach to transparency focuses on two key areas: datasets and AI systems, as outlined in the work of Hupont et al.
Principles Considered In The Study
– Explainability: The dashboard includes tools that provide post-hoc explanations
for AI decisions and visualisation and interpretation aids. These features help
users understand AI-driven outcomes, enhancing transparency and trust in the
system. The choice of the post-hoc method for explanations was based on the
results of a questionnaire, as described in the methodology.
– Fairness: The dashboard integrates fairness metrics, such as Disparate Impact
Ratio (DIR) and Smoothed Empirical Differential Fairness (SEDF). These tools assess and mitigate biases within the AI system, ensuring that it does not
disproportionately disadvantage any group, particularly those based on sensitive
attributes like age and gender.
– Robustness: The dashboard offers frameworks and tools to simulate adversarial
attacks and assess the AI system’s robustness. It includes features for comparative analysis to evaluate how well different models perform under stress or manipulation, ensuring consistent and reliable performance.
– Transparency: The dashboard includes transparency self-assessment tools that
allow users to evaluate the transparency of their AI systems and datasets. These
mechanisms ensure that the inner workings of AI systems are accessible and
understandable to stakeholders, supporting accountability and ethical standards.
– Trustworthiness: The dashboard provides comprehensive assessments
combining accuracy, robustness, and fairness evaluations. It includes continuous
monitoring tools to maintain the trustworthiness of AI systems over time, ensuring
that they operate reliably and ethically.
– Legal Compliance: The dashboard incorporates features that align AI metrics
with the requirements of the EU AI Act and other relevant legal standards. It
includes tools for documenting and justifying design choices supporting legal
compliance in developing and deploying AI systems.
Dashboard
To support the practical application of these ethical considerations, the study
introduces the dashboard “Metrics for Responsible AI Principles.” This dashboard
integrates various methodologies and tools to assess the ethical dimensions of AI
systems. It serves as a practical tool for users to understand the potential and
limitations of using metrics to evaluate compliance with ethical and legal standards,
particularly in the context of the AI Act.
This comprehensive approach to evaluating AI ethics in high-risk applications
provides valuable insights into the challenges and solutions for developing AI
systems that are fair, transparent, and robust.
Figure 1 illustrates the first page of the dashboard for the German Credit dataset
study. On the left side, users can select options related to the model, sensitive
attributes, and bias metrics. Results for Bias, Fairness, Explainability, and
Robustness are displayed immediately. Additional information on each dimension,
including legal aspects where applicable, is also provided.
Conclusion
The study provides a detailed analysis of the performance and robustness of various
AI models. It is observed that there is often a trade-off between accuracy and
robustness, with some models losing significant accuracy when subjected to
adversarial conditions. The research highlights the need for a balanced approach
that ensures high performance and strong resilience against adversarial attacks.
Legal implications are also a significant focus of the study, particularly in the context
of the EU AI Act, which sets rigorous standards for high-risk AI systems.
The research explores how the metrics used in the evaluation framework can help ensure compliance with these legal requirements, including transparency, fairness, and robustness. While metrics alone do not guarantee legal compliance, they are
essential tools for measuring and demonstrating adherence to legal standards.
The study concludes by advocating for using multiple metrics to assess AI systems
from various ethical perspectives. The research emphasises that developing ethical
AI systems is a complex process that requires careful consideration of trade-offs
between different ethical principles. The findings contribute to the broader
understanding of how to design and implement AI systems that are effective and
aligned with ethical and legal standards.
Macro-project WP2 and WP3
Co-evolution of large-scale networked and collaborative human-AI ecosystems
Università di Pisa, University Warsaw, CNR, Università di Trento, SNS, CNR, UNIPI, Victoria University Wellington, IIT-CNR
Co-evolution of large-scale networked and collaborative human-AI ecosystems
The objective of this macro-project is
to study the co-evolution of humans and models, especially LLMs/LGMs, in large-scale humane-AI ecosystems
to understand and improve task allocation/role distribution
to model emergent behavior and network effects in the interaction between humans and generative AI models
The macro-project includes 3 main topics:
1 HABA-MABA-inspired collaboration and task delegation in human-AI ecosystems.
This topic is divided in 2 subtopic:
a. LLMs and Cognition: From Representation Biases to a Theory of the Mind.
While LLM foundation models can handle language and imagery at unprecedented breadth, recent research has underlined that they may exhibit many cognitive biases humans possess. Therefore, widespread interaction with biased LLMs may reinforce harmful stereotypes that we should aim to eradicate. This risk is augmented by the fact that most present-day LLM users come from various backgrounds and are not trained to be critical consumers of LLM-produced content, in contrast to earlier human-automation interaction. Therefore, to facilitate responsible human-AI interaction that mitigates the risk of exacerbating harmful stereotypes, it is ever more critical to understand how cognitive biases emerge from the cognitive architecture of LLMs.
To such an extent, our work focused on two directions: i) defining a model to proxy LLM cognition patterns using cognitive network science [1] theory, and ii) studying the emerging patterns that characterize LLM agents’ interactions while simulating an opinion dynamic process. Both analyses focused on comparing LLM cognitive model instances and opinion dynamics behaviors with the ones known to approximate human beings.
In [2], we focused on constructing and analyzing a free association network built on top of a psychological experiment involving heterogeneous LLM models. Word associations have been extensively used in psychology to study the rich structure of human conceptual knowledge. The absence of large-scale LLM-generated word association norms comparable with human-generated norms limits the comparative analyses that can be conducted. To overcome this, we create LLM-generated word association norms modeled after the Small World of Words (SWOW) human-generated word association norms consisting of over 12,000 cue words. We prompt the language models with the same cues and participant profiles as those in the SWOW human-generated norms [3], and we conduct comparative analyses between humans and LLMs that explore differences in response variability, biases, concreteness effects, and network properties. Our exploration provides insights into how LLM-generated word associations can be used to investigate similarities and differences in how humans and LLMs process information. Human-generated responses are much richer and more varied than LLM ones. We also observe stronger gender biases and weaker concreteness effects in the LLM-generated norms compared to the human-generated norms.
In [4], conversely, we focused on implementing an opinion dynamic simulation by exploiting networked LLM-enhanced agents to replicate human-like discussions. Assuming a Deffuant-like [5] bounded opinion model, our analyses unveiled the tendency – transversal to heterogeneous LLM models – to establish convergence toward the positive end of the opinion spectrum. Such a result – explicable by the assertiveness and guardrails that are known to characterize LLM models – depicts a noteworthy difference from what is observed and often modeled when considering human-centered opinion diffusion phenomena. Moreover, an analysis of the texts generated by the LLM agents during the unfolding discussion – each aimed to convince the interlocutor to change its opinion – unveiled the presence of a relevant adoption of logical fallacies (particularly credibility, relevance, and generalization ones) and assessed their effectiveness in producing subtle shifts of agents’ opinions.
[1] Stella, M., Citraro, S., Rossetti, G., Marinazzo, D., Kenett, Y. N., & Vitevitch, M. S. (2022). Cognitive modeling with multilayer networks: Insights, advancements and future challenges. arXiv preprint arXiv:2210.00500.
[2] Abramski, K., Lavorati, C., Rossetti, G., & Stella, M. (2024). LLM-Generated Word Association Norms. In HHAI 2024: Hybrid Human AI Systems for the Social Good (pp. 3-12). IOS Press.
[3] S. De Deyne, D. J. Navarro, A. Perfors, M. Brysbaert, and G. Storms, “The “small world of words” English word association norms for over 12,000 cue words,” Behavior research methods, vol. 51, pp. 987–1006, 2019.
[4] Cau, E., Morini, V., Rossetti, G. (2024). LLM Opinion Dynamics: Polarization Trends, Models’ Agreeableness, and Logical Fallacies. Under submission.
[5] Deffuant, G., Neau, D., Amblard, F., & Weisbuch, G. (2000). Mixing beliefs among interacting agents. Advances in Complex Systems, 3(01n04), 87-98.
b. Task allocation and role distribution in humane-AI ecosystems.
We want to examine the evolving landscape of human-AI ecosystems, focusing on task allocation and role distribution within Human-AI Teams (HATs). Theoretical basis, empirical, and computer simulation-based results are presented. Building on recent empirical findings, it is highlighted that HATs have surpassed human-only teams in complex tasks such as crisis management resource allocation. The chapter explores the general tendencies in task allocations and role assignments and the dynamics of task allocation.
Trust is the main factor affecting task allocation. Trust, among others, depends on explainability in AI agents, suggesting that the ability to articulate reasoning, environmental understanding, and plans enhances trust in AI partners. Studies have shown that while likeability (warmth) influences the selection of team members, AI agents’ perceived competence plays a more significant role in their acceptance and integration into human teams. Shared mental models and effective communication strategies are vital for fostering receptivity and trust in AI teammates. Proactive communication by AI agents has been identified as particularly beneficial for team coordination and situation awareness. Task characteristics also influence their allocations to humans vs. AI agents.
Special attention is paid to the delegation of information processing in HATs. The dynamically adjusted trust provides the basis for dynamic task allocation. Effectively, HAT becomes a self-organizing, optimizing distributed information processing system.
The chapter ends with a discussion of synthesizing research findings to propose strategies for optimizing collaboration within HATs, underscoring the importance of competence, communication, shared understanding, and explainability in the design and implementation of AI agents in team-based settings.
Output: Segram a Python package for narrative analysis
The purpose of the package is to provide tools for automated narrative analysis of text data focused on extracting information on basic building blocks of narratives – agents (both active and passive), actions, events, or relations between agents and actions (e.g. determining subjects and objects of actions), as well as descriptions of actors, actions and events.
The proposed framework is aimed at language understanding and information extraction, as opposed to language generation. Namely, the role of the package is to organize narrative information in convenient data structures allowing effective querying and deriving of various statistical descriptions. Crucially, thanks to its semi-transparent nature, the produced output should be easy to validate for human users. This should facilitate the development of shared representations (corresponding to the WP1 and WP2 motivated goal: “Establishing Common Ground for Collaboration with AI Systems”) of narratives, understandable for both humans and machines, that are at the same time trustworthy (by being easy to validate for humans). This is arguably a desirable feature, for instance, compared to increasingly powerful but hard-to-trust large language models. In particular, the package should be useful for facilitating and informing human-driven analyses of text data.
The package is available at Python Package Index (PyPI) and comes with a full-fledged documentation website at ReadTheDocs platform. The source code is distributed under a permissive MIT license and available at Github.
2. Medium- and long-term social Impact of Large Language Models and Gen AI.
Large Generative Models (LGMs), like Large Language Models (LLMs), have gained significant influence across diverse domains. However, the growing awareness of potential biases and unfairness in their outcomes raises concerns about the risk of reducing content diversity. This chapter delves into recent research examining the repercussions of recurrent training on AI-generated data, focusing on image generation and LLMs. A specific concern explored is model collapse, a degenerative process affecting generations of learned generative models. This phenomenon results in generated data contaminating the training set for subsequent model generations. Additionally, we explore a proposed simulation framework aimed at investigating the impact of LGMs on critical aspects such as language standardization and diversity.
We developed a comprehensive simulation framework to investigate the dynamics of a self-consuming loop, a process in which a large language model (LLM) undergoes fine-tuning across multiple generations using content it has generated itself. Our framework is built upon LLama, an open-source LLM provided by Meta.
To evaluate the effects of this self-consuming loop, we conducted a series of experiments focusing on the generation of text related to Wikipedia articles. The experiments yielded two significant findings. First, we observed that over successive generations, LLMs subjected to the self-consuming loop exhibit a phenomenon known as “model collapse.” This collapse manifests as a reduction in the diversity of the generated text over time, a result that aligns with recent research findings in the field.
Second, our analysis revealed substantial changes in the linguistic structure of the generated text as the self-consuming loop progresses. Specifically, we noted significant alterations in the frequency of nouns, adjectives, and verbs, alongside shifts in the overall distribution of word frequencies when compared to authentic text. These changes suggest that the loop not only affects content diversity but also distorts the underlying linguistic patterns of the text.
These findings underscore the critical importance of carefully curating and selecting content for fine-tuning LLMs. Neglecting this aspect could lead to undesirable degradation in both the diversity and structural integrity of the generated text, ultimately impacting the model’s performance and reliability.
3 Network effects of human-AI interactions in distributed collaborative learning.
Human-AI interactions are becoming increasingly prevalent in distributed settings characterized by a network of nodes, comprising both humans and AI agents. The ongoing activities in this area are focusing on understanding the implications of these interactions, particularly considering that AI agents can be generative agents (LGMs).
In the realm of decentralized learning, where nodes cooperate to learn a task without the help of a central server, significant progress has been made. Currently, some nodes generate their local data using generative AI (LGMs). This raises critical research questions:
How robust is a decentralized learning process to accidental or intentional malicious behavior?
Can the network structure protect the overall training process from such unwanted behavior? For instance, if the most central node learns from corrupted data—such as samples of the digit ‘9’ that look like ‘4’s—how does this affect the learning process? Preliminary findings indicate that decentralized learning is generally robust to single-point malicious data injection, unless the data distribution is pathologically skewed.
In the domain of opinion dynamics, models have shown that opinion polarization can be driven by “negative” algorithmic bias. However, ongoing experimental interventional studies provide evidence that “positive” algorithmic bias may backfire. To explore this further, an experiment is underway to study the effects of positive, negative, and absent algorithmic bias (AB) on opinion polarization, as well as the impact of AI units (bots).
We evaluate the initial opinion distribution through a survey, having participants write texts supporting their views, and exposing them to texts from like-minded individuals (negative AB), opposing individuals (positive AB), or random individuals (absent AB).
After incentivized interactions, a follow-up survey measures changes in opinion distribution and individual trajectories.
AI bots with predetermined opinions are introduced to assess their effectiveness in influencing humans and perturbing opinion distribution.
Initial results are promising and indicate nuanced interactions between algorithmic bias and human opinion formation.
Furthermore, the macro-project is actively investigating the generation of content to maximize propagation effects on social networks.
This involves studying user behavior models (e.g., BCM, FJ), large generative models (e.g., Llama, Gemma, Mistral), and discussion topics (e.g., vaccines, Obamacare) to understand their impacts on user behavior and opinion dynamics.
Early findings suggest that content generated to maximize propagation can significantly influence user behavior and opinion distribution, underscoring the power of generative models in shaping online discourse.
Code repository: cannot be shared at the moment due to the anonymity requirement of a submitted paper. It will be shared upon notification
Macro-project WP3 Human AI Collaboration and Interaction
Collaborative AI Arena
Aalto University, IST University of Lisbon, DFKI, fortiss, Eötvös Loránd University (ELTE), LMU Munich, Start2 Group, Örebro University, Umeå University, Fraunhofer IAIS
Original Aims
Overview: This macroproject converged efforts in WPs 1, 2, and 3 in HumaneAI-Net for Y3. The high-level goal was to advance the theory and practice of interacting with LWMs (large whatever models, including language models, multimodal foundation models etc). The macroproject aimed to create a joint benchmark and first results to push research on capabilities of LWMs in interaction with humans.
Background: Common ground between humans and AI systems has been identified as a key challenge for the next generation of human AI collaboration. Different HumanE AI Net WPs have been running MPs on different aspects of this problem: from the HCI angle of interactive grounding in WP3, through interplay of perception and common ground in WP2 to learning with and from narratives in WP1. However, these efforts have not yet converged into something that the rest of the field could build on and that would touch on efforts at developing LWMs. The challenge is not only technical, but agreement should be established on how to define the joint problem.
Opportunity: Given that LWMs are essentially models of linguistic and other media captured of reality, they propose a paradigm-change in all three areas. LWMs also have potential to be a unifying factor for the different streams of work within the project. They offer unprecedented ability (e.g., in-context learning) for addressing issues related to grounding, shared perception, and coadaptation. Yet, their abilities have been found to be subtly unhuman-like in tasks like reasoning. The question stands what their limits are in collaborative tasks.
Objective: This macroproject aimed to create a common web-based benchmark to support multidisciplinary efforts at studying collaboration with LWMs. The benchmark was designed to involve tasks with two or more agents (human and AI) in a joint activity, with progress which requires grounding, perception, and coadaptation. The macroproject was designed to contain:
A concrete definition of the problem implemented as an interactive game-like setup that allows running both AI and human agents
A software API that allows training AI agents in a standard way
Theories of interaction and empirical studies with human participants
Interaction techniques, assistive AI and other facilitatory techniques to help humans in this task
First results from state-of-the-art methods
A community-crossing effort to draw attention to the benchmark
Scientific Results/Insights Summary
The macroproject produced the first version of the Collaborative AI Arena, which is available here https://rse-test1.cs.aalto.fi/.
One of the tasks in the arena was designed following the classic tangram puzzle as inspiration. The core concept was to develop a collaborative creation task that could progressively increase in challenge for both the AI and the human partners while keeping the same core interaction and problem-solving framework. The task consists of placing, in turns, a set of tangram pieces in a playfield with the goal of achieving a designated figure. The goal figure can be fixed, like in the case of the classic tangram, turning the task into a puzzle like collaborative problem-solving, or open, defined a word, a sentence, or a set of restrictions, enhancing the creative nature of the task. Between and during turns players (human and AI) can chat about the task, discussing the goal and the process to achieve it. The task can be changed also by giving different pieces to different players and even different sub-goals to create a mixed motive and hidden profile flavour to the collaboration. We believe that this consists of a good framing for challenges for LWM models as it raises needs to tackle conversational, spatial reasoning and problem-solving skills together with reaching common ground and joint creativity. In the MP we defined the first version of the framework in the form of a web game that engages two players (AI and human) with the collaborative task of creating a tangram figure defined by a word (e.g, house), whose form needs to be discussed, agreed and built jointly during the interaction. We also developed the first version of the AI model, based on ChatGPT, that plays the task with a human. It is able to talk about the task and play towards the creation of a figure that is defined by the initial word and constrained by the pieces placed on the playfield. A user study is planned for the near future to assess the collaborative capabilities of the AI model and the acceptance and subjective experience of the users interacting with it.
Emerging, context constrained non-verbal communication between human and machine (avatar, robot) partners in diverse environments is a highly unexplored field. Recent advances in LWM techniques with extensive prompt engineering and visual information can be sufficient as demonstrated at HHAI 2024
Innovation and Industry Cooperation Potential
Non-verbal communication has potential applications in both industrial and medical domains. It can complement verbal communication for disambiguation, making it history dependent and pragmatic. It may be necessary in noisy environments, e.g., in firefighting and disasters for controlling and collaborating with robots and drones. In the medical field it can be used for diagnostic and therapeutic purposes, e.g., in the case of autism and language impairments.
Informing Startups and SMEs about the Tangram Project (Start2 Group)
Examples of measures to disseminate and trigger involvement of industrial organizations included:
Outreach from Start2 Group to startups as well as SMEs such as Parloa, Lengoo GmbH, Anticipate
The current status of the tangram project was presented by Antti Oulasvirta. A discussion about industry relevance was held
Tangible Outcomes
Publications
Rui Prada, Astrid C Homan, Gerben A van Kleef: “Towards Sustainable Human-Agent Teams: A Framework for Understanding Human-Agent Team Dynamics” in the proceedings of AAMAS’2024 – the 23rd International Conference on Autonomous Agents and Multiagent Systems – Blue Sky Ideas Track, pp. 2696-2700, May 6–10, 2024, Auckland, New Zealand. IFAAMAS.
Passant Elagroudy, Jie Li, Kaisa Väänänen, Paul Lukowicz, Hiroshi Ishii, Wendy E Mackay, Elizabeth F Churchill, Anicia Peters, Antti Oulasvirta, Rui Prada, Alexandra Diening, Giulia Barbareschi, Agnes Gruenerbl, Midori Kawaguchi, Abdallah El Ali, Fiona Draxler, Robin Welsch, Albrecht Schmidt: “Transforming HCI Research Cycles using Generative AI and “Large Whatever Models”(LWMs)” in the proceedings of CHI’2024 – Conference on Human Factors in Computing Systems – Extended Abstracts, pp. 1-5, May 11-16, 2024, Honolulu, Hawaiʻi. ACM.
Inês Lobo, Janin Koch, Jennifer Renoux, Inês Batina, Rui Prada: “When Should I Lead or Follow: Understanding Initiative Levels in Human-AI Collaborative Gameplay” in proceedings of DIS’2024 – ACM Designing Interactive Systems Conference, pp 2037-2056, July 1-5, 2024, Copenhagen, Denmark, ACM.
Helena Lindgren, Vera C. Kaelin, Ann-Margreth Ljusbäck, Maitreyee Tewari, Michele Persiani, and Ingeborg Nilsson. 2024. To Adapt or Not to Adapt? Older Adults Enacting Agency in Dialogues with an Unknowledgeable Digital Agent. In Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization (UMAP ’24), July 1–4, 2024, Cagliari, Italy. ACM, New York, NY, USA https://doi.org/10.1145/3627043.3659562
Vera C. Kaelin, Maitreyee Tewari, Sara Benouar, and Helena Lindgren. Developing Teamwork: Transitioning between stages in human-agent collaboration. To appear in Frontiers in Computer Science
Handbook chapters
János Adrián Gulyás, Miklós Máté Badó, Kristian Fenech, András Lőrincz
3 page demo paper
Keep Gesturing: A Game for Pragmatic Communication, Gulyás et al.
On July 29.- 30th, the “AI & I Hackathon” brought together developers, AI enthusiasts, researchers, and ambitious students for a 1.5-day deep dive into co-creativity between humans and Large Language Models (LLMs) such as OpenAI’s GPT and Google’s Gemini. Held in collaboration with fortiss, Ludwig Maximilians University, Start2 Group, and dfki, the hackathon aimed to leverage LLMs as creative collaborators rather than just passive tools, challenging participants to build systems that support human-LLM co-creation.
The two-day event was held in the fortiss office in Munich.
With 28 participants, the event welcomed a diverse international presence, with attendees from countries including the Netherlands, Sweden, Portugal, Italy, Poland, and more. Travel grants helped cover costs for those attending from outside Munich. Participants formed teams of up to four people and competed for top prizes, with 1st, 2nd, and 3rd place awards of €800, €400, and €200, respectively + cloud credits from OVH Cloud. Experts were available throughout the hackathon for consultation.
Highlights
Collaborative Systems for Co-Creation: Teams worked to develop applications that used LLMs as interactive, idea-generating partners in content creation. Using prompts and feedback mechanisms, the projects sought to showcase how LLMs can actively contribute to generating more engaging and innovative texts.
Inspiration Through Expert-Led Keynotes: Expert-led keynotes provided thought-provoking perspectives to set the stage. Dr. Joongi Shin (Aalto University) presented “Can LLMs Make People More Creative?” and Prof. Dr. Albrecht Schmidt (LMU) discussed “Symbiotic Creativity: The Fusion of Human Ingenuity and Machine Intelligence.” The agenda continued with Thomas Pfau (Aalto University) providing an overview of the technology stack and goals, equipping participants with the tools to start hacking.
Event Communication
To promote the event, the organizing team utilized LinkedIn channels, newsletters, and a dedicated event-website to reach a broader audience. Here are a few examples:
Hackathon Task: Developing a Prototype for Collaborative Creativity with LLMs
For this hackathon, participants were tasked with creating a prototype system designed to enhance and evaluate the collaborative creative potential between users and Large Language Models (LLMs). The objective was to build tools that facilitate seamless, innovative co-creation in text-based projects while exploring and benchmarking human-AI creativity.
Available Resources
Participants were provided with access to advanced AI models, including GPT-4, as well as various tools and endpoints for building their prototypes. They could also use a pre-designed poetry generation example to kickstart their project. Teams were encouraged to experiment with different levels of difficulty, from straightforward prompt reformulations to more complex features such as automated prompt engineering, API integrations, and even building entirely independent applications.
Final Deliverables
Each team presented a 10-minute live demo to a jury board, showcasing their prototype, followed by a Q&A. Additionally. A demo link was submitted to allow the jury to explore the prototype independently and assess its functionality and creativity-promoting capabilities.
Feedback from participants
A small survey was sent out after the Hackathon to get immediate feedback and identify areas of improvement. Even though only 8 participants took part in the survey, the results provide valuable feedback for the organizational team. The following summarizes the main findings. (Detailed survey results can be accessed through this link https://forms.gle/9Exe6RUP4njeGxCf8.)
Overall Satisfaction: Most respondents (75%) were highly satisfied with the event’s organization and technical support.
Positive Aspects: Attendees enjoyed the engaging tasks, inspiring keynotes, the collaborative atmosphere, and the focus on pre-built components, which allowed more time for creativity.
Areas for Improvement: Participants highlighted the lack of feedback on project ranking and unclear judging criteria. Some felt that basic AI knowledge should have been a prerequisite, and a few noted discomfort with the provided workspace and scheduling.
Skills Gained: Participants acquired various skills, including front-end development (React), full-stack development, prompt engineering, and multi-agent LLM-based architectures, along with teamwork and project collaboration experience.
Additional Feedback: The participants appreciated the event organization and thanked the organizers for providing support for travel and accommodation, which enabled broader participation.
These insights reflect a well-received event with some areas for logistical and evaluative refinement.
Key Takeaways
Future of LLMs in Collaborative Creativity: The hackathon showcased a new approach to AI, positioning LLMs as creative partners that respond to user input and inspire fresh ideas.
Practical Applications for Co-Creative Projects: Through prompts and interactive discussions, participants explored innovative ways to engage LLMs in generating creative text in a collaborative way, driving new possibilities for human-LLM partnerships.
Conclusion
The AI and I Hackathon provided an invaluable experience for participants to explore, learn, and innovate within the rapidly evolving space of human-AI collaboration. By leveraging the expertise of the partners fortiss, LMU, and Start2 Group and dfki, this hackathon enabled developers, AI enthusiasts, and students to push the boundaries of co-creativity, setting a new standard for how LLMs can transform creative industries.
LLM4SME Workshop on July 30th 2024 in Munich
Event Overview
On July 30, 2024, fortiss hosted the workshop titled “LLM4SME: LLMs and Their Influence on Internal Business Processes” at their office in Highlight Tower, Munich. This event was organized in collaboration with Start2 Group, LMU Munich, and Bayern Innovativ, aiming to familiarize small and medium-sized enterprises (SMEs) with the potential of Large Language Models (LLMs) to optimize their business processes and improve overall efficiency. The workshop attracted representatives from SMEs looking to integrate LLM-driven innovation in areas such as customer service, content generation, and data analysis. 16 persons have been registering for the Workshop.
Event Objectives
The “LLM4SME” workshop provided insights into the rapidly evolving field of LLMs and their applications for SMEs. Participants were introduced to real-world examples and practical approaches for using LLMs to automate language-based tasks and meet the growing demands for personalized customer interactions. The event encouraged SMEs to explore how LLMs could improve their efficiency, stimulate innovation, and provide a competitive edge.
Event Communication
For promoting the event, the organization-team leveraged LinkedIn Channels, Newsletters and a dedicated event-website to attract a wider audience. In the following, there are some examples:
Inspiring Keynotes from Experts: Holger Pfeifer from fortiss and Dr. Sabine Wiesmüller, Director AI (Europe) at Start2, welcomed participants and set the stage for the workshop. Dr. Wiesmüller highlighted how LLMs can be leveraged to streamline tasks like data analysis and customer communication, sharing practical insights into existing AI tools that automate these processes.The workshop day concluded with an inspiring keynote by Prof. Dr. Paul Lukowicz from the German Research Center for Artificial Intelligence (DFKI), who discussed the potential of human-AI collaboration and its significance for SMEs.
Interactive Group Workshops:
The first workshop “Process Optimization in Practice: Opportunities through LLMs for SMEs” was led by Zhiwei Han from fortiss and provided hands-on examples of how SMEs could use LLMs to enhance their processes. Participants collaborated in small groups, exchanging ideas and exploring the practical implications of implementing LLMs for optimization.
The second workshop “Effective Prompts for SMEs: Hands-On Techniques for Using LLMs” was guided by Thomas Weber from LMU. The session focussed on crafting effective prompts for LLMs, a key technique for maximizing the capabilities of these tools. Participants were encouraged to bring their laptops and actively experiment with prompt engineering techniques.
Key Outcomes
The workshop offered SMEs invaluable guidance on integrating LLMs into their operations. Interactive discussions and practical sessions provided participants with a solid foundation in LLM applications, showcasing how these AI models can drive efficiency, foster innovation, and maintain a competitive edge.
Survey: Large language Models (LLMs) for Small and Medium Enterprises (SMEs)
The study “LLMs and Their Influence on Internal Business Processes”, initiated on April 25th, 2024 and ongoing as of July 10, 2024, is a collaborative project among Ludwig-Maximilians-Universität München, fortiss GmbH, Start2 Group, and Bayern Innovativ. The survey aims to provide foundational insights for further research into AI adoption within SMEs. The survey results also informed the conceptual planning of the July 30, 2024, workshop titled “LLM4SME: LLMs and Their Influence on Internal Business Processes.” Hosted by fortiss at their Highlight Tower office in Munich, the event was organized in collaboration with Start2 Group, LMU Munich, and Bayern Innovativ (more details will be published in the Handbook, see D9.6).
The distribution of the survey was leveraged by SME Networks such as Bayern Innovativ, fortiss, KI-Hub as well as from MUK.IT and Start2 Group.
Survey Concept
This survey focuses on understanding how companies perceive and utilize AI and LLM technologies, particularly in the context of their adoption, perceived challenges, and resource requirements.
Structure and Types of Questions:
The survey includes a mix of question types:
Multiple-choice questions – Often used for direct information gathering, such as company size, frequency of AI use, or specific departmental adoption of LLMs.
Likert scale questions – Frequently employed to gauge attitudes, such as the importance of AI technologies to the company, perceived adoption levels, and anticipated benefits and risks across different business areas.
Open-ended questions – These invite respondents to provide more nuanced insights on unique challenges faced, anticipated future use cases for AI, or desired areas of information to support decision-making.
Topics Covered:
The topics are organized to assess several critical areas:
Current and Potential AI Use: This includes questions on the importance of AI to the company’s objectives, the extent of its current implementation, and specific departments where AI tools, particularly LLMs, are used or could be imagined in the future.
Challenges and Barriers: The survey dives into the challenges companies face in adopting AI technologies, like skill shortages, high costs, data privacy concerns, technological complexity, and cultural resistance.
Resource Needs and Plans: Respondents are asked about missing resources necessary for further AI adoption, such as specialized talent or financial support, and their plans to allocate resources (e.g., hiring, training, or outsourcing).
Potential and Risks of LLMs: Several questions focus on evaluating LLMs’ potential benefits in areas like customer acquisition or business innovation, as well as risks such as data security or ethical concerns.
Context and Introductions:
Some questions are prefaced with short introductions, especially those involving technical terms or newer concepts (e.g., “LLMs” or “AI-driven business applications”). This approach provides clarity, helping respondents better understand the intent behind each question.
Additional Sections:
In addition to core questions, the survey concludes with consent inquiries, seeking permission to contact respondents for follow-up surveys, share results, and process personal data according to the company’s privacy policy. These elements ensure informed participation and compliance with data protection practices.
Overall, the survey is designed to capture a comprehensive view of AI integration in companies, focusing not only on current practices but also on anticipated needs, barriers, and strategic outlook. This survey template is highly adaptable and can be used across various ecosystems or geographic regions to evaluate perceptions of LLMs and AI technologies. By tailoring the questions to fit specific regional, industry, or sectoral contexts, stakeholders can gain nuanced insights into AI adoption, resource needs, and potential challenges unique to their environment. We are happy to offer consultation services to support the customization of this template, helping organizations effectively capture and analyze AI perceptions within their specific ecosystem or geographic area.
Main findings
The following presents main findings from the survey conducted among small and medium-sized enterprises (SMEs) regarding the use of large language models (LLMs) in internal business processes. A detailed analysis can be accessed here.
Importance of AI in SMEs: Among 54 respondents, 63% of SMEs consider AI important (36%) or very important (27%), though 75% rate their adoption level as low (44%) or average (31%).
Challenges in AI Adoption: Key challenges reported by SMEs (54 responses) include privacy concerns (11%), technology complexity (10%), and integration issues with existing systems (10%).
Usage of LLMs in Internal Processes: Of 53 responses, 41% of SMEs use LLMs frequently (26%) or very often (15%), with the main applications in marketing (22%), IT management (19%), and customer relations (11%).
Impact of LLMs on Efficiency: Efficiency gains from LLMs are particularly noted in Information and Technology Management, followed by marketing and CRM. 38% of SMEs (40 responses) are planning to allocate resources towards training existing staff rather than hire additional staff (16%) in order to utilize LLMs more efficiently.
Resource Constraints for Frequent LLM Use: Privacy concerns (12%) and skilled worker shortages (12%) were the most reported limitations among 41 responses, followed closely by limited R&D resources (10%), integration challenges (10%) and poor data quality (10%).
Perception of LLM Potential and Risks: Among 33 respondents, a significant portion views LLMs as having high business potential, though 50% express concerns about data protection and competitive pressures.
Information Needs for Informed Decision-Making: A strong demand for information exists, especially regarding technical background, with 60% of SMEs (33 responses) seeking data protection guidance and 50% requesting practical business use cases.
Recommendations
Based on the survey results, here are targeted recommendations to support SMEs in effectively adopting and integrating LLMs into their internal processes:
Enhance Training and Skills Development: SMEs could benefit from dedicated training programs to improve employee skills in AI and LLMs. Given that 38% of companies plan to invest in training, tailored workshops and accessible online courses could help bridge skill gaps, making LLM technologies more approachable and enhancing internal capabilities.
Address Privacy and Data Security Concerns: With privacy concerns cited by 11% of respondents as a key obstacle, SMEs should implement robust data protection protocols and provide guidance on regulatory compliance. Partnerships with privacy-focused AI consultancies and training on data governance can equip SMEs to safeguard sensitive information while using LLMs.
Simplify Technology Integration: Considering that integration challenges were highlighted by 10% of respondents, SMEs would benefit from integration tools and resources to streamline the adoption of LLMs into existing workflows. Vendors could focus on providing user-friendly API interfaces and compatibility features to ease this transition.
Provide Industry-Specific Case Studies and Use Cases: With 50% of SMEs expressing a need for practical business use cases, industry-specific examples and case studies would help SMEs visualize and plan LLM applications effectively. By observing how similar businesses leverage LLMs, SMEs can better assess potential applications and implementation strategies.
Support SMEs with Funding for R&D and Resource Allocation: To address high costs and limited R&D resources, external support, such as grants or low-interest loans, could enable SMEs to experiment with LLMs at lower financial risk. Policymakers and industry associations could collaborate to provide such funding, fostering AI innovation within the SME sector.
Promote Ethical and Transparent AI Practices: As LLM transparency and ethical AI practices are growing concerns, establishing clear guidelines around model transparency, accountability, and ethical considerations will help SMEs make informed choices and build trust with stakeholders. Collaborative efforts between AI providers and SMEs can focus on defining and adhering to responsible AI use.
These recommendations aim to address the specific challenges SMEs face and provide actionable steps to foster successful, scalable, and secure LLM integration.
Collaboration Outside the Consortium (in particular with AIoD etc)
The macroproject collaborated with AI on Demand platform, specifically by integrating some of the AI-Arena models into the AI-Builder (Fig. A) which provides the infrastructure and framework to execute the arena tasks (Fig. b).