Original Aims

Overview: This macroproject converged efforts in WPs 1, 2, and 3 in HumaneAI-Net for Y3. The high-level goal was to advance the theory and practice of interacting with LWMs (large whatever models, including language models, multimodal foundation models etc). The macroproject aimed to create a joint benchmark and first results to push research on capabilities of LWMs in interaction with humans.

Background: Common ground between humans and AI systems has been identified as a key challenge for the next generation of human AI collaboration. Different HumanE AI Net WPs have been running MPs on different aspects of this problem: from the HCI angle of interactive grounding in WP3, through interplay of perception and common ground in WP2 to learning with and from narratives in WP1. However, these efforts have not yet converged into something that the rest of the field could build on and that would touch on efforts at developing LWMs. The challenge is not only technical, but agreement should be established on how to define the joint problem.

Opportunity: Given that LWMs are essentially models of linguistic and other media captured of reality, they propose a paradigm-change in all three areas. LWMs also have potential to be a unifying factor for the different streams of work within the project. They offer unprecedented ability (e.g., in-context learning) for addressing issues related to grounding, shared perception, and coadaptation. Yet, their abilities have been found to be subtly unhuman-like in tasks like reasoning. The question stands what their limits are in collaborative tasks.

Objective: This macroproject aimed to create a common web-based benchmark to support multidisciplinary efforts at studying collaboration with LWMs. The benchmark was designed to involve tasks with two or more agents (human and AI) in a joint activity, with progress which requires grounding, perception, and coadaptation. The macroproject was designed to contain:

  • A concrete definition of the problem implemented as an interactive game-like setup that allows running both AI and human agents
  • A software API that allows training AI agents in a standard way
  • Theories of interaction and empirical studies with human participants
  • Interaction techniques, assistive AI and other facilitatory techniques to help humans in this task
  • First results from state-of-the-art methods
  • A community-crossing effort to draw attention to the benchmark

Scientific Results/Insights Summary

The macroproject produced the first version of the Collaborative AI Arena, which is available here https://rse-test1.cs.aalto.fi/.

One of the tasks in the arena was designed following the classic tangram puzzle as inspiration. The core concept was to develop a collaborative creation task that could progressively increase in challenge for both the AI and the human partners while keeping the same core interaction and problem-solving framework. The task consists of placing, in turns, a set of tangram pieces in a playfield with the goal of achieving a designated figure. The goal figure can be fixed, like in the case of the classic tangram, turning the task into a puzzle like collaborative problem-solving, or open, defined a word, a sentence, or a set of restrictions, enhancing the creative nature of the task. Between and during turns players (human and AI) can chat about the task, discussing the goal and the process to achieve it. The task can be changed also by giving different pieces to different players and even different sub-goals to create a mixed motive and hidden profile flavour to the collaboration. We believe that this consists of a good framing for challenges for LWM models as it raises needs to tackle conversational, spatial reasoning and problem-solving skills together with reaching common ground and joint creativity. In the MP we defined the first version of the framework in the form of a web game that engages two players (AI and human) with the collaborative task of creating a tangram figure defined by a word (e.g, house), whose form needs to be discussed, agreed and built jointly during the interaction. We also developed the first version of the AI model, based on ChatGPT, that plays the task with a human. It is able to talk about the task and play towards the creation of a figure that is defined by the initial word and constrained by the pieces placed on the playfield. A user study is planned for the near future to assess the collaborative capabilities of the AI model and the acceptance and subjective experience of the users interacting with it.

Emerging, context constrained non-verbal communication between human and machine (avatar, robot) partners in diverse environments is a highly unexplored field. Recent advances in LWM techniques with extensive prompt engineering and visual information can be sufficient as demonstrated at HHAI 2024

Innovation and Industry Cooperation Potential

Non-verbal communication has potential applications in both industrial and medical domains. It can complement verbal communication for disambiguation, making it history dependent and pragmatic. It may be necessary in noisy environments, e.g., in firefighting and disasters for controlling and collaborating with robots and drones. In the medical field it can be used for diagnostic and therapeutic purposes, e.g., in the case of autism and language impairments.

Informing Startups and SMEs about the Tangram Project (Start2 Group)

Examples of measures to disseminate and trigger involvement of industrial organizations included:

  • Outreach from Start2 Group to startups as well as SMEs such as Parloa, Lengoo GmbH, Anticipate
  • The current status of the tangram project was presented by Antti Oulasvirta. A discussion about industry relevance was held

Tangible Outcomes

Publications

Rui Prada, Astrid C Homan, Gerben A van Kleef: “Towards Sustainable Human-Agent Teams: A Framework for Understanding Human-Agent Team Dynamics” in the proceedings of AAMAS’2024 – the 23rd International Conference on Autonomous Agents and Multiagent Systems – Blue Sky Ideas Track, pp. 2696-2700, May 6–10, 2024, Auckland, New Zealand. IFAAMAS.

https://www.ifaamas.org/Proceedings/aamas2024/pdfs/p2696.pdf

Passant Elagroudy, Jie Li, Kaisa Väänänen, Paul Lukowicz, Hiroshi Ishii, Wendy E Mackay, Elizabeth F Churchill, Anicia Peters, Antti Oulasvirta, Rui Prada, Alexandra Diening, Giulia Barbareschi, Agnes Gruenerbl, Midori Kawaguchi, Abdallah El Ali, Fiona Draxler, Robin Welsch, Albrecht Schmidt: “Transforming HCI Research Cycles using Generative AI and “Large Whatever Models”(LWMs)” in the proceedings of CHI’2024 – Conference on Human Factors in Computing Systems – Extended Abstracts, pp. 1-5, May 11-16, 2024, Honolulu, Hawaiʻi. ACM.

https://abdoelali.com/pdfs/3613905.3643977.pdf

Inês Lobo, Janin Koch, Jennifer Renoux, Inês Batina, Rui Prada: “When Should I Lead or Follow: Understanding Initiative Levels in Human-AI Collaborative Gameplay” in proceedings of DIS’2024 – ACM Designing Interactive Systems Conference, pp 2037-2056, July 1-5, 2024, Copenhagen, Denmark, ACM.

https://dl.acm.org/doi/pdf/10.1145/3643834.3661583

Helena Lindgren, Vera C. Kaelin, Ann-Margreth Ljusbäck, Maitreyee Tewari, Michele Persiani, and Ingeborg Nilsson. 2024. To Adapt or Not to Adapt? Older Adults Enacting Agency in Dialogues with an Unknowledgeable Digital Agent. In Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization (UMAP ’24), July 1–4, 2024, Cagliari, Italy. ACM, New York, NY, USA https://doi.org/10.1145/3627043.3659562

Vera C. Kaelin, Maitreyee Tewari, Sara Benouar, and Helena Lindgren. Developing Teamwork: Transitioning between stages in human-agent collaboration. To appear in Frontiers in Computer Science

Handbook chapters

János Adrián Gulyás, Miklós Máté Badó, Kristian Fenech, András Lőrincz

3 page demo paper

Keep Gesturing: A Game for Pragmatic Communication, Gulyás et al.

HHAI 2024, pp. 463-465

https://lucris.lub.lu.se/ws/portalfiles/portal/194811556/HHAI_2024-_HYBRID_HUMAN_AI_SYSTEMS_FOR_THE_SOCIAL_GOOD_Proceedings_of_the_Third_International_Conference_on_Hybrid_Human-Artificial_Intelligence.pdf

Submission to CHI 2025:

János Gulyás, Miklós Máté Badó, Kinga Faragó, Kristian Fenech, and András Lőrincz

Large Language Mimes: Gesture Based Human-Machine Communication in a Bi-Directional Signaling Game

 Toolsets

Collaborative AI Arena: https://humane-ai.dice.aalto.fi/

Software:

https://github.com/badomate/CollaborativeAI/tree/working_demo_gesture for web-based gestural interaction in the Collaborative AI platform. May need permission from https://github.com/AaltoRSE/CollaborativeAI/pull/5 on page 5. Default LWM is GPT4o

The specific developments of the Tangram collaborative task are available here: https://github.com/GAIPS/TangramCollaborativeAI

AI-Builder of the European AIoD Platform: AI-Builder that holds a catalog of AI-Models

 Videos or Demos

https://youtu.be/j_bAw8e0lNU?si=STi6sbLzbpknckGG

Tangram Task demo available at:

https://gaips.github.io/TangramCollaborativeAI/

 Other

Student competition, 1st Prize at ELTE

https://www.youtube.com/watch?v=WmuWaNdIpcQ (Call https://shorturl.at/SN8eE)

Events Organized/participated in

Event visual: AI & I Hackathon
Event visual: AI & I Hackathon

On July 29.- 30th, the “AI & I Hackathon” brought together developers, AI enthusiasts, researchers, and ambitious students for a 1.5-day deep dive into co-creativity between humans and Large Language Models (LLMs) such as OpenAI’s GPT and Google’s Gemini. Held in collaboration with fortiss, Ludwig Maximilians University, Start2 Group, and dfki, the hackathon aimed to leverage LLMs as creative collaborators rather than just passive tools, challenging participants to build systems that support human-LLM co-creation.

The two-day event was held in the fortiss office in Munich.

With 28 participants, the event welcomed a diverse international presence, with attendees from countries including the Netherlands, Sweden, Portugal, Italy, Poland, and more. Travel grants helped cover costs for those attending from outside Munich. Participants formed teams of up to four people and competed for top prizes, with 1st, 2nd, and 3rd place awards of €800, €400, and €200, respectively + cloud credits from OVH Cloud. Experts were available throughout the hackathon for consultation.

Highlights

  • Collaborative Systems for Co-Creation: Teams worked to develop applications that used LLMs as interactive, idea-generating partners in content creation. Using prompts and feedback mechanisms, the projects sought to showcase how LLMs can actively contribute to generating more engaging and innovative texts.
  • Inspiration Through Expert-Led Keynotes: Expert-led keynotes provided thought-provoking perspectives to set the stage. Dr. Joongi Shin (Aalto University) presented “Can LLMs Make People More Creative?” and Prof. Dr. Albrecht Schmidt (LMU) discussed “Symbiotic Creativity: The Fusion of Human Ingenuity and Machine Intelligence.” The agenda continued with Thomas Pfau (Aalto University) providing an overview of the technology stack and goals, equipping participants with the tools to start hacking.

Event Communication

To promote the event, the organizing team utilized LinkedIn channels, newsletters, and a dedicated event-website to reach a broader audience. Here are a few examples:

https://www.linkedin.com/posts/fortiss_hackathon-ai-llm-activity-7214923383950458880-lO9m?utm_source=share&utm_medium=member_desktop

Hackathon Task: Developing a Prototype for Collaborative Creativity with LLMs

For this hackathon, participants were tasked with creating a prototype system designed to enhance and evaluate the collaborative creative potential between users and Large Language Models (LLMs). The objective was to build tools that facilitate seamless, innovative co-creation in text-based projects while exploring and benchmarking human-AI creativity.

AI&I Hackathon on July 29th - 30th, participating teams
AI&I Hackathon on July 29th – 30th, participating teams

Available Resources

Participants were provided with access to advanced AI models, including GPT-4, as well as various tools and endpoints for building their prototypes. They could also use a pre-designed poetry generation example to kickstart their project. Teams were encouraged to experiment with different levels of difficulty, from straightforward prompt reformulations to more complex features such as automated prompt engineering, API integrations, and even building entirely independent applications.

Final Deliverables

Each team presented a 10-minute live demo to a jury board, showcasing their prototype, followed by a Q&A. Additionally. A demo link was submitted to allow the jury to explore the prototype independently and assess its functionality and creativity-promoting capabilities.

Feedback from participants

A small survey was sent out after the Hackathon to get immediate feedback and identify areas of improvement. Even though only 8 participants took part in the survey, the results provide valuable feedback for the organizational team. The following summarizes the main findings. (Detailed survey results can be accessed through this link https://forms.gle/9Exe6RUP4njeGxCf8.)

  1. Overall Satisfaction: Most respondents (75%) were highly satisfied with the event’s organization and technical support.
  2. Positive Aspects: Attendees enjoyed the engaging tasks, inspiring keynotes, the collaborative atmosphere, and the focus on pre-built components, which allowed more time for creativity.
  3. Areas for Improvement: Participants highlighted the lack of feedback on project ranking and unclear judging criteria. Some felt that basic AI knowledge should have been a prerequisite, and a few noted discomfort with the provided workspace and scheduling.
  4. Skills Gained: Participants acquired various skills, including front-end development (React), full-stack development, prompt engineering, and multi-agent LLM-based architectures, along with teamwork and project collaboration experience.
  5. Additional Feedback: The participants appreciated the event organization and thanked the organizers for providing support for travel and accommodation, which enabled broader participation.

These insights reflect a well-received event with some areas for logistical and evaluative refinement.

Key Takeaways

  1. Future of LLMs in Collaborative Creativity: The hackathon showcased a new approach to AI, positioning LLMs as creative partners that respond to user input and inspire fresh ideas.
  2. Practical Applications for Co-Creative Projects: Through prompts and interactive discussions, participants explored innovative ways to engage LLMs in generating creative text in a collaborative way, driving new possibilities for human-LLM partnerships.

Conclusion

The AI and I Hackathon provided an invaluable experience for participants to explore, learn, and innovate within the rapidly evolving space of human-AI collaboration. By leveraging the expertise of the partners fortiss, LMU, and Start2 Group and dfki, this hackathon enabled developers, AI enthusiasts, and students to push the boundaries of co-creativity, setting a new standard for how LLMs can transform creative industries.

LLM4SME Workshop on July 30th 2024 in Munich

Event Visual, LLM4SME Workshop
Event Visual, LLM4SME Workshop

Event Overview

On July 30, 2024, fortiss hosted the workshop titled “LLM4SME: LLMs and Their Influence on Internal Business Processes” at their office in Highlight Tower, Munich. This event was organized in collaboration with Start2 Group, LMU Munich, and Bayern Innovativ, aiming to familiarize small and medium-sized enterprises (SMEs) with the potential of Large Language Models (LLMs) to optimize their business processes and improve overall efficiency. The workshop attracted representatives from SMEs looking to integrate LLM-driven innovation in areas such as customer service, content generation, and data analysis. 16 persons have been registering for the Workshop.

Event Objectives

The “LLM4SME” workshop provided insights into the rapidly evolving field of LLMs and their applications for SMEs. Participants were introduced to real-world examples and practical approaches for using LLMs to automate language-based tasks and meet the growing demands for personalized customer interactions. The event encouraged SMEs to explore how LLMs could improve their efficiency, stimulate innovation, and provide a competitive edge.

Event Communication

For promoting the event, the organization-team leveraged LinkedIn Channels, Newsletters and a dedicated event-website to attract a wider audience. In the following, there are some examples:

 

Even after the event, the communication did not end:
Screenshot from Start2 Newsletter from June 4th, 2024

Even after the event, the communication did not end:

 

Highlights

  • Inspiring Keynotes from Experts: Holger Pfeifer from fortiss and Dr. Sabine Wiesmüller, Director AI (Europe) at Start2, welcomed participants and set the stage for the workshop. Dr. Wiesmüller highlighted how LLMs can be leveraged to streamline tasks like data analysis and customer communication, sharing practical insights into existing AI tools that automate these processes.The workshop day concluded with an inspiring keynote by Prof. Dr. Paul Lukowicz from the German Research Center for Artificial Intelligence (DFKI), who discussed the potential of human-AI collaboration and its significance for SMEs.

 

  • Interactive Group Workshops:

The first workshop “Process Optimization in Practice: Opportunities through LLMs for SMEs” was led by Zhiwei Han from fortiss and provided hands-on examples of how SMEs could use LLMs to enhance their processes. Participants collaborated in small groups, exchanging ideas and exploring the practical implications of implementing LLMs for optimization.

 

The second workshop “Effective Prompts for SMEs: Hands-On Techniques for Using LLMs” was guided by Thomas Weber from LMU. The session focussed on crafting effective prompts for LLMs, a key technique for maximizing the capabilities of these tools. Participants were encouraged to bring their laptops and actively experiment with prompt engineering techniques.

 

Key Outcomes

The workshop offered SMEs invaluable guidance on integrating LLMs into their operations. Interactive discussions and practical sessions provided participants with a solid foundation in LLM applications, showcasing how these AI models can drive efficiency, foster innovation, and maintain a competitive edge.

LLM4SME Workshop on July 30th, participants
LLM4SME Workshop on July 30th, participants

Survey: Large language Models (LLMs) for Small and Medium Enterprises (SMEs)

The study “LLMs and Their Influence on Internal Business Processes”, initiated on April 25th, 2024 and ongoing as of July 10, 2024, is a collaborative project among Ludwig-Maximilians-Universität München, fortiss GmbH, Start2 Group, and Bayern Innovativ. The survey aims to provide foundational insights for further research into AI adoption within SMEs. The survey results also informed the conceptual planning of the July 30, 2024, workshop titled “LLM4SME: LLMs and Their Influence on Internal Business Processes.” Hosted by fortiss at their Highlight Tower office in Munich, the event was organized in collaboration with Start2 Group, LMU Munich, and Bayern Innovativ (more details will be published in the Handbook, see D9.6).

 

The distribution of the survey was leveraged by SME Networks such as Bayern Innovativ, fortiss, KI-Hub as well as from MUK.IT and Start2 Group.

Survey Concept

This survey focuses on understanding how companies perceive and utilize AI and LLM technologies, particularly in the context of their adoption, perceived challenges, and resource requirements.

Structure and Types of Questions:

The survey includes a mix of question types:

  1. Multiple-choice questions – Often used for direct information gathering, such as company size, frequency of AI use, or specific departmental adoption of LLMs.
  1. Likert scale questions – Frequently employed to gauge attitudes, such as the importance of AI technologies to the company, perceived adoption levels, and anticipated benefits and risks across different business areas.
  1. Open-ended questions – These invite respondents to provide more nuanced insights on unique challenges faced, anticipated future use cases for AI, or desired areas of information to support decision-making.

 

Topics Covered:

The topics are organized to assess several critical areas:

  • Current and Potential AI Use: This includes questions on the importance of AI to the company’s objectives, the extent of its current implementation, and specific departments where AI tools, particularly LLMs, are used or could be imagined in the future.
  • Challenges and Barriers: The survey dives into the challenges companies face in adopting AI technologies, like skill shortages, high costs, data privacy concerns, technological complexity, and cultural resistance.
  • Resource Needs and Plans: Respondents are asked about missing resources necessary for further AI adoption, such as specialized talent or financial support, and their plans to allocate resources (e.g., hiring, training, or outsourcing).
  • Potential and Risks of LLMs: Several questions focus on evaluating LLMs’ potential benefits in areas like customer acquisition or business innovation, as well as risks such as data security or ethical concerns.

 

Context and Introductions:

Some questions are prefaced with short introductions, especially those involving technical terms or newer concepts (e.g., “LLMs” or “AI-driven business applications”). This approach provides clarity, helping respondents better understand the intent behind each question.

Additional Sections:

In addition to core questions, the survey concludes with consent inquiries, seeking permission to contact respondents for follow-up surveys, share results, and process personal data according to the company’s privacy policy. These elements ensure informed participation and compliance with data protection practices.

Overall, the survey is designed to capture a comprehensive view of AI integration in companies, focusing not only on current practices but also on anticipated needs, barriers, and strategic outlook. This survey template is highly adaptable and can be used across various ecosystems or geographic regions to evaluate perceptions of LLMs and AI technologies. By tailoring the questions to fit specific regional, industry, or sectoral contexts, stakeholders can gain nuanced insights into AI adoption, resource needs, and potential challenges unique to their environment. We are happy to offer consultation services to support the customization of this template, helping organizations effectively capture and analyze AI perceptions within their specific ecosystem or geographic area.

 

Main findings

The following presents main findings from the survey conducted among small and medium-sized enterprises (SMEs) regarding the use of large language models (LLMs) in internal business processes. A detailed analysis can be accessed here.

 

  • Importance of AI in SMEs: Among 54 respondents, 63% of SMEs consider AI important (36%) or very important (27%), though 75% rate their adoption level as low (44%) or average (31%).
  • Challenges in AI Adoption: Key challenges reported by SMEs (54 responses) include privacy concerns (11%), technology complexity (10%), and integration issues with existing systems (10%).
  • Usage of LLMs in Internal Processes: Of 53 responses, 41% of SMEs use LLMs frequently (26%) or very often (15%), with the main applications in marketing (22%), IT management (19%), and customer relations (11%).
  • Impact of LLMs on Efficiency: Efficiency gains from LLMs are particularly noted in Information and Technology Management, followed by marketing and CRM. 38% of SMEs (40 responses) are planning to allocate resources towards training existing staff rather than hire additional staff (16%) in order to utilize LLMs more efficiently.
  • Resource Constraints for Frequent LLM Use: Privacy concerns (12%) and skilled worker shortages (12%) were the most reported limitations among 41 responses, followed closely by limited R&D resources (10%), integration challenges (10%) and poor data quality (10%).
  • Perception of LLM Potential and Risks: Among 33 respondents, a significant portion views LLMs as having high business potential, though 50% express concerns about data protection and competitive pressures.
  • Information Needs for Informed Decision-Making: A strong demand for information exists, especially regarding technical background, with 60% of SMEs (33 responses) seeking data protection guidance and 50% requesting practical business use cases.

 

Recommendations

Based on the survey results, here are targeted recommendations to support SMEs in effectively adopting and integrating LLMs into their internal processes:

  1. Enhance Training and Skills Development:
    SMEs could benefit from dedicated training programs to improve employee skills in AI and LLMs. Given that 38% of companies plan to invest in training, tailored workshops and accessible online courses could help bridge skill gaps, making LLM technologies more approachable and enhancing internal capabilities.
  2. Address Privacy and Data Security Concerns:
    With privacy concerns cited by 11% of respondents as a key obstacle, SMEs should implement robust data protection protocols and provide guidance on regulatory compliance. Partnerships with privacy-focused AI consultancies and training on data governance can equip SMEs to safeguard sensitive information while using LLMs.
  3. Simplify Technology Integration:
    Considering that integration challenges were highlighted by 10% of respondents, SMEs would benefit from integration tools and resources to streamline the adoption of LLMs into existing workflows. Vendors could focus on providing user-friendly API interfaces and compatibility features to ease this transition.
  4. Provide Industry-Specific Case Studies and Use Cases:
    With 50% of SMEs expressing a need for practical business use cases, industry-specific examples and case studies would help SMEs visualize and plan LLM applications effectively. By observing how similar businesses leverage LLMs, SMEs can better assess potential applications and implementation strategies.
  5. Support SMEs with Funding for R&D and Resource Allocation:
    To address high costs and limited R&D resources, external support, such as grants or low-interest loans, could enable SMEs to experiment with LLMs at lower financial risk. Policymakers and industry associations could collaborate to provide such funding, fostering AI innovation within the SME sector.
  6. Promote Ethical and Transparent AI Practices:
    As LLM transparency and ethical AI practices are growing concerns, establishing clear guidelines around model transparency, accountability, and ethical considerations will help SMEs make informed choices and build trust with stakeholders. Collaborative efforts between AI providers and SMEs can focus on defining and adhering to responsible AI use.

These recommendations aim to address the specific challenges SMEs face and provide actionable steps to foster successful, scalable, and secure LLM integration.

LLM4SME study landing page
LLM4SME study landing page

 

Collaboration Outside the Consortium (in particular with AIoD etc)

The macroproject collaborated with AI on Demand platform, specifically by integrating some of the AI-Arena models into the AI-Builder (Fig. A) which provides the infrastructure and framework to execute the arena tasks (Fig. b).

The collaborative ai model in the AI-Builder Catalog
The collaborative ai model in the AI-Builder Catalog
The collaborative ai model in the AI-Builder Catalog
The collaborative ai model in the AI-Builder Catalog

 

 

 

 

 

Contact person: Samuel Kaski (samuel.kaski@aalto.fi)

Internal Partners:

  1. Aalto University
  2. Delft University of Technology, Frans Oliehoek

 

In human-AI collaboration, one of the key difficulties is establishing a common ground for the interaction, especially in terms of goals and beliefs. In practice, the AI might not have access to this necessary information directly and must infer it during the interaction with the human. However, training a model to support this kind of inference would require massive collections of interaction data and is not feasible in most applications.

Modern cognitive models, on the other hand, can equip AI tools with the necessary prior knowledge to readily support inference, and hence, to quickly establish a common ground for collaboration with humans. However, utilizing these models in realistic applications is currently impractical due to their computational complexity and non-differentiable structure.

Contact person: Catholijn Jonker, Maria Tsfasman (c.r.m.m.oertel@tudelft.nl; m.tsfasman@tudelft.nl)

Internal Partners:

  1. Technical University Delft, Catharine Oertel, c.r.m.m.oertel@tudelft.nl
  2. Eotvos Lorand University, Andras Lorincz, lorincz@inf.elte.hu

 

In this micro-project, we propose investigating human recollection of team meetings and how conversational AI could use this information to create better team cohesion in virtual settings. Specifically, we would like to investigate how a person’s emotion, personality, relationship to fellow teammates, goal and position in the meeting influences how they remember the meeting. We want to use this information to create memory aware conversational AI that could leverage such data to increase team cohesion in future meetings. To achieve this goal, we first record a multi-modal dataset of team meetings in a virtual setting. Second, administrate questionnaires to participants in different time intervals succeeding a session. Third, annotate the corpus. Fourth, carry out an initial corpus analysis to inform the design of memory-aware conversational AI. This micro-project will contribute to a longer-term effort in building a computational memory model for human-agent interaction.

Results Summary

The MEMO corpus was collected, which contains 45 group discussions around the topic of COVID-19. A total of 15 groups were formed, consisting of 3 to 6 participants who took part in 3 group discussions, with a 3-4 day gap between sessions. A total of 59 individuals with diverse backgrounds took part in the study. Before and after each session participants completed a series of questionnaires to determine which moments they recalled from their conversations, along with their personality traits, values and perceptions.

To capture conversational memory, we collected first-party free-recall reports of the most memorable moments from the discussion immediately after the interaction and again 3-4 days later. For the shorter-term memories, participants also mapped the moments to a particular interval in the video of their discussion, which were used for the ground-truth conversational memory annotations.

For each participant, personality and value profiles were recorded in the pre-screening survey, along with demographic information to identify their social group affected by COVID-19. Pre-session questionnaires also assessed participants’ mood before each session. Post-session questionnaire included questions about mutual understanding, personal attitude and perceived social distance. The perception of the discussion and the group as a whole was also monitored in the post-session questionnaire with variables such as Task and Group Cohesion, Entitativity, Perceived Interdependence, Perceived Situation Characteristics, Syncness, and Rapport.

The following automatic annotations were extracted on the corpus:

* Transcripts – Transcripts were generated with automatic speech recognition methods and were manually reviewed and corrected where needed. Transcript timestamps are available at the utterance level as well as word-level text grid files for each recording. Speaker diarization is also available.

* Eye gaze and head pose – automatically annotated with EyeWare software, the annotation itself will be provided, but the code uses proprietary API. This includes gaze targets collected through screenshots of participants’ screen views.

* Prosody – eGeMAPS feature set was extracted from the default eGeMAPS configuration in OpenSmile

* Body pose – Body pose (upper body only) and hand pose when visible were estimated with the models available in the MediaPipe software

* Facial action units – Facial action units were estimated for participants using the OpenFace Software

A Paper describing the corpus and the annotations in more detail is in preparation. Additionally, the collected annotations are to be packaged in an appropriate manner for ease of use for future researchers.

Tangible Outcomes

  1. Tsfasman, M., Fenech, K., Tarvirdians, M., Lorincz, A., Jonker, C., & Oertel, C. (2022). Towards creating a conversational memory for long-term meeting support: predicting memorable moments in multi-party conversations through eye-gaze. In ICMI 2022 – Proceedings of the 2022 International Conference on Multimodal Interaction (pp. 94-104). (ACM International Conference Proceeding Series). Association for Computing Machinery (ACM). https://doi.org/10.1145/3536221.3556613 
  2. summary

Contact person: Eric Blaudez, (eric.blaudez@thalesgroup.com)

Internal Partners:

  1. Thales, Eric Blaudez, eric.blaudez@thalesgroup.com
  2. Unibo, Paolo Torrini, p.torroni@unibo.it
  3. CNRS

External Partners:

  1. LISN, Christophe Servan c.servan@qwant.com

 

The micro-project provides a demonstration of the hierarchical framework for collaboration described in the Humane-AI Net revised strategic work plan, by constructing a multimodal and multilingual conversational agents focused on search. The framework is based on hierarchical levels of abilities:

  • Reactive (sensori-motor) Interaction: Interaction is tightly-coupled perception-action where actions of one agent are immediately sensed and interpreted as actions of the other. Examples include greetings, polite conversation and emotional mirroring
  • Situated (Spatio-temporal) Interaction Interactions are mediated by a shared model of objects and relations (states) and shared models for roles and interaction protocols.

In this micro-project, we focused on the 2 first levels (Reactive and Situational) and designed the global framework architecture to show a Proof of Concept (PoC).

Results Summary

We show that the proposed approach provides high-quality semantic segmentation from the robot’s perspective, with accuracy comparable to the original one. In addition, we exploited the gained information and improved the recognition performance of the deep network for the lower viewpoints and showed that the small robot alone is capable of generating high-quality semantic maps for the human partner. The computations are close to real time, so the approach enables interactive applications.

Tangible Outcomes

  1. T-KEIR: https://github.com/ThalesGroup/t-keir 
  2. erc-unibo-module: https://github.com/helemanc/erc-unibo-module 

Contact person: Jan Hajic, Charles Univ, (jan.hajic@mff.cuni.cz)

Internal Partners:

  1. Charles Univ, Jan Hajic
  2. DFKI, Thierry deClerck

 

Many industrial NLP applications emphasize the processing and detection of nouns, especially proper nouns (Named Entity Recognition, NER). However, processing of verbs has been neglected in recent years, even though it is crucial for the development of full NLU systems, e.g., for the detection of intents in spoken language utterances or events in written language news articles. The META-O-NLU microproject focuses on proving the feasibility of a multilingual event-type ontology based on classes of synonymous verb senses, complemented with semantic roles and links to existing semantic lexicons. Such an ontology shall be usable for content- and knowledge-based annotation, which in turn shall allow for developing NLU parsers/analyzers. The concrete goal is to extend the existing Czech- English SynSemClass lexicon (which displays all the necessary features, but only for two languages) by German and Polish, as a first step to show it can be extended to other languages as well.

Results Summary

Extended version of SynSemClass (entried in additional languages)

Tangible Outcomes

  1. SynSemClass 3.5 dataset, (dataset),
    URL: http://hdl.handle.net/11234/1-3750
  1. SynSemClass 3.5 browser, (other),
    URL: https://lindat.cz/services/SynSemClass35/

Contact person: James Crowley (James@crowley-coutaz.fr)

Internal Partners:

  1. Eotvos Lorand University – ELTE, Andras Lorincz
  2. Univ Grenoble Alpes, Dominique Vaufreydaz, Fabien Ringeval
  3. Uni Paris Saclay, Camille Guinaudeau, Marc Evrard
  4. Jozef Stefan Institut-JSI, Marko Grobelnik
  5. Charles University, Pavel Pecina

 

Transformers and self-attention (Vaswani et al., 2017), have become the dominant approach for natural language processing (NLP) with systems such as BERT (Devlin et al., 2019) and GPT-3 (Brown et al., 2020) rapidly displacing more established RNN and CNN structures with an architecture composed of stacked encoder-decoder modules using self- attention. This micro-project provide tools and data sets for experiments and a first initial demonstration of the potential of transformers for multimodal perception and multimodal interactions. We define research challenges, benchmark data sets and performance metrics for multimodal perception and interaction tasks such as (1) audio-visual narration of scenes, cooking actions and activities, (2) audio-video recordings of lectures and TV programs (3) audio-visual deictic (pointing) gestures, and (4) perception and evocation of engagement, attention, and emotion.

1) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. and Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762

2) Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

3) Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., and Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.

Results Summary

In this project, we explore the potential of Transformer-based models in two significant domains: unsupervised object discovery and multimodal emotion recognition using physiological signals. First, we demonstrate a novel approach for unsupervised object discovery by leveraging self-supervised learning with self-distillation loss (DINO). Our method utilizes visual tokens as nodes in a weighted graph, where edges reflect connectivity scores based on token similarity. By applying a normalized graph-cut and solving it through spectral clustering with generalized eigen-decomposition, we isolate foreground objects. This approach effectively segments self-similar regions, with the second smallest eigenvector of the decomposition providing the cutting solution that indicates token association with foreground objects. This technique not only simplifies the object discovery process but also achieves substantial performance improvements over current state-of-the-art methods such as LOST, outperforming it by 6.9%, 8.1%, and 8.1% on the VOC07, VOC12, and COCO20K benchmarks, respectively. Furthermore, integrating a second-stage class-agnostic detector (CAD) enhances these results, and our method’s adaptability is demonstrated in its application to unsupervised saliency detection and weakly supervised object detection, achieving notable IoU improvements on the ECSSD, DUTS, and DUT-OMRON datasets.

In parallel, we address the challenge of multimodal emotion recognition from physiological signals using Transformer-based models. Recognizing the advantages of attention mechanisms in Transformers for creating contextualized representations, we propose a model for processing electrocardiogram (ECG) data to predict emotions. This model highlights significant segments of the signal, ensuring that relevant information is given priority. Due to the limited size of datasets with emotional labels, we adopt a self-supervised learning approach. We pre-train our model using unlabelled ECG datasets to build robust representations and then fine-tune it on the AMIGOS dataset for emotion recognition. Our findings confirm that this approach achieves state-of-the-art results in emotion recognition tasks involving ECG signals. Additionally, the success of this strategy underscores the broader potential of Transformers and pre-training techniques for analyzing time-series data in emotion recognition tasks.

Overall, the outcomes of our project demonstrate that Transformer-based models, coupled with self-supervised learning, can significantly enhance the performance of both unsupervised object discovery and emotion recognition from physiological signals. These methods provide robust solutions for complex visual and temporal signal analysis tasks, marking a substantial step forward in computer vision and affective computing.

Tangible Outcomes

  1. Y. Wang, X. Shen, S. Hu, Y. Yuan, J. L. Crowley, D. Vaufreydaz, Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut. IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2022, pp14543-14553, New Orleans, Jun 2022.
    https://arxiv.org/abs/2202.11539 
  2. J. Vazquez-Rodriguez, G. Lefebvre, J. Cumin and J. L Crowley, “Emotion Recognition with PreTrained Transformers Using Multimodal Signals”, 10th International Conference on Affective Computing and Intelligent Interaction (ACII), Oct 2022
    https://ieeexplore.ieee.org/document/9953852  .
  3. J. Vazquez-Rodriguez, G. Lefebvre, J. Cumin, J. L. Crowley. Transformer-Based SelfSupervised Learning for Emotion Recognition. 26th International Conference on Pattern Recognition (ICPR 2022), Aug 2022, Montreal, Canada.
    https://arxiv.org/abs/2204.05103 
  4. A survey of tools and datasets for a multimodal perception with transformers (http://crowley-coutaz.fr/jlc/HumanE-AI-Net/TransfomerMicroProject/TransformerTools.pdf )
  5. A tutorial on the use of transformers for multimodal perception. (http://crowley-coutaz.fr/jlc/Courses/ACAI2021/Multimodal-Transformer-Tutorial.html )
  6. Report on challenges for the use of transformers for multimodal perception and interaction. (http://crowley-coutaz.fr/jlc/HumanE-AI-Net/TransfomerMicroProject/ReseachChallengesDataSets.pdf )

Contact person: Florian Müller (florian.mueller@um.ifi.lmu.de )

Internal Partners:

  1. LMU Munich, Florian Müller, florian.mueller@um.ifi.lmu.de  

External Partners:

  1. University of Bari, Giuseppe Desolda, giuseppe.desolda@uniba.it 

 

Manufacturing tools like 3D printers have become accessible to the wider society, making the promise of digital fabrication for everyone seemingly reachable. While the actual manufacturing process is largely automated today, users still require knowledge of complex design applications to not only produce ready-designed objects, but also adapt them to their needs or design new objects from scratch. To lower the barrier for the design and customization of personalized 3D models, we imagine an AI-powered system that assists users in creating 3D objects for digital fabrication. Reaching this vision requires a common understanding – a common ground – between the users and the AI system. As a first step, in this micro project, we explored novices’ mental models in voice-based 3D design by conducting a high-fidelity Wizard of Oz study with 22 participants without skills in 3D design. We asked the participants to perform 14 tasks revolving around some basic concepts of 3D design for digital modeling, like the creation of objects, the manipulation of objects (e.g., scaling, rotating, and/or moving objects), and the creation of composite objects. We performed a thematic analysis of the collected data assessing how the mental model of novices translates into voice-based 3D design.

Results Summary

We found that future AI assistants to support novice users in voice-based digital modeling must: manage the correction the users do during and after the commands to fix certain errors; deal with vague and incomplete commands by automatically completing the commands with sensible defaults or by asking the users for clarification; consider the prior novices knowledge, for example, about the use of undo and redo functions; provide only a simplified set of operations for creating simple and composite 3D objects; design a workflow similar to what novices would do if they were building real objects, for example, providing wizard procedures that guide novices in designing composite 3D models starting from the bottom; provide different commands to select 3D objects; understand and execute chained commands; understand commands that are relative to the users’ point of view; grant multiple ways to refer to the axes, for example, by using their names, colors and user direction; favor explicit trigger words to avoid unintentional activation of the voice assistant; embrace diversity in naming approaches since novices often use other words to refer to 3D objects.

Contact person: András Lőrincz (lorincz@inf.elte.hu

Internal Partners:

  1. Eötvös Loránd University (ELTE), András Lőrincz and Daniel Sindley
  2. Charles University Prague, Ondřej Dušek and Tomáš Nekvinda  

 

We propose research on a scalable human-machine collaboration system with the goal of executing high quality actions for rehabilitation exercises. We combine video and speech for video-grounded goal-oriented dialogue. We build on our video and text database. The database has exercises for rehabilitation following knee injuries. We evaluate high performance body pose estimation tools and compare it to a real-time body pose estimation tool to be developed for smartphones via ‘knowledge distillation’ methods. The complementing part of the project deals with the texts that we have collected for these exercises and estimates the amount of texts needed for dialogues that can lead and correct the quality of exercises. Potential topics/intents include pose relative to camera, proper light conditions, audio-visual information about pain, notes about execution errors, errors discovered by the computer evaluations, requests about additional information from the patient, and reactions to other, unrelated queries.

Results Summary

Human-machine collaboration will soon be ubiquitous, as machines can help in everyday life. However, spatial tasks are challenging because of real-time constraints. We want to optimize the interaction offline before it happens in real time to ensure high quality. We present the SPAtial TAsk (SPATA) framework. SPATA is modular, and here we address two connected components; body pose optimization and navigation. Our experiments show that 3D pose estimation using 2D cameras is accurate when the motion is captured from the right direction and distance. This limitation currently restricts us to simple forms of movement, such as those used in physical rehabilitation exercises. Accurate estimation requires (a) estimation of body size, (b) optimization of body and camera position, (c) navigation assistance to a location, and (d) activity capture and error estimation. An avatar model is used to estimate the shape and a skeleton model is used to estimate the body pose for (a). For (b), we use SLAM. For (c), we use a semantic map and optimize a minimal NLP system for human needs that we test. Finally, we estimate the accuracy of the motion and propose a visual comparison between the planned and the implemented motion pattern for (d). Our SPATA framework is useful for various tasks at home, in gyms and other spatial applications. Depending on the task, different components can be integrated. The MP targeted specific topics, including

(i) body motion and pain both

— in terms a language and potential dialogues and

— in more than 400 video samples that included 50 exercises and about 7 errors on the average to be detected alone or in combinations for each motion types

and

(ii) dialogues

— from experts and

— crowdsourcing based dialogue enhancements

Tangible Outcomes

  1. DeepRehab: Real Time Pose Estimation on the Edge for Knee Injury Rehabilitation – Bruno Carlos Dos Santos Melício, Gábor Baranyi, Zsófia Gaal, Sohil Zidan,and Andras Lőrincz. https://e-nns.org/icann2021/
  2. Video presentation summarizing the project

Contact person: Gilles Bailly (Gilles.Bailly@sorbonne-universite.fr

Internal Partners:

  1. Sorbonne Université, Gilles Bailly
  2. Aalto University, Kashyap Todi and Antti Oulasvirta  

External Partners:

  1. University of Luxembourg, Luis Leiva  

 

Adapting user interfaces (UIs) requires taking into account both positive and negative effects that changes may have on the user. A carelessly picked adaptation may impose high costs — for example, due to surprise or relearning effort. It is essential to consider differences between users as the effect of an adaptation depends on the user’s strategies, e.g. how each user searches for information in a UI. This microproject extends an earlier collaboration between partners on model-based reinforcement learning for adaptive UIs by developing methods to account for individual differences. Here, we first develop computational models to explain and predict users’ visual search and pointing strategies when searching within a UI. We apply this model to infer user strategies based on interaction history, and adapt UIs accordingly.

Results Summary

This micro-project reinforces the collaborations between Sorbonne Université, Aalto University and University of Luxembourg with weekly meetings. It aims at elaborating computational models of visual search in adaptive User Interfaces. We defined different visual search strategies in adaptive menus as well as promising interactive mechanisms to revisit how to to design menus. The Elaboration of the model is in progress. Concretely, we achieved 4 things:

  1. Created a model of visual search and pointing in menus. The code is available on GitHub
  2. The integration of the model in our platform for adaptive UI. The code is available on GitHub
  3. A demo of the system
  4. A publication at the conference ACM CHI

Tangible Outcomes

  1.  Adapting User Interfaces with Model-based Reinforcement Learning. Kashyap Todi, Gilles Bailly, Luis A. Leiva, Antti Oulasvirta.In CHI ’21: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems https://dl.acm.org/doi/fullHtml/10.1145/3411764.3445497
  2. summary of project’s key findings https://www.kashyaptodi.com/adaptive/
  3. paper presentation from CHI’21
  4. Interaction preview
  5. Video summarizing the project

 

 

Contact person: Frank Dignum (dignum@cs.umu.se)

Internal Partners:

  1. Umeå University (UMU), Frank Dignum
  2. Consiglio Nazionale delle Ricerche (CNR), Eugenia Polizzi, Giulia Andrighetto and Mario Paolucci
  3. Leiden university, Mark Dechesne  

External Partners:

  1. UU and National Police in Netherlands, Mijke van den Hurk 

 

In this project we investigate whether normative behavior can be detected in facebook groups. In a first step we hypothesize about possible norms that could lead to a group becoming more extreme on social media, or whether groups that become more extreme will develop certain norms that distinguish them from other groups and that could be detected. An example of such a norm could be that a (self-proclaimed) leader of a group is massively supported by retweets, likes or affirmative messages, along with evidence of verbal sanctioning toward counter-normative replies. Simulations and analyses of historical facebook data (using manual detection in specific case studies and more broadly through NLP) will help reveal the existence of normative behavior and its potential change over time.

Results Summary

The project delivered detailed analyses of the tweets around the USA elections and subsequent riots. Where we thought we might discover some patterns in the tweets indicating more extreme behavior, it appears that extremist expressions are quickly banned from Twitter and find a home in more niche social platforms (in this case Parler). Thus the main conclusion of this project is that we need to find the connections between users in different social media platforms in order to track any extreme behavior.

In order to see how individuals might contribute to behavior that is not in the interest of society we cannot analyze one social media platform. Especially more extremist expressions quickly disappear from main stream social media to niche platforms that can quickly change over time. Thus the connection between individual and societal goals is difficult to observe by just analyzing data from a single social media platform. On the other hand it is very difficult to link users between platforms. Our core contribution could be summarized in 2 points:

  1. Identification of radical behavior in Parler groups
  2. Characterizing the language use of radicalized communities detected on Parler

Tangible Outcomes

  1. Video presentation summarizing the project

Contact person: Antti Oulasvirta (antti.oulasvirta@aalto.fi)

Internal Partners:

  1. Aalto University, Antti Oulasvirta, antti.oulasvirta@aalto.fi
  2. CNRS and UPMC, Julien Gori, gori@isir.upmc.fr

 

This project studied how AI assistants could better alert humans by estimating their internal states from observations. Contributed to a computational theory called POSG, a multi-agent framework for human-AI interaction developed between CNRS and Aalto University. When is an opportune moment to alert a human partner? This question is hard, because the beliefs and cognitive state of the human should be taken into account when choosing if/when to alert. Every alert is interruptive and bears a cost to the human. However, especially in safety-critical domains, the consequences of not alerting can be infinitely negative. In this work, we formulate the optimal alerting problem based on the theory of partially observable stochastic games.

Results Summary

The problem of the assistant and the problem of the user are formulated and solved as a single problem in POSG. We presented first results using a gridworld environment, comparing different types of alerting agents and a roadmap for future work using realistic driver simulators. These models can inform handover/takeover decisions in semi-automated vehicles. The results were integrated into COOPIHC, a multiagent solver for interactive AI.

Tangible Outcomes

  1. Contributed to a computational theory called POSG, a multi-agent framework for human-AI interaction developed between CNRS and Aalto University: https://jgori-ouistiti.github.io/CoopIHC/ 

Contact person: Brian Ravenet (brian.ravenet@limsi.fr

Internal Partners:

  1. CNRS, Brian Ravenet, brian.ravenet@limsi.fr
  2. INESC-ID, Rui Prada, rui.prada@tecnico.ulisboa.pt

 

This project aims at investigating the construction of humor models to enrich conversational agents through the help of interactive reinforcement learning approaches. Our methodology consists of deploying an online platform where passersby can play a game of matching sentences with humorous comebacks against an agent. The data collected from these interactions helps to gradually build the humor models of the agent following state of the art Interactive Reinforcement Learning techniques. Our work resulted in an implementation of the platform, a first model for humor-enabled conversational agent and a publication of the obtained results and evaluations.

Results Summary

The main result of this project is the creation of an intelligent agent capable of playing a game – Cards Against Humanity- that involves matching sentences with humorous comebacks. In order to achieve this, a dataset of 1712 jokes, rated on a scale of 1 to 9 in terms of joke level, originality, positivity, entertainment, whether it makes sense and whether it is family-friendly, were collected and an online game was developed to serve as the foundation of the reinforcement mechanism.