Abstract

Artificial intelligence (AI) agents typically serve a single human as a client, but such a single human focus by an AI agent can be sub-optimal in contexts where the AI agent performs tasks for teams or other groups of humans. This disclosure describes techniques to develop and evaluate AI agents capable of robust operation in multi-human environments. In a data collection phase, group chat or virtual meeting experiments are organized where participants roleplay conversations to accomplish predefined tasks such as scheduling a meeting, planning an event, etc. Communication transcripts from such experiments capture the complexities of group interactions. In a data augmentation phase, the collected data is expanded to cover a wide array of multi-human, multi-agent scenarios. In an evaluation phase, human and model-based raters assess agent performance against rubrics such as effectiveness, factuality, efficiency, appropriateness, coherence, etc. Results from human and model-based evaluation are synthesized and visualized to arrive at a holistic understanding of the capabilities of the agent and to identify areas for improvement. In this manner, datasets for agent creation and improvement in a multi-human context can be generated, and the performance of AI agents in multi-human environments can be evaluated.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

NA, "Evaluating Artificial Intelligence (AI) Agents in Multi-Human Environments", Technical Disclosure Commons, (August 01, 2025)
https://www.tdcommons.org/dpubs_series/8425

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Evaluating Artificial Intelligence (AI) Agents in Multi-Human Environments

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Evaluating Artificial Intelligence (AI) Agents in Multi-Human Environments

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information