IntellAgent: 대화형 AI 시스템을 평가하기 위한 다중 에이전트 프레임워크

초록

대규모 언어 모델(LLMs)은 인공지능을 변혁시키고, 자율적인 계획과 실행이 가능한 과제 중심 시스템으로 진화하고 있습니다. LLMs의 주요 응용 중 하나는 대화형 AI 시스템으로, 다중 턴 대화를 탐색하고, 도메인 특정 API를 통합하며 엄격한 정책 제약을 준수해야 합니다. 그러나 이러한 에이전트를 평가하는 것은 실제 상호작용의 복잡성과 변동성을 포착하지 못하는 기존 방법으로 인해 중요한 도전 과제입니다. 우리는 대화형 AI 시스템을 철저하게 평가하기 위해 설계된 확장 가능한 오픈 소스 다중 에이전트 프레임워크인 IntellAgent를 소개합니다. IntellAgent는 정책 주도 그래프 모델링, 현실적인 이벤트 생성, 상호작용적인 사용자-에이전트 시뮬레이션을 결합하여 다양하고 합성적인 벤치마크의 생성을 자동화합니다. 이 혁신적인 접근 방식은 정책 주도의 그래프 모델을 활용하여 에이전트 능력과 정책 제약의 미묘한 상호작용을 포착함으로써 정적이고 수동으로 유지되는 벤치마크의 한계를 해소합니다. IntellAgent는 대화형 AI를 평가하는 패러다임 변화를 대표합니다. 복잡성의 다양한 수준에서 다양한 정책 시나리오를 시뮬레이션함으로써 IntellAgent는 에이전트 능력과 정책 제약의 미묘한 상호작용을 포착합니다. 기존 방법과 달리 그래프 기반 정책 모델을 사용하여 정책 상호작용의 관계, 가능성 및 복잡성을 표현함으로써 상세한 진단을 제공합니다. IntellAgent는 또한 핵심적인 성능 간극을 식별하여 효과적인 최적화에 대한 실행 가능한 통찰을 제공합니다. 모듈화된 오픈 소스 설계는 새로운 도메인, 정책 및 API의 원활한 통합을 지원하여 재현성과 커뮤니티 협력을 촉진합니다. 우리의 연구 결과는 IntellAgent가 연구와 배포 간의 과제를 극복하며 대화형 AI를 발전시키는 데 효과적인 프레임워크로 작용함을 입증합니다. 해당 프레임워크는 https://github.com/plurai-ai/intellagent에서 이용 가능합니다.

English

Large Language Models (LLMs) are transforming artificial intelligence, evolving into task-oriented systems capable of autonomous planning and execution. One of the primary applications of LLMs is conversational AI systems, which must navigate multi-turn dialogues, integrate domain-specific APIs, and adhere to strict policy constraints. However, evaluating these agents remains a significant challenge, as traditional methods fail to capture the complexity and variability of real-world interactions. We introduce IntellAgent, a scalable, open-source multi-agent framework designed to evaluate conversational AI systems comprehensively. IntellAgent automates the creation of diverse, synthetic benchmarks by combining policy-driven graph modeling, realistic event generation, and interactive user-agent simulations. This innovative approach provides fine-grained diagnostics, addressing the limitations of static and manually curated benchmarks with coarse-grained metrics. IntellAgent represents a paradigm shift in evaluating conversational AI. By simulating realistic, multi-policy scenarios across varying levels of complexity, IntellAgent captures the nuanced interplay of agent capabilities and policy constraints. Unlike traditional methods, it employs a graph-based policy model to represent relationships, likelihoods, and complexities of policy interactions, enabling highly detailed diagnostics. IntellAgent also identifies critical performance gaps, offering actionable insights for targeted optimization. Its modular, open-source design supports seamless integration of new domains, policies, and APIs, fostering reproducibility and community collaboration. Our findings demonstrate that IntellAgent serves as an effective framework for advancing conversational AI by addressing challenges in bridging research and deployment. The framework is available at https://github.com/plurai-ai/intellagent

IntellAgent: 대화형 AI 시스템을 평가하기 위한 다중 에이전트 프레임워크

IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

초록

Support