From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

Paper
Mohamed Amine Ferrag et al.March 6, 2026
Original Source
Key Contribution

Unified taxonomy of ~60 benchmarks, agent framework comparison, collaboration protocols survey

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

Abstract

This comprehensive review consolidates fragmented efforts in evaluation benchmarks, frameworks, and collaboration protocols into a unified framework. It presents a side-by-side comparison of benchmarks developed between 2019 and 2025, and proposes a taxonomy of approximately 60 benchmarks covering general knowledge reasoning, mathematical problem-solving, code generation, and domain-specific evaluations. Reviews agent frameworks from 2023-2025 and examines real-world applications across 11 sectors.

Key Contributions

  • Unified taxonomy of ~60 benchmarks categorized across 8 domains
  • Comparative analysis of benchmarks developed 2019-2025
  • Review of AI-agent frameworks integrating LLMs with modular tools
  • Survey of agent-to-agent collaboration protocols (ACP, MCP, A2A)
  • Documentation of real-world applications across 11 sectors including materials science, biomedical research, healthcare, and finance

Methodology

Systematic literature consolidation organizing fragmented evaluation efforts into a unified framework addressing multi-domain assessment needs.

Results

Coverage spans general reasoning, mathematics, code generation, factual grounding, multimodal tasks, and interactive assessments. Identifies critical gap between benchmark performance and real-world deployment robustness.

Limitations

  • Identifies future research needs including failure modes and security vulnerabilities
  • Automated scientific discovery challenges remain
  • Gap between benchmark and real-world performance

Source: From LLM Reasoning to Autonomous AI Agents by Ferrag et al.

Tags

llm-agentsbenchmarksevaluationagent-frameworkscollaboration-protocols

Identifiers

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review | KB | MenFem