Agentic AI Security & Autonomous Red-Teaming

Paper
Ashok Kumar KanagalaIndependent Researcher, Boston, MAFebruary 7, 2026
Original Source
Key Contribution

Red-teaming framework for agentic AI: permission escalation, hallucination, orchestration flaws, memory manipulation, supply chain

Agentic AI Security & Autonomous Red-Teaming

Abstract

Threats in agentic AI environments are difficult to anticipate due to their dynamic execution contexts and unpredictable autonomous behaviors. This paper proposes embedding security mechanisms throughout AI development pipelines, advocating for continuous model verification, alignment assurance, and transparency tooling tailored to agentic systems. The framework emphasizes early, automated, and lifecycle-integrated security validation augmented by autonomous red-teaming to enable more resilient, adaptive, and accountable intelligent systems.

Key Contributions

  • Comprehensive framework for agentic AI security addressing the full development lifecycle
  • Taxonomy of threat vectors specific to agentic systems: permission escalation, hallucination-driven actions, orchestration flaws, memory manipulation, supply chain vulnerabilities
  • Proactive testing approach integrating autonomous red-teaming into CI/CD-style AI pipelines
  • Self-assessing security mechanisms enabling runtime threat detection and response
  • Analysis of how agentic autonomy creates novel attack surfaces absent in traditional ML systems

Threat Taxonomy

Permission Escalation

  • Agents acquiring capabilities beyond intended scope through tool chain exploitation
  • Lateral movement through interconnected tool APIs
  • Privilege inheritance in multi-agent delegation chains

Hallucination-Driven Actions

  • Confident but incorrect reasoning leading to harmful tool invocations
  • Fabricated context influencing downstream agent decisions
  • Compounding errors in multi-step agentic workflows

Orchestration Flaws

  • Coordination failures in multi-agent systems
  • Race conditions in shared resource access
  • Deadlocks in agent communication protocols

Memory Manipulation

  • Poisoning persistent memory stores to influence future behavior
  • Injecting false context through retrieval-augmented generation
  • Exploiting memory consolidation to create persistent backdoors

Supply Chain Attacks

  • Compromised tool APIs providing malicious responses
  • Poisoned training data in fine-tuning pipelines
  • Adversarial prompt injection through external data sources

Framework Principles

  1. Continuous verification — Runtime monitoring of agent behavior against safety constraints
  2. Alignment assurance — Ongoing validation that agent actions align with specified objectives
  3. Transparency tooling — Interpretable audit trails for agent decision-making processes
  4. Autonomous red-teaming — Automated adversarial testing discovering novel failure modes

Tags

agent-safetyred-teamingsecurityalignment
Agentic AI Security & Autonomous Red-Teaming | KB | MenFem