PAPER2026-02-07·Independent Researcher, Boston, MA

Agentic AI Security & Autonomous Red-Teaming

Ashok Kumar Kanagala

COMPILED NOTES

Red-teaming framework for agentic AI: permission escalation, hallucination, orchestration flaws, memory manipulation, supply chain

Agentic AI Security & Autonomous Red-Teaming

Abstract

Threats in agentic AI environments are difficult to anticipate due to their dynamic execution contexts and unpredictable autonomous behaviors. This paper proposes embedding security mechanisms throughout AI development pipelines, advocating for continuous model verification, alignment assurance, and transparency tooling tailored to agentic systems. The framework emphasizes early, automated, and lifecycle-integrated security validation augmented by autonomous red-teaming to enable more resilient, adaptive, and accountable intelligent systems.

Key Contributions

Comprehensive framework for agentic AI security addressing the full development lifecycle
Taxonomy of threat vectors specific to agentic systems: permission escalation, hallucination-driven actions, orchestration flaws, memory manipulation, supply chain vulnerabilities
Proactive testing approach integrating autonomous red-teaming into CI/CD-style AI pipelines
Self-assessing security mechanisms enabling runtime threat detection and response
Analysis of how agentic autonomy creates novel attack surfaces absent in traditional ML systems

Threat Taxonomy

Permission Escalation

Agents acquiring capabilities beyond intended scope through tool chain exploitation
Lateral movement through interconnected tool APIs
Privilege inheritance in multi-agent delegation chains

Hallucination-Driven Actions

Confident but incorrect reasoning leading to harmful tool invocations
Fabricated context influencing downstream agent decisions
Compounding errors in multi-step agentic workflows

Orchestration Flaws

Coordination failures in multi-agent systems
Race conditions in shared resource access
Deadlocks in agent communication protocols

Memory Manipulation

Poisoning persistent memory stores to influence future behavior
Injecting false context through retrieval-augmented generation
Exploiting memory consolidation to create persistent backdoors

Supply Chain Attacks

Compromised tool APIs providing malicious responses
Poisoned training data in fine-tuning pipelines
Adversarial prompt injection through external data sources

Framework Principles

Continuous verification — Runtime monitoring of agent behavior against safety constraints
Alignment assurance — Ongoing validation that agent actions align with specified objectives
Transparency tooling — Interpretable audit trails for agent decision-making processes
Autonomous red-teaming — Automated adversarial testing discovering novel failure modes

RELATED · IN THE BASE

PAPER2026-02-07·Independent Researcher, Boston, MA

Agentic AI Security & Autonomous Red-Teaming

Ashok Kumar Kanagala

COMPILED NOTES

Red-teaming framework for agentic AI: permission escalation, hallucination, orchestration flaws, memory manipulation, supply chain

Agentic AI Security & Autonomous Red-Teaming

Abstract

Key Contributions

Comprehensive framework for agentic AI security addressing the full development lifecycle
Taxonomy of threat vectors specific to agentic systems: permission escalation, hallucination-driven actions, orchestration flaws, memory manipulation, supply chain vulnerabilities
Proactive testing approach integrating autonomous red-teaming into CI/CD-style AI pipelines
Self-assessing security mechanisms enabling runtime threat detection and response
Analysis of how agentic autonomy creates novel attack surfaces absent in traditional ML systems

Threat Taxonomy

Permission Escalation

Agents acquiring capabilities beyond intended scope through tool chain exploitation
Lateral movement through interconnected tool APIs
Privilege inheritance in multi-agent delegation chains

Hallucination-Driven Actions

Confident but incorrect reasoning leading to harmful tool invocations
Fabricated context influencing downstream agent decisions
Compounding errors in multi-step agentic workflows

Orchestration Flaws

Coordination failures in multi-agent systems
Race conditions in shared resource access
Deadlocks in agent communication protocols

Memory Manipulation

Poisoning persistent memory stores to influence future behavior
Injecting false context through retrieval-augmented generation
Exploiting memory consolidation to create persistent backdoors

Supply Chain Attacks

Compromised tool APIs providing malicious responses
Poisoned training data in fine-tuning pipelines
Adversarial prompt injection through external data sources

Framework Principles

Continuous verification — Runtime monitoring of agent behavior against safety constraints
Alignment assurance — Ongoing validation that agent actions align with specified objectives
Transparency tooling — Interpretable audit trails for agent decision-making processes
Autonomous red-teaming — Automated adversarial testing discovering novel failure modes

RELATED · IN THE BASE