OpenClaw

product

deployed-agentpersonal-ai-agentsafety-evaluation

OpenClaw

Type: Deployed Personal AI Agent (Platform)

OpenClaw is a widely deployed personal AI agent with local system access and real-world integrations including Gmail, Stripe, and the local filesystem. In this knowledge base, it appears as the subject of the first systematic real-world safety evaluation of a deployed personal AI agent (Wang et al. 2026).

The significance of OpenClaw as a research subject is that it operates with genuine persistent state — unlike controlled agent sandboxes, OpenClaw maintains knowledge stores, behavioral guidelines (identity), and an evolving tool capability set across sessions. This persistent architecture is what enables the CIK (Capability, Identity, Knowledge) attack surface to be studied empirically, and what makes findings generalizable beyond a single-session threat model.

OpenClaw's real integrations (not simulated) mean that attack success rates reflect realistic harm potential: successful attacks triggered actual email sends, financial transactions, and file system modifications in the test harness. The evaluation used an automated testing harness managing workspace backup, Telegram-based prompt delivery, and external evidence verification.

Key Contributions

Subject of first real-world deployed agent safety evaluation, enabling CIK taxonomy validation (OpenClaw Analysis)
Demonstrated architectural vulnerability patterns independent of backbone model choice across Claude Opus 4.6, and three other models (OpenClaw Analysis)
Provided empirical evidence for the evolution-safety tradeoff in learning-capable persistent agents (OpenClaw Analysis)

Mentioned In

Deployed Agent Safety — Primary subject of CIK taxonomy evaluation
Agent Safety & Alignment — Referenced in empirical exploitation section

Sources

openclaw-real-world-agent-safety-analysis