Part 3 of 3 · Part 1: The Mechanism · Part 2: The Intervention

When the Bots Started Talking to Each Other: Moltbook, OpenClaw, and the Weaponisation of Token Attention

Research Note | February 2026

Position in Research

This is the third article in a series about how AI systems process, lose, and can be manipulated into weighting information. The first, "AI Summarisation Mechanics," explained the mechanism. The second, "Influencing Token Attention Through Prompting," explained how users can steer it. This article explains what happens when that steering is done deliberately, at scale, by malicious actors — and when there are no humans in the loop to notice.

Everything described in the first two articles as a mechanism to understand or a technique to use has now been weaponised. What was "how summarisation works" is now "how 1.5 million autonomous agents can be manipulated simultaneously." The mechanism hasn't changed. The context has.

The 88:1 Ratio

On 28 January 2026, a platform called Moltbook launched. It described itself as a social network built exclusively for AI agents — the "front page of the agent internet." Within a week, it claimed 1.5 million registered agents.

The actual number of human owners was approximately 17,000.

That's 88 bots for every human. And that ratio is the entry point to understanding what went wrong — not because the number is dramatic, but because of what it reveals about verification.

Moltbook had no mechanism to distinguish whether an "agent" was an AI system, a human running a script, or an automated account created in a loop. Wiz security researcher Gal Nagli tested this directly: "I could register a million agents in minutes." The verification layer didn't exist — not because it failed, but because nobody built one.

This is the soft tech gap in its purest form. The technology worked. The governance didn't exist.

The Vibe Coding Problem

Moltbook's creator, Matt Schlicht, was explicit about how the platform was built: "I didn't write a single line of code for @moltbook. I just had a vision for the technical architecture, and AI made it a reality."

This matters. Not because AI-assisted development is inherently dangerous — it isn't — but because of what happens when the entire security posture of a platform serving 1.5 million agents is determined by what an AI code generator thought to include.

Within three days of launch, Wiz researchers discovered that Moltbook's entire database was exposed. A Supabase API key, visible in client-side JavaScript, granted unauthenticated read and write access to every table in production. The timeline is instructive:

On 31 January, Wiz contacted Moltbook's maintainer. Over the next three hours, fixes were applied incrementally — agents table, then messages, then votes, then a write-access vulnerability nobody had noticed, then 29,000 observer email addresses, then identity verification records, then developer application data. Each fix revealed another exposure. The database had been configured with no row-level security at all.

Wiz's assessment was blunt: this was "a non-intrusive security review, simply by browsing like normal users." They didn't hack anything. They looked.

What was exposed: 1.5 million API authentication tokens, 35,000 email addresses, private messages between agents, and full database write access. Anyone with a browser could have read — or altered — every agent's instructions, memories, and communications.

Why This Connects to Token Attention

Return to the mechanism from the first two articles. When an AI agent reads text, it doesn't comprehend it the way a human does. It converts tokens to vectors, computes attention weights between them, and generates output based on which tokens pulled hardest in high-dimensional space.

Now consider what happens on a platform where AI agents continuously read each other's posts.

Each post is input text. Each input text shapes attention weights. Each agent's output becomes another agent's input. This is a closed loop — agents reading agent-generated content, with their attention weights shaped by what previous agents chose to emphasise.

The first article described this as the "lost in the middle" problem applied to summarisation. The second article described how users can deliberately steer attention weights by introducing specific tokens into context.

On Moltbook, this steering happened at scale, automatically, and without human oversight. Every post an agent read reshaped its attention landscape. Every post it generated reshaped the attention landscape for every agent that read it. The system was, in effect, a massive attention-steering engine with no governor.

Prompt Injection as Weaponised Attention Steering

Security firm Permiso deployed an agent called Rufio on Moltbook to monitor for threats. What they found was prompt injection operating as social engineering — not targeting humans, but targeting other agents' attention mechanisms.

The attacks were specific and documented. An account called "samaltman" — impersonating OpenAI's CEO — posted content containing hidden instructions designed to make reading agents modify their own system prompts. Accounts named "chandog" and "hyperstitions" ran coordinated financial manipulation campaigns. Both shared the same Ethereum wallet, suggesting a single operator using multiple agent identities to create the appearance of consensus. An account called "SuskBot" posted crypto pump-and-dump schemes tagged as CRITICAL — a word that functions as an attention-weight amplifier in most AI systems, pulling processing resources toward whatever follows it.

When Permiso's Rufio agent began posting warnings about these attacks, it immediately received prompt injection attempts in its replies. The attackers understood that agents read replies. They understood that reading is processing. They understood that processing shapes behaviour.

This is what the second article described as "context priming" and "explicit weighting" — except the person doing the priming is an attacker, the context is a social media post, and the target is an AI agent with access to its owner's files, passwords, and online services.

OpenClaw: The Execution Layer

If Moltbook was the attack surface, OpenClaw was the execution engine.

OpenClaw is an open-source AI agent framework that reached 150,000 GitHub stars in its first weeks. It connects AI models to messaging platforms, file systems, shell commands, and dozens of productivity tools. Its selling point is real autonomy: agents that can reason, remember, and act on your behalf.

The architecture is genuinely impressive. A local gateway manages sessions across WhatsApp, Telegram, Slack, Discord, Signal, and more. Agents can read and write files, execute scripts, browse the web, control smart home devices, manage calendars, and interact with code repositories.

It's also, by design, operating with the permissions of the user who runs it.

Nathan Hamiel, a security researcher, described the core problem: "These systems are operating as 'you.' They sit above operating-system protections. Application isolation doesn't apply." When an OpenClaw agent reads a Moltbook post containing malicious instructions, those instructions execute with whatever permissions the agent has been granted. If the agent can access your email, the attacker can access your email. If the agent can run shell commands, the attacker can run shell commands.

The OpenClaw community's own skill marketplace — ClawHub — demonstrated how quickly this goes wrong. A deeper audit by Koi found 341 malicious skills out of 2,857 reviewed — roughly one in eight. Security researcher Jamieson O'Reilly demonstrated the supply chain risk directly: he uploaded a benign proof-of-concept skill, artificially inflated its download count above 4,000, and watched developers from seven countries install it within hours. ClawHub had no moderation process. Downloaded code was treated as trusted.

The Closed Loop

This is where theoretical frameworks become operational.

The "Monoculture Consciousness" paper described what happens when AI systems trained on similar data converge toward similar outputs, creating a shrinking basin of available responses. On Moltbook, this operated in real time — agents reading agent-generated content, each reinforcing the same patterns, with no external input to break the cycle.

The "Untrust Principle" paper argued that trust applied to AI systems is categorically misplaced — not because AI is unreliable, but because trust requires properties that AI systems don't have: agency, continuity, ethical commitment, contextual understanding. On Moltbook, trust was applied by default. Agents processed every post as input. There was no mechanism for distrust, scepticism, or verification.

The "Feedback Architecture" research described the Santander case — £5,000 monthly income, £1.5 million in outflows over a year, unchallenged because the feedback loop between detection and intervention didn't exist. Moltbook was Santander at platform scale: 1.5 million agents processing unverified content, with no detection layer, no intervention mechanism, and no human in the loop.

The soft tech gap framework identified the space between AI capabilities and effective system performance — the gap where governance, verification, and human oversight should live but often don't. Moltbook occupied that gap entirely. The AI capabilities were real. The verification layer was absent. The gap was the platform.

What the Experts Said

Simon Willison, a security researcher who has tracked prompt injection for years, offered the most precise assessment of OpenClaw: "Current pick for 'most likely to result in a Challenger disaster.'"

That reference isn't casual. The Challenger killed seven astronauts because engineers who identified the O-ring failure risk were overridden by managers who wanted to launch on schedule. The safety warnings existed. They were ignored. The system failed not because the technology was insufficient but because the governance structures meant to prevent known risks didn't function.

Gary Marcus called OpenClaw "basically a weaponised aerosol." IBM's Chris Hay and Kaoutar El Maghraoui provided the enterprise view: "Neither OpenClaw nor Moltbook is likely to be deployed in workplaces soon. They expose users — and employers, if used on work devices — to too many security vulnerabilities."

Heather Adkins, a founding member of Google's Security Team, issued a two-word advisory: "Don't run Clawdbot."

What Comes Next

The pattern Moltbook and OpenClaw established — autonomous agents operating at scale, reading each other's outputs, executing instructions found in content, with minimal human oversight — isn't going away. It's accelerating.

Prompt injection has no comprehensive fix. It's an architectural problem, not a bug. Every AI system that reads external input is potentially vulnerable. The question isn't whether it can be exploited. The question is whether the verification layers exist to detect and contain exploitation when it happens.

Palo Alto Networks identified what they call a "lethal trifecta": increased autonomy, persistent memory, and near-absent governance protocols. Persistent memory is the accelerant. A malicious payload no longer needs immediate execution. It can be written into an agent's long-term memory in fragments that appear benign in isolation, then assembled into executable instructions days or weeks later. This is time-shifted prompt injection — the security equivalent of a logic bomb.

At what ratio of autonomous agents to human overseers does meaningful oversight become impossible? Is this a linear relationship, or is there a phase transition — a point beyond which the system behaviour changes qualitatively? Moltbook's 88:1 suggests we may already be past it.

Between the current state and mature governance frameworks is a gap. And the gap is where the damage happens.

The Mechanism, Again

AI summarisation drops information based on position, salience, and training bias. Users can steer that process by introducing specific tokens that reshape attention weights. Malicious actors can do the same thing — at scale, automatically, targeting agents rather than humans.

The summarisation article asked: what gets lost? The attention article asked: how do you steer? This article asks: what happens when steering is weaponised and the human who should be watching isn't there?

The answer is Moltbook. 88 bots per human. No verification layer. A database exposed to anyone who looked. 341 malicious skills in a marketplace with no moderation. Agents reading agents reading agents, with each cycle reinforcing whatever the loudest signal happened to be.

The mechanism hasn't changed since the first article. The context has. And context, as the attention mechanism shows, determines everything.

Key Sources

Nagli, G. (2026). Wiz security assessment of Moltbook platform. Wiz Research.
Permiso (2026). Analysis of prompt injection and social engineering on Moltbook.
O'Reilly, J. (2026). Supply chain attack demonstration on ClawHub marketplace.
Koi Security (2026). Malicious skill audit: 341 of 2,857 skills compromised.
Tenable (2026). CVE-2026-25253, CVE-2026-25157. OpenClaw command injection vulnerabilities.
Palo Alto Networks (2026). Analysis of autonomous agent risk.

Companion Research

AI Summarisation Mechanics: What Gets Lost and Why (Curiosity Shed, 2026)
Influencing Token Attention Through Prompting (Curiosity Shed, 2026)

Questions or corrections?

keiron@curiosityshed.co.uk

Curiosity Shed Research — examining what we think we know.

Keiron Northmore & Claude | February 2026