AI June 4, 2026 mixed ⇧ 423 pts across 2 threads

Containing AI agents is now a real engineering problem

Anthropic published a piece on how they contain Claude across products, and the HN thread immediately went to work finding holes. One commenter flagged a domain fronting attack on the egress proxy that could enable data exfiltration. Another is building a local inference rig with four RTX PRO 6000 Blackwell GPUs specifically to avoid handing a cloud service agent-level access to their systems.

This connects to the LLM hacking thread, where someone spent $1,500 testing whether LLMs could compromise a vulnerable app. The finding: Anthropic's guardrails actively interfered with the attack surface, while other models were less restricted. That's useful data, but it also highlights that the containment problem is not solved, just differently configured depending on which model you use.

The pattern across both threads: as AI agents get more access to real systems, the security model for those systems needs to be rebuilt from scratch. The current approach of treating an agent like a privileged user is not adequate.

So what?

If you are shipping agentic features that touch user data or external APIs, you need a threat model that specifically accounts for prompt injection and exfiltration vectors. Assuming the model provider's guardrails will protect you is not a security posture. Build egress controls and audit logs before you ship.

Read these

The ways we contain Claude across products

155 pts 75 comments jbredeche

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

268 pts 127 comments jc4p

← Back to today