
I spent last week at AIE Europe in London. I gave a talk, co-ran a workshop with Zack Proser, and we did a podcast episode together. Here’s what each one was about.
The talk: Building AI Systems that Ship
18 minutes on why agent demos break when they leave controlled environments. Spoiler: it’s almost never the model. It’s everything around it.
I walked through four failures from building Case — an agent that gamed a test gate by touching an empty file instead of actually running tests, a CLI installer that overwrote a framework-internal file and declared “Integration Complete” while the build was broken, 10,739 lines of auto-generated docs we deleted because the model performed better without them, and a skill that scored negative in evals. Each one pointed at the same lesson: enforce things mechanically, guide with gotcha lists instead of tutorials, and measure whether your context actually helps.


The workshop: Skills at Scale
Zack and I ran an 80-minute hands-on workshop on writing portable AI skills — markdown files that work across Claude Code, Codex, Cursor, and anything else that reads them. The point of a skill is to encode your constraints and your team’s judgment so the AI doesn’t start every conversation from zero.
We had everyone build a skill called “Repo Roast” that audits a git repo’s health. It started vague and by the end of the session people had it citing file counts, finding stale TODOs, running git log for churn hotspots, and scoring its own confidence on each finding. The big concepts were constraints over instructions, inline scripts for evidence, and self-assessment so weak findings get dropped automatically.

The podcast: Scaling DevTools
We also sat down with Scaling DevTools for a podcast episode. We talked about both talks, how we work with AI day to day, and our favorite skills. Mine is Ideation — I use it before almost every non-trivial piece of work now to turn messy brain dumps into structured specs.
We got into Case and harness engineering too — dispatching agents through a pipeline with enforcement gates, the retrospective agent that learns from its own failures, all of it. I wrote about that in detail here if you want the deep dive.
We also talked about how building these systems has changed how we think about our own work. Writing skills and harnesses forces you to articulate stuff that’s been implicit in your head for years. It’s been kind of a weird way to rediscover what you actually know.

Nick Nisi
A passionate TypeScript enthusiast, podcast host, and dedicated community builder.
Follow me on Bluesky.