AI Reading Notes #2

It’s been about 3 months since I posted “AI Articles and reading“, and I think I need to do it more regularly. Here’s a bunch of quotes from the best things I’ve been reading / watching / listening to that are informing how my thinking about AI and the software industry is developing.

On harness engineering

An introduction to “Harness Engineering”

Birgitta Böckeler has written a great intro to “Harness Engineering” (a follow-up from an earlier post I linked to). Some great new terminology in this piece:

“Guides” / “feedforward”: context you supply up front to nudge the agent in the right direction. For example AGENTS.md, package documentation, specifications and plans, examples to copy, skills.
“Sensors” / “feedback”: tools or scripts the agent can run to get feedback and self-correct. For example linters, unit tests, browser dev tools.
Computational / Inferential: terms to differentiate between deterministic code that is fast and always returns the same result, and code relying on LLMs, which is slow and non-deterministic.

She describes how the engineer’s job changes in this world too – you spend more time tweaking the system that writes code, than tweaking the code itself:

The human’s job in this is to steer the agent by iterating on the harness. Whenever an issue happens multiple times, the feedforward and feedback controls should be improved to make the issue less probable to occur in the future, or even prevent it.
Birgitta Böckeler in Harness Engineering on the Martin Fowler blog

I like this point:

A good harness should not necessarily aim to fully eliminate human input, but to direct it to where our input is most important.
Birgitta Böckeler in Harness Engineering on the Martin Fowler blog

“Every layer of review makes you 10x slower”

Avery Pennarun (CEO at Tailscale) on PR reviews being the bottleneck, and the kind of quality culture you need to adopt if you’re going to start removing or reducing them:

But, the call of AI coding is strong. That first, fast step in the pipeline is so fast! It really does feel like having super powers. I want more super powers. What are we going to do about it?

Maybe we finally have a compelling enough excuse to fix the 20 years of problems hidden by code review culture, and replace it with a real culture of quality.

I think the optimists have half of the right idea. Reducing review stages, even to an uncomfortable degree, is going to be needed. But you can’t just reduce review stages without something to replace them. That way lies the Ford Pinto or any recent Boeing aircraft.

The complete package, the table flip, was what Deming brought to manufacturing. You can’t half-adopt a “total quality” system. You need to eliminate the reviews and obsolete them, in one step.
Every layer of review makes you 10x slower – Avery Pennarun

He names code review as the new bottleneck – that’s my experience too. I love that he then looks at other industrial processes and thinks about how we could build quality into every step rather than trying to catch all things in review (or worse, not catch them if we skimp on reviews).

The spec driven development triangle

Drew Breunig (who I’ve been following since his excellent How Long Contexts Fail post) has been experimenting with spec driven development, including publishing an open source “library” that is just a spec, leaving it up to an agent to implement. In this post

This isn’t a one-way equation. It’s a feedback loop. The act of writing code improves the spec, and it improves the tests. Just like software doesn’t really work until it meets the real world, a spec doesn’t really work until it’s implemented.

So instead of an equation, I propose a triangle. The spec defines what tests need to be written, and what code needs to be written. Tests validate the code. That’s essentially what we had before, just in a different shape.

But the act of implementing code generates new decisions. Those decisions inform the spec. And when the spec updates, new tests need to be written. And sometimes it’s not new decisions — it’s just dependencies or subtle choices. New code surfaces new behaviors that need to be tested.

I call this: the Spec-Driven Development Triangle.

As each node moves forward, our job — and our tooling’s job — is to keep those nodes in sync. That’s the job. If we improve the code, we must improve the spec.
Learnings from a No-Code Library: Keeping the Spec Driven Development Triangle in Sync – Drew Breunig

Trying to compress the remaining 80%

“There’s a lot of noise around how “Ai generates buggy software”. The truth is writing the code was always 20% of the time. Testing and refining was always 80%. But now as that 20% (coding) compresses 10x, we want to compress the 80% (testing and validation) as well. That is the mistake… I find that actually AI is more thoughtful in the initial 20% than most (great) engineers are. But discovering real edge cases doesn’t happen at the initial code writing time.”
Balint Orosz (founder of Craft Docs). Quoted in Are AI agents actually slowing us down? on The Pragmatic Engineer

On company culture in the AI transition

“AI will not make up for that”

This video from Laura Tacho (CTO at DX) points out that a bunch of the main problems software companies have aren’t writing code… but perhaps we could use AI to tackle some of those problems too:

AI time savings is not going to make up for bad meeting culture and lots of interruptions and developers who are constantly being pulled out of their work: unplanned work, interruptions, outages, those kinds of things. AI will not make up for that. We can use AI to help solve that problem, but AI in and of itself is not going to make up for it.

Then when we look in the bottom half: build and test wait time, toil, and dev environment – and we put all that together, we realise that just the time savings from coding task speedup isn’t going to get us very far. But what will get us far is when we can take AI and point it at those problems. Can we use AI to help reduce meeting frequency? Can we use AI to improve CI wait time? Can we use AI to reduce dev environment toil? That is what winning organisations are doing right now. They’re putting Dev Ex at the centre of their universe and using AI as a tool to fix systems level problems.”
Data vs Hype: How Orgs Actually Win with AI – The Pragmatic Summit
Laura Tacho, CTO at DX.

Main quests and side quests

It’s so tempting with agents to spend time re-writing SAAS products that don’t quite fit the workflow we want. This comment from Karri Saarinen (CEO of Linear) hit home:

Internally, we always talked about main quests and side quests. Everyone should focus on the main quest, and moderately – or not all – on side quests. Both quest lines feel productive, but only one of them advances the main mission of the company.
Karri Saarinen. Quoted in Are AI agents actually slowing us down? on The Pragmatic Engineer

And this follow up comment from Gergely Orosz:

I love this framing from Karri: it speaks to why I see more engineers do “side quests” like rebuilding a SaaS vendor using AI and bragging about it. Yes, a new ticketing system is impressive, but it won’t help the company generate more revenue, and probably won’t save costs as it lacks features that a mature solution has, and maintaining it will take up precious focus and time.
Gergely Orosz, Are AI agents actually slowing us down? on The Pragmatic Engineer

On the industry as a whole

When everyone can code

I loved this simple parallel from Steve Yegge:

Everyone’s going to be forking. I think that’s a natural consequence of everybody writing code.
Just like everyone can take a picture now. That didn’t used to be true.
Steve Yegge – From The Pragmatic Engineer Podcast: From IDEs to AI Agents with Steve Yegge

Things definitely changed with digital photography, and then modern smartphone photography. And I appreciate that there’s still a healthy industry of professional photographers even though everyone can “do it themselves”

“Education is solved because we now have computers”

The Pragmatic Engineer interview with Mario Zechner (creator of Pi) and Armin Ronacher (creator of Flask) was excellent. I liked the way their conversation was from people who are very AI forward while still being relatively low on hype, high on caring about code quality etc.

This interaction on the hubris of software engineers thinking every person’s job will now be replaced by AI and software felt grounding compared to the hype:

Mario: I think one thing we software engineers or IT people underestimate is just how freaking complex the world is. And how much human squishiness is in each little nook and granny and corner, right?

Now we can automate everything, like every bit of knowledge work. But we as software engineers are so bad at becoming domain experts that we don’t see all the non-machine parts that go into a workflow. And we are running to the same fallacy here again.

We are seeing models doing incredible things. I’m not disputing that. This is for me like whoa, basically all my research in the 2000s is null and void because transformers can do all the things.

But we are overextending that to everything, like we always do in software, like we did in ed-tech. Yeah, we have tablets in classrooms now. I’m sure now it’s solved. Education is solved because we now have computers.

Gergely: Well, in fact, I’ve heard, I don’t know which country it was, but they’re now rolling back to Sweden. They’re taking the tablets out from the classroom.

Mario: It turns out if you do some scientific investigations into the tactics and effects on pupils, if you do just throw a bunch of tablets into a classroom, close it and hope for the best, turns out the best is terrible. So yeah, for me, I think the biggest takeaway in the past two to three years is the hype is terrible because it dehumanizes everything. And I want to not be part of that circus.
Mario Zechner and Mario Zechner in Building Pi, and what makes self-modifying software so fascinating (The Pragmatic Engineer podcast)

There’s still room for expertise. And humanity.

I was working in schools when the hype about iPads was at fever-pitch, and he’s right that the human parts of the problem are far far bigger than the hype acknowledged.

“Maybe you’ll notice another historical pattern”

Tanya Verma naming similarities between AI and colonisation:

There is something special about training a model on all of humanity’s data and then locking it up for the benefit of a few well-connected organizations that you have relationships with. Maybe you’ll notice another historical pattern here. Extract value from a population that can’t meaningfully consent, concentrate the returns within a small inner circle, and then offer some version of charity to the people you extracted from as moral cover for the arrangement. The pattern repeats itself with labs promising post-AGI UBI or encouraging EA philanthropy while continuing to concentrate frontier capability. Not saying the intent is malicious, I think many are trying to do the best they can, I’m simply noticing.
Closing of the Frontier by Tanya Verma

Work slop and the loss of trust

This blog post painted a relatable picture of both “workslop” (vibe-creating something with AI in a way that is fast for you but wastes your co-worker’s time) and also what happens when you can no longer confidently assert the that you know the thing works. You lose trust, and the trust is what people are paying for:

For firms, the competitive advantage of a firm whose work can be trusted has not disappeared; it has, if anything, appreciated, because so many of the firm’s competitors are quietly converting themselves into content-generation pipelines and counting on the client not to notice.
This is already coming to a head. Deloitte has already refunded part of a $440,000 fee over an AI-hallucinated government report. It could be a production system built on a hallucinated specification, or a senior engineer who realizes they have spent the last year nominally reviewing work they could no longer competently review. The reckoning will not be subtle. The firms still doing the work properly will be in a position to charge for it. The firms that have hollowed themselves out will discover that what they hollowed out was the thing the client was paying for.
Appearing Productive in The Workplace on blog No One’s Happy

Five months in, I think I’ve decided that I don’t want to vibecode — I want professionally managed software companies to use AI coding assistance to make more/better/cheaper software products that they sell to me for money.
Matthew Yglesias, seen via Simon Willison’s blog