Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

4.1 Development Practices for Working with Agents

4.1 Development Practices for Working with Agents

The central idea of this section is simple: if code generation becomes cheap, then specification and verification become more important. You need a way to state what should happen before the agent starts generating files, and you need an equally clear way to check whether the generated result is correct.

In this chapter we use two practices for that purpose:

They are related, but they are not the same thing. A spec tells the system what problem is being solved and what counts as a valid result. A test checks whether the current implementation satisfies one slice of that result. When you work with an AI agent, both are useful because they attack different failure modes.

Spec-Driven Development

Software projects have always had a specification problem. Teams know they need to describe the desired behavior of a system, yet those descriptions are often too vague, too stale, or too disconnected from implementation work to be trusted. The result is familiar: stakeholders think they asked for one thing, developers build another, and nobody notices the mismatch until late in the process.

Specification-driven development is an attempt to reverse that pattern. Rather than treating the specification as documentation written after the fact, SDD treats it as a first-class development artifact. You state what the system should do, what constraints matter, what cases are in scope, and what is out of scope before implementation begins.

That idea long predates modern LLMs. The term appears explicitly in the agile literature in the early 2000s, but the underlying problem is older. What has changed is that AI coding tools make the cost of implementation much lower. A developer can now ask an agent for a feature and receive a large, coherent patch in seconds. That speed is valuable, but it also makes under-specified work more dangerous. If the goal is vague, the agent can now go wrong quickly and at scale.

This is why SDD has returned to the center of the conversation. Recent tooling such as (GitHub Spec-Kit)[https://github.com/github/spec-kit] and other “living spec” workflows all converge on the same lesson: if you want reliable automation, you must externalize intent.

What a useful spec contains

A useful spec for AI-assisted work is usually shorter and more operational than a traditional requirements document. It should answer questions the agent would otherwise guess about:

To work with a concrete example, let’s say that we want to create a very simple Python app that checks whether a GitHub repository exists.

# Spec 001: GitHub Repository existence checker 

## Behavior
- The app should only accept a valid GitHub repository URL as input.
- The app should return "not found" if the repository does not exist.
- The app should return "found" if the repository exists.
- The app should return "invalid URL" if the input is not a valid GitHub repository URL.

## Constraints
- The app should be as simple as possible while meeting the behavior requirements.
- The app should not use the GitHub API and leverage HTTP status codes instead.
- The app should be implemented in Python.
- The app should be designed for maintainability and extensibility.
- The app should use uv as project and dependency manager.
- The app should use pytest for testing, if tests are needed.

## Out of scope
- The app does not need to support private repositories or authentication.

If you pass this spec to an agent like GitHub Copilot, you will see that it will be able to implement with high probability a working solution that meets the behavior requirements and constraints.

Specs help because agents are eager guessers

Without a spec, the agent fills gaps using statistical plausibility. That often looks competent, but the decisions are still guesses. The model may decide which edge cases matter, invent integration details, or silently broaden scope.

Test-Driven Development

Kent Beck popularized Test-Driven Development as a discipline in which tests are written before the production code they exercise. TDD reframed tests as a design tool rather than a final inspection step. The test captures the next small behavior you need, and the production code is then written to satisfy it.

This matters for AI-assisted programming because tests are executable specifications. They do not merely describe behavior; they verify it. When the agent proposes an implementation, the tests provide a fast and concrete answer to the question, “does this actually work?”

The classic TDD cycle is still the right mental model:

In human-only development, that loop encourages incremental design. In AI-assisted development, it does something extra: it constrains the agent’s freedom. The model can still propose code, but it must do so against a concrete behavioral target.

TDD with an agent is not “just ask it to use TDD”

A weak instruction such as “use TDD” often fails in practice. The agent may:

Those are not small mistakes. They remove the very benefits TDD is supposed to provide.

For that reason, it is better to ask the agent to implement the tests for the behavior described in the spec.

Red phase: ask the agent to write tests for the behavior described in the spec

I will implement a function that checks whether a URL is a valid GitHub repository URL. It should be a http or https URL that points to github.com and has the format "https://github.com/{owner}/{repo}".
Please write just the tests for this function using pytest.

This will generate a test suite that is focused on the behavior we care about allowing us to think deeply about the behavior before we ask the agent to generate the implementation. We can then run the tests and see that they fail. We are now in the “red” phase of TDD.

To move to the “green” phase, we can ask the agent to implement the function with the constraint that it must make the tests pass.

Green phase: ask the agent to implement the function with the constraint that it must make the tests pass

Create the implementation of the function that checks whether a URL is a valid GitHub repository URL. The implementation must make all the tests pass.

Why these two practices fit together

Spec-Driven Development and Test-Driven Development solve different parts of the same control problem.

If you have only tests, the agent may still solve the wrong problem very well. If you have only specs, you may still miss regressions because nothing runs automatically. When used together, the spec tells the agent what matters and the tests tell you whether the current code actually does it.

The next section adds the final layer: repository artifacts and editor features that make these practices persistent across sessions, collaborators, and tools.

References