4.1 Development Practices for Working with Agents¶

The central idea of this section is simple: if code generation becomes cheap, then specification and verification become more important. You need a way to state what should happen before the agent starts generating files, and you need an equally clear way to check whether the generated result is correct.

In this chapter we use two practices for that purpose:

Spec-Driven Development (SDD) to describe the behavior and constraints.
Test-Driven Development (TDD) to turn that behavior into executable checks.

They are related, but they are not the same thing. A spec tells the system what problem is being solved and what counts as a valid result. A test checks whether the current implementation satisfies one slice of that result. When you work with an AI agent, both are useful because they attack different failure modes.

Spec-Driven Development¶

Software projects have always had a specification problem. Teams know they need to describe the desired behavior of a system, yet those descriptions are often too vague, too stale, or too disconnected from implementation work to be trusted. The result is familiar: stakeholders think they asked for one thing, developers build another, and nobody notices the mismatch until late in the process.

Specification-driven development is an attempt to reverse that pattern. Rather than treating the specification as documentation written after the fact, SDD treats it as a first-class development artifact. You state what the system should do, what constraints matter, what cases are in scope, and what is out of scope before implementation begins.

That idea long predates modern LLMs. The term appears explicitly in the agile literature in the early 2000s, but the underlying problem is older. What has changed is that AI coding tools make the cost of implementation much lower. A developer can now ask an agent for a feature and receive a large, coherent patch in seconds. That speed is valuable, but it also makes under-specified work more dangerous. If the goal is vague, the agent can now go wrong quickly and at scale.

This is why SDD has returned to the center of the conversation. Recent tooling such as (GitHub Spec-Kit)[https://github.com/github/spec-kit] and other “living spec” workflows all converge on the same lesson: if you want reliable automation, you must externalize intent.

What a useful spec contains¶

A useful spec for AI-assisted work is usually shorter and more operational than a traditional requirements document. It should answer questions the agent would otherwise guess about:

What behavior are we adding or changing?
What inputs and outputs matter?
What constraints are non-negotiable?
What cases must pass before we consider the work correct?
What is explicitly out of scope for this change?

To work with a concrete example, let’s say that we want to create a very simple Python app that checks whether a GitHub repository exists.

# Spec 001: GitHub Repository existence checker 

## Behavior
- The app should only accept a valid GitHub repository URL as input.
- The app should return "not found" if the repository does not exist.
- The app should return "found" if the repository exists.
- The app should return "invalid URL" if the input is not a valid GitHub repository URL.

## Constraints
- The app should be as simple as possible while meeting the behavior requirements.
- The app should not use the GitHub API and leverage HTTP status codes instead.
- The app should be implemented in Python.
- The app should be designed for maintainability and extensibility.
- The app should use uv as project and dependency manager.
- The app should use pytest for testing, if tests are needed.

## Out of scope
- The app does not need to support private repositories or authentication.

If you pass this spec to an agent like GitHub Copilot, you will see that it will be able to implement with high probability a working solution that meets the behavior requirements and constraints.

Specs help because agents are eager guessers¶

Without a spec, the agent fills gaps using statistical plausibility. That often looks competent, but the decisions are still guesses. The model may decide which edge cases matter, invent integration details, or silently broaden scope.

Test-Driven Development¶

Kent Beck popularized Test-Driven Development as a discipline in which tests are written before the production code they exercise. TDD reframed tests as a design tool rather than a final inspection step. The test captures the next small behavior you need, and the production code is then written to satisfy it.

This matters for AI-assisted programming because tests are executable specifications. They do not merely describe behavior; they verify it. When the agent proposes an implementation, the tests provide a fast and concrete answer to the question, “does this actually work?”

The classic TDD cycle is still the right mental model:

Red. Write a small failing test for the next slice of behavior.
Green. Implement the smallest change that makes the test pass.
Refactor. Improve the code while keeping the test suite green.

In human-only development, that loop encourages incremental design. In AI-assisted development, it does something extra: it constrains the agent’s freedom. The model can still propose code, but it must do so against a concrete behavioral target.

TDD with an agent is not “just ask it to use TDD”¶

A weak instruction such as “use TDD” often fails in practice. The agent may:

mock away the behavior that matters,
or produce a very broad test suite that is hard to interpret when it fails.

Those are not small mistakes. They remove the very benefits TDD is supposed to provide.

For that reason, it is better to ask the agent to implement the tests for the behavior described in the spec.

Red phase: ask the agent to write tests for the behavior described in the spec¶

I will implement a function that checks whether a URL is a valid GitHub repository URL. It should be a http or https URL that points to github.com and has the format "https://github.com/{owner}/{repo}".
Please write just the tests for this function using pytest.

This will generate a test suite that is focused on the behavior we care about allowing us to think deeply about the behavior before we ask the agent to generate the implementation. We can then run the tests and see that they fail. We are now in the “red” phase of TDD.

To move to the “green” phase, we can ask the agent to implement the function with the constraint that it must make the tests pass.

Green phase: ask the agent to implement the function with the constraint that it must make the tests pass¶

Create the implementation of the function that checks whether a URL is a valid GitHub repository URL. The implementation must make all the tests pass.

Why these two practices fit together¶

Spec-Driven Development and Test-Driven Development solve different parts of the same control problem.

The spec protects intent.
The test protects behavior.

If you have only tests, the agent may still solve the wrong problem very well. If you have only specs, you may still miss regressions because nothing runs automatically. When used together, the spec tells the agent what matters and the tests tell you whether the current code actually does it.

The next section adds the final layer: repository artifacts and editor features that make these practices persistent across sessions, collaborators, and tools.

References¶

Specification-Driven Development
Ostroff, J. S., Makalsky, D., and Paige, R. F. (2004). “Agile Specification-Driven Development.” In Proceedings of the Agile Development Conference. IEEE.
XP 2004 paper PDF
Spec-Driven Development with AI: Get started with a new open source toolkit
GitHub Spec Kit
TDAD: TDD Anti-Patterns in Agentic Development
Building Effective Agents
GitHub for Beginners: Test-driven development (TDD) with GitHub Copilot
Spec-Driven Development with AI: Get started with a new open source toolkit