4.2 Artifacts, Tools, and Guardrails

In the previous section we focused on practices: write specs before code, and use tests to make behavior executable. Those practices work best when they are supported by artifacts that survive beyond a single chat session.

This section is about those artifacts. In an AI-assisted workflow, the editor is not just a text editor. It is a workspace where the model reads files, follows instructions, calls tools, and sometimes performs actions on your behalf. That means part of the software design now lives in repository metadata and editor configuration.

For Chapter 4, four artifacts matter most:

repository instructions in AGENTS.md,
reusable prompts and skills,
tool connections such as MCP servers,
and approval settings that enforce least privilege.

`AGENTS.md`: persistent repository memory¶

One of the recurring failure modes in AI coding is decision amnesia. You tell an agent about naming conventions, testing expectations, or workflow constraints in chat, and then the model forgets them in the next session. The result is that the same guidance has to be repeated over and over.

An AGENTS.md file addresses that problem by storing stable repository-level instructions in a place the agent can consult as part of its working context. You can think of it as a lightweight operating manual for collaborators that are part human and part machine.

A useful AGENTS.md file typically includes:

the project purpose,
key architectural conventions,
how to run tests and validation commands,
scope boundaries,
and review rules such as “do not modify generated files” or “prefer small patches.”

For example, for a project we might have an AGENTS.md that says:

# AGENTS.md

- Use `uv` for dependency management.
- Use `pytest` for testing.
- Prefer behavioral tests over implementation-detail tests.

The file does not need to be long. Its value comes from precision and stability, not volume.

Skills¶

If AGENTS.md is the repository’s standing policy, a skill is a reusable procedure. In GitHub Copilot and VS Code, a skill is a directory centered on a SKILL.md file, with optional scripts, templates, examples, and other files that support a specialized workflow. The key idea is that skills are loaded on demand rather than applied all the time.

That makes them different from AGENTS.md.

AGENTS.md tells the agent how to behave in this repository in general.
A skill teaches the agent how to perform a particular kind of task.

In practice, AGENTS.md is where you put stable project guidance such as test commands, coding conventions, forbidden directories, and review expectations. A skill is where you put an operational recipe: how to triage a flaky test, how to review a pull request, or how to inspect a web app with the browser tool.

This distinction matters because the failure modes are different. If you put everything in AGENTS.md, the agent carries too much always-on instruction and important rules get buried in noise. If you put repository policy into a skill, the agent may miss critical constraints because the skill is only loaded when it is invoked or judged relevant. The right split is simple: policy in AGENTS.md, procedure in skills.

It is also worth separating skills from custom agents in VS Code. A custom agent stored in an .agent.md file defines a role, tool set, and operating mode such as “planner” or “security reviewer.” A skill does not define a persona. It packages a capability that an agent can use. Put differently: an agent answers “who is doing the work and with which tools?” A skill answers “what reusable workflow should that agent follow?”

What skills are good for¶

Skills are most useful when a task has enough structure to deserve reuse but is too detailed to restate in every chat. Good examples include:

grill-me, a compact productivity skill that interviews the user relentlessly about a plan or design until the agent and the user reach a shared understanding,
tdd, an engineering skill that enforces a red-green-refactor loop while the agent builds one vertical slice at a time,
debugging playbooks,
release or deployment procedures,
and data-cleaning or grading workflows.

The grill-me and tdd examples above were created by Matt Pocock and are available in the mattpocock/skills repository.

Skills also travel well across tools. VS Code implements Agent Skills as an open standard, so the same skill can be used by GitHub Copilot in VS Code and, in many cases, by other tools that understand the same format. That is useful for teams because the workflow description can outlive any one editor integration.

How to use skills with GitHub Copilot and VS Code¶

In VS Code, project skills normally live in a directory such as .github/skills/, and each skill has its own folder:

.github/
	skills/
		grill-me/
			SKILL.md

The SKILL.md file contains YAML frontmatter and then the instructions. If we look at the grill-me skill, the frontmatter includes a name, a description, and a prompt:

---
name: grill-me
description: Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when the user wants to stress-test a plan, get grilled on a design, or explicitly says "grill me".
---

Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.

Ask the questions one at a time.

If a question can be answered by exploring the codebase, explore the codebase instead.

The tdd skill uses the same file format, but with a larger set of instructions and supporting documents. Its frontmatter description is a concise summary of the workflow: test-driven development with a red-green-refactor loop, used when the user wants to build a feature or fix a bug test-first. The body then spells out the loop in more detail: plan the interface changes, write tests around behavior rather than implementation details, make one test pass at a time, and refactor between cycles.

From there, GitHub Copilot in VS Code can use the skill in two ways:

automatically, when the skill description matches the current task,
or explicitly, when you invoke it as a slash command in chat.

The explicit path is usually the easiest one to teach. In the chat box, type / and choose the skill, or use /skills to open the skills configuration UI. A prompt such as /grill-me help me think through the design of an autograder or /tdd add regression coverage for the autograder tells Copilot which reusable workflow to run and adds task-specific context after the command.

Tools and MCP servers¶

By default, a model can generate suggestions, but it cannot safely inspect your project, run commands, or fetch outside information unless those actions are available as tools.

A tool is a controlled capability the agent can call during a task, such as searching files, running tests, reading a web page, or executing a terminal command.

We can group tools into three categories:

Built-in tools: capabilities provided by VS Code itself (for example, reading files in the workspace or running terminal tasks).
Extension tools: capabilities added by extensions you install (for example, tools from a testing, cloud, or issue-tracking extension).
MCP tools: capabilities exposed by external servers through the Model Context Protocol (MCP).

MCP is easiest to understand as a standard connector between the agent and external systems. Instead of hard-coding support for every API, the agent talks to an MCP server that exposes specific actions in a structured, predictable way.

Why this matters: it makes the agent more reliable and more auditable. Suppose the agent needs the exact syntax of a CLI flag or the current behavior of a library API. Without tools, it may rely on imperfect memory. With an MCP fetch tool, it can retrieve the source documentation directly and ground its answer in that source.

The practical result is fewer “best guess” responses and more verifiable work. The agent can show where information came from, and you can review both the action it took and the data it used.

Approvals and least privilege¶

Tools make agents more capable, but also more dangerous. An agent with shell access, network access, and file-system write access can do useful work quickly. It can also make damaging decisions quickly.

For that reason, approval settings are not a nuisance. They are part of the design of the system.

The principle to follow is the same one used elsewhere in security engineering: least privilege. Give the agent the smallest set of permissions that still allows useful progress.

That usually means:

read-only access by default,
explicit approval for destructive or irreversible actions,
narrow tool scopes where possible,
and human review before changes that affect production systems, secrets, or history.

Putting it all together¶

When you put all these pieces together, you get a workflow that looks like this:

Specs reduce ambiguity.
Tests catch regressions.
AGENTS.md preserves local conventions.
Reusable prompts standardize recurring workflows.
Tools retrieve facts and execute checks.
Approvals keep automation bounded.

This is what controlled AI-assisted development looks like in practice. The goal is not to remove the human from the loop. The goal is to make the loop explicit, repeatable, and cheap to audit.

AGENTS.md: persistent repository memory¶