MCP Servers
12.1 What MCP Is
The Model Context Protocol (MCP) is an open standard for connecting AI agents to external tools and services. Claude Code is an MCP client — it can call tools from any MCP server, giving your agents access to systems far beyond the local file system.
What this unlocks in practice:
- A researcher agent that queries your company's database directly
- A reviewer agent that posts GitHub review comments automatically
- An ops agent that reads from and writes to your CRM
- A pipeline that pushes published articles to your CMS
- Any workflow that needs to touch an external API or data source
MCP servers are separate processes that expose tools via a standard interface. Claude Code discovers them from your configuration and makes their tools available just like built-in tools (Read, Write, Bash).
12.2 Adding MCP Servers
Via CLI (one-time setup)
# Add a remote HTTP MCP server
claude mcp add --transport http my-api https://api.mycompany.com/mcp
# Add a local stdio MCP server (runs as a subprocess)
claude mcp add --transport stdio github npx -y @modelcontextprotocol/server-github
# List configured servers
claude mcp list
# Remove a server
claude mcp remove github Via settings.json (project-persistent)
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
}
},
"supabase": {
"command": "npx",
"args": ["-y", "@supabase/mcp-server-supabase"],
"env": {
"SUPABASE_URL": "${SUPABASE_URL}",
"SUPABASE_SERVICE_ROLE_KEY": "${SUPABASE_KEY}"
}
}
}
} ${VAR_NAME}) for credentials in settings.json — never hardcode secrets. Claude Code resolves them from your shell environment.12.3 MCP in Agentic Flows
Giving agents MCP tool access
Add MCP tools to an agent's tools whitelist using the format mcp__server-name__tool-name:
---
name: github-reviewer
description: Reviews GitHub pull requests, posts comments, and updates PR
status. Use when asked to review a PR or check open pull requests.
model: sonnet
tools: Read, mcp__github__list_pull_requests, mcp__github__get_pull_request,
mcp__github__create_review, mcp__github__add_pull_request_review_comment
permissionMode: default
---
You are a GitHub PR reviewer. You review code changes and post structured
feedback directly to the pull request.
When invoked with a PR number:
1. Fetch the PR details and diff
2. Review for: correctness, security, test coverage, style
3. Post inline comments for specific issues
4. Submit a review with verdict: Approve / Request Changes / Comment Skill + MCP combination
Skills work well as MCP orchestrators — they define the workflow, agents provide the expertise:
---
name: weekly-metrics
description: Pulls this week's key metrics from our database and generates
an executive summary. Run every Monday. Invoke with /weekly-metrics.
user-invocable: true
allowed-tools: Read, Write, mcp__supabase__execute_sql
---
# Weekly Metrics Skill
Generate the weekly metrics report from our Supabase database.
## Data Queries
Run these queries and save results to /work/metrics-raw.json:
1. New signups this week:
SELECT COUNT(*) FROM users WHERE created_at >= NOW() - INTERVAL '7 days'
2. Active users (any event in 7 days):
SELECT COUNT(DISTINCT user_id) FROM events
WHERE timestamp >= NOW() - INTERVAL '7 days'
3. Revenue this week:
SELECT SUM(amount) FROM transactions
WHERE created_at >= NOW() - INTERVAL '7 days' AND status = 'completed'
## After collecting data
Invoke @analyst agent to generate the executive summary from the raw data.
Save the summary to /output/weekly-metrics-{date}.md 12.4 Real Integration Examples
GitHub — code review workflow
## MCP Setup
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}" }
}
}
}
## Usage
"Review all open PRs in the anthropic/claude-code repository
that are more than 2 days old and haven't been reviewed yet." Supabase — data analysis agent
---
name: data-analyst
description: Queries the Supabase database and produces structured analysis.
Use for any reporting, metrics, or data investigation task.
model: opus
tools: Read, Write, mcp__supabase__execute_sql, mcp__supabase__list_tables
permissionMode: default
---
You are a data analyst with direct database access.
Before writing any query:
1. List available tables to understand the schema
2. Write safe, read-only SELECT queries
3. Never use DELETE, UPDATE, DROP, or INSERT without explicit user permission
Format your output as: findings in prose + supporting tables + raw SQL used Google Drive — document pipeline
---
name: doc-publisher
description: Exports finished articles to Google Drive in the Publications
folder. Use after /format-article completes and the article is approved.
model: haiku
tools: Read, mcp__gdrive__create_file, mcp__gdrive__move_file
permissionMode: default
---
When given an article path:
1. Read the article from /output/
2. Create a new Google Doc in the "Publications/Drafts" folder
3. Report the document URL to the user Plugins
13.1 What Plugins Are
A plugin is a portable bundle of agents, skills, hooks, and commands that can be installed once and used across all your projects. Where a project's .claude/ directory is local to one codebase, a plugin is global — available everywhere.
Two reasons to create a plugin:
- Personal reuse: You've built something excellent (a code reviewer, a weekly report generator, a research assistant) and you want it in every project without copying files.
- Distribution: You want to share your system with your team, open-source it, or publish it to the Claude plugin marketplace.
13.2 Creating a Plugin
Directory structure
my-plugin/
├── .claude-plugin/
│ └── marketplace.json ← plugin manifest (required)
├── agents/
│ ├── code-reviewer.md
│ └── doc-writer.md
├── skills/
│ └── pr-review/
│ └── SKILL.md
└── hooks/
└── auto-lint.json The manifest file
{
"name": "my-dev-toolkit",
"version": "1.2.0",
"description": "Code review, documentation, and PR workflow tools for development teams",
"author": "Your Name",
"license": "MIT",
"repository": "https://github.com/you/my-dev-toolkit",
"components": {
"agents": ["agents/code-reviewer.md", "agents/doc-writer.md"],
"skills": ["skills/pr-review/SKILL.md"],
"hooks": ["hooks/auto-lint.json"]
},
"permissions": {
"tools": ["Read", "Grep", "Bash"],
"network": false
}
} What to include vs. exclude
| Include | Exclude |
|---|---|
| Agents and skills with broad applicability | Project-specific agents (they reference your codebase structure) |
| Hooks for general automation (lint, notify) | Hardcoded file paths or project-specific rules |
| Supporting files (templates, checklists) | Credentials or environment-specific configuration |
| A README explaining how to use each component | Files from your .claude/agent-memory/ |
13.3 Installing and Managing Plugins
Install from a URL
# Install from GitHub
/plugin install https://github.com/you/my-dev-toolkit
# Install from the marketplace
/plugin marketplace add my-dev-toolkit
# List installed plugins
/plugin list
# Update a plugin
/plugin update my-dev-toolkit
# Remove a plugin
/plugin remove my-dev-toolkit Using installed plugin components
Plugin agents appear in the @ typeahead as plugin-name:agent-name:
@my-dev-toolkit:code-reviewer please review this file Plugin skills appear as slash commands prefixed with the plugin name:
/my-dev-toolkit:pr-review Best Practices
14.1 The QA Layer: Self-Auditing Systems
In production agentic systems, the QA Layer is a three-role cycle that runs automatically after each completed step. It's external to the creative process: it observes, measures, and proposes — but never modifies or makes decisions on behalf of the user.
Three QA roles
| Agent | Role | Can | Cannot |
|---|---|---|---|
Auditor (age-spe-auditor) | Verifies rule compliance | Read files, report compliance | Modify anything, suggest improvements |
Evaluator (age-spe-evaluator) | Scores phase quality | Calculate scores, write to qa-report.md | Modify entities, issue qualitative judgements |
Optimizer (age-spe-optimizer) | Proposes improvements | Detect patterns, propose changes | Apply changes automatically |
Scoring rubric
The Evaluator scores each phase on four weighted dimensions:
| Dimension | Weight | What it measures |
|---|---|---|
| Completeness | 30% | Are all required elements present and fully formed? |
| Quality | 30% | Specificity and concreteness of the output |
| Compliance | 25% | Adherence to active rules |
| Efficiency | 15% | Number of iterations/regenerations needed |
Scores: Excellent (≥9.0) | Good (7.0-8.9) | Improvable (5.0-6.9) | Critical (<5.0)
The QA cycle in practice
After each approved step, the cycle fires automatically:
- Auditor reads rules from disk and checks compliance → appends Audit Report to
qa-report.md - Evaluator scores the phase → appends Score block to
qa-report.md - At process close: Optimizer analyzes all audit/score blocks → proposes prioritized improvements
The qa-report.md file is append-only — it is never overwritten, creating a complete audit trail.
Example: Auditor agent
---
name: age-spe-auditor
description: Audits phase outputs for rule compliance by reading rules
from disk at audit time. Use after each approved checkpoint to verify
the output follows all active constraints.
model: opus
tools: Read, Grep, Glob
permissionMode: plan
---
You are a compliance auditor. Read each active rule file from
.claude/rules/ at audit time — never rely on cached versions.
For each rule, verify compliance and report:
- ✅ Compliant (with supporting evidence)
- ⚠️ Partially compliant (what's missing)
- ❌ Non-compliant (specific violation)
Append your report to qa-report.md. Never modify any other file. Quality gates in CLAUDE.md
## Quality Gate Rules
After each approved checkpoint:
1. Invoke @age-spe-auditor — reads rules from disk, appends audit to qa-report.md
2. Invoke @age-spe-evaluator — scores the phase, appends score block
3. If score < 5.0 (Critical): warn the user before proceeding
4. At process close: invoke @age-spe-optimizer for improvement proposals
Never skip the quality gate for outputs going to /output/ (published content). Iterating on generated systems
Once a system is in production, you can evolve it without starting from scratch using three iteration modes:
| Mode | When | What happens |
|---|---|---|
| PATCH | Fix or update specific entities | Entity builder edits in place → patch version bump |
| REFACTOR | Reorganize architecture | Architecture designer produces delta blueprint → minor bump |
| EVOLVE | Add new capabilities | Mini-discovery → architecture → implementation → minor/major bump |
Each iteration creates a branch (e.g., iter/0.2.0-add-email-skill). When ready, merge to main and tag the version.
14.2 Multi-Model Strategies
Assigning the right model to each agent is one of the highest-leverage decisions in system design. The difference between Haiku and Opus is 10-20x in cost and 3-5x in latency — for the right task, both ends of the spectrum are correct.
| Model | Cost | Best agent types | Avoid for |
|---|---|---|---|
| Haiku 4.5 | Lowest | Classifiers, routers, format validators, simple extractors | Complex reasoning, nuanced writing, architectural decisions |
| Sonnet 4.6 | Medium | Writers, editors, code generators, reviewers, most implementation work | Tasks requiring deep multi-step reasoning across large codebases |
| Opus 4.6 | Highest | Architects, complex researchers, orchestrators for difficult decisions, QA auditors | High-volume routine tasks where cost compounds |
The content pipeline example, optimized
| Agent | Model | Reasoning |
|---|---|---|
| Classifier (routes requests) | Haiku | Binary classification — doesn't need reasoning depth |
| Researcher | Opus | Source evaluation requires deep judgment; errors compound downstream |
| Writer | Sonnet | Good writing within clear constraints; fast iteration |
| Reviewer | Sonnet | Checklist evaluation — structured, not creative |
| QA Auditor | Opus | Final gate before publication — highest stakes, justify the cost |
| Formatter | Haiku | Mechanical formatting — no judgment required |
14.3 The Golden Rules
These rules emerge consistently from production systems. Violate them early, and you'll spend hours debugging problems that didn't need to exist.
| # | Rule | What breaks when you ignore it |
|---|---|---|
| 1 | One responsibility per agent. If you can't name it in two words, it does too much. | Agents become unpredictable. Failures are hard to locate. Context windows fill with unrelated work. |
| 2 | Description is a routing rule, not a label. Write trigger conditions, not agent biography. | Wrong agents get invoked. Auto-delegation fails. You end up explicitly mentioning agents for everything. |
| 3 | CLAUDE.md stays lean. If it's workflow-specific, it's a Skill. | CLAUDE.md balloons. Context fills every session. Agents load irrelevant instructions. |
| 4 | Rules need alternatives. "Never X" → "Instead of X, do Y". | Agents know what not to do but not what to do instead. They find workarounds or fail silently. |
| 5 | Test the simplest version first. One agent, one task, end to end. Then add. | You build a 12-agent system, something breaks, and you can't tell where. |
| 6 | Commit your .claude/ directory. Always. | A working system isn't reproducible. Colleagues can't run it. You can't roll back. |
| 7 | Explicit file paths in spawn prompts. Don't assume agents know where to look. | Context gets lost between stages. Agents read the wrong files or write to unexpected locations. |
14.4 Common Anti-Patterns
The God Agent
One agent that does research, writing, editing, formatting, and publishing. The description is a paragraph. The instructions are 4,000 words. It works sometimes and fails inconsistently.
Fix: Apply the decision tree from Chapter 4. Every distinct responsibility becomes its own agent.
The Prompt Novel
CLAUDE.md is 80KB of detailed instructions, edge cases, examples, and backstory. Every session loads all of it. Context is 40% consumed before the user types a word.
Fix: Apply the golden rule: if it's workflow-specific, it's a Skill. CLAUDE.md should be scannable in 30 seconds.
The Invisible Rule
A critical constraint is buried in paragraph 7 of a 12-paragraph system prompt. Agents read it once, follow it 60% of the time, and violate it silently the other 40%.
Fix: Rules go in a clearly labeled ## Rules section with bullet points. One rule per bullet. Put the most important rules first.
The Over-Engineered System
A 3-step workflow with 12 agents, 8 skills, 15 rules, and 6 hooks. Built in a weekend. Debugged for a month.
Fix: Start with the minimum that works. Add an agent or skill only when a specific, recurring problem requires it. Complexity is easy to add; hard to remove.
The Silent Context Loss
Agent B gets invoked after Agent A, but the orchestrator doesn't tell B what A produced. B starts from scratch, duplicates work, or makes different assumptions.
Fix: Establish a file handoff convention. Every agent writes its output to a predictable path. Every spawn prompt explicitly references that path.
The Vague Description
Two agents with overlapping descriptions: "handles writing tasks" and "creates content". Claude hesitates between them, picks arbitrarily, or asks you every time.
Fix: Each description must answer: in what specific situation should this agent be invoked? Front-load with trigger phrases. No overlap between agents.
14.5 Debugging Agentic Flows
When something goes wrong in a multi-agent flow, the debugging approach is systematic — not exploratory.
Step 1: Identify where in the flow it broke
Check the intermediate files. If /work/research-topic.md exists and looks right but /work/draft-topic.md is wrong, the problem is in the writer, not the researcher.
Step 2: Check what context the agent received
Add a temporary rule to CLAUDE.md:
## Debugging Rule (temporary — remove after fixing)
Each agent, at the very start of its task, must output:
"CONTEXT RECEIVED: [paste first 200 characters of your spawn prompt here]" Step 3: Use Plan Mode before executing
Run /plan before triggering the workflow. Claude will show its complete delegation plan — which agent it intends to invoke, with what context, in what order. Catch problems here, before any files are touched.
Step 4: Run /doctor
/doctor Checks your Claude Code setup for common configuration problems: missing frontmatter, invalid tool names, unreachable MCP servers, malformed settings.json.
Step 5: Isolate and test a single agent
Invoke the suspect agent explicitly with a known-good input:
@writer I'm going to give you a test research brief. Read it carefully and write a draft.
Research: [paste brief directly here]
Write the draft to /work/test-draft.md If the agent works correctly in isolation, the problem is context transfer. If it fails in isolation, the problem is the agent's instructions.
Logging agent outputs to files
Add a rule to CLAUDE.md that persists agent reasoning:
## Logging Rules
Every agent must append a summary of what it did to /logs/agent-log.md:
Format:
---
[timestamp] @agent-name
Input received: [one line summary]
Action taken: [one line summary]
Output written to: [file path]
Issues encountered: [none / description]
--- 14.6 Non-Coding Use Cases
Claude Code is marketed as a coding tool, but the agent runtime it provides is general-purpose. The file system is your workspace; the agents are your specialists; the skills are your procedures. None of that requires code.
| Domain | Example system | Agents involved |
|---|---|---|
| Content production | Article pipeline: research → draft → review → publish | Researcher, writer, editor, SEO analyst, formatter |
| Market research | Competitive intelligence: monitor → analyze → report | Competitor tracker, analyst, report writer |
| Project management | Weekly digest: gather status → synthesize → distribute | Status collector, summarizer, distributor |
| HR / recruiting | CV screening: parse → score → shortlist → notify | CV parser, scorer, shortlister, email drafter |
| Finance | Expense review: categorize → flag anomalies → report | Classifier, anomaly detector, report generator |
| Legal / compliance | Contract review: extract clauses → check against policy → flag | Clause extractor, policy checker, risk reporter |
| Customer success | Feedback triage: classify → route → draft response | Classifier, router, response drafter |
The pattern is identical in every case: decompose the process into stages, assign one agent per stage, design the file handoffs, encode the orchestration in CLAUDE.md. The domain changes; the architecture doesn't.
From Here
15.1 Iterating on Your Systems
Every system you build will need refinement. The patterns that help:
Version control your .claude/ directory
Commit after every meaningful change. Your agent files are configuration code — they deserve the same discipline as application code. A working system that can't be reproduced or rolled back isn't a system; it's luck.
git add .claude/ CLAUDE.md
git commit -m "refine: tighten researcher description to prevent overlap with writer" Keep a changelog for your agent system
Add a AGENTS-CHANGELOG.md at your project root. When something changes and why:
# Agent System Changelog
## 2026-04-10
- researcher: added explicit instruction to flag conflicting sources
- writer: tightened sentence length rule (was 30 words, now 25)
- REASON: reviewer consistently flagged complex sentences; upstream fix is cleaner
## 2026-04-03
- Added qa-auditor agent as final gate before /output/
- REASON: two articles published with uncited claims; gate catches this now
## 2026-03-28
- Moved style guide from CLAUDE.md to .claude/knowledge/style-guide.md
- REASON: CLAUDE.md was 8KB; style guide is only relevant for writer agent Start simple, measure, improve
The right sequence: one agent working correctly → two agents with file handoff → add reviewer → add quality gate → add hooks. Each addition should solve a concrete, observed problem — not a theoretical one.
15.2 Team Adoption
Sharing via Git
Because your system is files, sharing is a pull request. Commit your .claude/ directory and CLAUDE.md. Teammates clone the repo and open it in their Code tab — they immediately have the same agents, skills, and rules.
Project vs. user scope for teams
- Project scope (
.claude/agents/) — agents specific to this codebase. In Git. Everyone on the team gets them. - User scope (
~/.claude/agents/) — personal productivity agents. Not in Git. Yours alone.
Onboarding new team members
Add a section to your project's README:
## AI-Assisted Workflows
This project includes Claude Code agent configurations in `.claude/`.
To use them:
1. Install Claude Desktop and enable the Code tab
2. Open this project folder in the Code tab
3. The agents and skills are auto-discovered
Available agents:
- @code-reviewer — review any file or diff
- @doc-writer — generate documentation from code
- @test-writer — generate tests from implementation
Type /help inside Claude Code to see all available skills. Managed subagents (organization-level)
Enterprise Claude accounts can configure organization-level agents that are available to all users without needing to be in each project's .claude/ directory. Contact your account manager or check code.claude.com/docs for current managed agent capabilities.
15.3 Resources
| Resource | What you'll find there |
|---|---|
| code.claude.com/docs | Official Claude Code documentation — authoritative reference for features, commands, and configuration |
| docs.anthropic.com | Anthropic's full documentation including API reference, model specs, and prompt engineering guides |
| AiAgentArchitect | An open-source framework for automated agentic system design — generates .claude/ configurations from workflow descriptions. The architectural concepts in this guide draw from its approach. |
| Claude Code GitHub discussions | Community patterns, tips, and troubleshooting from practitioners building real systems |
15.4 The Mindset Shift
This guide began with a simple observation: most people use AI at Level 1 or 2. They ask questions and get answers. They write prompts and get responses. They're good at it — but they're still in conversation mode.
The shift this guide has been building toward is architectural. You're no longer asking "what should I prompt?" You're asking "what system should I design?" The AI isn't your counterpart in a conversation — it's your workforce, structured and deployed.
The architect's role, as you've seen across these chapters, is:
- Decompose — break any process into stages with clear responsibilities
- Delegate — assign each stage to the right specialist with the right tools
- Review — build quality gates, not just execution paths
The systems you build will outlast the conversations that inspired them. A well-designed agent, committed to Git, onboarded to your team, running reliably in production — that's a different category of work than a good prompt.
Start with the minimum working system. Commit it. Run it. Observe what breaks. Fix exactly that. Repeat.
Start building your agentic system
AiAgentArchitect transforms a single conversation into a complete, deployable multi-agent system. Explore the framework or get in touch.