The Claude Certified Architect Exam: A Full Breakdown of All 5 Domains

Apr 22, 2026

This is my first article on Substack, so I wanted to start by introducing myself.

My name is Austin Hawkins, and I’m an AI engineering professional focused on artificial intelligence, agentic AI, and data science. I’ve spent the last 10 years working in the industry in Atlanta, Georgia, and for the past two years I’ve been building and running my own AI consulting and education firm, EchelonAiQ.

Outside of work, I enjoy staying active and curious. I like working out, traveling, training BJJ, spending time with friends and family, and building engineering projects just for the challenge and fun of learning.

It’s been exciting to watch the Data and AI space evolve over the years. I’ve seen it grow from a time when the conversation centered more on traditional data science, machine learning pipelines, and early deep learning breakthroughs, to the wave sparked by Attention Is All You Need, which helped reshape the industry and accelerate the rise of modern generative AI. Now, we’re moving even further into a world of copilots, autonomous systems, and agentic AI applications that are changing how people work and build.

Being able to witness that shift firsthand, and be part of it, has been one of the most rewarding parts of my career.

Through this Substack, I’ll be sharing my thoughts, lessons, and experiences from working in AI, building in this space, and helping others understand where the technology is heading.

I’m glad you’re here. Here’s on my favorite quotes….

“The greatest glory in living lies not in never falling, but rising every time we fall.

Currently, I have been very busy. I have taken a pause on working with clients due to building a FREE community Introduction to AI Course and studying for the Claude Certification Architect Certification Exam.

Its the first of its kind, and will help prove proficiency in designing designing multi-step, autonomous agent workflows rather than just conversational chatbots that are fairly common these days.

It will help build trust with students and clients as my goal is to deploy AI Professionals interested or transitioning into the space.

And frankly, I like the mission of Anthropic and I just like learning honestly.

Even though I’ve worked with Claude in the past, I am approaching this like a student.

Funny enough, Claude AI explains this domain structure the best so below is a description of the exam I will be taking. My future articles will be about my own journey and techniques into learning this knowledge, and hopefully I can build a course for you to make it easier.

Let’s start with the structure. The exam is built around five domains. Each carries a different weight. You need 720 out of 1,000 points to pass. And the questions aren’t trivia — they’re scenario-based. You’re making real architectural decisions, not recalling definitions.

The 5 Domains and their weight

D1 - Agentic Architecture & Orchestration - 27%

D2 - Claude Code Configuration - 20%

D3 - Tool Design & MCP Integration - 18%

D4 - Prompt Engineering & Structured Output - 20%

D5 - Context Management & Reliability - 15%

Domain 1 · 27% of the Exam

Agentic Architecture & Orchestration

The Heaviest Domain

Agentic Architecture & Orchestration — 27%

Agentic Loops, Multi-Agent Orchestration, Hooks & Guardrails, Session Management Task Decomposition Hub-and-Spoke

Domain 1 is the biggest slice of the exam for a reason: it’s where production systems break the most. 27% of your score comes from demonstrating that you understand how to design, build, and reason about agentic loops — the core execution pattern that powers every Claude-based autonomous system.

“Most production failures? Start here. The agentic loop is where complexity accumulates and where architectural decisions compound over time.”

What an Agentic Loop Actually Is

At its core, an agentic loop is a cycle: Claude receives a task, reasons about it, decides on a tool call, receives the result, and then reasons about the next step. The loop continues until Claude reaches stop_reason: "end_turn". Understanding when to continue versus when to terminate is fundamental — and it’s exactly the kind of decision the exam tests.

The key rule: continue the loop when stop_reason is "tool_use". Terminate when it’s "end_turn". Every tool result must be appended to conversation history between iterations so Claude can reason about subsequent steps. If you don’t do this, the model loses context and the loop degrades.

Multi-Agent Orchestration

The exam also covers multi-agent systems — specifically the hub-and-spoke pattern. In this architecture, a central coordinator Claude instance receives high-level tasks and delegates sub-tasks to specialized subagents. Each subagent handles a narrow domain: one for data retrieval, one for synthesis, one for output formatting.

The critical insight here is that subagents aren’t just Claude instances running in parallel — they’re bounded contexts. The coordinator shouldn’t delegate everything; it should maintain oversight and make the architectural decisions. Knowing what to centralize versus what to delegate is a core exam question pattern.

Hooks and Guardrails

Hooks are programmatic enforcement points that fire at specific moments in the agentic lifecycle — before a tool call, after a result, on error. They’re how you add guardrails without baking them into the model’s system prompt. The exam distinguishes between hooks (code-level enforcement) and prompt-level instructions (guidance that the model may or may not follow).

Key Concept — Session Management

Agentic sessions accumulate state across many turns. You need to understand how to preserve relevant context, summarize completed sub-tasks, and prevent context degradation over long-running workflows. The exam will test your ability to distinguish between state that needs to persist (task progress, constraints, decisions made) and state that can be safely discarded (verbose tool outputs, intermediate reasoning).

⚠ Common Exam Trap

The exam will present a scenario where a subagent fails mid-task and ask how the coordinator should respond. The wrong answer is to silently retry. The correct pattern is structured error propagation — the coordinator needs to know what failed, why, and whether to escalate, retry with different parameters, or terminate the workflow entirely.

Domain 2

Domain 2 · 20% of the Exam

Claude Code Configuration

The Configuration Domain

Claude Code Configuration — 20%

CLAUDE.md Hierarchy, Custom Commands, Plan Mode, Iterative Refinement, CI/CD Integration, Batch Processing

Domain 2 is the most configuration-heavy part of the exam, and I mean that in the most literal sense. Either you know where the files go, or you don’t. There’s not a lot of middle ground here. If you’ve been using Claude Code seriously, a lot of this will feel familiar. If you’ve only used it casually, this domain will expose the gaps.

“You either know where the files go — or you don’t. The CLAUDE.md hierarchy is not optional context. It’s the foundation.“

The CLAUDE.md Hierarchy

CLAUDE.md files are how you give Claude Code persistent, contextual instructions. But they’re not a single file — they form a hierarchy. A global CLAUDE.md applies everywhere. A project-level CLAUDE.md applies to a specific repository. A directory-level CLAUDE.md applies to a specific path within a project.

The exam tests whether you understand how these layers interact. When there’s a conflict between global and project-level instructions, the more specific one wins. When there’s a conflict between project-level and directory-level, the directory wins. Understanding the inheritance model is essential for questions about configuration debugging.

Key Concept — Path-Specific Rules with Glob Patterns

CLAUDE.md supports glob patterns for path-specific rules. You can tell Claude to behave differently when working inside /src/api/ versus /tests/. The exam will test your ability to write and reason about these rules — including edge cases where multiple patterns match the same file.

Plan Mode vs. Direct Execution

Plan mode is a distinct execution state in Claude Code where the model reasons about what it’s about to do before doing it. It generates a plan, presents it for review, and waits for confirmation before executing. The exam distinguishes this from direct execution — where Claude acts immediately — and tests your judgment about when each is appropriate.

High-stakes, irreversible operations (deleting files, modifying production configs, running migrations) warrant plan mode. Routine, reversible operations don’t. The cost of plan mode is latency; the benefit is auditability and human oversight.

CI/CD Integration and the -p Flag

The -p flag is what makes Claude Code work in non-interactive environments — CI pipelines, automated workflows, batch jobs. It passes a prompt directly without requiring terminal interaction. Understanding when and how to use it is a practical requirement for any production Claude Code deployment, and the exam reflects that.

⚠ Common Exam Trap

The exam will ask about a scenario where Claude Code behaves unexpectedly in a CI environment. The most common wrong answer is to modify the system prompt. The correct answer is almost always a CLAUDE.md configuration issue — either missing, misplaced, or conflicting rules that don’t apply correctly in the non-interactive context.

Domain 3

Domain 3 · 18% of the Exam

Tool Design & MCP Integration

The Most Overlooked Domain

Tool Design & MCP Integration — 18%

Tool Descriptions, Structured Error Responses, Tool Routing, MCP Server Config, Built-in Tools, Tool Splitting

18% is not a small number. But I’ve watched engineers blow past Domain 3 in their study plans because “it’s just tools.” It’s not just tools. This domain contains one of the most important insights in the entire exam — and most people arrive at exam day not knowing it.

“Your tool description is the routing logic. Write it wrong, and your agent calls the wrong tool every single time. This isn’t a bug — it’s an architecture decision you got backwards.”

The Description Is the Primary Selection Mechanism

Claude selects tools based on their descriptions. Not their names. Not the order they appear in the list. The description. This means that if two tools have vague or overlapping descriptions, Claude will misroute — and it will do so consistently and silently.

The exam presents scenarios exactly like this: an agent that routes “check order #12345 status” to a get_customer tool instead of a lookup_order tool, because both descriptions say “retrieves entity information.” The fix isn’t to rename the tools. It’s to rewrite the descriptions to be specific enough that there’s no ambiguity about which tool handles which class of request.

Key Concept — What Good Tool Descriptions Contain

Astrong tool description includes: what the tool does (specific action, not category), what inputs it requires and why, what it returns and in what format, and critically — what it does not do. Negative constraints are as important as positive descriptions, because they prevent Claude from routing ambiguous requests to the wrong tool.

Structured Error Responses

When a tool fails, the error response is part of the agentic loop. A vague error message (”something went wrong”) leaves Claude unable to reason about what happened or how to recover. A structured error response tells Claude what failed, whether it’s retryable, and what alternative paths exist.

The exam distinguishes between different failure types: a tool returning an empty valid result (success, nothing found) versus an access failure (the tool couldn’t execute) versus a genuine error (the tool executed but the result is invalid). Each requires different handling, and confusing them leads to incorrect recovery behavior.

MCP Server Configuration

The Model Context Protocol allows Claude to connect to external servers that expose tools, resources, and prompts. The exam covers MCP server configuration — how to register servers, how tool availability flows through to Claude, and how to debug misconfigurations. A common failure pattern is a tool that’s registered but not surfaced correctly, leading Claude to behave as if it doesn’t exist.

⚠ Common Exam Trap

The exam will present a scenario with a tool selection failure and ask for the root cause. The tempting answer is a problem with the system prompt or the model’s reasoning. Almost always, the correct answer is the tool description — it’s either too vague, too broad, or overlapping with another tool in a way that creates ambiguity the model can’t resolve.

Domain 4

Domain 4 · 20% of the Exam

Prompt Engineering & Structured Output

Where the Exam Tricks You

Prompt Engineering & Structured Output — 20%

Explicit Criteria, Few-Shot Prompting tool_use for Output JSON Schema Design Validation-Retry Loops Multi-Pass Review Batch vs. Sync

Domain 4 is where I see the most confident engineers lose points. Not because the material is obscure — it’s because the wrong answers are written to sound exactly like good engineering. They’re plausible. They use the right vocabulary. And they’re wrong in ways that only become obvious when you understand the deeper principle.

“This is where the exam tricks you. Wrong answers sound like good engineering. The question isn’t whether the answer makes sense — it’s whether it’s the right way to do it with Claude specifically.”

Structured Output: Use tool_use, Not Raw Text

This is the single most important principle in Domain 4: when you need structured output from Claude, use the tool_use mechanism with a defined JSON schema. Do not ask Claude to return structured JSON as raw text and then parse it yourself. The tool_use approach guarantees schema adherence, makes validation straightforward, and gives you a structured result that your code can consume directly.

The exam will present both approaches and ask which is appropriate. The raw-text approach will often be framed as “flexible” or “simpler.” It is neither — it’s fragile. Use tool_use.

Few-Shot Prompting Construction

Few-shot examples are the fastest way to shift Claude’s output toward a target format or style. But the exam tests your ability to construct them correctly — not just include them. Good few-shot examples demonstrate the exact reasoning pattern you want, not just the correct output. They should cover edge cases, not just the happy path. And they should be ordered from simplest to most complex.

Key Concept — JSON Schema Design for Nullable Fields

The exam has specific questions about JSON schema design — particularly around required versus optional fields and how to handle nullable values correctly. A field that might not always be present should be typed as string | null, not just omitted. Fields that are always required should be in the required array. The distinction matters because Claude will generate output that matches your schema — so a poorly designed schema produces structurally valid but semantically broken output.

Validation-Retry Loops

Even with good prompts and schemas, outputs sometimes fail validation. The correct pattern is a structured retry loop: validate the output, and if it fails, return the validation error to Claude with context about what went wrong, then request a corrected response. The exam tests the architecture of this loop — specifically, how many retries are appropriate, what context to include in the retry prompt, and when to surface the failure to a human rather than retrying indefinitely.

Batch vs. Synchronous Processing

The exam covers the decision between synchronous Claude calls (real-time, one at a time) and batch processing (parallel, deferred). Batch is appropriate when you have many independent tasks that don’t need immediate responses. Synchronous is appropriate when the result feeds directly into a user-facing interaction or a subsequent step in a live workflow. Choosing the wrong mode is an architectural error that shows up under load.

⚠ Common Exam Trap

The exam will describe a multi-step review process and ask how to improve output quality. The most common wrong answer is to make the prompt longer and more detailed. The correct answer is almost always a multi-pass review strategy: one Claude call generates the output, a second call reviews it against explicit criteria. Trying to do both in one call produces worse results than splitting them.

Domain 5

Domain 5 · 15% of the Exam

Context Management & Reliability

Smallest Weight. Biggest Cascade Risk.

Context Management & Reliability — 15%

Long-Context Preservation, Escalation Triggers, Error Propagation, Human Review, Handoff Patterns, Provenance Tracking, Confidence Calibration

Fifteen percent. It’s the smallest domain, and it’s tempting to study it last, or lightly. Don’t. Domain 5 is where I’ve seen the most cascading failures in production Claude systems — not because it’s the most complex, but because the problems it covers are invisible until they’re catastrophic. A context management failure doesn’t announce itself. It quietly corrupts the reasoning of every subsequent step in the workflow.

“Only 15% — but skip it and it cascades through everything else. Context failures don’t appear in logs. They appear in outputs — quietly, consistently, and always at the worst moment.”

Long-Context Preservation

The “lost in the middle” effect is real and well-documented: when you give Claude a very long context, information in the middle of that context is weighted less heavily in the model’s reasoning than information at the beginning or end. For production systems that pass large amounts of context — case history, document collections, long conversation threads — this is a genuine architectural concern.

The exam tests specific mitigation strategies: progressive summarization (compressing earlier turns while preserving critical facts), persistent case fact blocks (structured sections at the top of context that contain essential information regardless of length), and tool result trimming (reducing verbose tool outputs to the semantically relevant portions before appending to context).

The Three Valid Escalation Triggers

This is a specific and testable piece of content, and the exam leans on it hard. There are three conditions that validly trigger escalation from an autonomous agent to a human reviewer:

First, explicit ambiguity — the task cannot be completed because the instructions are genuinely unclear and making an assumption would produce a materially different outcome. Second, confidence below threshold — the agent’s internal assessment of its answer quality falls below a defined minimum for the task type. Third, scope violation — the required action falls outside the boundaries defined for autonomous execution.

Key Concept — The Two Unreliable Triggers (What the Exam Tests Against)

The exam explicitly tests that you can distinguish valid escalation triggers from unreliable ones. User frustration (detected sentiment) and task duration (the operation has been running for a long time) are not reliable escalation triggers. A frustrated user may not need human intervention — they may need a better answer. A long-running task is not evidence of a problem. Escalating on these signals wastes human attention and trains users to see escalation as random.

Error Propagation Across Agents

In multi-agent systems, errors in one agent’s output become inputs to the next agent. An error that’s not caught at the source doesn’t disappear — it propagates, gets processed, and produces a downstream output that looks valid but contains corrupted reasoning. By the time the error surfaces, it may be several steps removed from its origin.

The solution is structured error context: every agent passes not just its output but a provenance record — what it was asked to do, what sources it used, what confidence it had in the result, and what errors or uncertainties it encountered. This record allows downstream agents and human reviewers to trace failures back to their source.

⚠ Common Exam Trap

The exam will describe a scenario where an agent produces a plausible but incorrect synthesis from multiple sources. The question asks what went wrong. The wrong answer is a prompting issue. The correct answer is a provenance tracking failure — the agent didn’t have structured information about the reliability of its input sources, so it weighted them equally when it shouldn’t have.

What This Exam Is Really Testing

After studying all five domains, the most important thing to understand is what the exam is actually measuring. It’s not testing whether you can recall definitions. It’s testing whether you think like an architect — which means thinking about failure modes, not just happy paths.

Every domain, if you look at it right, is asking the same underlying question: what breaks, and why, and how do you design around that?

Agentic loops break when you don’t handle termination conditions. Claude Code breaks when your configuration hierarchy is inconsistent. Tools break when descriptions are ambiguous. Prompts break when you optimize for one scenario and don’t account for edge cases. Context breaks when you treat the context window as infinite.

The engineers who pass this exam aren’t necessarily the ones who know the most syntax. They’re the ones who’ve internalized production thinking — who, when they see a system design, immediately start asking “where does this fail?”

That’s the mindset the exam rewards. And honestly, it’s the mindset that makes you a better engineer regardless of whether you ever sit for the certification.

If you found this useful, subscribe to my Substack for the full journey into getting this Certification and how I am able to help your journey into becoming an AI Professional. Link in Bio.

Austin Hawkins | EchelonAiQ

Discussion about this post

Ready for more?