Built for Ready Solutions AI

Built and operated in-house at Ready Solutions AI. The companion white paper is linked at the bottom.

5 Architectural layers

Claude Code Case Study: A Five-Layer AI Writing Pipeline

May 8, 2026 ·Content Engineering ·Architecture stable since v.2026.05; thirty-eight posts published through it to date

Custom Skill DevelopmentAgentic Workflow/Automation DevelopmentClaude Code Infrastructure Setup

Built with Claude CodeCustom SkillsSubagentsHooksMCPCloudflare Pages

Why AI-assisted writing needed an architecture, not a prompt

AI-assisted writing has a small number of well-known failure modes. A single drafted post can ship any of them silently. The longer a corpus runs, the harder they get to catch.

Seven recur. Fabricated or stale statistics, where a model paraphrases a number it half-remembers and the citation looks fine. AI-written tells: em-dashes everywhere, perfect tricolons, hedge-then-assert openers. Temporally impossible claims, written eight days after a tool shipped: "I have been using this tool for weeks." Voice drift across the corpus, where post one reads like the author and post thirty reads like the model. Inconsistent claims across posts, where the same metric reports two different values in two different posts. Stale sources in published work, where a 2024 statistic ages into a 2026 lie. Hooks the body never delivers, where the opener says "three patterns emerge" and the body covers two.

A single occurrence of any of these is recoverable. A corpus that lets each one accrete silently for sixty posts is not. The cost of detection scales with the debt.

The cheapest alternative is a better prompt: tell the model to check for fabricated stats, banned tells, voice drift before it commits the draft. That works for a post or two. It collapses around post twenty, when the failure modes start compounding and a single prompt is being asked to be the whole quality system. Editorial discipline (ask the author to be more careful) has the same shape and the same ceiling. Prose-only quality control was not going to hold at the volume this pipeline was built for.

The five-layer pipeline

The package composes five layers. Each one solves a problem the layer below cannot. None are replaceable by the others.

Layer 1: long-lived state

A knowledge base sits at the base of the stack. Corpus index, metric registry, pain-point catalog, competitive log, audit log. Read at the start of every run. Written at the end of every publish. The KB is what makes the corpus more than a folder of posts. When post twenty-eight wants to cite a metric that post five already published, the KB is what flags the inconsistency.

Layer 2: deterministic validators

Above the KB, a layer of deterministic scripts handles work the model cannot do reliably. Prose validation against grammar and readability rules. Claim canonicalization. Cross-source contradiction detection. Numeric reproduction (does the number in the post match what the source reports?). Quote integrity. No model in the loop. Reproducible. The same input always produces the same output.

Layer 3: pre-write hooks

The third layer is the synchronous veto. A pre-write hook fires before any write commits and checks for banned patterns: em-dashes, banned AI-tell words, voice violations. The author cannot ship a draft that violates them, even by accident. This is the layer where prose instructions stop being trustworthy. Enforcement that has to hold every time, no exceptions, lives here. The rule-routing decision tree post covers when to choose a hook over a markdown directive.

Layer 4: specialist subagents

Twelve specialist subagents handle the judgment work. Topic research with two-pass depth discovery. Counter-evidence research that runs the inverse of the thesis. Internal cross-reference. SEO landscape. Fact-checker that walks every cited claim. Technical authority for vendor docs. AI-detection analyst. Brand-prose reviewer. Coherence reviewer. Skeptic reader. Supplemental research. Each has a bounded scope, its own tool allowlist, and read-only access to the source tree.

Layer 5: the orchestrator

The parent skill at the top of the stack routes phases, presents human-in-the-loop quality gates, and owns every disk write. Subagents return structured findings. The orchestrator decides what ships. Authority does not propagate down the spokes; it terminates at each node.

Each layer catches a different failure class. Determinism backs judgment; the reverse never holds. The seven-to-five mapping looks like this:

Failure mode	Caught at	How it gets caught
Fabricated or stale statistics	Layer 4 (fact-checker) + Layer 2 (numeric-reproduction script)	Subagent walks every cited claim against its source; deterministic script confirms the number in the post matches the number in the source.
AI-written tells	Layer 4 (AI-detection analyst, brand-prose reviewer) + Layer 3 (pre-write hook)	Subagents flag tells in the draft; the pre-write hook blocks banned patterns at the file system before any write commits.
Temporally impossible claims	Layer 4 (fact-checker, experiential audit step) + Layer 1 (tool first-use ledger in the KB)	Fact-checker compares duration claims against ledger entries that record when the author first used each tool.
Voice drift across the corpus	Layer 4 (brand-prose reviewer) + Layer 2 (corpus voice monitor)	Reviewer checks the draft against canonical voice; corpus monitor samples a rolling window of recent posts and flags drift programmatically.
Inconsistent claims across posts	Layer 1 (metric registry in the KB) + Layer 2 (inter-post consistency audit)	Registry holds canonical metric values; audit flags posts that report different values for the same metric.
Stale sources in published work	Layer 2 (link-rot and capability-change monitors) + Layer 4 (technical-authority subagent)	Monitors scan for moved sources and changed vendor capabilities; subagent re-verifies hard claims against current docs.
Hooks the body never delivers	Layer 4 (coherence reviewer)	Reviewer reads the opener and the close as a contract and flags promises the body never delivers.

Most failure modes get caught at two layers, not one. That is the point: deterministic backing means a single missed flag at the AI-judgment layer does not ship the draft.

Hub and spoke, not a chain of prompts

The orchestration shape is hub and spoke. The orchestrator is the hub. Twelve specialist subagents are the spokes. Subagents do the work; the orchestrator owns the result.

Parallel where the work is independent. Phase 2 fans out five research lanes (topic, counter-evidence, competitive, internal cross-reference, SEO landscape) at the same time. Phase 4 fans out five quality lanes (fact-checker, AI-detection analyst, brand-prose reviewer, skeptic-reader, coherence-reviewer) at the same time. Serial only where the contract requires order: a coherence pass cannot run before the rewrites that might have broken coherence.

The pattern itself is common. What is unusual is the discipline of using it consistently. Subagents do not write to the source tree, ever. The orchestrator does, always.

Three principles behind the architecture

Format calibration over format gating. Every quality check runs in every mode. Where a check varies between formats, the parameters scale rather than the check disappearing. A "quick" post and a "deep dive" both carry the same failure-mode risk profile. What changes is how much depth, breadth, and threshold each check applies. A check that does not run for one format is a topology decision, and topology decisions need explicit justification.

Source credibility as a structural concern. Every research and fact-check finding gets a tier. Tier 1 is primary and authoritative: vendor docs, changelogs, official blogs, peer-reviewed research, standards bodies, first-party data with methodology. Tier 2 is editorial and recognized practitioners: major tech publications with editorial oversight, industry analyst reports, and recognized practitioner voices. Tier 3 is individual and community: individual dev blogs, community platforms, podcasts, video, social posts, and affiliate-driven comparison pages. Tier 4 is anonymous, generated, or competitor: anonymous forums, AI content farms, competitor marketing, unidentifiable authors, and generic encyclopedia entries. Hard block, never used. Hard claims sourced from Tier 3 (or sourced from Tier 3 when a Tier 1 primary is reachable) trigger an automatic primary-source cascade. The fact-checker walks it. If the primary confirms, the citation upgrades. If the primary disagrees, the script flags the delta at the next quality gate so the author revises the prose first. The trigger extends to Tier 2 as well: hard claims still get a primary cross-check. A wrong number ships once and lives forever; a redundant check costs seconds.

Deterministic validators behind AI judgment. AI subagents handle subjective work: depth grades, steel-man engagement, memorability, voice review. A separate deterministic layer handles canonicalization, pattern matching, contradiction detection, and claim integrity. Determinism backs judgment. The pre-write hook is the final synchronous boundary; what cannot be enforced in prose lives there.

What this Claude Code case study generalizes

The architecture is the artifact. The blog post output is incidental. Six lessons port to any agentic workflow that depends on AI doing production work.

Tool absence beats prompt instruction. Cross-model verification reduces shared-architecture hallucination (two instances of the same model checking each other do not meaningfully reduce shared errors). Parallel dispatch needs explicit contractual semantics; verbal "do this in parallel" is not enough. Format calibration over format gating preserves quality uniformity as a system grows. Deterministic validators backing AI subagents catch what neither catches alone. The orchestrator-writes pattern handles tool-frontmatter propagation and keeps subagent privilege from creeping.

The lessons surfaced from a content pipeline, but the constraints they answer (prose instructions are advisory, judgment work is non-deterministic, write-side governance has to live in one place) sit inside any agentic system you would build for a team. The agentic development starter guide and the engineering manager's guide to governing agentic development cover the broader pattern this pipeline lives inside. The AI Diligence Operating System applies the same schema-first architecture decision to diligence workflows in boutique private-equity, commercial diligence, and strategy firms.

Scope honesty: the corpus it serves is small (a few dozen posts) and single-author. At enterprise content scale (thousands of posts, dozens of authors, parallel publishing) the coordination model would need revision around per-author KB sharding, distributed merge protocols, and run-cost amortization. The architecture is the artifact for the volume it was designed for. The lessons port; the specific dispatch shape is sized to one team.

By the numbers

5 architectural layers in the pipeline (knowledge base, deterministic validators, pre-write hooks, specialist subagents, orchestrating skill)
7 recurring failure modes of AI-assisted writing each given a dedicated layer
12 specialist subagents in the dispatch graph (research, counter-evidence, competitive, internal cross-reference, SEO, fact-checker, technical authority, supplemental, AI-detection, brand-prose, coherence, skeptic-reader)
16 deterministic validators, audits, and monitors backing the AI judgment layer
4 source-credibility tiers driving cross-verification cascades on hard claims
9 knowledge-base files maintaining corpus state across runs
6 generalizable lessons documented for transfer to other agentic workflows
38 posts published through the pipeline at the time of writing
14 pages of accompanying technical white paper, edition v.2026.05

What this means for the engagements I run

If a content pipeline is what your team needs configured, Custom Skill Development and Agentic Workflow/Automation Development are the engagements. When the same pattern needs to live in your codebase rather than your blog (test review, code review, CI/CD policy enforcement, knowledge-base maintenance), Claude Code Infrastructure Setup is the wrapper. The AI persona profiler case study covers another build with the same hub-and-spoke shape: custom skills, hook chains, and multi-agent orchestration shipped as a working system. The opus compatibility scanner case study covers a narrower scope: a single-skill audit tool that demonstrates what custom skill development looks like when the deliverable is a scanner rather than a pipeline.

If a pipeline like this is on your team's roadmap and you want to talk about scope before committing, book a free 15-minute intro call. No pitch, no obligation.

Frequently Asked Questions

How is this different from a single-prompt approach to AI-assisted writing?: A single prompt collapses every quality concern (research, voice, fact-checking, coherence, source credibility) into one generation pass. A pipeline gives each concern a separate layer, with deterministic checks backing AI judgment, and surfaces failures at the layer that owns them. The same content load spreads across five layers instead of one prompt, and the failure modes that compound at corpus scale (voice drift, contradictions across posts, stale sources) get a layer that watches the corpus, not just the post.
Why hub and spoke instead of a chain of prompts?: Chains pass state forward and accumulate it. A failure midway through a chain corrupts everything downstream. Hub and spoke terminates authority at each spoke: subagents do bounded judgment work and return structured findings to a parent that owns the disk write. Parallel where the work is independent, serial only where the contract requires order. The pattern is documented in the [sub-agent orchestration patterns post](/blog/2026-04-18-sub-agent-orchestration-patterns-claude/).
Why does the orchestrator own every write and not the subagents?: When a subagent has a write tool available that matches a prior behavior pattern, prose instructions cannot reliably override the prior behavior. The fix is to remove the tool, not strengthen the prose. Centralizing writes also puts atomic-write, no-clobber recovery, and schema-bump logic in one place. No subagent privilege creep, no scattered write-side governance.

Want an authoring pipeline like this engineered around your team's content or codebase? Book a free 15-minute intro call.

Book a Free Intro Call

Download the white paper