The agentic Claude Code pipeline that researches, writes, validates, and ships every page on this site, described at the level of what it guarantees, not how it is implemented.
AI-assisted writing has a small number of well-known failure modes that are recoverable as one-offs and structurally catastrophic at corpus volume. A single drafted post can ship any of them silently; the longer the corpus runs, the harder they get to catch. Prose-only quality control does not hold past roughly twenty posts. The fix is architectural, not editorial.
What follows describes a pipeline at the level of what each layer guarantees, never the recipe that implements it. I will not ask you to treat this page passing its own checks as evidence the system works. The evidence sits elsewhere and is checkable without me; §Verify says where.
99
blog posts
16
read-only subagents
37
deterministic validators
11
glossary terms
6
cornerstone guides
10
knowledge-base docs
1,600
lines / eng / day delivery
First-party measure from inside one engagement, not an externally audited benchmark. The honest version of a proof point names its own limits.
The Pipeline
Each layer answers a problem the layer beside it cannot. The order below mirrors lifetime and write authority: long-lived state at the base, deterministic checks above it, the synchronous boundary above those, judgment subagents above the boundary, the single writer at the top, and provenance recorded after the write. The pipeline narrows: many readers, one writer.
◆
Knowledge base
The pipeline's long-lived memory: corpus index, metric registry, source-credibility scheme, audit log.
Continuity. A number, claim, or source that one page established is visible to the next, so the corpus stays internally consistent instead of drifting page by page.
Prevents: Per-page drift; contradictions across the corpus.
✓
Deterministic validators
Scripts that re-check what a model cannot be trusted to get right every time: prose rules, claim canonicalization, numeric reproduction, cross-source contradiction, quote integrity.
Reproducibility. The same input yields the same verdict with no model in the loop, and the count is self-auditing: a build step tallies the inventory under a documented inclusion rule and fails the build if the number printed on this page diverges from the files it counts.
A synchronous veto that fires before any write reaches disk.
A forbidden pattern cannot land even by accident. This is the layer where prose instruction stops being trustworthy. A rule that has to hold every time, with no exception, is enforced at the boundary rather than left to the author to remember.
Prevents: Em-dashes, AI-tell tokens, banned phrases landing in committed prose.
⚙
Read-only specialist subagents
Bounded workers, each scoped to one kind of judgment: topic research, counter-evidence, fact-checking, cross-reference, brand voice, coherence.
Containment. A subagent investigates and returns structured findings, but it cannot change the artifact, so authority never propagates out to a spoke.
Prevents: Subagent overreach; uncontained errors from one judgment lane affecting another.
◉
Orchestrating skill
The parent that routes the phases, presents the human review gates, and owns every disk write.
Single writer. There is exactly one writer, so no read-only subagent can mutate the artifact, and write-side concerns live in one place rather than scattered across the spokes.
A machine-readable record stamped on each artifact: model version, skill version, knowledge-base SHA, validator pass log, visual-QA sub-block.
Replay. A set of deny-by-default gates where a claim must trace to a source, the disclosure stays at shape level, and the provenance block is present, or the artifact does not publish.
Recording where an artifact came from is a settled idea in other domains:
the W3C PROV family
standardizes agentic pipeline provenance generically,
the NIST AI 600-1 Generative AI Profile addresses content provenance for generative systems,
and PROV-AGENT formalizes provenance for agentic workflows specifically.
The point of the stamp is agentic work gated by deterministic validators rather than ad-hoc review,
narrowed to billable and production agentic delivery,
where the AI authoring trust chain relies on deny-by-default gates rather than advisory review.
A subagent investigates and returns structured findings, but it cannot change the artifact, so authority never propagates out to a spoke.
Specialist subagents
(see also subagent-orchestration)
provide independent coverage per judgment lane; InfoQ's editorial coverage
of Claude Code subagents confirms independent practitioners adopt the same read-only pattern.
The pre-write hook
is the synchronous veto that enforces prose rules at the boundary,
not in an advisory prompt that a confident-looking context can override.
How a change gets shipped
The plan-then-execute timelineA left-to-right flow: brainstorm, then spec, then writing-plans, then subagent-driven development, then finishing the branch.BrainstormSpecWriting-plansSubagent-driven devFinish branch
Brainstorm. Intent + 2-3 approaches explored before any code is written.
Spec. Approved design, committed to docs/, before tasks are decomposed.
Writing-plans. Bite-sized tasks with explicit files-to-touch + tests, sized for a subagent with zero context.
Subagent-driven dev. Fresh implementer subagent per task, followed by spec-compliance review and code-quality review.
Finish branch. PR open, CI gate green, squash-merge, branch cleanup.
Each phase loads a domain procedure as an
on-demand skill
(Anthropic docs)
only when a task calls for it, instead of holding every procedure in one always-on prompt.
On-demand discipline keeps the active context narrow and the procedure library wide.
Session-end retros write structured improvement entries to a backlog. Recurring failure modes earn a registered key and accumulate evidence under one entry; a key with enough recurrence promotes to a structural fix. The loop is the practice that keeps the practice improving.
See also: claude-code-hook (the term covers the capability; this section claims the decision between capabilities).
The Catch-Net
Most failure modes are caught at two layers rather than one, so a single missed flag at the judgment layer does not ship the page.
Fabricated or stale statistics caught by Validators, Subagents, Monitor. AI-written tells caught by Validators, Hook, Subagents, Monitor. Temporally impossible claims caught by Validators, Subagents, Monitor. Voice drift across the corpus caught by Validators, Subagents, Monitor. Contradictions between posts caught by KB, Validators. Stale sources in published work caught by Validators, Subagents, Monitor. Hooks the body never delivers caught by Validators, Subagents. Affiliation or embargo overclaim caught by Validators. JSON-LD @graph integrity drift caught by Validators. Recipe leak on authority surfaces caught by Validators, Subagents. Self-referential-edge regression caught by Monitor.
Self-referential-edge regressionRecipe leak on authority surfacesJSON-LD @graph integrity driftAffiliation or embargo overclaimHooks the body never deliversStale sources in published workContradictions between postsVoice drift across the corpusTemporally impossible claimsAI-written tellsFabricated or stale statisticsFailure modeKBValidatorsHookSubagentsSkillMonitorLayer✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Eleven failure modes by catch layer. Most modes are caught at two or more layers.
37 validators in scripts/, generated from data/validator-inventory.json at every build.
The inclusion rule, quoted from the inventory file: scripts/validate-*.mjs + scripts/audit-*.mjs + scripts/monitor-*.mjs, excluding *.test.mjs.
The figure of record is the one regenerated from the codebase inventory at every build, not a number typed into prose, which is why a stale count can't quietly persist here.
Authority Engine
I monitor my own self-referential-edge ratio: live R = 16.10 versus threshold 40. The entity graph is self-asserted, and naming that bounds the trust claim; §Verify says where the boundary sits.
Entity graph (hub-and-spoke)Glossary @ids at center; guides and engineering point at them via about; Organization and Person via knowsAbout.Glossary @idGuidesEngineeringOrganizationPersonURL hierarchy (flat)Home links to three peer collections: /glossary/, /guides/, /engineering/.Home/glossary//guides//engineering/Lateral cross-links (discovery)Glossary terms and guides cross-link via chips and appearsIn; guides reference engineering laterally; engineering references glossary via about.Glossary termGuideEngineeringchips + appearsInlateral / aboutabout[]
Conflating these axes creates the breadcrumb bug. The entity graph is hub-and-spoke; the URL hierarchy is flat; lateral cross-links carry discovery, never parental authority.
GEO/SEO citability
Every glossary term in the rebuild track carries a load-bearing OPINION in its dek and a trade-off claim in its whyItMatters: the canonical Anthropic / W3C / NIST docs leave the trade-off niche open, and the rebuild claims it explicitly. An S2-6 probe confirms a retrieval LLM extracts the rebuild's framing as distinct from the upstream doc.
Of 11 terms, 4 have been rebuilt to this shape.
Entity @ids follow the Model Context Protocol architecture
and Schema.org conventions; the graph is self-asserted but machine-readable.
The Proof: the blog-post engine
This section names what is in production today: the structural shape, the counts, the gates, the provenance receipt. The full design rationale, the principles that drove the design, and the lessons that travel beyond this corpus live in the case study.
The architecture is the artifact: the blog output is incidental; what travels are the lessons. From the case study at /case-studies/2026-05-08-ai-authoring-pipeline/.
Phases are temporal flow; layers (per the case study) are architectural stack: orthogonal axes, same system.
The five-phase blog-post pipelineFive research lanes feed QG1, which approves the outline; then drafting, then Phase 3b visual QA, then five validation lanes, then QG2, then publish with provenance.5 research lanesQG1OutlineDraftPhase 3bvisual-QA5 validation lanesQG2Publish + provenance
Phase 1
Research
Every H2 has bound Tier-1/2 sources before drafting.
Catches: Thesis built on confirming evidence only (counter-evidence as a first-class lane).
Phase 2
Outline
Angle, thesis, audience tier, and source binding are locked before a sentence is drafted.
Catches: Unsourced H2 gaps surface at QG1.
Phase 3
Draft + Phase 3b visual
Claims trace to bound sources; visuals reviewed before publish.
Catches: Zero-em-dash + banned tells caught at the pre-write hook synchronously.
Phase 4
Validate
Every hard claim is verified at two independent layers.
Every post carries its replayable validation receipt.
Catches: Drift between what was validated and what shipped.
QG1 / QG2 + report-and-proceed
QG1 is the research sufficiency check between Phase 1 and Phase 2: flags unsourced outline gaps, undersourced H2s, missing experiential anchors. QG2 is the validation reconciliation between Phase 4 and Phase 5: synthesizes all five validator outputs into a Severity-2 Disposition Table where every Sev-2 routes to addressed-by-rewrite, fixed-directly, or deferred. Silence is a defect. In autonomous mode the gates are report-and-proceed: BLOCKING findings always surface; non-blocking deferred entries land in the QG2 report for review after publish.
The provenance receipt
provenance:
model_version: <Which model and revision wrote the post.>
skill_version: <Git SHA of the blog-post skill at publish time.>
kb_sha: <Git SHA of the knowledge base at publish time.>
validation_pass_log: <Path to the per-post coverage telemetry JSON.>
published_at: <ISO-8601 UTC publish timestamp.>
visual_qa: <Sub-block with status, reviewed, unresolved_defects from the Phase 3b loop.>
Replay is structural, not narrative. Future-you can reproduce, rerun, or invalidate any post deterministically.
The pipeline runs five research lanes plus five validation lanes per post: a 5+5 shape.
The blog corpus is 99 posts today, all gate-green per the required CI check.
For the trade-offs of stage-gated production agentic delivery
at scale, see the AWS Prescriptive Guidance on operationalizing agentic AI.
See also: subagent-orchestration (the term covers context isolation per worker; this section covers how a specific composition produces a publishable post).
Verify
Authority on a page like this goes only as far as an outsider can check it.
Two columns. Left column, Outsider-checkable: The case study at /case-studies/2026-05-08-ai-authoring-pipeline/, The engineering trust paper at /downloads/ready-solutions-ai-engineering-trust.pdf, The structured data (JSON-LD) embedded in this page source, The D2 and Plot SVG source rendered on this page, Every cited Tier 1/2 source in §Sources. Right column,
Internal discipline: Validator counts (auto-audited from inventory at build, but you take the inventory itself on faith), Per-post provenance stamps (in source, but generated by internal tooling), Knowledge-base contents (private repository), Subagent system prompts (private repository), The challenge then validate loop on load-bearing prompts (private workflow).
Outsider-checkableInternal disciplineThe case study at /case-studies/2026-05-08-ai-authoring-pipeline/The engineering trust paper at/downloads/ready-solutions-ai-engineering-trust.pdfThe structured data (JSON-LD)embedded in this page sourceThe D2 and Plot SVG source rendered onthis pageEvery cited Tier 1/2 source in §SourcesValidator counts (auto-audited frominventory at build, but you take theinventory itself on faith)Per-post provenance stamps (in source,but generated by internal tooling)Knowledge-base contents (privaterepository)Subagent system prompts (privaterepository)The challenge then validate loop on load-bearing prompts (private workflow)
The verifiability boundary. Two columns are mutually exclusive, not overlapping: an outsider can check the left; the right rests on internal discipline.
Regenerating those numbers at every build is what keeps them honest on my end; from the outside, with the repository private, you still take the count and the stamp on faith.
One thing this section does not claim is that this page passing its own validation proves the pipeline works. That would be circular. The pipeline runs against this page the same way it runs against every other; the standing proof is the documented architecture and the published outcomes, not a green checkmark this page awards itself.
Publishing the shape is a transparency choice, not a claim that the method is secret or that being open changes what the pipeline does. It changes what you can see, and what you can check.
Common questions?
Does AI write the content on this site?
Yes, end to end, with a human review gate. Read-only specialist subagents do the judgment work, and one orchestrating skill performs every write, so no subagent changes a file on its own.
How is AI-authored content kept accurate?
Layered defense: canonical facts live in a knowledge base, deterministic validators re-check them, a fact-checking pass walks every cited claim back to its source, and a provenance stamp records the run. A claim that cannot be substantiated does not ship.
Can I see the pipeline source code?
No. The architecture is public; the recipe is not. Validator logic, agent instructions, and knowledge-base contents stay private, which is enough to confirm the design is real and not enough to clone it.
What can an outsider verify independently?
The case study, the engineering trust paper, and the structured data embedded in this page source. The validator counts and provenance stamps are internal, since the repository is private, so those you take on faith.
Is Ready Solutions AI affiliated with Anthropic?
No. Ready Solutions AI is an independent consultancy, not partnered with, affiliated with, or endorsed by Anthropic. It works in Claude, Claude Code, and the Model Context Protocol as an independent practitioner.