Every guide to running Claude Code in GitHub Actions tells you how to give the agent access. Store the API key as a secret. Grant the workflow write permission. Wire up the @claude trigger and watch it push a branch and tee up a pull request while you sleep. Almost none of them tell you what to take away.

That omission is the whole problem. In April 2026, researchers published a proof of concept they called "Comment and Control": a crafted pull request title and issue body that drove Claude Code, Gemini CLI, and GitHub Copilot's agent, all running inside GitHub Actions, to read their own environment and exfiltrate ANTHROPIC_API_KEY and GITHUB_TOKEN. The vendors acknowledged it. No model was jailbroken in the usual sense. The agents did exactly what an agent does: they read the untrusted text in front of them and acted on it, using the permissions someone had already granted.

So here is the thesis, and it is not the one the setup guides imply. Running a Claude Code agent in CI safely is a containment problem, and least-privilege permissions are only its first move. You cannot configure your way to safety by tuning scopes and then trusting the agent, because injection turns the agent's granted permissions against you, and because human review of agent-authored pull requests is unreliable in practice. What contains the blast radius is an external gate the agent cannot weaken, plus a named human the gate backstops.

Three numbers frame the case, and none of them is about the model being wrong.

0 ,000 repos exposed by one poisoned GitHub Action (CVE-2025-30066, March 2025)
0 % false-negative rate disclosed for an AI permission classifier (auto mode)
0 long-lived secrets a contained CI agent should hold

Treat the CI agent as an untrusted contributor

Start with the mental model, because it decides everything downstream. A Claude Code agent running in a GitHub Actions job reads untrusted input (the issue, the PR diff, the review comments), holds a shell, and can reach whatever secrets and scopes the workflow handed it. Strip away the word "AI" and you have described an external contributor with commit access whom you have never met. You would not give that contributor your production deploy key and merge rights on trust. The agent should not get them either.

This is the part the setup tutorials skip. They frame the work as "what do we let the agent do," which quietly assumes the agent is on your side and just needs enough access to be useful. The containment frame asks a different question: what do we make impossible? That shift is the substance of agentic AI governance, which is governing tool-using agents through permission scope, action provenance, and deterministic gates rather than through instructions you hope the agent follows.

The obvious objection is that a well-prompted, capable agent does not need to be treated like an adversary. I would believe that if a prompt were a boundary. It is not. The Comment and Control work landed because the agent could not tell the difference between a maintainer's instruction and an attacker's instruction pasted into a PR description; both are just text in the context window. A capable agent makes the problem worse, not better, because it is more effective at carrying out whatever instruction it decides to follow. Capability is not the variable you are managing here. Authority is.

The containment pipeline for a CI agentA vertical flow: untrusted input reaches a least-authority Claude Code agent that runs inside an ephemeral runner. The runner proposes a change to a deterministic gate of required checks. Only if the gate is green does a named human review, and only the human merges to main.Untrusted input (issue, PR, comment)Claude Code agent (least authority)Ephemeral runner (scoped surfaces)Deterministic gate (required checks)Named human (CODEOWNERS)main reachesruns insideproposesif greenmerges
The agent authors. The gate and the human own the merge.

Hold that picture. Everything below is a way of moving one more capability out of the agent's reach and behind the gate.

Secrets and scope: least privilege is the first move, not the whole game

The fastest credential mistake in a CI agent is a static personal access token dropped into a secret. Anthropic's own action documentation is blunt about it: do not use personal access tokens, because static tokens do not rotate and can be recovered through prompt injection. The correct credential for repository operations is the job-scoped GITHUB_TOKEN that GitHub mints for each run and expires when the job ends. A short life caps how long a leak lasts, not what it can do while it lasts. Inside that window the token can still push, open or mutate PRs, and touch check statuses, so treat expiry as a limit on exposure, not a substitute for scoping the token down.

Scope it down further. Without an explicit permissions: block, the GITHUB_TOKEN inherits the repository default, which can be read-write across the repo. GitHub's security guidance is to default the token to read-only for contents and raise it per job only for the scopes a given job needs, such as pull-requests: write for the job that opens the PR. For cloud credentials, do not store long-lived keys at all. GitHub's OpenID Connect lets a job mint short-lived cloud credentials at runtime by presenting a signed token, with an identity provider trust policy bound to your specific repository. One caveat the Cloud Security Alliance flagged in its May 2026 research note: if that trust policy omits the repository subject constraint, any GitHub Actions workflow anywhere can assume your role. Bind the sub claim, the part of the signed token that names which repository is asking. Leave it open and the trust policy effectively answers "any repo on GitHub."

There is a fine-grained-token discipline here that carries straight over from connecting Claude to other systems. When I wrote about connecting Claude to Jira, GitHub, and Confluence through MCP, the same rule applied: a token scoped to only what the workflow needs. Same instinct, higher stakes, because a CI runner executes code.

Now the honest part. All of this shrinks the blast radius. None of it stops injection. A read-only agent can still be talked into reading a secret it has access to and writing it somewhere an attacker can see, which is exactly what the CSA note demonstrated. Least privilege is necessary. It is not sufficient. That is why this section ends by pointing at the gate, not at a longer allow list.

Isolation and supply chain: ephemeral runner, scoped surfaces, pinned actions

Containment has four surfaces, and it helps to name them so none gets forgotten.

SurfaceWhat the agent can reachDefault posture in CI
FilesystemDirectories it can read and writeFresh checkout; no runner home, no sibling repos
ShellCommands it can executeAllow list; read-only Bash under dontAsk
NetworkHosts it can call out toEgress-scoped; deny by default
External toolsMCP servers, web fetchOnly what the task needs; none by default

Run each job on an ephemeral runner with a fresh checkout, so nothing the agent touched in one run survives into the next. Do not let it reuse a build cache that an earlier, possibly poisoned, run could have written to. GitHub's own engineering write-up on agentic workflows goes further, giving the agent process zero access to secrets through container-level isolation and an authenticated API proxy. You can borrow the shape even without their platform: the agent proposes, a separate job that holds the deploy credentials disposes, and only after the gate.

Two traps deserve their own callout.

Warning

pull_request_target is the dangerous trigger. It runs with the base branch's full privileges, including secrets and write access, even for code coming from a fork. Pair it with an agent that checks out and executes the fork's code and you have handed an outside contributor your secrets. If you must process untrusted PR code, keep it at a separate, non-executed path and grant the smallest possible scope. The OWASP CI/CD risk catalog files this under CICD-SEC-4 for a reason.

The second trap is the supply chain, and it is not hypothetical. In March 2025, the tj-actions/changed-files action had its version tags retroactively rewritten to dump runner secrets into public build logs. Roughly 23,000 repositories were affected before it was caught, and CISA issued an advisory. A year later, a second campaign poisoned 76 of 77 version tags of a popular scanning action. The lesson is mechanical: a @v1 or @main reference trusts whoever can move that tag. Pin every action, including claude-code-action itself, to a full-length commit SHA, which GitHub describes as the only way to use an action as an immutable release. Per-worker scope and pinned dependencies are the same discipline I leaned on when writing about subagent orchestration in production: a worker that does something unexpected can only reach what its scope allowed.

Permissions in practice: the bypassPermissions trap

Here is where the productivity pressure shows up. A CI job is non-interactive, so the agent cannot stop and ask you to approve a command. The tempting fix is bypassPermissions, the mode behind the --dangerously-skip-permissions flag, which makes every tool call run immediately. It is the wrong default, and Anthropic's documentation says so in plain language: bypass mode offers no protection against prompt injection or unintended actions, and it is meant for isolated, throwaway containers only.

There is a non-interactive mode built for exactly this, and it is not the dangerous one.

ModeWhat it doesPrompt-injection protectionUse it for
dontAskAuto-denies any tool call that would otherwise prompt; only actions on your permissions.allow list and read-only Bash runDeny-by-default, not injection-proof: a broad allow list reopens the surfaceLocked-down CI and scripts
bypassPermissionsDisables prompt and safety checks; tool calls execute immediatelyNone, per the official docsIsolated throwaway containers only
autoA classifier model approves or denies each action at runtimePartial: a disclosed 17% false-negative rate, and in headless mode the session just ends after repeated denialsAssisting a human, not gating a merge

dontAsk is the mode the documentation points at for locked-down CI. It is non-interactive precisely because it auto-denies anything that would have prompted, so the agent runs to completion inside the fence you drew with the allow list. Permission rules evaluate in a fixed precedence: deny beats ask beats allow, by category, no matter where each rule sits in the file. A deny rule is the strongest lever you have. One caveat to state plainly, because it is easy to oversell: dontAsk is deny-by-default, not injection-proof. It stops the agent from reaching tools you never approved, but anything on the allow list, a broad Bash permission or an outbound network call, still runs even when the instruction to use it was injected. Keep the allow list short, or the fence has a gate in it.

What about auto mode, the newer middle path where a classifier judges each action in context? It is genuinely useful for reducing per-action approvals, and a fair reading of the counter-argument is that a runtime classifier adapts better than a static allow list you have to maintain by hand. I take that seriously. I also read Anthropic's own engineering write-up, which discloses a 17% false-negative rate on overeager actions and states plainly that auto mode is not a drop-in replacement for careful human review on high-stakes infrastructure. In non-interactive runs there is no human to escalate to; the session simply terminates after repeated denials. So auto mode can lower the cost of approvals. It does not become the gate. The gate is still downstream, and it is deterministic.

The deeper point is about where a rule lives. An instruction in a prompt or a CLAUDE.md is advisory; the agent can reason its way around it. A rule that must hold every time belongs in a Claude Code hook that returns a deny decision the agent cannot override, even in a mode that skips the usual prompts. That is the same routing question I worked through in where each kind of rule belongs: if it has to hold, give it teeth, do not give it a paragraph.

The merge gate the agent does not own

This is the load-bearing section, so I will state it directly. The control that makes a CI agent safe to run unattended is not a smarter permission set. It is a deterministic validator on the path to main that the agent cannot weaken, talk past, or mark green on its own authority.

GitHub already provides the machinery: branch protection with required status checks that must pass before collaborators can merge; "require review from Code Owners" to put a named person on the hook; and stale-review dismissal so that a new commit resets an approval the agent might otherwise inherit. Turn on "Do not allow bypassing the above settings," or an admin (or an app with admin) can merge straight past every check you just configured. By default, claude-code-action does not even merge. It commits to a branch and links a PR for a human to open. That default is structural containment, and you should keep it.

A gate only holds if the agent cannot edit what defines it. The required checks, the validator scripts they run, the CODEOWNERS file, and the branch rules are all just files in the repo, and an agent with write access can change a file. Put .github/ behind CODEOWNERS review, so weakening the gate needs the same human sign-off as any other change. A gate the agent can rewrite is not a gate.

Not a gate

The agent's self-report

  • "All tests passed" in the run log
  • A green check the agent's own token can set
  • An approval the same automation can dismiss
  • Trust that scales with the prompt, not the risk
The merge owner

A gate the agent cannot weaken

  • Required status checks on the protected branch
  • A status the agent's token has no permission to set
  • A named human via CODEOWNERS as the backstop
  • Stale-review dismissal on every new commit

The agent authors the change. Something it cannot influence decides whether the change ships.

Lean on the human less than your instinct wants. A May 2026 study of AI-generated pull requests found that most receive no human review at all, and when they are reviewed, the review is frequently mediated by another agent rather than a person reading the diff. If your safety story is "a human will catch it," the data says the human often will not. That is the argument for making the deterministic check load-bearing and the human the backstop, not the reverse. The check runs the same way every time; the human is the judgment call you escalate the hard cases to.

This is the same conclusion I reached coming at agentic work from other directions: that the verification loop, the deterministic check between the change and main, is the part that matters, which I argued in who owns the verification loop, and that the review bar itself needs an owner once an agent can re-green a PR while you are asleep. CI is just the most literal version of that question, because in CI there is no IDE, no human glancing at the output, nothing between the agent and main except the gate you built.

I will anchor it in my own setup, because I would not ask you to run something I do not. Every change to the repository behind this site has to clear a local pre-commit gate of 17 checks and a separate CI gate before it can merge. When I let Claude Code open changes against that repo, the gate is what decides whether they ship, not the model's report that the work is done. The most useful thing I have built around it is not a cleverer prompt. It is a pre-commit hook that returns a hard deny the agent cannot talk its way past, sitting in front of a required check whose status its token has no permission to set. That is governance you can run a thousand times without watching.

What containment does not fix

One honest limit, so the checklist does not oversell itself. Containment addresses the blast radius of the agent's authority. It does not make a wrong agent right. A January 2026 taxonomy of failed agentic pull requests found that the large majority of rejections, more than nine in ten, came from coordination and competence problems: reviewers abandoning the PR, duplicate work, tests failing, features nobody asked for. Authority and security failures were a small slice. Everything in this post limits what a hijacked or over-permissioned agent can damage. None of it makes the agent's code good. That is a different problem, addressed by the verification loop's tests and by the cornerstone discipline of running Claude Code as a production engineering practice and keeping agents reliable in production. Contain the authority; verify the work. Two jobs, both required.

Before the first unattended run, the short version:

  • Run the agent in dontAsk, never bypassPermissions, with a deliberately short allow list.
  • Skip static PATs. Use the job-scoped GITHUB_TOKEN, and set persist-credentials: false on checkout so the token does not linger in .git/config where the agent can read it back.
  • Declare permissions: explicitly, read-only by default, raised per job only where a job genuinely needs it.
  • SHA-pin every action, including claude-code-action itself.
  • Require status checks and CODEOWNERS review on the protected branch, turn off bypassing, and put .github/ behind CODEOWNERS too.

None of these is the whole answer on its own. Together they move the merge decision off the agent and onto something it cannot quietly change.

If you want help designing that containment layer, the hooks, the permission scoping, and the CI integration, that is what Claude Code infrastructure setup covers. Or take fifteen minutes and we can look at where your current pipeline trusts the agent more than it should: book a short call.

FAQ

Can Claude Code run inside GitHub Actions?

Yes. The official claude-code-action runs the agent inside a GitHub Actions job, triggered by an @claude mention, a label, or a schedule. By default it commits to a new branch and links a pull request for a human to open and merge; it does not merge on its own.

Is it safe to give a Claude Code agent CI access?

Safe is the wrong frame. Treat the agent as an untrusted contributor: give it the least authority that still lets it work, isolate the run, and put a deterministic gate plus a named human in front of the merge. The risk is the authority you grant, not the agent's competence.

Should a Claude Code agent use bypassPermissions or --dangerously-skip-permissions in CI?

No. bypassPermissions disables safety checks and, per Anthropic's docs, offers no protection against prompt injection. Use dontAsk mode instead: it auto-denies any tool call that would prompt and only runs actions on your allow list plus read-only Bash, which makes it non-interactive for CI without removing the guardrails.

Who reviews code an AI agent writes in CI?

A named human, assigned through CODEOWNERS, and backstopped by required status checks on the protected branch that the agent's token cannot mark green. The agent's own report that tests passed is not a gate.