Anthropic recently shipped Code Review for Claude Code: a native, multi-agent PR reviewer that dispatches parallel agents to hunt bugs, verify findings, and post inline comments. It’s impressive. It also costs $15-25 per review.
Months before that launched, I’d already built something similar for my org using claude-code-action. Different tradeoffs: no multi-agent orchestration, but full control over prompts, triggers, and auth, at a fraction of the cost. Here’s the architecture I landed on: 13 lines of YAML per repo, everything else centralized.
Table of contents
- The landscape: native vs. DIY
- The three-tier architecture
- The workflows, from repo to review
- The prompt and review types
- Authentication and provider choice
- Keeping costs under control
- Lessons learned
The landscape: native vs. DIY
There are three ways to get AI-powered PR reviews on GitHub today.
Anthropic’s native Code Review is a multi-agent system. Parallel review agents cross-check findings for false positives and post inline comments. Anthropic reports 84% of large PRs get actionable findings with less than 1% false positive rate. It requires a Team or Enterprise plan, and at org scale, $15-25 per review gets expensive quickly.
claude-code-action@v1 is the official GitHub Action. One Claude Code session per PR event. You control the prompt, the model, the auth. Much cheaper (you pay only for API tokens), but you build the orchestration yourself.
The DIY centralized approach (this post) wraps claude-code-action in reusable workflows. One config governs the whole org. Smart triggers route PRs to different review types. You get centralized prompt iteration and cost controls without touching individual repos.
AI PR reviews are essential guardrails at this point. The only question is how much control and budget you want to keep over them.
The three-tier architecture
The design separates concerns into three layers:
graph TB
subgraph "Repository Level"
PR[Pull Request Event]
Comment[PR Comment]
Trigger[Trigger Workflow]
end
subgraph "Organization Level"
Central[Central Workflow]
GHA[GitHub App]
Auth[Auth Provider]
end
subgraph "AI Provider"
API[Claude API]
Claude[Claude Model]
end
subgraph "Response"
Review[PR Review]
Inline[Inline Comments]
Status[Status Check]
end
PR --> Trigger
Comment --> Trigger
Trigger -->|"Calls"| Central
Central -->|"Authenticates"| GHA
Central -->|"Credentials"| Auth
Auth -->|"API Call"| API
API -->|"Invokes"| Claude
Claude -->|"Returns Analysis"| API
API -->|"Response"| Central
Central -->|"Posts"| Review
Central -->|"Adds"| Inline
Central -->|"Updates"| Status
style PR fill:#e1f5fe
style Comment fill:#e1f5fe
style Central fill:#fff3e0
style Auth fill:#e8f5e9
style Claude fill:#f3e5f5
style Review fill:#e8f5e9
Tier 1 is a thin YAML stub in each repo. It defines which events trigger a review and delegates everything else.
Tier 2 is a reusable workflow in your org’s .github repo. It decides whether to review and what kind of review to run, based on event type, comment content, and PR size.
Tier 3 is the central review workflow: auth, prompt assembly, claude-code-action invocation, result posting.
If you’ve worked on a platform team, this pattern is familiar. A thin per-repo shim calling centralized pipeline logic. Want to change the review prompt? Update one file. Add a new review type? One file. The repos never know.
The workflows, from repo to review
Tier 1: The repo stub
This is the only file you add to each repository:
name: AI Code Review
on:
pull_request:
types: [opened, synchronize, ready_for_review]
issue_comment:
types: [created]
jobs:
claude-review:
uses: your-org/.github/.github/workflows/claude-review-trigger.yml@main
secrets: inherit
Thirteen lines. That’s the whole thing. Two event triggers and a uses: call with secrets: inherit.
Tier 2: The trigger router
The router lives in your org’s .github repo. It receives the event from Tier 1, decides if a review should happen, and picks the review type.
The gate condition filters out noise early:
if: |
(github.event_name == 'pull_request' &&
github.event.pull_request.draft == false) ||
(github.event_name == 'issue_comment' &&
github.event.issue.pull_request &&
contains(github.event.comment.body, '@claude'))
Draft PRs? Skipped. Comments on issues (not PRs)? Skipped. Only @claude mentions on pull requests pass through.
For automatic PR events, size determines the review type:
ADDITIONS="${{ github.event.pull_request.additions }}"
DELETIONS="${{ github.event.pull_request.deletions }}"
TOTAL_CHANGES=$((ADDITIONS + DELETIONS))
if [ $TOTAL_CHANGES -lt 50 ]; then
REVIEW_TYPE="quick"
elif [ $TOTAL_CHANGES -gt 1000 ]; then
REVIEW_TYPE="comprehensive"
fi
For comment-triggered reviews, the type is parsed from the comment:
COMMENT_LOWER=$(echo "$COMMENT_BODY" | tr '[:upper:]' '[:lower:]')
if [[ "$COMMENT_LOWER" == *"security"* ]]; then
REVIEW_TYPE="security"
elif [[ "$COMMENT_LOWER" == *"performance"* ]]; then
REVIEW_TYPE="performance"
elif [[ "$COMMENT_LOWER" == *"quick"* ]]; then
REVIEW_TYPE="quick"
else
REVIEW_TYPE="comprehensive"
fi
@claude security gets a security-focused review. @claude quick gets a fast pass. Just @claude defaults to comprehensive.
Once resolved, the router calls Tier 3 with structured inputs:
trigger-review:
needs: check-trigger
if: needs.check-trigger.outputs.should_review == 'true'
uses: your-org/.github/.github/workflows/claude-review.yml@main
with:
pr_number: ${{ needs.check-trigger.outputs.pr_number }}
repository: ${{ github.repository }}
trigger_comment: ${{ needs.check-trigger.outputs.trigger_comment }}
review_type: ${{ needs.check-trigger.outputs.review_type }}
secrets:
GCP_WORKLOAD_IDENTITY_PROVIDER: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}
GCP_SERVICE_ACCOUNT: ${{ secrets.GCP_SERVICE_ACCOUNT }}
APP_ID: ${{ secrets.APP_ID }}
APP_PRIVATE_KEY: ${{ secrets.APP_PRIVATE_KEY }}
Tier 3: The central review
This is where claude-code-action actually runs. The workflow checks out the PR branch, authenticates, and invokes the action with a dynamic prompt:
- name: Claude Code Review
uses: anthropics/claude-code-action@v1
with:
github_token: ${{ steps.app-token.outputs.token }}
prompt: |
You are an expert code reviewer. Review PR #${{ inputs.pr_number }}.
## Review Context
- **Review Type**: ${{ inputs.review_type }}
- **Focus**: ${{ steps.review-context.outputs.FOCUS_AREAS }}
${{ steps.review-context.outputs.REVIEW_DEPTH }}
Structure your review as:
### Summary
### Issues Found
### Suggestions
### Review Metrics
- **Risk Level**: Low/Medium/High
- **Code Quality**: 1-10
- **Recommendation**: Approve / Request Changes / Needs Discussion
claude_args: |
--model claude-sonnet-4-5@20250929
--allowedTools "Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*),Bash(git log:*),Bash(git diff:*),Read,Grep,Glob"
The allowedTools list is deliberate. Claude can read code, run git commands, and post comments, but it can’t modify files, run arbitrary commands, or hit the network. Read-only reviewer by design.
The workflow also posts a failure notice to the PR with common troubleshooting steps when something breaks, so developers aren’t left guessing.
The prompt and review types
The prompt is where you actually tune what the reviewer does. Everything else is plumbing.
A Prepare Review Context step maps the review type to focus areas and depth instructions before the action runs. The routing logic:
graph TD
Start[PR Event / Comment] --> Check{Trigger Type?}
Check -->|Comment| Parse[Parse Comment]
Check -->|PR Event| Size[Check PR Size]
Parse --> Security{Contains 'security'?}
Security -->|Yes| SecReview[Security Review]
Security -->|No| Perf{Contains 'performance'?}
Perf -->|Yes| PerfReview[Performance Review]
Perf -->|No| Quick{Contains 'quick'?}
Quick -->|Yes| QuickReview[Quick Review]
Quick -->|No| CompReview[Comprehensive Review]
Size --> Small{< 50 lines?}
Small -->|Yes| QuickReview
Small -->|No| Large{> 1000 lines?}
Large -->|Yes| CompReview
Large -->|No| CompReview
style Start fill:#e1f5fe
style SecReview fill:#ffebee
style PerfReview fill:#fff3e0
style QuickReview fill:#e8f5e9
style CompReview fill:#f3e5f5
| Review type | Trigger | Focus areas |
|---|---|---|
| Quick | Auto (<50 lines) or @claude quick | Obvious bugs, basic code quality, critical problems |
| Comprehensive | Auto (default) or @claude review | Code quality, security, performance, testing, docs, architecture |
| Security | @claude security | Vulnerabilities, auth issues, input validation, data exposure, dependencies |
| Performance | @claude performance | Bottlenecks, algorithm efficiency, query optimization, caching, memory |
The mapping in code:
case "${{ inputs.review_type }}" in
security)
FOCUS_AREAS="security vulnerabilities, authentication issues, input validation, \
SQL injection, XSS, CSRF, sensitive data exposure, and dependency vulnerabilities"
REVIEW_DEPTH="Perform a thorough security audit."
;;
performance)
FOCUS_AREAS="performance bottlenecks, algorithm efficiency, database query \
optimization, caching opportunities, memory usage, and scalability concerns"
REVIEW_DEPTH="Focus on performance implications."
;;
quick)
FOCUS_AREAS="obvious bugs, basic code quality issues, and critical problems"
REVIEW_DEPTH="Provide a quick, high-level review."
;;
*)
FOCUS_AREAS="code quality, best practices, security vulnerabilities, performance, \
testing, documentation, and architecture"
REVIEW_DEPTH="Perform a comprehensive review."
;;
esac
The structured output format (summary, issues, suggestions, metrics) keeps reviews scannable. Risk level and recommendation fields make triage fast without reading the full review.
Authentication and provider choice
Two things need auth: the AI provider and GitHub.
Claude provider
claude-code-action supports three providers. Pick one:
- Direct API key: set
ANTHROPIC_API_KEYas a secret. Simplest way to get started. - Vertex AI via OIDC: keyless, GCP-native. Workload Identity Federation means your workflow never holds a long-lived key.
- Bedrock: AWS-native. Use IAM roles.
For Vertex AI with keyless auth, the workflow uses Google’s OIDC integration:
- name: Authenticate to Google Cloud
uses: google-github-actions/auth@v2
with:
workload_identity_provider: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}
service_account: ${{ secrets.GCP_SERVICE_ACCOUNT }}
- name: Claude Code Review
uses: anthropics/claude-code-action@v1
env:
ANTHROPIC_VERTEX_PROJECT_ID: your-gcp-project
CLOUD_ML_REGION: global
with:
use_vertex: "true"
No API keys stored anywhere. GitHub’s OIDC token gets exchanged for short-lived GCP credentials at runtime. The GitHub OIDC docs walk through the Workload Identity Federation setup.
GitHub App
Regardless of provider, you need a GitHub App for PR write permissions. The built-in GITHUB_TOKEN can’t reliably post comments on PRs from reusable workflows across repos. A GitHub App gives you a stable identity, fine-grained permissions, and works across the org.
- name: Generate GitHub App token
uses: actions/create-github-app-token@v2
with:
app-id: ${{ secrets.APP_ID }}
private-key: ${{ secrets.APP_PRIVATE_KEY }}
repositories: ${{ github.event.repository.name }}
Keeping costs under control
This is the real reason to build it yourself.
A single Claude session per review costs roughly $1-3 depending on PR size and review type. Compare that to $15-25 for the native multi-agent system. At 50 PRs/week across an org, that’s $50-150/week vs $750-1,250/week. A 10x difference.
Here’s how to keep it predictable:
Skip conditions. The Tier 2 router already skips draft PRs. I also skip bot-authored PRs (github.actor checks) and WIP branches (title prefix matching). No point burning tokens on Dependabot bumps.
Size-based routing. Small PRs (<50 lines) get a quick review: shorter prompt, fewer tokens, faster turnaround. Only large or explicitly requested PRs get the full treatment.
Debounce. Reviews fire on ready_for_review, not on every synchronize during development. This alone cut my review volume in half.
Model selection. Sonnet for routine reviews. Opus for security reviews or anything over 1000 lines where deeper reasoning matters. One-line change in the central workflow.
You can also log token usage as workflow artifacts for budget tracking, but honestly, at $1-3 per review, I haven’t needed to watch it closely.
Lessons learned
-
Start with quick reviews. Get auth, permissions, and routing working before tuning prompts. A quick review that posts “LGTM, no major issues” proves the whole pipeline end-to-end. I spent two days debugging OIDC before writing a single line of prompt.
-
Make reviews advisory, never blocking. Don’t gate merges on AI reviews. Teams adopt advisory tools faster, and you avoid the politics of “the bot blocked my PR.”
-
Iterate prompts centrally. The whole point of this architecture. When reviews miss a class of issues, you update the prompt in one place. Every repo benefits immediately.
-
The 13-line stub is a feature. Repo owners don’t need to understand the review system. They drop in a file and get reviews. When the central team ships an improvement, every repo gets it for free.
-
Watch for hallucinated line numbers. Claude sometimes references lines that don’t exist in the diff. The
allowedToolsconstraint helps (Claude cangh pr diffto see actual changes), but you’ll still see occasional misses. Inline comments viaclaude-code-action’s built-in GitHub MCP tools are more reliable than asking the model to format line references manually.
References
- Anthropic: Code Review for Claude Code
- anthropics/claude-code-action on GitHub
- GitHub: Reusing Workflows
- GitHub: OIDC with Google Cloud Platform
- GitHub: Creating a GitHub App
I’m still iterating on prompt quality and thinking about adding model routing based on file types (Opus for infra changes, Sonnet for everything else). If you’ve built something similar, I’d like to hear what worked. GitHub.