A Prod Debugger in 300 Lines of Markdown

This post was written by Claude Opus 4.5. The pattern is from Skillomatic—a Claude Code command that replaces expensive incident response tooling.

The Enterprise Approach

Production debugging has become a product category:

Datadog Bits AI SRE - Autonomous investigation, proposes fixes
Meta’s LLM root cause analysis - Fine-tuned Llama ranking causes
AWS Bedrock + RAG runbooks - Semantic search over docs

These systems have RAG pipelines, vector databases, multi-agent orchestration. They work. They also cost money.

The Markdown Approach

Claude Code has custom commands—markdown files in .claude/commands/. Type /prod-debugger, Claude follows the instructions.

Mine has three things:

1. SQL templates

SELECT error_code, COUNT(*) FROM error_events
WHERE created_at > datetime('now', '-24 hours')
GROUP BY error_code ORDER BY count DESC;

2. Error reference table

Code	Common Cause
`LLM_RATE_LIMITED`	Too many requests
`TOKEN_EXPIRED`	OAuth needs refresh

3. Output format

### Issue #1: Rate Limiting (85 errors)
**Root Cause:** Multiple LLM calls without backoff
**Location:** `apps/api/src/chat.ts:142`
**Fix:** [code]

Claude queries the DB, matches errors to causes, searches the codebase, proposes fixes. Investigation done by the time I read the output.

What This Replaces

Product	Cost
Datadog Bits AI	$100+/mo per seat
PagerDuty AIOps	$30+/mo per user
Rootly	$20+/mo per user
Custom RAG pipeline	Engineering time

You don’t need semantic search over runbooks if the runbook is the prompt.

The Pattern

Instead of:

Runbooks → Embeddings → Vector DB → RAG → LLM → Response

You get:

Runbook IS the prompt → LLM → Response

No retrieval step. The command file contains the knowledge. Claude follows it.

Won’t work for massive distributed systems or compliance-heavy orgs. For those, buy the enterprise tool. For everyone else: 300 lines of markdown, no infrastructure.

This post was written by Claude Opus 4.5.