结构化输出：让 Claude 返回可以直接解析的数据

"让 Claude 返回 JSON"听起来简单，但生产环境里经常出现 Claude 在 JSON 前后加了解释文字，或者字段名不一致。正确的做法是用结构化输出，不是靠 prompt 祈祷。

方式一：工具调用（最可靠）

把你要的数据结构定义为一个"工具"，让 Claude 调用它。这是最可靠的方式，因为 Claude 知道它在填写结构化数据，不会多加文字。

import anthropic
import json

client = anthropic.Anthropic()

# 把数据结构定义为工具
code_review_tool = {
    "name": "submit_code_review",
    "description": "提交代码审查结果",
    "input_schema": {
        "type": "object",
        "properties": {
            "severity": {
                "type": "string",
                "enum": ["critical", "warning", "suggestion"],
            },
            "issues": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "line": {"type": "integer"},
                        "description": {"type": "string"},
                        "fix": {"type": "string"},
                    },
                    "required": ["line", "description"],
                }
            },
            "summary": {"type": "string"},
        },
        "required": ["severity", "issues", "summary"],
    }
}

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=2048,
    tools=[code_review_tool],
    tool_choice={"type": "tool", "name": "submit_code_review"},  # 强制调用
    messages=[{"role": "user", "content": f"审查这段代码：\n{code}"}],
)

# 直接获取结构化数据
result = response.content[0].input
print(result["severity"], result["issues"])

方式二：system prompt + JSON mode

如果不想定义工具，可以在 system prompt 里强制 JSON 输出，但需要更严格的约束：

response = client.messages.create(
    model="claude-sonnet-4-5",
    system="""你只输出 JSON，不输出任何其他内容。
不要有 markdown 代码块，不要有解释，直接输出 JSON 对象。

输出格式：
{
  "category": "bug|improvement|style",
  "confidence": 0.0-1.0,
  "description": "问题描述"
}""",
    messages=[...],
)

# 解析时加防御
try:
    data = json.loads(response.content[0].text)
except json.JSONDecodeError:
    # 处理 Claude 偶尔输出非 JSON 的情况
    text = response.content[0].text
    # 尝试提取 JSON 片段
    import re
    match = re.search(r'\{.*\}', text, re.DOTALL)
    if match:
        data = json.loads(match.group())

Pydantic 集成（Python）：

from pydantic import BaseModel
from typing import Literal

class ReviewIssue(BaseModel):
    line: int
    description: str
    fix: str | None = None

class CodeReview(BaseModel):
    severity: Literal["critical", "warning", "suggestion"]
    issues: list[ReviewIssue]
    summary: str

# 用 Pydantic 的 JSON schema 自动生成工具定义
tool_schema = CodeReview.model_json_schema()

批量处理时的输出一致性：

当处理大量数据时，Claude 的输出格式可能在不同请求间有细微差异。最佳实践： - 用工具调用强制结构 - 对每个响应做 schema 验证 - 记录并监控解析失败率

生产环境里，解析失败率超过 0.1% 就需要排查 prompt 设计了。