工具使用、函数调用与 MCP

语言模型本身,是一个精巧的文本变换器:它读文本、产文本。我们称作“agent”的一切——能读文件、查数据库、调 API、跑代码、与世界交互的系统——都是一个配备了工具的语言模型。function calling(Anthropic 叫“tool use”,OpenAI 叫“function calling”,是同一个概念)正是那个把“能描述一个动作的模型”与“能真正采取它的模型”衔接起来的机制。

深入理解 function calling——schema 如何工作、执行循环如何结构化、并行调用长什么样、它与 structured output 有何关系——是任何构建 LLM 驱动应用的人的基础。其上还有一个更新的层:Model Context Protocol(MCP),Anthropic 推出的一个开放标准,把 agent 在多样生态中如何连接工具的方式标准化了。MCP 之于 agent,正如 REST 之于 web 服务:一个让组件无需定制集成即可互操作的通用接口。本文对两者都做完整深入的讲解。

⚡ 速览要点

function calling 是执行层。模型不跑代码——它发出一个结构化的 tool call(名称 + 参数);你的 controller 执行真正的函数并把结果喂回去。模型只看到输入与输出,从不看到实现。
JSON Schema 定义契约。每个工具由一个名称、一段模型用来决定何时调用的描述,以及一个约束参数的 JSON Schema 描述。好的描述至关重要——模型据此决定调哪个工具。
tool-use 循环天生是多轮的。单个任务可能涉及数十次 tool call,每次都观察上一次的结果。当模型发出文本响应而非 tool call 时循环结束。
并行 tool call 削减延迟。模型能在一个回合内发出多个 tool call;你并发扇出、一次性返回所有结果,缩短独立读取的墙钟时间。
MCP 标准化工具表面。不必为每个 agent 里的每个工具写定制集成,MCP server 经标准协议暴露工具——任何 MCP 兼容的 host(Claude Code、Cursor 等)都能连接任何 MCP server,无需胶水代码。
MCP 与 RAG 解决不同问题。RAG 检索只读上下文;MCP 提供有状态、可执行的工具。它们互补,而非替代。

tldr

function calling 是机制:模型输出一个 tool call → 你的代码执行它 → 结果回传 → 模型继续。MCP 是标准化层:一个协议,让工具(作为 MCP server)在任何合规 agent host 上可复用。两者皆精通,才能构建既强大又可维护的 agent。

function calling 如何工作:机制

关键心智模型:模型不执行函数。它从不碰你的数据库、文件系统或任何外部 API。模型做的,是产出一个结构化输出,说“我想用参数 Y 调用函数 X”。你的应用代码收到这个输出、校验它、运行真正的函数、把结果作为新消息返回给模型。模型再决定下一步做什么。

这个设计是刻意的。执行发生在你的代码里,在你的掌控下,用你的权限和你的错误处理。模型被沙箱化为只产出意图;采取行动是你的责任。这种分离让 function-calling 系统可审计、可测试、安全——你能在测试里 mock 工具响应、记录每次调用、对昂贵工具限流、在有副作用的操作前要求确认,全程根本不碰模型。

三阶段交互

模型产出一个 tool call。当模型决定它需要某个工具时,它发出一个结构化的 tool-use 块,替代(或附加于)文本。该块含工具名称和一个符合工具 schema 的参数 JSON 对象。
controller 执行工具。你的代码反序列化 tool-use 块、校验参数、调用真正的函数、收集结果(成功或错误)。
结果回传。你把 tool 结果作为新消息追加(Anthropic 上 role: tool,OpenAI 上 role: tool 带 tool_call_id),再次调用 API。模型看到结果并决定下一步——要么再发一个 tool call,要么产出最终文本响应。

用 JSON Schema 定义工具

每个工具由三样东西定义:模型用来标识它的名称、模型读来决定何时使用它的描述,以及定义哪些参数有效的 input_schema(JSON Schema)。三者都重要。

python — 带 JSON Schema 的工具定义

tools = [
    {
        "name": "search_codebase",
        "description": """Search the repository for files or code matching a query.
Use this to find relevant files before reading them. Returns a list of
file paths and matching line snippets. Prefer this over read_file when
you don't know which file contains what you need.""",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query — function name, variable, error message, or concept.",
                },
                "file_pattern": {
                    "type": "string",
                    "description": "Optional glob pattern to restrict search, e.g. '*.py' or 'src/**/*.ts'.",
                },
                "max_results": {
                    "type": "integer",
                    "description": "Maximum number of results to return. Default 10.",
                    "default": 10,
                },
            },
            "required": ["query"],
        },
    },
    {
        "name": "run_tests",
        "description": """Run the test suite and return results. Use after making code changes
to verify correctness. Returns pass/fail status and any error output.""",
        "input_schema": {
            "type": "object",
            "properties": {
                "test_path": {
                    "type": "string",
                    "description": "Specific test file or directory. Omit to run all tests.",
                },
                "flags": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Additional pytest flags, e.g. ['-x', '--tb=short'].",
                },
            },
            "required": [],
        },
    },
]

把描述写到模型能用对

模型用你的描述——而非 schema——来决定调哪个工具。描述就是 prompt 文本,这里 prompt 质量和别处一样重要。弱描述导致错误的工具选择:

太含糊:"Search for things."——模型不知道该用它,还是 read_file 或 grep。
太偏实现:"Runs ripgrep with -r flag."——模型无法据此推断何时搜索才是对的动作。
好:描述工具做什么、概念上期望什么输入、何时优先于替代选项、返回什么。把描述当作给初级工程师的指令来写。

同理,schema 里的属性描述也重要。无描述的 "query" 规约不足;"The search query — function name, variable, error message, or concept" 告诉模型如何组织输入。

tool-use 循环细节

我们追踪一次完整的多工具交互,把机制讲具体。任务:“找到并修复导致 test_auth 失败的 bug。”

python — 带结果路由的完整 tool-use 循环

import anthropic, json

client = anthropic.Anthropic()

def execute_tool(name: str, args: dict) -> str:
    # 分派到真正的实现
    if name == "run_tests":
        import subprocess
        cmd = ["python", "-m", "pytest"] + args.get("flags", [])
        if "test_path" in args:
            cmd.append(args["test_path"])
        result = subprocess.run(cmd, capture_output=True, text=True)
        return result.stdout + result.stderr
    elif name == "read_file":
        return open(args["path"]).read()
    elif name == "write_file":
        with open(args["path"], "w") as f:
            f.write(args["content"])
        return f"Written {len(args['content'])} chars to {args['path']}"
    return f"Unknown tool: {name}"

messages = [{"role": "user", "content": "Find and fix the bug causing test_auth to fail."}]

while True:
    response = client.messages.create(
        model="claude-opus-4-5", max_tokens=4096,
        tools=tools, messages=messages,
    )

    # 把 assistant 回合追加进对话
    messages.append({"role": "assistant", "content": response.content})

    if response.stop_reason == "end_turn":
        # 模型产出了最终文本响应 —— 完成
        print(response.content[0].text)
        break

    # 收集本回合所有 tool call(可能多个)
    tool_results = []
    for block in response.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": result,
            })

    # 把结果喂回模型
    messages.append({"role": "user", "content": tool_results})

注意这个循环完全通用。它对具体任务一无所知——那份知识活在模型里。controller 只知道如何:(1) 调 API,(2) 检查模型是否完成,(3) 执行它找到的任何 tool call,(4) 返回结果。这是每个 function-calling 应用的标准模式。

并行 tool call:消除不必要的延迟

在上面的基础循环里,若模型决定读三个文件,它分三个顺序回合做:请求文件 A、拿 A、请求文件 B、拿 B、请求文件 C、拿 C。那是所需往返时间的 3 倍。现代模型支持在一个回合内发出多个 tool call,你的 controller 应当并发扇出它们。

python — 用 asyncio 并行执行工具

import asyncio, anthropic

async def execute_tool_async(name: str, args: dict) -> str:
    # 工具实现的异步版本
    if name == "read_file":
        import aiofiles
        async with aiofiles.open(args["path"]) as f:
            return await f.read()
    # ... 其他工具

async def handle_tool_calls(tool_use_blocks):
    # 把本回合所有 tool call 并发扇出
    tasks = [
        execute_tool_async(block.name, block.input)
        for block in tool_use_blocks
        if block.type == "tool_use"
    ]
    results = await asyncio.gather(*tasks)  # 全部并行运行

    return [
        {
            "type": "tool_result",
            "tool_use_id": block.id,
            "content": result,
        }
        for block, result in zip(
            [b for b in tool_use_blocks if b.type == "tool_use"],
            results
        )
    ]
# 若模型请求 read_file("auth.py")、read_file("utils.py")、read_file("tests/test_auth.py")
# → 三次读取并发进行;延迟 = max(read_time) 而非 sum(read_time)

提速是实打实的:并行读 5 个文件与读 1 个文件耗时大致相同。对做大量探索(读许多文件以理解代码库)的 agent,并行 tool call 能把探索时间砍 3–5 倍。

并行调用何时不安全

扇出只对独立操作正确。若 tool call B 依赖 tool call A 的结果(如“搜一个文件,再读结果”),它们必须顺序进行——B 用到 A 结果里的信息。模型通常通过为依赖调用发出顺序回合来正确处理这点。你的 controller 的职责,是把模型在单个回合里发出的任何东西并发执行,而不去给独立调用强加自己的顺序。

把 tool use 当作 structured output

tool use 不只为 agent——它也是从模型抽取 structured output 最可靠的方式。当你想让模型返回特定数据形状(供下游代码解析)时,把那个形状定义为 tool schema 并把 tool_choice 设为强制使用该 tool,远比让模型“返回 JSON”可靠。

方法	语法可靠性	schema 可靠性	何时用
prompt:“返回 JSON”	80–95%	低	仅快速实验
JSON 模式	100%	低(JSON 有效但形状错)	任何有效 JSON 都可接受时
强制 tool call	100%	高(schema 约束)	生产抽取流水线

强制 tool call 之所以有效,是因为模型被训练得能产出符合 schema 的参数。API 在返回前校验输出——格式错乱的 tool call 会是模型错误,而非应用错误。你可以把结果当作有类型的 dict,无需防御式解析。

什么是 MCP?

上面描述的 function calling 要求你写工具定义(schema)、实现执行逻辑、在应用里把一切接起来。每个 agent 应用都独立重造这个轮子。想让你的 agent 跟 GitHub 对话,你写一个 GitHub 工具;想让它跟 Postgres 对话,你写一个 Postgres 工具;别人造另一个 agent,他写他自己的 GitHub 工具。毫无复用。

Model Context Protocol(MCP),Anthropic 于 2024 年末发布,直接解决这点。MCP 是一个开放标准,定义了一个通用协议来暴露工具(及其他能力),任何合规 host 无需定制集成工作即可消费。它之于 agent,正如 USB 之于外设:一个让组件互操作的标准接口。

核心理念一句话

MCP 把工具实现(一个 MCP server)与工具消费者(一个 MCP host)分开,中间夹一个标准协议,从而让任何 server 都能与任何 host 协作。

MCP 架构:Host、Client、Server

MCP 定义三种角色:

MCP Host:运行 LLM 并管理 agentic 循环的应用。Claude Code、Cursor、一个自定义 agent——任何想用工具的应用都可以是 MCP host。host 为它连接的每个 server 含一个 MCP client。
MCP Client:host 内部的一个薄层,维持到某个 MCP server 的持久连接,处理协议分帧(连接、能力协商、请求/响应)。
MCP Server:一个进程,经 MCP 协议暴露一组工具(可选还有 resource 和 prompt)。真正的功能活在 server 里:它可能包装一个 GitHub API client、一个数据库连接、一个文件系统层、一个 web 浏览器,或别的什么。

text — MCP 架构图

┌─────────────────────────────────────────────┐
│                  MCP HOST                   │
│  (Claude Code, Cursor, custom agent app)    │
│                                             │
│  ┌──────────┐   ┌──────────┐   ┌─────────┐ │
│  │ LLM      │   │ MCP      │   │ MCP     │ │
│  │ (Claude  │◄──│ Client   │   │ Client  │ │
│  │  /GPT)   │   │    A     │   │    B    │ │
│  └──────────┘   └────┬─────┘   └────┬────┘ │
└────────────────────────────────────────────-┘
                        │                  │
              MCP Protocol         MCP Protocol
              (JSON-RPC 2.0)       (JSON-RPC 2.0)
                        │                  │
               ┌────────▼───────┐  ┌─▼──────────────┐
               │  MCP Server A  │  │  MCP Server B  │
               │  (GitHub API)  │  │  (PostgreSQL)  │
               │                │  │                │
               │  tools:        │  │  tools:        │
               │  - list_repos  │  │  - run_query   │
               │  - create_pr   │  │  - list_tables │
               │  - get_issues  │  │  - describe_db │
               └────────────────┘  └────────────────┘

传输层是 JSON-RPC 2.0,通常走 stdio(本地 server)或带 Server-Sent Events 的 HTTP(远程 server)。分帧是标准化的;变化的是每个 server 暴露的工具及其 schema——server 在连接时经能力协商握手广而告之。

MCP server 暴露的三种原语

原语	它是什么	类比
Tools	模型可调用的函数;接收参数、有副作用、返回结果	REST API 端点(POST/PUT/DELETE)
Resources	host 可加载进上下文的只读数据;以 URI 标识	REST API 端点(GET)或文件
Prompts	server 为常见任务暴露的预写 prompt 模板	预设 SQL 查询 / 保存的搜索

Tools 是最重要的原语,也是多数 MCP server 聚焦的。Resources 给 server 一个轻量方式来呈现数据(如数据库 schema、文档索引),无需模型调用工具。Prompts 让 server 打包任务特定的 prompt 模板,host 可注入对话。

写一个最小 MCP server

python — 用 mcp SDK 写最小 MCP server

from mcp.server.fastmcp import FastMCP
import subprocess, pathlib

mcp = FastMCP("dev-tools")

@mcp.tool()
def run_tests(test_path: str = "", flags: list[str] = []) -> str:
    """Run the project test suite. Pass test_path to run a specific file.
    Returns combined stdout+stderr from pytest."""
    cmd = ["python", "-m", "pytest"] + flags
    if test_path:
        cmd.append(test_path)
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout + result.stderr

@mcp.tool()
def list_changed_files() -> str:
    """Return files changed since the last git commit. Useful for scoping reviews."""
    result = subprocess.run(["git", "diff", "--name-only", "HEAD"],
                            capture_output=True, text=True)
    return result.stdout or "No changed files."

@mcp.resource("file://{path}")
def read_project_file(path: str) -> str:
    """Expose project files as resources for context loading."""
    return pathlib.Path(path).read_text()

if __name__ == "__main__":
    mcp.run()  # 默认经 stdio 提供服务

这个 server 暴露两个工具(run_tests 和 list_changed_files)和一个 resource(file://)。任何 MCP host——Claude Code、自定义 agent、Cursor——都能经 stdio 连上它,并经协议握手自动发现这些能力。host 侧无需定制集成代码。

MCP vs. 传统 function calling

传统 function calling(本文前面所讲)是按应用的:你在 agent 代码里定义工具、内联实现执行逻辑、把一切接起来。它能用,但跨应用不可组合。

维度	传统 function calling	MCP
复用性	按应用;每个 agent 重造同样的工具	server 在任何 MCP 兼容 host 上可复用
发现	工具写死在 agent 代码里	动态:host 在运行时经能力协商发现工具
部署	工具逻辑在 agent 进程内	server 是独立进程,可独立部署与扩展
生态	各团队专有	开放标准;预构建 MCP server 库不断增长
复杂度	对小而自包含的 agent 更简单	更多搭建;在规模化和需要复用时回报丰厚

对一个你同时掌控 LLM 循环和所有工具的小而单一用途的 agent,传统 function calling 更简单。MCP 的价值随你拥有更多 agent、更多工具,或想跨团队共享工具实现而复利增长。把它想成 REST vs. 直连数据库的权衡:对单个用例直连更简单;有许多客户端时标准接口胜出。

MCP vs. RAG:互补,而非竞争

检索增强生成(RAG)在生成前抓取相关文档并插入模型上下文。它是只读、被动的:模型不请求检索;应用替模型检索。MCP 工具是主动、按需的:模型决定何时调一个工具、要什么。

维度	RAG	MCP Tools
发起	应用在模型回合前预取	模型在其回合中运行时请求
副作用	无——只读	能写、删、调 API、跑命令
精确性	近似——按语义相似度检索	精确——模型指定精准参数
延迟	加到生成前延迟	加到生成中延迟(每次 tool call)
最适合	提供模型本没有的背景知识	采取行动、抓取精确数据、验证输出

实践中,成熟 agent 两者都用:用 RAG 给上下文播下背景知识(代码库概览、文档、近期 issue),用 MCP 工具做精确的运行时操作(读某个特定文件、跑某个特定测试、查某一行数据库记录)。RAG 减少探索所需的 tool call 数;工具处理一切需要精确或有副作用的事。

实践中的 MCP 生态

MCP 作为开放标准的一个关键好处,是预构建 server 会累积成一个可复用生态。截至 2026 年,常见可用的 MCP server 包括:

Filesystem——读写本地文件,可配置访问控制。Anthropic 的参考实现。
Git——列出改动文件、读 diff、创建 commit、管理分支。
GitHub / GitLab——创建/评审 PR、管理 issue、触发 CI 工作流。
PostgreSQL / SQLite——跑查询、检视 schema、列出表。
Browser——导航页面、点击元素、抽取内容(用于 web 自动化 agent)。
Slack / Linear / Jira——读写 issue、发消息、管理项目。
自定义内部服务——任何内部 API 都可包装成 MCP server,在组织内所有 agent 间共享。

实践含义:对常见工具,你可以从货架上取一个现成 MCP server,而非自己实现。对内部工具,你实现一次 MCP server,你技术栈里的每个 agent 自动获得访问权。

tool use 与 MCP 的安全考量

赋予模型在世界中采取行动的能力,既强大又危险——若粗心行事。安全 tool-use 设计的一些原则:

最小权限原则。只暴露 agent 任务真正需要的工具。评审代码的 agent 不需要 delete_database 工具。工具越少 = 攻击面越小。
校验所有输入。模型会幻觉出 tool 参数。执行前校验参数值,尤其对有不可逆副作用的操作(删、发、写)。
不可逆行动前确认。对高风险工具(发邮件、删记录、部署代码),要求一个人类确认步骤,而非自动执行。
沙箱化执行环境。在网络访问与文件系统范围受限的容器里运行 agent。被沙箱化的错误是被遏制的错误。
提防 prompt injection。若 agent 读取用户可控内容(上传的文件、网页),而该内容含指令(“忽略你的准则并……”),模型可能照做。把所有外部内容当作不可信数据,而非指令。
把一切打进审计日志。记录每次 tool call 及其参数和结果。这让 agent 行为可复现、可调试、事后可审计。

takeaway

function calling 是把文本模型变成行动者的东西。模式很简单——定义 tool schema、跑循环、扇出并行调用——但细节要紧:描述决定正确的工具选择,schema 约束保证可解析的输出,执行必须被校验并沙箱化。MCP 把这个模式提升为一个标准协议,催生一个任何合规 agent 无需定制集成即可消费的可复用工具生态。两者一起,构成每个生产 AI agent 的核心架构。

🎯 面试尖锐观点

在 API 层面解释 function calling 循环。模型发出一个 tool_use 块(名称 + 参数 JSON) → 你的代码执行真正的函数 → 你追加一条 tool_result 消息 → 模型继续。模型从不直接执行代码;它只发出由你的 controller 据以行动的意图。

为什么工具描述比 schema 更重要?schema 在模型已决定调用某工具后约束参数。描述才是模型读来决定是否调用该工具的依据。含糊的描述导致错误的工具选择,这是任何 schema 都修不了的——模型会用完全有效的参数调错工具。

MCP 解决了 function calling 解决不了的什么问题?function calling 是按应用的:每个 agent 独立重造同样的工具。MCP 标准化了工具实现(server)与 agent host 之间的接口,使一个工具写一次即可在任何 MCP 兼容 agent 上复用——正是 REST API 给 web 服务带来的那种可组合性收益。