A language model, by itself, is a sophisticated text transformer: it reads text and produces text. Everything we call an "agent" — a system that can read files, query databases, call APIs, run code, and interact with the world — is a language model equipped with tools. Function calling (Anthropic calls it "tool use"; OpenAI calls it "function calling"; they are the same concept) is the mechanism that bridges the gap between a model that can describe an action and one that can actually take it.
Understanding function calling deeply — how schemas work, how the execution loop is structured, what parallel calling looks like, and how it relates to structured output — is foundational for anyone building LLM-powered applications. Then there is a newer layer on top: the Model Context Protocol (MCP), an open standard from Anthropic that standardizes how agents connect to tools across a diverse ecosystem. MCP is to agents what REST is to web services: a common interface that makes components interoperable without bespoke integrations. This article covers both in full depth.
- Function calling is the execution layer. The model doesn't run code — it emits a structured tool call (name + arguments); your controller executes the actual function and feeds the result back. The model only sees inputs and outputs, never the implementation.
- JSON Schema defines the contract. Every tool is described by a name, a description the model uses to decide when to call it, and a JSON Schema that constrains the arguments. Good descriptions are critical — the model decides which tool to call based on them.
- The tool-use loop is multi-turn by design. A single task may involve dozens of tool calls, each observing the previous result. The loop ends when the model emits a text response instead of a tool call.
- Parallel tool calls cut latency. Models can emit multiple tool calls in a single turn; you fan them out concurrently and return all results at once, reducing wall-clock time for independent reads.
- MCP standardizes the tool surface. Instead of writing a custom integration for every tool in every agent, MCP servers expose tools via a standard protocol — any MCP-compatible host (Claude Code, Cursor, etc.) can connect to any MCP server without custom glue code.
- MCP and RAG solve different problems. RAG retrieves read-only context; MCP provides stateful, actionable tools. They are complementary, not alternatives.
Function calling is the mechanism: model outputs a tool call → your code executes it → result goes back → model continues. MCP is the standardization layer: a protocol that makes tools (as MCP servers) reusable across any compliant agent host. Master both to build agents that are both powerful and maintainable.
How Function Calling Works: The Mechanism
The key mental model: the model does not execute functions. It never touches your database, filesystem, or any external API. What the model does is produce a structured output that says "I would like to call function X with arguments Y." Your application code receives that output, validates it, runs the actual function, and returns the result to the model as a new message. The model then decides what to do next.
This design is deliberate. Execution happens in your code, under your control, with your permissions and your error handling. The model is sandboxed to producing intents; you are responsible for acting on them. This separation makes function-calling systems auditable, testable, and safe — you can mock tool responses in tests, log every call, rate-limit expensive tools, and require confirmation before side-effecting operations, all without touching the model at all.
The Three-Phase Interaction
- Model produces a tool call. When the model decides it needs a tool, it emits a structured tool-use block instead of (or in addition to) text. The block contains the tool name and a JSON object of arguments that conform to the tool's schema.
- Controller executes the tool. Your code deserializes the tool-use block, validates the arguments, calls the actual function, and collects the result (success or error).
- Result is fed back. You append the tool result as a new message (role:
toolon Anthropic, role:toolwith atool_call_idon OpenAI) and call the API again. The model sees the result and decides what to do next — either make another tool call or produce a final text response.
Defining Tools with JSON Schema
Every tool is defined by three things: a name the model uses to identify it, a description the model reads to decide when to use it, and an input_schema (JSON Schema) that defines what arguments are valid. All three matter.
tools = [
{
"name": "search_codebase",
"description": """Search the repository for files or code matching a query.
Use this to find relevant files before reading them. Returns a list of
file paths and matching line snippets. Prefer this over read_file when
you don't know which file contains what you need.""",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query — function name, variable, error message, or concept.",
},
"file_pattern": {
"type": "string",
"description": "Optional glob pattern to restrict search, e.g. '*.py' or 'src/**/*.ts'.",
},
"max_results": {
"type": "integer",
"description": "Maximum number of results to return. Default 10.",
"default": 10,
},
},
"required": ["query"],
},
},
{
"name": "run_tests",
"description": """Run the test suite and return results. Use after making code changes
to verify correctness. Returns pass/fail status and any error output.""",
"input_schema": {
"type": "object",
"properties": {
"test_path": {
"type": "string",
"description": "Specific test file or directory. Omit to run all tests.",
},
"flags": {
"type": "array",
"items": {"type": "string"},
"description": "Additional pytest flags, e.g. ['-x', '--tb=short'].",
},
},
"required": [],
},
},
]
Writing Descriptions the Model Will Use Correctly
The model uses your description — not the schema — to decide which tool to call. Descriptions are prompt text, and prompt quality matters here as much as anywhere. Weak descriptions lead to wrong tool choices:
- Too vague:
"Search for things."— the model does not know when to use this versus read_file or grep. - Too implementation-focused:
"Runs ripgrep with -r flag."— the model cannot infer from this when a search is the right move. - Good: Describe what the tool does, what inputs it expects conceptually, when to prefer it over alternatives, and what it returns. Treat the description as instruction to a junior engineer.
Similarly, property descriptions in the schema matter. "query" with no description is underspecified; "The search query — function name, variable, error message, or concept" tells the model how to formulate the input.
The Tool-Use Loop in Detail
Let us trace a complete multi-tool interaction to make the mechanics concrete. The task: "Find and fix the bug causing test_auth to fail."
import anthropic, json
client = anthropic.Anthropic()
def execute_tool(name: str, args: dict) -> str:
# Dispatch to actual implementations
if name == "run_tests":
import subprocess
cmd = ["python", "-m", "pytest"] + args.get("flags", [])
if "test_path" in args:
cmd.append(args["test_path"])
result = subprocess.run(cmd, capture_output=True, text=True)
return result.stdout + result.stderr
elif name == "read_file":
return open(args["path"]).read()
elif name == "write_file":
with open(args["path"], "w") as f:
f.write(args["content"])
return f"Written {len(args['content'])} chars to {args['path']}"
return f"Unknown tool: {name}"
messages = [{"role": "user", "content": "Find and fix the bug causing test_auth to fail."}]
while True:
response = client.messages.create(
model="claude-opus-4-5", max_tokens=4096,
tools=tools, messages=messages,
)
# Append assistant turn to conversation
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
# Model produced a final text response — done
print(response.content[0].text)
break
# Collect all tool calls in this turn (may be multiple)
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
# Feed results back to model
messages.append({"role": "user", "content": tool_results})
Notice that the loop is completely generic. It does not know anything about the specific task — that knowledge lives in the model. The controller only knows how to: (1) call the API, (2) check whether the model is done, (3) execute any tool calls it finds, and (4) return the results. This is the standard pattern for every function-calling application.
Parallel Tool Calls: Eliminating Unnecessary Latency
In the basic loop above, if the model decides to read three files, it does so in three sequential turns: request file A, get A, request file B, get B, request file C, get C. That is 3× the round-trip time needed. Modern models support emitting multiple tool calls in a single turn, which your controller should fan out concurrently.
import asyncio, anthropic
async def execute_tool_async(name: str, args: dict) -> str:
# Async versions of tool implementations
if name == "read_file":
import aiofiles
async with aiofiles.open(args["path"]) as f:
return await f.read()
# ... other tools
async def handle_tool_calls(tool_use_blocks):
# Fan out ALL tool calls in this turn concurrently
tasks = [
execute_tool_async(block.name, block.input)
for block in tool_use_blocks
if block.type == "tool_use"
]
results = await asyncio.gather(*tasks) # all run in parallel
return [
{
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
}
for block, result in zip(
[b for b in tool_use_blocks if b.type == "tool_use"],
results
)
]
# If model requests read_file("auth.py"), read_file("utils.py"), read_file("tests/test_auth.py")
# → all three reads happen concurrently; latency = max(read_time) not sum(read_time)
The speedup is real: reading 5 files in parallel takes roughly as long as reading 1 file. For agents that do extensive exploration (reading many files to understand a codebase), parallel tool calls can cut exploration time by 3–5×.
When Parallel Calls Are Not Safe
Fan-out is only correct for independent operations. If tool call B depends on the result of tool call A (e.g., "search for a file, then read the result"), they must be sequential — B uses information from A's result. The model usually handles this correctly by emitting sequential turns for dependent calls. Your controller's job is to execute whatever the model emits in a single turn concurrently, without trying to impose its own ordering on independent calls.
Tool Use as Structured Output
Tool use is not just for agents — it is also the most reliable way to extract structured output from a model. When you want the model to return a specific data shape (for parsing by downstream code), defining that shape as a tool schema and setting tool_choice to force that tool is far more reliable than asking the model to "return JSON."
| Method | Syntax reliability | Schema reliability | Use when |
|---|---|---|---|
| Prompt: "return JSON" | 80–95% | Low | Quick experiments only |
| JSON mode | 100% | Low (valid JSON, wrong shape) | When any valid JSON is acceptable |
| Forced tool call | 100% | High (schema-constrained) | Production extraction pipelines |
The forced tool call approach works because the model is trained to produce arguments that conform to the schema. The API validates the output before returning it — a malformed tool call would be a model error, not an application error. You can treat the result as a typed dict without defensive parsing.
What Is MCP?
Function calling as described above requires you to write the tool definitions (schemas), implement the execution logic, and wire everything together in your application. Every agent application reimplements this wheel independently. If you want your agent to talk to GitHub, you write a GitHub tool. If you want it to talk to Postgres, you write a Postgres tool. If someone else builds a different agent, they write their own GitHub tool. There is no reuse.
The Model Context Protocol (MCP), released by Anthropic in late 2024, addresses this directly. MCP is an open standard that defines a common protocol for exposing tools (and other capabilities) that any compliant host can consume without custom integration work. It is to agents what USB is to peripherals: a standard interface that makes components interoperable.
The Core Idea in One Sentence
MCP separates the tool implementation (an MCP server) from the tool consumer (an MCP host), with a standard protocol in between, so that any server works with any host.
MCP Architecture: Host, Client, Server
MCP defines three roles:
- MCP Host: the application that runs the LLM and manages the agentic loop. Claude Code, Cursor, a custom agent — any application that wants to use tools can be an MCP host. The host contains an MCP client for each server it connects to.
- MCP Client: a thin layer inside the host that maintains a persistent connection to one MCP server and handles the protocol framing (connection, capability negotiation, request/response).
- MCP Server: a process that exposes a set of tools (and optionally resources and prompts) over the MCP protocol. The server is where the actual functionality lives: it might wrap a GitHub API client, a database connection, a filesystem layer, a web browser, or anything else.
┌─────────────────────────────────────────────┐
│ MCP HOST │
│ (Claude Code, Cursor, custom agent app) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│ │ LLM │ │ MCP │ │ MCP │ │
│ │ (Claude │◄──│ Client │ │ Client │ │
│ │ /GPT) │ │ A │ │ B │ │
│ └──────────┘ └────┬─────┘ └────┬────┘ │
└────────────────────────────────────────────-┘
│ │
MCP Protocol MCP Protocol
(JSON-RPC 2.0) (JSON-RPC 2.0)
│ │
┌────────▼───────┐ ┌─▼──────────────┐
│ MCP Server A │ │ MCP Server B │
│ (GitHub API) │ │ (PostgreSQL) │
│ │ │ │
│ tools: │ │ tools: │
│ - list_repos │ │ - run_query │
│ - create_pr │ │ - list_tables │
│ - get_issues │ │ - describe_db │
└────────────────┘ └────────────────┘
The transport layer is JSON-RPC 2.0, typically over stdio (for local servers) or HTTP with Server-Sent Events (for remote servers). The framing is standardized; what varies is the tools each server exposes and their schemas — which the server advertises at connection time via a capability negotiation handshake.
The Three Primitives MCP Servers Expose
| Primitive | What it is | Analogy |
|---|---|---|
| Tools | Functions the model can invoke; take arguments, have side effects, return results | REST API endpoints (POST/PUT/DELETE) |
| Resources | Read-only data the host can load into context; identified by URI | REST API endpoints (GET) or files |
| Prompts | Pre-written prompt templates the server exposes for common tasks | Canned SQL queries / saved searches |
Tools are the most important primitive and what most MCP servers focus on. Resources provide a lightweight way for a server to surface data (e.g., a database schema, a document index) without requiring the model to invoke a tool. Prompts allow servers to package up task-specific prompt templates that hosts can inject into the conversation.
Writing a Minimal MCP Server
from mcp.server.fastmcp import FastMCP
import subprocess, pathlib
mcp = FastMCP("dev-tools")
@mcp.tool()
def run_tests(test_path: str = "", flags: list[str] = []) -> str:
"""Run the project test suite. Pass test_path to run a specific file.
Returns combined stdout+stderr from pytest."""
cmd = ["python", "-m", "pytest"] + flags
if test_path:
cmd.append(test_path)
result = subprocess.run(cmd, capture_output=True, text=True)
return result.stdout + result.stderr
@mcp.tool()
def list_changed_files() -> str:
"""Return files changed since the last git commit. Useful for scoping reviews."""
result = subprocess.run(["git", "diff", "--name-only", "HEAD"],
capture_output=True, text=True)
return result.stdout or "No changed files."
@mcp.resource("file://{path}")
def read_project_file(path: str) -> str:
"""Expose project files as resources for context loading."""
return pathlib.Path(path).read_text()
if __name__ == "__main__":
mcp.run() # serves over stdio by default
This server exposes two tools (run_tests and list_changed_files) and one resource (file://). Any MCP host — Claude Code, a custom agent, Cursor — can connect to this server over stdio and discover these capabilities automatically via the protocol handshake. No custom integration code needed on the host side.
MCP vs. Traditional Function Calling
Traditional function calling (as covered earlier in this article) is per-application: you define tools in your agent's code, implement the execution logic inline, and wire everything together. It works, but it does not compose across applications.
| Dimension | Traditional function calling | MCP |
|---|---|---|
| Reusability | Per-application; each agent reimplements the same tools | Servers are reusable across any MCP-compatible host |
| Discovery | Tools hardcoded into agent code | Dynamic: host discovers tools at runtime via capability negotiation |
| Deployment | Tool logic is inside the agent process | Servers are separate processes, independently deployable and scalable |
| Ecosystem | Proprietary per-team | Open standard; growing library of pre-built MCP servers |
| Complexity | Simpler for small, self-contained agents | More setup; pays off at scale and when reuse matters |
For a small, single-purpose agent where you control both the LLM loop and all the tools, traditional function calling is simpler. MCP's value compounds as you have more agents, more tools, or want to share tool implementations across teams. Think of it like the REST vs. direct database access tradeoff: direct is simpler for one use case; a standard interface wins when you have many clients.
MCP vs. RAG: Complementary, Not Competing
Retrieval-Augmented Generation (RAG) fetches relevant documents and inserts them into the model's context before generation. It is read-only and passive: the model does not request retrieval; the application retrieves for the model. MCP tools are active and on-demand: the model decides when to call a tool and what to ask for.
| Dimension | RAG | MCP Tools |
|---|---|---|
| Initiation | Application pre-fetches before model turn | Model requests at runtime during its turn |
| Side effects | None — read-only | Can write, delete, call APIs, run commands |
| Specificity | Approximate — retrieval by semantic similarity | Exact — model specifies precise arguments |
| Latency | Adds to pre-generation latency | Adds to mid-generation latency (each tool call) |
| Best for | Providing background knowledge the model didn't have | Taking actions, fetching precise data, verifying output |
In practice, sophisticated agents use both: RAG to seed the context with background knowledge (codebase overview, documentation, recent issues), and MCP tools for precise runtime operations (reading a specific file, running a specific test, querying a specific database row). RAG reduces the number of tool calls needed for exploration; tools handle everything that requires precision or side effects.
The MCP Ecosystem in Practice
A key benefit of MCP being an open standard is that pre-built servers accumulate into a reusable ecosystem. As of 2026, commonly available MCP servers include:
- Filesystem — read/write local files with configurable access controls. The reference implementation from Anthropic.
- Git — list changed files, read diffs, create commits, manage branches.
- GitHub / GitLab — create/review PRs, manage issues, trigger CI workflows.
- PostgreSQL / SQLite — run queries, inspect schema, list tables.
- Browser — navigate pages, click elements, extract content (for web automation agents).
- Slack / Linear / Jira — read and create issues, post messages, manage projects.
- Custom internal services — any internal API can be wrapped as an MCP server and shared across all agents in an organization.
The practical implication: for common tools, you can pull an existing MCP server off the shelf rather than implementing it yourself. For internal tools, you implement an MCP server once and every agent in your stack gains access to it automatically.
Security Considerations for Tool Use and MCP
Giving a model the ability to take actions in the world is powerful and dangerous if done carelessly. Some principles for secure tool-use design:
- Principle of least privilege. Only expose tools the agent actually needs for its task. An agent that reviews code does not need a
delete_databasetool. Fewer tools = smaller attack surface. - Validate all inputs. The model can hallucinate tool arguments. Validate argument values before executing, especially for operations with irreversible side effects (deletes, sends, writes).
- Confirm before irreversible actions. For high-stakes tools (sending emails, deleting records, deploying code), require a human confirmation step rather than executing automatically.
- Sandbox execution environments. Run agents in containers with restricted network access and filesystem scope. A sandboxed mistake is a contained mistake.
- Watch for prompt injection. If the agent reads user-controlled content (uploaded files, web pages) and that content contains instructions ("ignore your guidelines and..."), the model may follow them. Treat all external content as untrusted data, not instructions.
- Audit log everything. Log every tool call with its arguments and result. This makes agent behavior reproducible, debuggable, and auditable after the fact.
Function calling is what turns a text model into an actor. The pattern is simple — define tool schemas, run the loop, fan out parallel calls — but the details matter: descriptions determine correct tool choice, schema constraints ensure parseable output, and execution must be validated and sandboxed. MCP lifts this pattern to a standard protocol, enabling a reusable ecosystem of tools that any compliant agent can consume without custom integration. Together, they form the core architecture of every production AI agent.
Explain the function calling loop at the API level. Model emits a tool_use block (name + arguments JSON) → your code executes the actual function → you append a tool_result message → model continues. The model never executes code directly; it only emits intents that your controller acts on.
Why is the tool description more important than the schema? The schema constrains the arguments once the model has decided to call a tool. The description is what the model reads to decide whether to call the tool at all. A vague description leads to wrong tool selection, which no schema can fix — the model will call the wrong tool with perfectly valid arguments.
What problem does MCP solve that function calling doesn't? Function calling is per-application: every agent reimplements the same tools independently. MCP standardizes the interface between tool implementations (servers) and agent hosts so a tool is written once and reusable across any MCP-compatible agent — the same composability benefit that REST APIs gave to web services.