AI

Build an AI Agent With Claude

By James KillickOctober 26, 2025
Build an AI Agent With Claude

TL;DR: You can build an AI agent with Claude using Anthropic's API and a few lines of Python. The key decisions are your agent's tools, memory model, and how it calls itself. Get those three right and the rest follows.

You can build an AI agent with Claude in a weekend. Give it tools, give it memory, give it a loop, and it starts doing real work.

The hard part is not the code. It is knowing what decisions to make before you write the first line.

What is an AI agent, actually?

An AI agent is a model that takes actions, not just generates text. It reads inputs, picks a tool, runs that tool, reads the result, then decides what to do next.

Claude is well-suited for this because it handles long context windows reliably and follows tool call schemas consistently. It does not hallucinate tool calls the way smaller models do.

A basic Claude agent has three parts:

  • A system prompt that defines its job
  • A set of tools it can call (functions you write)
  • A loop that keeps running until the job is done or it hits a stop condition

That is the whole architecture. Build from there.

What tools do you give it?

Tools are the only way an agent affects the world outside the conversation window. They are Python functions, and you pass their schemas to Claude so it knows what each one does and how to call it.

Common tools for a starter agent:

  • Web search (fetch a URL, run a search query)
  • File read and write
  • Code execution (sandboxed)
  • Database queries
  • API calls to your own services

Start with one or two tools. Adding more tools without clear job definitions turns the agent into a sprawl that is hard to debug.

A useful rule: if you cannot write a single sentence describing what the tool does and when the agent should use it, the tool is not ready.

How does memory work?

This is where most people get stuck. Claude does not remember anything between API calls unless you put that context back in the next call.

You have three options:

In-context memory puts the conversation history in every request. Simple, works fine for short sessions. Expensive at scale because you pay for every token in the history.

External memory stores key facts in a database and retrieves them at the start of each call. A vector store (like Pinecone) or a simple key-value store works. The agent writes notes to itself and reads them back.

Episodic memory logs what the agent did in past sessions so it can pick up where it left off. Useful for long-running tasks that span hours or days.

For most first agents, start with in-context memory. Add an external store when the context window cost becomes a problem.

What does the agent loop look like?

Here is the core pattern in Python:

```python

import anthropic

client = anthropic.Anthropic()

def run_agent(task: str, tools: list, max_steps: int = 10):

messages = [{"role": "user", "content": task}]

for step in range(max_steps):

response = client.messages.create(

model="claude-opus-4-5",

max_tokens=4096,

tools=tools,

messages=messages

)

if response.stop_reason == "end_turn":

return response.content

if response.stop_reason == "tool_use":

tool_results = []

for block in response.content:

if block.type == "tool_use":

result = call_tool(block.name, block.input)

tool_results.append({

"type": "tool_result",

"tool_use_id": block.id,

"content": str(result)

})

messages.append({"role": "assistant", "content": response.content})

messages.append({"role": "user", "content": tool_results})

return None

```

The loop runs until Claude says it is done (`end_turn`) or you hit `max_steps`. Always set a max steps limit. Without one, a confused agent will run forever and burn through API credits.

The full API reference for tool use and agentic patterns is in the Anthropic documentation, updated regularly as new model versions ship.

How do you stop it from going off the rails?

Agents fail in predictable ways. They pick the wrong tool, they loop on an error without stopping, or they misread a tool result and take a bad action.

Three things that help:

Hard stop conditions. Set a step limit and a token budget. If either trips, the agent stops and flags for human review.

Tool call validation. Before running any tool call, check that the inputs are sane. A web scraper tool with an empty URL should raise an error, not try to fetch nothing.

Human in the loop checkpoints. For anything that writes to a database, sends an email, or calls an external API, require a confirmation step. The agent proposes the action, a human approves, then it runs.

This is how we build agents at Devwiz. We look at AI agents for business as a build problem, not a research problem. The architecture decisions above are the ones that actually determine whether an agent ships or stays in a notebook.

What model should you use?

For most agent work, Claude Opus 4 gives you the best reasoning on complex multi-step tasks. Claude Sonnet 4 is faster and cheaper, which matters when your loop runs 20-30 steps per task.

A practical approach: prototype with Opus until your tool schemas and system prompt are stable. Switch to Sonnet for production when you know the task fits.

For long-running background tasks, look at Claude's extended thinking mode. It lets the model reason through hard problems before committing to a tool call. Useful for research agents that need to evaluate multiple sources before writing a summary.

What can you actually build with this?

Here are some real categories where Claude agents deliver results today:

  • Research agents that pull from multiple sources and write briefs
  • Code review agents that read a PR, run tests, and leave inline comments
  • Customer support agents that handle Tier 1 queries and escalate edge cases
  • Data pipeline agents that transform, validate, and load files on a schedule
  • Internal knowledge agents that answer questions against your own docs

We built a care coordination platform, CARED, that uses an agent layer to match clients with support workers based on complex eligibility rules. The agent reads client profiles, checks availability, and surfaces ranked matches. What used to take a coordinator 30 minutes now runs in seconds.

That is the kind of work where an AI agent earns its place.

When to get help building it

If you are prototyping for yourself, start with the loop above and add tools as you need them. The Anthropic documentation covers tool schemas, error handling, and the newer features like prompt caching in detail.

If you are building for a business, the decisions get harder fast. Which tools get access to which data? How do you audit what the agent did? What happens when it fails in production?

That is where having a team who has shipped 200+ apps since 2015 makes a difference. We work with clients like NSW Government, Briometrix, Vivid, and Huskee on this kind of build.

The team at AI Orchestrators also runs structured programs for businesses that want to build AI agent capability in-house, if that is a better fit for your situation.

If you want to build something production-ready, take a look at our AI app development work and get in touch. We can scope it out from there.

---

FAQ

Q: Do I need a backend to build an AI agent with Claude?

A: You need somewhere to run your Python or JavaScript code, but it does not have to be a full backend. A simple script or a serverless function is enough to get started. If the agent needs to persist memory or handle concurrent users, you will want a proper backend, but that is a later decision.

Q: How much does it cost to run a Claude agent?

A: Cost depends on model choice and how many tokens flow through each loop step. Claude Sonnet 4 runs at $3 per million input tokens and $15 per million output tokens (check Anthropic pricing for current rates). A 20-step research agent might use 50,000-100,000 tokens per run. Start with a token budget and monitor it before scaling.

Q: What is the difference between an AI agent and a chatbot?

A: A chatbot responds to messages. An agent takes actions. An agent can call APIs, read and write files, run code, and chain multiple steps together to complete a task. It acts on the world, not just inside a conversation window.

Q: How do I stop my agent from making mistakes in production?

A: Set hard stop conditions (step limit, token budget), validate all tool inputs before running them, and put a human approval step on any action that writes data or calls external services. Log every tool call and result so you can audit what happened. Start with low-stakes tasks and raise the autonomy level as confidence grows.

Q: Can Claude agents work with my existing software?

A: Yes. Any system with an API or that can be called from Python can become a tool. We have connected Claude agents to CRMs, ERPs, databases, custom web apps, and third-party services. The tool schema approach means you can wrap almost any callable function and give the agent access to it.

Frequently asked questions

Do I need a backend to build an AI agent with Claude?

You need somewhere to run your Python or JavaScript code, but it does not have to be a full backend. A simple script or a serverless function is enough to get started. If the agent needs to persist memory or handle concurrent users, you will want a proper backend, but that is a later decision.

How much does it cost to run a Claude agent?

Cost depends on model choice and how many tokens flow through each loop step. Claude Sonnet 4 runs at $3 per million input tokens and $15 per million output tokens (check Anthropic pricing for current rates). A 20-step research agent might use 50,000-100,000 tokens per run. Start with a token budget and monitor it before scaling.

What is the difference between an AI agent and a chatbot?

A chatbot responds to messages. An agent takes actions. An agent can call APIs, read and write files, run code, and chain multiple steps together to complete a task. It acts on the world, not just inside a conversation window.

How do I stop my agent from making mistakes in production?

Set hard stop conditions (step limit, token budget), validate all tool inputs before running them, and put a human approval step on any action that writes data or calls external services. Log every tool call and result so you can audit what happened. Start with low-stakes tasks and raise the autonomy level as confidence grows.

Can Claude agents work with my existing software?

Yes. Any system with an API or that can be called from Python can become a tool. We have connected Claude agents to CRMs, ERPs, databases, custom web apps, and third-party services. The tool schema approach means you can wrap almost any callable function and give the agent access to it.

About James Killick

James is a co-founder of Devwiz and an AI product specialist. Since 2015 he has helped ship 200+ apps for founders, businesses and government, including work for NSW Government, Briometrix and Huskee. He builds AI-first platforms and writes about turning a proven program into software. He also hosts the Up in the AI podcast.

jameskillick.co · LinkedIn · AI Orchestrators

Tags: AI Agents