Building Your First AI Agent: Tools, Memory, and the Loop That Makes It Real
A chatbot answers. An agent acts. The difference is three things — a tool layer, a memory layer, and a planning loop. Here is how to wire them together.
A chatbot answers. An agent acts. The difference between the two is not a different model — it is three pieces of plumbing around the model.
The first piece is the tool layer. A model on its own can only emit tokens. The moment you give it a tool — `search_email(query)`, `create_calendar_event(date, title)`, `query_database(sql)` — it goes from a writer to an operator. Every modern provider exposes this through structured tool calling: you describe each tool's name, parameters, and JSON schema, the model decides when to call it, and your code runs the actual function. The model never touches the network. Your code does. That separation is what keeps the agent safe.
The second piece is memory. Short-term memory is the message history you replay on every turn — the model has no native persistence. Long-term memory is a vector store you write to when something matters and read from when context is needed. The trap is treating the vector DB as a brain. It is a filing cabinet. The brain is whatever you put back into the prompt.
The third piece is the loop. A single model call is not an agent. An agent is a while-loop that says: think, act, observe, repeat — until the goal is met or a stop condition fires. Inside the loop the model emits a tool call, your runtime executes it, the result is appended to the conversation, and the model is invoked again with the new state. Five turns later the agent has booked a flight, sent a follow-up email, and updated a CRM row.
A minimal agent in pseudocode:
```
messages = [system_prompt, user_goal]
for step in range(MAX_STEPS):
response = model.generate(messages, tools=TOOLS)
if response.is_final:
return response.text
result = run_tool(response.tool_call)
messages.append(response)
messages.append({"role": "tool", "content": result})
```
Four things make this stop being a toy.
One — guardrails. Wrap every tool with input validation and an allowlist. The model will eventually try to read a file outside the working directory or run an SQL DELETE. Treat tool inputs as user input, not as model output.
Two — observability. Log every step: the prompt, the tool call, the tool result, the next prompt. When the agent goes off the rails — and it will — you need a trace, not a vibe.
Three — budget. Set a max-step count, a max-token count, and a max-wall-clock. Agents do not know how to give up. You do.
Four — evals. Pick five real tasks the agent should be able to do. Run them on every prompt change. The day you stop running evals is the day silent regressions start landing in production.
The model is the cheap part. The runtime around it is the product.
The first piece is the tool layer. A model on its own can only emit tokens. The moment you give it a tool — `search_email(query)`, `create_calendar_event(date, title)`, `query_database(sql)` — it goes from a writer to an operator. Every modern provider exposes this through structured tool calling: you describe each tool's name, parameters, and JSON schema, the model decides when to call it, and your code runs the actual function. The model never touches the network. Your code does. That separation is what keeps the agent safe.
The second piece is memory. Short-term memory is the message history you replay on every turn — the model has no native persistence. Long-term memory is a vector store you write to when something matters and read from when context is needed. The trap is treating the vector DB as a brain. It is a filing cabinet. The brain is whatever you put back into the prompt.
The third piece is the loop. A single model call is not an agent. An agent is a while-loop that says: think, act, observe, repeat — until the goal is met or a stop condition fires. Inside the loop the model emits a tool call, your runtime executes it, the result is appended to the conversation, and the model is invoked again with the new state. Five turns later the agent has booked a flight, sent a follow-up email, and updated a CRM row.
A minimal agent in pseudocode:
```
messages = [system_prompt, user_goal]
for step in range(MAX_STEPS):
response = model.generate(messages, tools=TOOLS)
if response.is_final:
return response.text
result = run_tool(response.tool_call)
messages.append(response)
messages.append({"role": "tool", "content": result})
```
Four things make this stop being a toy.
One — guardrails. Wrap every tool with input validation and an allowlist. The model will eventually try to read a file outside the working directory or run an SQL DELETE. Treat tool inputs as user input, not as model output.
Two — observability. Log every step: the prompt, the tool call, the tool result, the next prompt. When the agent goes off the rails — and it will — you need a trace, not a vibe.
Three — budget. Set a max-step count, a max-token count, and a max-wall-clock. Agents do not know how to give up. You do.
Four — evals. Pick five real tasks the agent should be able to do. Run them on every prompt change. The day you stop running evals is the day silent regressions start landing in production.
The model is the cheap part. The runtime around it is the product.