
Tenant Data Isolation: Patterns and Anti-Patterns
Explore effective patterns and pitfalls of tenant data isolation in multi-tenant systems to enhance security and compliance.
Jul 30, 2025
Read More
AI agents go beyond chatbots — they take action. They can search databases, call APIs, make decisions, and execute multi-step workflows without human intervention. But moving from prototype to production means handling errors gracefully, adding observability, and ensuring reliability when the LLM makes unexpected choices.
This guide covers building production-ready AI agents with LangChain, from basic tool integration to deployment patterns.
An agent is an LLM that can:
Example flow: User asks "What's my account balance and should I invest more this month?"
get_account_balance() toolget_monthly_expenses() tool
| Agent Type | When to Use | Tool Support |
|---|---|---|
| ReAct | General-purpose reasoning | Any tool |
| OpenAI Functions | Structured tool calling | JSON schema tools |
| Plan-and-Execute | Multi-step complex tasks | Any tool |
| Conversational | Stateful multi-turn chat | Memory + tools |
Recommendation: Use OpenAI Functions agent for production — it's the most reliable and cheapest (uses function calling API, not prompt engineering).
from langchain.tools import tool
import requests
@tool
def search_company_data(query: str) -> str:
"""Search internal company database. Use for product info, pricing, policies."""
# Call your API
result = requests.post("https://api.internal.com/search", json={"query": query})
return result.json()["answer"]
@tool
def send_email(to: str, subject: str, body: str) -> str:
"""Send email to a user. Use when user requests notification or update."""
# Call email service
requests.post("https://api.sendgrid.com/v3/mail/send", json={...})
return f"Email sent to {to}"
@tool
def get_weather(location: str) -> str:
"""Get current weather for a location."""
response = requests.get(f"https://api.weather.com/v1/current?location={location}")
return response.json()["summary"]
Tool design rules:
search_company_data not company_searcher)from langchain.agents import initialize_agent, AgentType
from langchain.llms import OpenAI
tools = [search_company_data, send_email, get_weather]
llm = OpenAI(model="gpt-4-turbo", temperature=0)
agent = initialize_agent(
tools=tools,
llm=llm,
agent=AgentType.OPENAI_FUNCTIONS,
verbose=True,
max_iterations=5,
early_stopping_method="generate"
)
# Run the agent
response = agent.run("What's the weather in London and email me a summary?")
LLMs are non-deterministic. Agents can fail, call wrong tools, or get stuck in loops.
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def run_agent_with_retry(query: str):
try:
return agent.run(query)
except Exception as e:
logger.error(f"Agent failed: {e}")
# Fallback to simple LLM call without tools
return llm.predict(query)
Prevent agents from doing dangerous things:
from langchain.callbacks import BaseCallbackHandler
class SafetyCallback(BaseCallbackHandler):
def on_tool_start(self, tool, input_str, **kwargs):
# Block destructive operations
if tool.name == "delete_database" and "production" in input_str:
raise ValueError("Agent attempted to delete production database!")
# Rate limit expensive tools
if tool.name == "expensive_api" and self.call_count > 10:
raise ValueError("Rate limit exceeded for expensive_api")
agent = initialize_agent(
tools=tools,
llm=llm,
agent=AgentType.OPENAI_FUNCTIONS,
callbacks=[SafetyCallback()]
)
Track what agents are doing:
from langsmith import Client
# Initialize LangSmith for tracing
client = Client()
# All agent calls are automatically traced
with client.trace("user_query", tags=["production", "customer_support"]):
result = agent.run("Help me with my order #12345")
# Log to your metrics system
logger.info("agent_execution", {
"user_id": user_id,
"query": query,
"tools_called": agent.tools_used,
"iterations": agent.iteration_count,
"latency_ms": latency,
"success": True
})
Agents can burn through tokens fast with iterative tool calling:
Specialize agents for different tasks:
Use CrewAI or AutoGen for orchestration.
Require approval for sensitive actions:
@tool
def send_invoice(customer_id: str, amount: float) -> str:
"""Send invoice to customer. Requires human approval."""
# Save to approval queue
approval_id = save_to_queue({"customer_id": customer_id, "amount": amount})
return f"Invoice queued for approval (ID: {approval_id}). Awaiting human review."
| Platform | Best For | Cost |
|---|---|---|
| AWS Lambda | Serverless, low traffic | $0.20 per 1M requests |
| GCP Cloud Run | Containerized agents | $0.40 per 1M requests |
| Modal.com | GPU-heavy workloads | $0.30 per 1M requests |
| Kubernetes | High scale, full control | $200-500/month base |
Expect $0.01-0.05 per agent execution with GPT-4, $0.001-0.01 with GPT-3.5-turbo. Multi-step agents (3-5 tool calls) can burn 5-10K tokens per run. Budget: $500-2000/month for 50K agent executions. Use GPT-3.5 for simple tasks, GPT-4 only when reasoning quality matters. Cache tool results aggressively.
Common failures: (1) unclear tool descriptions confuse the LLM, (2) tool returns unexpected format (LLM can't parse), (3) no termination condition (loops calling same tool). Fix: write detailed tool docstrings, validate tool outputs, set max_iterations=5, add early_stopping. Monitor iteration counts — if >3 regularly, your tools are poorly designed.
Use chains when the workflow is fixed (A → B → C always). Use agents when the LLM needs to decide which tools to call and in what order. Example: "Summarize this doc" = chain. "Research this topic and email me a summary" = agent (needs to decide: search → synthesize → send email). Agents cost 2-5x more due to planning overhead.
Unit test each tool independently with mocked LLM responses. Integration test: run agent with deterministic queries (temperature=0) and assert expected tool call sequence. Use LangSmith evals: define test cases (query + expected tool usage + success criteria), run nightly, track pass rate. Target: 85%+ consistency before production. Regression test after every LLM provider upgrade.
Agents are inherently slower (3-10s vs 1-2s for simple LLM calls). Optimize: (1) Parallel tool calls when possible (LangChain supports this with OpenAI Functions), (2) Use streaming responses to show progress, (3) Cache tool results, (4) Use faster models (GPT-3.5-turbo = 2x faster than GPT-4), (5) Pre-warm agent instances in serverless environments.
Need an expert team to provide digital solutions for your business?
Book A Free CallDive into a wealth of knowledge with our unique articles and resources. Stay informed about the latest trends and best practices in the tech industry.
View All articlesTell us about your vision. We'll respond within 24 hours with a free AI-powered estimate.
© 2026 Propelius Technologies. All rights reserved.