Building Production AI Agents with LangChain: A Practical Guide

Feb 25, 2026
7 min read
Building Production AI Agents with LangChain: A Practical Guide

Building Production AI Agents with LangChain: A Practical Guide

AI agents go beyond chatbots — they take action. They can search databases, call APIs, make decisions, and execute multi-step workflows without human intervention. But moving from prototype to production means handling errors gracefully, adding observability, and ensuring reliability when the LLM makes unexpected choices.

This guide covers building production-ready AI agents with LangChain, from basic tool integration to deployment patterns.

What Are AI Agents?

An agent is an LLM that can:

  • Reason: Analyze a user query and plan steps
  • Act: Call external tools (APIs, databases, functions)
  • Observe: Process tool results and decide next action
  • Iterate: Continue until task is complete

Example flow: User asks "What's my account balance and should I invest more this month?"

  1. Agent calls get_account_balance() tool
  2. Receives result: $5,200
  3. Agent calls get_monthly_expenses() tool
  4. Receives result: $3,800
  5. Agent reasons: surplus of $1,400, safe to invest
  6. Returns recommendation with data
Robotic automation in action — AI agents at work — Propelius Technologies
Photo by Kindel Media on Pexels

LangChain Agent Types

Agent TypeWhen to UseTool Support
ReActGeneral-purpose reasoningAny tool
OpenAI FunctionsStructured tool callingJSON schema tools
Plan-and-ExecuteMulti-step complex tasksAny tool
ConversationalStateful multi-turn chatMemory + tools

Recommendation: Use OpenAI Functions agent for production — it's the most reliable and cheapest (uses function calling API, not prompt engineering).

Building a Basic Agent

Step 1: Define Tools

from langchain.tools import tool
import requests

@tool
def search_company_data(query: str) -> str:
    """Search internal company database. Use for product info, pricing, policies."""
    # Call your API
    result = requests.post("https://api.internal.com/search", json={"query": query})
    return result.json()["answer"]

@tool
def send_email(to: str, subject: str, body: str) -> str:
    """Send email to a user. Use when user requests notification or update."""
    # Call email service
    requests.post("https://api.sendgrid.com/v3/mail/send", json={...})
    return f"Email sent to {to}"

@tool
def get_weather(location: str) -> str:
    """Get current weather for a location."""
    response = requests.get(f"https://api.weather.com/v1/current?location={location}")
    return response.json()["summary"]

Tool design rules:

  • Clear, verb-based names (search_company_data not company_searcher)
  • Detailed docstrings — LLM uses these to decide when to call
  • Type hints for parameters
  • Return strings (LLMs consume text best)

Step 2: Create the Agent

from langchain.agents import initialize_agent, AgentType
from langchain.llms import OpenAI

tools = [search_company_data, send_email, get_weather]

llm = OpenAI(model="gpt-4-turbo", temperature=0)

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    verbose=True,
    max_iterations=5,
    early_stopping_method="generate"
)

# Run the agent
response = agent.run("What's the weather in London and email me a summary?")
Autonomous AI delivery robots — agent-based automation — Propelius Technologies
Photo by Kindel Media on Pexels

Production-Ready Patterns

1. Error Handling and Retries

LLMs are non-deterministic. Agents can fail, call wrong tools, or get stuck in loops.

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def run_agent_with_retry(query: str):
    try:
        return agent.run(query)
    except Exception as e:
        logger.error(f"Agent failed: {e}")
        # Fallback to simple LLM call without tools
        return llm.predict(query)

2. Guardrails and Validation

Prevent agents from doing dangerous things:

from langchain.callbacks import BaseCallbackHandler

class SafetyCallback(BaseCallbackHandler):
    def on_tool_start(self, tool, input_str, **kwargs):
        # Block destructive operations
        if tool.name == "delete_database" and "production" in input_str:
            raise ValueError("Agent attempted to delete production database!")
        
        # Rate limit expensive tools
        if tool.name == "expensive_api" and self.call_count > 10:
            raise ValueError("Rate limit exceeded for expensive_api")

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    callbacks=[SafetyCallback()]
)

3. Observability and Logging

Track what agents are doing:

from langsmith import Client

# Initialize LangSmith for tracing
client = Client()

# All agent calls are automatically traced
with client.trace("user_query", tags=["production", "customer_support"]):
    result = agent.run("Help me with my order #12345")

# Log to your metrics system
logger.info("agent_execution", {
    "user_id": user_id,
    "query": query,
    "tools_called": agent.tools_used,
    "iterations": agent.iteration_count,
    "latency_ms": latency,
    "success": True
})

4. Cost Control

Agents can burn through tokens fast with iterative tool calling:

  • Set max_iterations: Cap at 3-5 to prevent runaway loops
  • Use cheaper models: GPT-3.5-turbo for simple tools, GPT-4 only for complex reasoning
  • Cache tool results: If user asks "weather in London" twice, don't call API twice
  • Monitor per-user spend: Alert when a user exceeds $10/day in LLM costs
Robotic agent performing automated task — Propelius Technologies
Photo by Kindel Media on Pexels

Advanced Patterns

Multi-Agent Systems

Specialize agents for different tasks:

  • Researcher agent: Searches docs, summarizes findings
  • Writer agent: Drafts emails, reports
  • Executor agent: Calls APIs, updates databases
  • Coordinator agent: Routes tasks to specialists

Use CrewAI or AutoGen for orchestration.

Human-in-the-Loop

Require approval for sensitive actions:

@tool
def send_invoice(customer_id: str, amount: float) -> str:
    """Send invoice to customer. Requires human approval."""
    # Save to approval queue
    approval_id = save_to_queue({"customer_id": customer_id, "amount": amount})
    return f"Invoice queued for approval (ID: {approval_id}). Awaiting human review."

Deployment Options

PlatformBest ForCost
AWS LambdaServerless, low traffic$0.20 per 1M requests
GCP Cloud RunContainerized agents$0.40 per 1M requests
Modal.comGPU-heavy workloads$0.30 per 1M requests
KubernetesHigh scale, full control$200-500/month base

FAQs

How much do production AI agents cost per request?

Expect $0.01-0.05 per agent execution with GPT-4, $0.001-0.01 with GPT-3.5-turbo. Multi-step agents (3-5 tool calls) can burn 5-10K tokens per run. Budget: $500-2000/month for 50K agent executions. Use GPT-3.5 for simple tasks, GPT-4 only when reasoning quality matters. Cache tool results aggressively.

What causes agents to fail or loop infinitely?

Common failures: (1) unclear tool descriptions confuse the LLM, (2) tool returns unexpected format (LLM can't parse), (3) no termination condition (loops calling same tool). Fix: write detailed tool docstrings, validate tool outputs, set max_iterations=5, add early_stopping. Monitor iteration counts — if >3 regularly, your tools are poorly designed.

When should you use an agent vs a simple chain?

Use chains when the workflow is fixed (A → B → C always). Use agents when the LLM needs to decide which tools to call and in what order. Example: "Summarize this doc" = chain. "Research this topic and email me a summary" = agent (needs to decide: search → synthesize → send email). Agents cost 2-5x more due to planning overhead.

How do you test AI agents reliably?

Unit test each tool independently with mocked LLM responses. Integration test: run agent with deterministic queries (temperature=0) and assert expected tool call sequence. Use LangSmith evals: define test cases (query + expected tool usage + success criteria), run nightly, track pass rate. Target: 85%+ consistency before production. Regression test after every LLM provider upgrade.

How do you reduce agent latency?

Agents are inherently slower (3-10s vs 1-2s for simple LLM calls). Optimize: (1) Parallel tool calls when possible (LangChain supports this with OpenAI Functions), (2) Use streaming responses to show progress, (3) Cache tool results, (4) Use faster models (GPT-3.5-turbo = 2x faster than GPT-4), (5) Pre-warm agent instances in serverless environments.

Need an expert team to provide digital solutions for your business?

Book A Free Call

Related Articles & Resources

Dive into a wealth of knowledge with our unique articles and resources. Stay informed about the latest trends and best practices in the tech industry.

View All articles
Get in Touch

Let's build somethinggreat together.

Tell us about your vision. We'll respond within 24 hours with a free AI-powered estimate.

🎁This month only: Free UI/UX Design worth $3,000
Takes just 2 minutes
* How did you hear about us?
or prefer instant chat?

Quick question? Chat on WhatsApp

Get instant responses • Just takes 5 seconds

Response in 24 hours
100% confidential
No commitment required
🛡️100% Satisfaction Guarantee — If you're not happy with the estimate, we'll refine it for free
Propelius Technologies

You bring the vision. We handle the build.

facebookinstagramLinkedinupworkclutch

© 2026 Propelius Technologies. All rights reserved.