
Tenant Data Isolation: Patterns and Anti-Patterns
Explore effective patterns and pitfalls of tenant data isolation in multi-tenant systems to enhance security and compliance.
Jul 30, 2025
Read More
AI agents are powerful but unpredictable. They can hallucinate facts, generate inappropriate content, leak sensitive data, or violate brand guidelines — all while sounding confident and helpful. The more autonomy you give them, the more critical guardrails become. Without them, you're one viral screenshot away from a PR disaster or regulatory investigation.
At Propelius Technologies, we build AI agents with safety layers baked in from day one. This guide covers the technical and policy frameworks to keep your AI agents safe, compliant, and on-brand.
Guardrails are safety mechanisms that constrain AI behavior. They detect and prevent unwanted outputs before they reach users. Think of them as automated quality control plus policy enforcement.
Risk: AI invents facts, cites non-existent sources, or confidently states falsehoods.
Real examples:
Guardrails:
Risk: Agent generates hate speech, violence, self-harm instructions, or NSFW content.
Guardrails:
Risk: Agent exposes PII, proprietary data, API keys, or confidential information.
Guardrails:
Risk: Users trick the agent into ignoring instructions or performing unauthorized actions.
Examples:
Guardrails:
Risk: Agent sounds unprofessional, uses wrong terminology, or contradicts brand values.
Guardrails:
Check user input before it reaches the AI.
def validate_input(user_message):
# Check length
if len(user_message) > 5000:
return False, "Message too long"
# Detect jailbreak attempts
if detect_jailbreak(user_message):
return False, "Prohibited content"
# Spam/abuse detection
if is_spam(user_message):
return False, "Spam detected"
return True, None
Instruct the model on safety boundaries.
You are a customer support assistant for Acme Corp.
RULES:
1. Never share PII or internal data
2. If you don't know, say "I don't know" — never guess
3. For refunds >$100, say "Let me transfer you to a specialist"
4. Maintain professional, friendly tone
5. Cite the knowledge base when answering policy questions
Check AI response before showing to user.
def filter_output(ai_response):
# Content moderation
moderation = openai.Moderation.create(input=ai_response)
if moderation['results'][0]['flagged']:
return "I apologize, I can't provide that information."
# PII detection and redaction
ai_response = redact_pii(ai_response)
# Check for prohibited phrases
if contains_prohibited(ai_response):
return "I apologize, I can't provide that information."
return ai_response
For agents that take actions (send emails, charge payments, modify data), add approval gates.
def execute_action(action):
# High-risk actions require human approval
if action['type'] == 'refund' and action['amount'] > 100:
return request_human_approval(action)
# Spending limits
if action['type'] == 'purchase' and action['amount'] > 1000:
return "Exceeds authorization limit"
# Dry run for destructive actions
if action['type'] == 'delete':
log_action(action)
return confirm_deletion(action)
return perform_action(action)
| Tool | Purpose | Pricing |
|---|---|---|
| OpenAI Moderation API | Content filtering (hate, violence, sexual) | Free |
| Guardrails AI | Schema validation, PII detection, custom rules | Open-source + paid |
| NeMo Guardrails (NVIDIA) | Programmable guardrails, safety rails | Open-source |
| Lakera Guard | Prompt injection detection, jailbreak prevention | $99+/month |
| Azure AI Content Safety | Multi-category safety, custom blocklists | $1-4/1K texts |
| AWS Comprehend PII | PII detection and redaction | $0.0001/unit |
| Anthropic Claude Safety | Built-in constitutional AI safety | Included |
Have humans or automated systems try to break your guardrails.
Test cases:
Track in production:
Yes, but minimally. Input validation adds <10ms. Output filtering (moderation API) adds 100-300ms. For most applications, this is acceptable. For latency-critical use cases, run guardrails async and show response immediately with post-hoc review.
Start strict, then relax based on data. High false positive rate (blocking legitimate requests) hurts UX. Monitor blocked responses and adjust thresholds. Use confidence scores — block only high-confidence violations.
No system is perfect. Determined attackers will find edge cases. That's why you need: (1) multiple layers, (2) continuous monitoring, (3) rapid response to new attacks, (4) user reporting mechanisms. Treat guardrails like security — ongoing work, not one-time fix.
Use free/open-source for common cases (OpenAI Moderation, Guardrails AI). Buy specialized tools for complex needs (Lakera for prompt injection, enterprise content safety). Build custom rules for domain-specific requirements (industry terminology, company policies).
Have an incident response plan: (1) Kill switch to disable AI immediately, (2) Fallback to human agents, (3) Postmortem and patch, (4) User notification if needed. Monitor social media and support tickets for reports of AI misbehavior.
Guardrails aren't about limiting AI — they're about deploying it responsibly. The more autonomy you give your AI agents, the more critical safety mechanisms become.
Start with basics: Content filtering, PII detection, output validation. These cover 80% of risks.
Layer defenses: Input validation → prompt engineering → output filtering → action approval.
Test continuously: Red team, monitor, and iterate. Attackers evolve; your guardrails must too.
At Propelius Technologies, we build AI agents with safety and compliance baked in. Get in touch to discuss building responsible AI for your business.
Need an expert team to provide digital solutions for your business?
Book A Free CallDive into a wealth of knowledge with our unique articles and resources. Stay informed about the latest trends and best practices in the tech industry.
View All articlesTell us about your vision. We'll respond within 24 hours with a free AI-powered estimate.
© 2026 Propelius Technologies. All rights reserved.