Deep Dive

Chatbot Architecture

How modern AI chatbots are built - from simple Q&A bots to fully autonomous agents that take actions on behalf of users.

3
Architecture Types
6
Processing Steps
Channel Support
Foundations

Three Types of AI Chatbots

Not all chatbots are created equal. Understanding these three architectures helps you choose the right approach for your needs.

Basic LLM Chatbot

Direct conversation with an LLM. Simple to set up but limited to the model's training data.

Fast to implement
Good for general Q&A
No custom knowledge
Can't take actions

RAG Chatbot

Popular

LLM enhanced with your documents. Answers based on your actual content.

Uses your knowledge base
Accurate, sourced answers
Easy to update content
Read-only (no actions)

Agent Chatbot

Full autonomy with function calling. Can retrieve data AND take actions.

Books, updates, creates
Accesses live systems
Full task automation
More complex setup
FeatureBasic LLMRAG ChatbotAgent Chatbot
Knowledge source
Where the bot gets information
LLM training data onlyYour documents + LLMDocuments + External systems
Accuracy on your data
Answers about your specific business
Low - often wrongHigh - from your docsHigh - plus real-time data
Can take actions
Book, update, create, etc.
Setup complexity
Time and effort to implement
LowMediumHigh
Best for
Ideal use cases
General Q&AKnowledge basesFull automation
Core Loop

The Conversation Processing Loop

Every message flows through this pipeline. Understanding each step helps you optimize performance and debug issues.

1~10ms

Chatbot Architecture | How It Works

User sends a message via any channel

Details

The system receives a message from WhatsApp, web chat, SMS, or any connected channel. Messages are normalized into a standard format regardless of source.

Example

WhatsApp: "Hi, I need to check my order status for order #12345"
2~50ms

2. Context Loading

Retrieve conversation history and user data

Details

The system loads previous messages from the current session, the user's profile data, and any relevant metadata. This context helps the LLM understand the full situation.

Example

Load: Last 10 messages, user profile, past orders
3~100ms

3. RAG Retrieval

Search knowledge base for relevant info

Details

If RAG is enabled, the user's message is embedded and used to search the vector database for relevant documents. The top matches are retrieved to provide context for the response.

Example

Retrieved: Order tracking FAQ, Shipping policies doc
4~300ms

4. LLM Processing

Generate response with all context

Details

The LLM receives: system prompt, conversation history, RAG context, user profile, and the current message. It generates a response and may trigger function calls.

Example

Function call: get_order_status(order_id: "12345")
5~200ms

5. Action Execution

Run function calls and get results

Details

If the LLM requested function calls, they're executed against your backend systems. Results are fed back to the LLM to incorporate into the response.

Example

Result: Order shipped, arriving Jan 18th
6~50ms

6. Response & Storage

Send response and save to memory

Details

The final response is sent back to the user via their original channel. The conversation is saved to memory for future context.

Example

"Your order #12345 shipped and will arrive January 18th!"

Total Processing Time: ~700ms

A well-optimized chatbot can process messages in under a second. The LLM step is typically the slowest, but streaming responses make the experience feel faster.

Channels

Multi-Channel Architecture

Modern chatbots meet customers where they are. A unified architecture handles WhatsApp, web chat, SMS, and more through a single brain.

WhatsApp

Rich media, quick replies

Web Widget

Embedded on your site

SMS

Universal reach

Messenger

Facebook integration

Unified conversation history

Start on WhatsApp, continue on web - context preserved

Single knowledge base

Update once, deploy everywhere

Channel-specific features

Rich media on WhatsApp, buttons on web

Channel Configuration Example

chatbot-config.json
1{
2 "channels": {
3 "whatsapp": {
4 "enabled": true,
5 "provider": "twilio",
6 "features": ["rich_media", "quick_replies", "location"]
7 },
8 "web_widget": {
9 "enabled": true,
10 "customization": {
11 "primary_color": "#1a73e8",
12 "position": "bottom-right"
13 }
14 },
15 "sms": {
16 "enabled": true,
17 "provider": "twilio",
18 "features": ["text_only"]
19 },
20 "facebook_messenger": {
21 "enabled": true,
22 "features": ["rich_media", "quick_replies", "persistent_menu"]
23 }
24 },
25 "unified_inbox": true,
26 "conversation_continuity": true
27}
Escalation

Human Handoff: When AI Steps Aside

The best chatbots know their limits. Seamless handoff to human agents ensures complex issues get proper attention.

Total: ~101ms

The AI recognizes triggers indicating a human should take over:

  • • User explicitly requests human ("talk to a person")
  • • Sentiment analysis detects frustration
  • • Complex issue outside AI's scope
  • • Multiple failed resolution attempts

AI generates a summary for the human agent:

Handoff Summary:

Customer frustrated about delayed order #12345. Shipping shows "in transit" for 5 days. Wants refund or express reshipping. High-value customer (3 previous orders).

Available agent receives notification with full context.

Routed based on: skill, availability, workload

Customer is smoothly connected to agent:

"I'm connecting you with Sarah from our support team. She has your order details and will help resolve this right away."

Automatic Escalation Triggers

  • • User explicitly requests human
  • • Negative sentiment detected (3+ messages)
  • • Issue marked as "high complexity"
  • • Failed to resolve after 3 attempts
  • • VIP customer flag
  • • Legal/compliance keywords detected

Handoff Best Practices

  • • Generate conversation summary
  • • Include customer sentiment score
  • • Pass user profile and history
  • • Skill-based agent routing
  • • Clear transition message to user
  • • Allow agent to see full transcript
Security

Security Considerations

Chatbots handle sensitive customer data. A secure architecture protects both your users and your business.

PII Protection

Automatically detect and redact credit card numbers, SSNs, and other sensitive data before it reaches the LLM. Never store raw PII in conversation logs.

Prompt Injection Defense

Users may try to manipulate the chatbot with crafted prompts. Implement input sanitization, output validation, and system prompt protection layers.

Data Handling

Encrypt all data in transit (TLS) and at rest. Implement data retention policies. Provide data export and deletion for GDPR compliance.

Security Features Comparison

FeatureEnterpriseStandardBasic
PII redaction
Automatically mask sensitive data
Prompt injection protection
Block manipulation attempts
Data encryption (at rest)
Encrypted storage
Data encryption (in transit)
TLS/SSL connections
Audit logging
Complete conversation logs
Role-based access
Control who sees what

Enterprise solutions include comprehensive security features. For compliance-heavy industries, these are non-negotiable.

Full Picture

Complete Architecture Overview

How all the pieces fit together in a production chatbot system.

Channels

WhatsApp
Web Chat
SMS

Core System

Message Router
LLM Engine
RAG Pipeline
Function Handler

Integrations

Vector DB
CRM/ERP
Help Desk

← Messages flow bidirectionally between all components →

Ready to Build Your Chatbot?

We architect and build production-ready chatbots that scale. From simple RAG assistants to full agent systems with deep integrations.