Agora RTC Lambda Functions

AWS Lambda functions for Agora RTC token generation and ConvoAI agent management to be used in conjunction with the Agora telephony/SIP gateway.

Overview

This repo contains two independent Lambda functions:

Lambda	File	Purpose
PSTN CallLookup	`token_gen.py`	Returns an RTC token + channel for inbound PSTN calls
ConvoAI Agent Launcher	`launch_agent.py`	Generates tokens, launches/hangs-up Agora ConvoAI agents

Both share the same v007 token generation code but serve different use cases.

token_gen.py — PSTN CallLookup

Handles the Agora PSTN gateway CallLookup webhook. When an inbound phone call arrives, the gateway POSTs caller information and this Lambda responds with an RTC token and channel name so the gateway can connect the caller.

How It Works

PSTN gateway sends POST {did, pin, callerid}
Lambda generates a random 10-character channel name
Builds a v007 RTC token (or uses APP_ID if no certificate)
Returns the CallLookup response so the gateway joins the caller to the channel

Environment Variables (token_gen)

Required

APP_ID=your_agora_app_id

Optional

APP_CERTIFICATE=your_agora_app_certificate   # omit to use APP_ID as token
USER_UID=101                                  # default: "101"
AUDIO_SCENARIO=0                              # default: "0"
WEBHOOK_URL=https://example.com/webhook       # included in response if set
SDK_OPTIONS={"key":"value"}                   # included in response if set

Request / Response

Request (POST from PSTN gateway):

{
  "did": "17177440111",
  "pin": "",
  "callerid": "1765740333"
}

Response:

{
  "token": "007eJxT...",
  "uid": "101",
  "channel": "A1B2C3D4E5",
  "appid": "your_app_id",
  "audio_scenario": "0"
}

Optional fields webhook_url and sdk_options are included when the corresponding environment variables are set.

Lambda Configuration (token_gen)

Setting	Value
Handler	`token_gen.lambda_handler`
Timeout	10 seconds
Memory	128 MB

launch_agent.py — ConvoAI Agent Launcher

Launches and manages Agora Conversational AI agents with configurable TTS, STT, and LLM providers.

Features

Multi-vendor TTS support: Rime, ElevenLabs, OpenAI, Cartesia
Multi-vendor STT support: Ares (Agora built-in), Deepgram
Flexible LLM backend: Any OpenAI-compatible API
Profile-based configuration: Support multiple agent configurations via profiles
Token-only mode: Generate tokens without starting an agent
Agent lifecycle management: Join and hangup capabilities
RTM support: Real-time messaging integration

Supported Providers

Text-to-Speech (TTS)

1. Rime

TTS_VENDOR=rime
RIME_API_KEY=your_api_key
RIME_SPEAKER=astra (default)
RIME_MODEL_ID=mistv2 (default)
RIME_LANG=eng (default)
RIME_SAMPLING_RATE=16000 (default)
RIME_SPEED_ALPHA=1.0 (default)

2. ElevenLabs

TTS_VENDOR=elevenlabs
TTS_KEY=your_api_key
TTS_VOICE_ID=your_voice_id
TTS_VOICE_STABILITY=1 (default: 0-1)
TTS_VOICE_SAMPLE_RATE=24000 (default)

3. OpenAI

TTS_VENDOR=openai
TTS_KEY=your_api_key
TTS_VOICE_ID=alloy|echo|fable|onyx|nova|shimmer
TTS_VOICE_SPEED=1.0 (default: 0.25-4.0)

4. Cartesia

TTS_VENDOR=cartesia
CARTESIA_API_KEY=your_api_key
CARTESIA_MODEL=sonic-3 (default)
CARTESIA_VOICE_ID=your_voice_id
CARTESIA_SAMPLE_RATE=24000 (default)

Speech-to-Text (STT/ASR)

Ares (default) — Agora's built-in ASR, no API key required:

ASR_VENDOR=ares (default)
ASR_LANGUAGE=en-US (default)

Deepgram

ASR_VENDOR=deepgram
DEEPGRAM_KEY=your_api_key
DEEPGRAM_MODEL=nova-3 (default)
DEEPGRAM_LANGUAGE=en (default)

Large Language Model (LLM)

Any OpenAI-compatible API:

LLM_URL=https://api.openai.com/v1/chat/completions
LLM_API_KEY=your_api_key
LLM_MODEL=gpt-4o-mini

Environment Variables (launch_agent)

Required

APP_ID=your_agora_app_id
LLM_URL=your_llm_endpoint
LLM_API_KEY=your_llm_api_key
LLM_MODEL=your_model_name

Authentication (one of the following)

# Option 1: APP_CERTIFICATE (recommended)
# Generates v007 tokens for both API auth and channel join.
# API calls use "agora token=<v007_token>" authorization.
APP_CERTIFICATE=your_agora_app_certificate

# Option 2: AGENT_AUTH_HEADER (Basic auth)
# Uses Basic auth for API calls, APP_ID as channel join token.
AGENT_AUTH_HEADER=Basic <base64_key:secret>

If both are set, AGENT_AUTH_HEADER takes priority for API auth. If neither is set, API calls will fail (APP_ID alone is not valid for API auth).

TTS Configuration

See provider-specific settings above.

STT Configuration

# Default: Ares (no API key needed)
ASR_VENDOR=ares

# Or use Deepgram:
ASR_VENDOR=deepgram
DEEPGRAM_KEY=your_deepgram_key
DEEPGRAM_MODEL=nova-3
DEEPGRAM_LANGUAGE=en

Optional Settings

# Agent Behavior
DEFAULT_PROMPT="Your custom system prompt"
DEFAULT_GREETING="hi there"
DEFAULT_FAILURE_MESSAGE="An error occurred, please try again later"
DEFAULT_MAX_HISTORY=32

# Voice Activity Detection
VAD_SILENCE_DURATION_MS=300

# Advanced Features
ENABLE_BHVS=true
ENABLE_RTM=true
ENABLE_AIVAD=true
ENABLE_ERROR_MESSAGE=true

# Agent Settings
IDLE_TIMEOUT=120

# Optional Graph ID
GRAPH_ID=your_graph_id

API Usage

Base URL

https://your-lambda-url.amazonaws.com/your-stage/

1. Launch Agent (Standard)

GET /?channel=my_channel

# Optional parameters:
# - profile: Configuration profile to use
# - prompt: Override system prompt
# - greeting: Override greeting message
# - tts_vendor: rime|elevenlabs|openai|cartesia
# - voice_id: TTS voice identifier
# - llm_model: Override LLM model
# - debug: Include debug information

Response:

{
  "audio_scenario": "10",
  "token": "user_rtc_token",
  "uid": "101",
  "channel": "my_channel",
  "appid": "your_app_id",
  "user_token": {
    "token": "user_rtc_token",
    "uid": "101"
  },
  "agent_video_token": {
    "token": "agent_video_rtc_token",
    "uid": "102"
  },
  "agent": {
    "uid": "100"
  },
  "agent_rtm_uid": "100-my_channel",
  "enable_string_uid": false,
  "agent_response": {
    "status_code": 200,
    "response": "{...}",
    "success": true
  }
}

2. Token-Only Mode (No Agent Launch)

GET /?connect=false

# Optional:
# - channel: Specify channel (auto-generated if omitted)
# - profile: Configuration profile

Response:

{
  "audio_scenario": "10",
  "token": "user_rtc_token",
  "uid": "101",
  "channel": "AUTOGEN123",
  "appid": "your_app_id",
  "user_token": {
    "token": "user_rtc_token",
    "uid": "101"
  },
  "agent_video_token": {
    "token": "agent_video_rtc_token",
    "uid": "102"
  },
  "agent": {
    "uid": "100"
  },
  "agent_rtm_uid": "100-AUTOGEN123",
  "enable_string_uid": false,
  "token_generation_method": "RTC tokens with privileges",
  "agent_response": {
    "status_code": 200,
    "response": "{\"message\":\"Token-only mode...\"}",
    "success": true
  }
}

3. Hangup Agent

GET /?hangup=true&agent_id=your_agent_id

# Required:
# - agent_id: ID of the agent to disconnect

Response:

{
  "agent_response": {
    "status_code": 200,
    "response": "{...}",
    "success": true
  }
}

4. Debug Mode

GET /?debug=true&channel=my_channel
GET /?debug=true&env_debug=true  # Show environment variables

UID Structure

User UID: "101" — For end-user RTC connection
Agent UID: "100" — For AI agent audio
Agent Video UID: "102" — For agent video stream (if applicable)
String UIDs: Disabled by default (enable_string_uid: false)

Advanced Features

RTM (Real-Time Messaging)

Enable text chat alongside voice:

ENABLE_RTM=true

AI VAD (Voice Activity Detection)

AI-powered voice activity detection:

ENABLE_AIVAD=true

Behaviors (BHVS)

Enable agent behavior extensions:

ENABLE_BHVS=true

Error Messages

Return error messages to users:

ENABLE_ERROR_MESSAGE=true

Profile-Based Configuration

Use profile suffix to override defaults for specific use cases:

# Default configuration
LLM_MODEL=gpt-4o-mini
DEFAULT_GREETING="Hi there"

# Profile-specific (accessed via ?profile=premium)
LLM_MODEL_premium=gpt-4o
DEFAULT_GREETING_premium="Welcome, premium user"

Example Configurations

ElevenLabs + Deepgram + OpenAI

TTS_VENDOR=elevenlabs
TTS_KEY=sk_...
TTS_VOICE_ID=cgSgspJ2msm6clMCkdW9

ASR_VENDOR=deepgram
DEEPGRAM_KEY=...
DEEPGRAM_MODEL=nova-3

LLM_URL=https://api.openai.com/v1/chat/completions
LLM_API_KEY=sk-...
LLM_MODEL=gpt-4o-mini

Rime + Deepgram + Custom LLM

TTS_VENDOR=rime
RIME_API_KEY=...
RIME_SPEAKER=astra
RIME_MODEL_ID=mistv2

ASR_VENDOR=deepgram
DEEPGRAM_KEY=...

LLM_URL=https://your-llm-endpoint.com/v1/chat/completions
LLM_API_KEY=...
LLM_MODEL=your-custom-model

Cartesia + Deepgram + OpenAI

TTS_VENDOR=cartesia
CARTESIA_API_KEY=...
CARTESIA_MODEL=sonic-3
CARTESIA_VOICE_ID=...
CARTESIA_SAMPLE_RATE=24000

ASR_VENDOR=deepgram
DEEPGRAM_KEY=...
DEEPGRAM_MODEL=nova-3

LLM_URL=https://api.openai.com/v1/chat/completions
LLM_API_KEY=sk-...
LLM_MODEL=gpt-4o-mini

Lambda Configuration (launch_agent)

Setting	Value
Handler	`launch_agent.lambda_handler`
Timeout	30 seconds
Memory	256 MB
CORS	Enable for browser clients

Token Generation

Both Lambdas use v007 service-based tokens.

With APP_CERTIFICATE

Generates v007 tokens with RTC privileges:

RTC Service: JOIN_CHANNEL, PUBLISH_AUDIO/VIDEO/DATA_STREAM privileges
RTM Service (launch_agent only): LOGIN privilege with separate RTM UID ({agent_uid}-{channel})

token_gen.py generates RTC-only tokens (PSTN callers don't use RTM). launch_agent.py generates tokens with both RTC and RTM services.

Token expires in 24 hours.

Without APP_CERTIFICATE

Returns APP_ID as token for channel join (testing mode). For launch_agent.py, this requires AGENT_AUTH_HEADER for API authentication.

Troubleshooting

Agent doesn't join channel (launch_agent)

Verify either APP_CERTIFICATE or AGENT_AUTH_HEADER is set
Check APP_ID matches your Agora project
Ensure Lambda has internet access (VPC configuration)

No audio from agent (launch_agent)

Verify TTS provider credentials
Check TTS_VENDOR matches your configuration
Review CloudWatch logs for TTS errors

Speech recognition not working (launch_agent)

Verify Deepgram API key
Check microphone permissions on client side
Ensure audio is being sent to channel

PSTN caller not connecting (token_gen)

Verify the PSTN gateway is configured to POST to your Lambda URL
Check APP_ID is correct
Review CloudWatch logs for the CallLookup request

Token errors

Verify APP_CERTIFICATE is correct (must be 32-character hex)
Check token hasn't expired (24h default)
Ensure UID matches between client and token

License

See repository license.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
old		old
README.md		README.md
launch_agent.py		launch_agent.py
token_gen.py		token_gen.py

Folders and files

Latest commit

History

Repository files navigation

Agora RTC Lambda Functions

Table of Contents

Overview

token_gen.py — PSTN CallLookup

How It Works

Environment Variables (token_gen)

Required

Optional

Request / Response

Lambda Configuration (token_gen)

launch_agent.py — ConvoAI Agent Launcher

Features

Supported Providers

Text-to-Speech (TTS)

Speech-to-Text (STT/ASR)

Large Language Model (LLM)

Environment Variables (launch_agent)

Required

Authentication (one of the following)

TTS Configuration

STT Configuration

Optional Settings

API Usage

Base URL

1. Launch Agent (Standard)

2. Token-Only Mode (No Agent Launch)

3. Hangup Agent

4. Debug Mode

UID Structure

Advanced Features

RTM (Real-Time Messaging)

AI VAD (Voice Activity Detection)

Behaviors (BHVS)

Error Messages

Profile-Based Configuration

Example Configurations

ElevenLabs + Deepgram + OpenAI

Rime + Deepgram + Custom LLM

Cartesia + Deepgram + OpenAI

Lambda Configuration (launch_agent)

Token Generation

With APP_CERTIFICATE

Without APP_CERTIFICATE

Troubleshooting

Agent doesn't join channel (launch_agent)

No audio from agent (launch_agent)

Speech recognition not working (launch_agent)

PSTN caller not connecting (token_gen)

Token errors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages