Skip to content

dcrearer/docint

Repository files navigation

Document Intelligence Service

Last updated: 2026-06-07

A production-grade RAG (Retrieval-Augmented Generation) system built with:

  • Rust Lambda functions for vector search (5-10x faster than Python)
  • AgentCore Runtime hosting a Claude-powered agent (container deployment, ARM64)
  • AgentCore Gateway exposing Lambda functions as MCP tools (SigV4 auth)
  • PostgreSQL + pgvector for hybrid vector + full-text search
  • S3 auto-ingest — drop a file, it's automatically chunked, embedded, and searchable
  • Cognito authentication — CLI sign-up/login with tenant isolation via Cognito sub (UUID)

Architecture

Project Structure

docint/
├── crates/                     # Rust workspace
│   ├── docint-core/            # Shared library
│   │   ├── src/
│   │   │   ├── chunker.rs      # Semantic text chunking
│   │   │   ├── db.rs           # Connection pool + RLS tenant context
│   │   │   ├── embeddings.rs   # Bedrock Titan v2 embeddings
│   │   │   ├── models.rs       # Data types (Document, Chunk, SearchResult)
│   │   │   └── store.rs        # Vector store (similarity, hybrid, RRF)
│   │   └── Cargo.toml
│   ├── lambda-search/          # Hybrid search Lambda
│   ├── lambda-metadata/        # Document metadata Lambda
│   ├── lambda-compare/         # Document comparison Lambda
│   ├── lambda-ingest/          # S3 ingestion pipeline Lambda (S3 events + direct invoke)
│   └── docint-cli/             # Rust CLI client for querying the agent
│       └── src/
│           ├── main.rs         # CLI entry point, chat REPL
│           └── auth.rs         # Cognito auth (login, signup, token cache)
├── agent/
│   ├── agent.py                # Strands agent (Claude Haiku + Gateway tools via mcp-proxy-for-aws)
│   ├── Dockerfile              # ARM64 container for AgentCore Runtime
│   └── requirements.txt
├── infrastructure/             # CDK (Python)
│   ├── app.py                  # 6 stacks wired together
│   └── stacks/
│       ├── auth_stack.py       # Cognito User Pool + auto-confirm trigger
│       ├── database_stack.py   # Aurora Serverless v2 + VPC endpoints
│       ├── lambda_stack.py     # 4 Lambdas (VPC, IAM) + S3 bucket with event triggers
│       ├── gateway_stack.py    # AgentCore Gateway + MCP tool targets
│       ├── agent_stack.py      # AgentCore Runtime (container) + Endpoint + Memory
│       └── monitoring_stack.py # CloudWatch dashboard + alarms
├── docs/ 
│   └── architecture.png        # Architecture diagram
├── migrations/                 # SQL migrations (sqlx)
├── local/                      # Podman compose + test events
├── .github/workflows/ci.yml   # CI/CD pipeline
├── Cargo.toml                  # Workspace config + release profile
└── Dockerfile.lambda           # Container-based cross-compile

Prerequisites

  • Rust 1.75+
  • Python 3.11+
  • Podman & Podman Compose (for local development and testing)
  • AWS CLI v2 (configured with Bedrock access)
  • cargo-lambda (cargo install cargo-lambda)
  • cargo-llvm-cov (cargo install cargo-llvm-cov) - for test coverage reports
  • AWS CDK (npm install -g aws-cdk)

Quick Start — Query the Agent

# Set environment variables (from CDK outputs)
export DOCINT_RUNTIME_ARN="arn:aws:bedrock-agentcore:us-east-1:<ACCOUNT_ID>:runtime/<RUNTIME_ID>"
export DOCINT_CLIENT_ID="<COGNITO_CLIENT_ID>"

# Launch interactive chat (sign up or login on first run)
cargo run --bin docint-cli

On first run you'll see:

Welcome to docint

> Login
  Sign up
  Quit

Sign up creates an account and logs you in immediately. Your tenant ID is your Cognito sub (UUID) — unique, collision-free, and used for all data isolation.

Chat commands:

  • Type a question to query the agent
  • logout — clear cached credentials and exit
  • quit / exit — exit

Ingest Documents

Upload files to the S3 bucket — they're automatically ingested. Tenant is derived from the S3 key prefix, which should be your Cognito sub (UUID shown at login):

# Your Cognito sub is shown at login: ✓ Logged in as alice (tenant: a1b2c3d4-...)
# Use it as the S3 key prefix:
aws s3 cp my-notes.md s3://docint-docs-<ACCOUNT_ID>/a1b2c3d4-e5f6-7890-abcd-ef1234567890/docs/

# Sync an entire directory (only .md files)
aws s3 sync ./my-docs/ s3://docint-docs-<ACCOUNT_ID>/a1b2c3d4-e5f6-7890-abcd-ef1234567890/ --exclude "*" --include "*.md"

# TIP: Use --dryrun to preview what would be uploaded before syncing
aws s3 sync ./my-docs/ s3://docint-docs-<ACCOUNT_ID>/a1b2c3d4-.../ --exclude "*" --include "*.md" --dryrun

Useful sync options:

  • --dryrun - Preview changes without uploading (recommended for large syncs)
  • --delete - Remove files in S3 that don't exist locally (use with caution)
  • --exclude "*" --include "*.md" - Filter by file extension

Supported formats: .txt .md .csv .json .html .xml .yaml .yml .log .rst

Testing

The project has comprehensive test coverage (~70%) across Rust and Python code.

Quick Test Commands

# Rust: Run all unit tests (no database needed)
cargo test --workspace --lib

# Rust: Run integration tests (requires test database)
podman-compose -f docker-compose.test.yml up -d
cargo test --workspace --test '*' -- --ignored

# Rust: Generate coverage report
cargo llvm-cov --workspace --html --open

# Python: Run agent unit tests
cd agent
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements-dev.txt
pytest test_agent.py -v

# Python: Run with coverage
pytest test_agent.py --cov=agent --cov-report=html

Test Categories

Type Count Command Dependencies
Rust Unit 15 CLI tests cargo test --bin docint-cli None
Rust Integration 40 tests cargo test --workspace --test '*' -- --ignored Test DB (podman)
Python Unit 12 tests cd agent && pytest test_agent.py -v venv + requirements-dev.txt
Total 67 tests cargo test --workspace && cd agent && pytest Both

Test Database Setup

Integration tests use a dedicated PostgreSQL instance:

# Start test database (PostgreSQL 16 + pgvector on port 5433)
podman-compose -f docker-compose.test.yml up -d

# Each test creates a unique database: docint_test_<uuid>
# Tests run in parallel with full isolation
# Non-privileged test_user enforces RLS policies

# Stop test database
podman-compose -f docker-compose.test.yml down

What's Tested

Rust (docint-core + lambdas):

  • ✅ RLS tenant isolation (7 tests)
  • ✅ Store business logic: search, RRF, insert, metadata (22 tests)
  • ✅ Lambda handlers: search, metadata, compare (18 tests)
  • ✅ Embeddings: JSON, dimension validation, Unicode (8 tests)
  • ✅ CLI: tool call parsing, SSE extraction, XML filtering (15 tests)

Python (agent):

  • ✅ TenantInjectorMCPClient: injection, delegation, isolation (9 tests)
  • ✅ System prompt: parameter documentation, workflow guidance (3 tests)

Coverage: ~70% overall (85% core business logic)

See docs/TEST-COVERAGE-FINAL.md for detailed breakdown.

Local Development

# 1. Start PostgreSQL (for local dev)
cd local && podman-compose up -d

# 2. Run migrations
export DATABASE_URL="postgres://docint:docint_local@localhost:5432/docint"
sqlx migrate run --source migrations

# 3. Build
cargo build --workspace

# 4. Test the agent locally (requires AWS credentials)
cd agent && python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
export GATEWAY_URL="https://<GATEWAY_ID>.gateway.bedrock-agentcore.us-east-1.amazonaws.com/mcp"
export MODEL_ID="us.anthropic.claude-haiku-4-5-20251001-v1:0"
python3 -c "from agent import invoke; print(invoke({'prompt':'What documents do you have?','tenant_id':'<YOUR_COGNITO_SUB>'}))"

# 5. Test a Lambda locally
cargo lambda watch --invoke-port 9001 &
cargo lambda invoke lambda-search \
  --data-file local/test-events/search.json \
  --invoke-port 9001

Deployment

First-time setup

# 1. Bootstrap CDK (if not done)
cd infrastructure
cdk bootstrap

# 2. Deploy GitHub OIDC role (update OWNER/docint in the file first)
cdk deploy -a "python3 bootstrap_github_oidc.py"

# 3. Add the output role ARN as GitHub secret: AWS_DEPLOY_ROLE_ARN

Deploy via CI/CD

Push to main triggers the GitHub Actions pipeline:

  1. Testcargo test --workspace --lib (unit tests) + cargo test --workspace --test '*' -- --ignored (integration tests)
  2. Buildcargo lambda build --release --arm64 + Docker image (QEMU cross-compile)
  3. Deploycdk deploy --all

Manual deploy

cargo lambda build --release --arm64 --workspace

cd infrastructure
source .venv/bin/activate
pip install -r requirements.txt
cdk deploy --all

Key Design Decisions

  • Hybrid search with RRF — combines vector similarity and PostgreSQL full-text search for better recall than either alone
  • RRF stands for Reciprocal Rank Fusion — combines two different search strategies into one ranked result set.
  • Row-level security (RLS) — PostgreSQL enforces tenant isolation at the DB level, not just application code. Transaction-scoped set_config('app.tenant_id', ...) prevents cross-tenant data leaks even with connection pooling.
  • Comprehensive test coverage — 52 tests (12 unit + 40 integration) covering ~70% of codebase, including full verification of RLS tenant isolation
  • VPC endpoints instead of NAT — saves ~$32/month while keeping Lambdas in isolated subnets
  • OnceCell for Lambda state — DB pool and embedder initialize once on cold start, reused across invocations
  • Standalone Embedder — decoupled from VectorStore so ingestion and search can share it independently
  • Container-based agent — ARM64 Docker image for AgentCore Runtime avoids cold start timeouts
  • SigV4 Gateway authmcp-proxy-for-aws handles IAM signing for MCP connections
  • Claude Haiku — faster responses (~5-7s) vs Sonnet (~15s) for interactive use
  • S3 event-driven ingestion — upload a file, it's automatically chunked, embedded, and stored
  • Tenant-from-key derivation — S3 key prefix (e.g., a1b2c3d4-.../docs/file.md) determines tenant_id automatically
  • Conversational memory — AgentCore Memory with semantic, summary, and user preference strategies for cross-session recall
  • Cognito sub as tenant ID — UUID assigned by Cognito on sign-up, guarantees no collision, used for RLS and data isolation
  • Auto-confirm Lambda trigger — skips email verification for fast sign-up; easily reverted to email verification for production

TODO

Conversational Memory (AgentCore Memory)

  • Infrastructure: add AgentCore Memory resource to agent_stack.py (semantic strategy, 30-day event expiry)
  • Infrastructure: pass MEMORY_ID env var to agent container + IAM permissions for memory data plane
  • Agent: integrate AgentCoreMemorySessionManager with Strands agent
  • Agent: handle empty memory (first conversation) and write failures gracefully
  • CLI: add --chat flag for interactive REPL mode with streaming responses
  • CLI: add --actor flag, generate session_id per session, include both in payload
  • Docs: update README with --chat usage, MEMORY_ID env var
  • Test: verify cross-session memory retrieval, tenant isolation, failure resilience

Authentication

  • Cognito User Pool with self-signup and auto-confirm trigger
  • CLI auth module (login, signup, logout, token cache)
  • Tenant ID derived from Cognito sub (UUID)

Ingestion

  • S3 auto-ingest: derive tenant_id from S3 key prefix (supports UUIDs and legacy tenant names)

Testing

  • Integration test infrastructure (unique DB per test, RLS enforcement)
  • RLS tenant isolation tests (7 tests covering P0 security fix)
  • Business logic tests (15 tests for store.rs: insert, search, RRF, metadata)
  • Lambda handler tests (18 tests for search, metadata, compare handlers)
  • Embeddings unit tests (8 tests for JSON serialization, dimension validation)
  • Test coverage: ~70% overall, ~85% for core business logic
  • Auth module tests (requires Cognito mocking)
  • End-to-end smoke tests in staging environment

Performance

  • Tighter system prompt to reduce output tokens
  • Single-turn tool use (skip tool selection LLM round-trip)
  • Reduce search results returned to minimize synthesis input
  • Evaluate Amazon Nova Micro/Lite as faster model alternatives
  • Lambda-based agent (eliminate AgentCore Runtime + MCP Gateway overhead)
  • Query response cache (DynamoDB/ElastiCache)

Cost Estimate (Demo)

Component Monthly
Aurora Serverless v2 (min capacity) ~$15
Lambda (1K invocations) ~$0.50
AgentCore Runtime (1K) ~$2
Bedrock Claude Haiku (1K conversations) ~$1
Bedrock Embeddings ~$0.30
VPC Endpoints ~$21
S3 (document storage) ~$0.02
Cognito (first 10K MAUs free) $0
Total ~$40

About

Production-grade RAG system built with Rust Lambdas, Aurora Serverless (pgvector), and a Claude-powered Strands agent on AWS AgentCore. Features hybrid vector + full-text search with RRF, S3 auto-ingest pipeline, row-level tenant isolation via Cognito, and conversational memory — deployed with CDK

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors