In a large NBFC, a Regional Business Head or Zonal Head manages thousands of loan accounts across dozens of branches and field executives. Every morning, the same questions come up:
"Which accounts should my team prioritize today?" "Which executive is underperforming and why?" "How many accounts from last year's advances have gone delinquent?" "Show me customers who haven't paid in 3 months in the Western region."
Getting answers meant raising a request to an analyst, waiting for a report, and then asking a follow-up. The cycle repeated for every business question, every day.
Leaders were dependent on coordinators and analysts for information that should have been at their fingertips.
CollectionIQ is a self-serve portfolio intelligence platform built for collection leaders. Upload your monthly LCC Excel extract and the entire portfolio becomes queryable, visual, and explainable, without writing a single formula or waiting for a report.
It answers questions in plain English, surfaces risks automatically, and generates board-ready analysis on demand.
Live Link : collectioniq.streamlit.app
No data? No setup? Click Fill Sample Data on the landing page. It fetches a real LCC extract directly from this repository and builds the full dashboard instantly. No file upload, no API key, no configuration needed.
Landing Page - Upload regional files or load sample data instantly
Dashboard - KPIs and portfolio health with Month-on-Month movement
Portfolio Analysis - DPD bucket distribution, branch collection %, arrears exposure
|
Executive Scorecard - Every field executive ranked by collection %, strike rate, NPA count and roll rates |
Smart Alerts - 6 automatic risk flags with SOH exposure and recommended actions |
|
Bucket Migration - Roll-forward / roll-backward rates and the prev-month → curr-month migration matrix |
Monthly Portfolio Intelligence Report - board-ready HTML report with narrative, rankings and action plan |
AI Query Assistant - Plain English queries powered by Gemini 2.5 Flash + LangGraph
|
AI Query summary - top regions, branches, executives and bucket distribution for the matched accounts |
AI Queried table Observations - domain-aware narrative generated for every query result |
| Role | How They Use It |
|---|---|
| Regional Business Head | Monitor region-wide collection efficiency, bucket migration, and executive rankings |
| Zonal Head | Compare branch performance, identify concentration risk, track NPA formation |
| Business Unit Head | Query specific customer segments, get prioritized action lists, analyse new advances |
Plain English Query Engine : Ask any question in NBFC language. Get back a filtered loan table, a ranked executive comparison, or a single stat answer. Every result includes AI observations and an Excel download.
Multi-Region File Support : Upload multiple regional LCC files at once for a unified view. Supports .xlsx, .xls, and .xlsb with automatic deduplication and datetime normalisation across files.
Robust Column Normalisation : A four-pass pipeline handles truncated headers, trailing spaces, and capitalisation differences. Verified at 100% accuracy across all 85 expected columns. Multi-sheet workbooks auto-detect the LCC sheet.
Automated Priority Action List : Seven-tier business priority framework ranks accounts by impact. Non-starters, easy settlements, insurance arrears, co-lending risk, and NPA accounts each get their own actionable tier.
Field Executive Performance Scorecard : Every executive ranked by collection %, strike rate, NPA count, SMA-2 count, and bucket roll rates using quartile-based tiers relative to the current portfolio.
Bucket Migration Analysis : Two months of data reveal exactly how accounts moved between DPD buckets, the NPA formation rate, and which executives are improving or deteriorating.
6 Smart Risk Alerts : Pure pandas, no LLM, always accurate:
| Alert | Severity | What It Catches |
|---|---|---|
| Non Starters | Critical | Never paid 1st EMI |
| Co-lending at Risk | Critical | Partner bank exposure showing delinquency |
| High Arrears - Loan at Risk | Critical | Inst+Exp+BC arrears exceed 50% of original loan |
| Insurance-Driven Delinquency | High | EMI paid but insurance charge causing false delinquency |
| Recent Advances at Risk | High | Loans under 12 months already delinquent |
| Easy Settlements | Medium | Closing arrears under ₹1,000 |
SOH as the True Exposure Metric : Uses Sum of Hire (POS + Closing Arrears) instead of POS alone. For MAT and S&S accounts where POS = 0, SOH correctly reflects what is actually owed.
Monthly Portfolio Intelligence Report : Board-ready HTML report with AI narrative, branch league tables, executive rankings, and a five-point action plan. Email-safe layout. Generate once, send to any number of recipients without regenerating.
Before CollectionIQ, every portfolio question meant raising a request, waiting for an analyst, and asking a follow-up. Each loop took hours to a day.
With CollectionIQ, the same question is answered in under 30 seconds by the leader directly.
| Before | After |
|---|---|
| Analyst dependency for every query | Self-serve in plain English |
| Manual Excel work for breakdowns | Instant filtered results with AI observations |
| Subjective account prioritisation | Seven-tier data-driven priority framework |
| Month-end reports for portfolio health | Real-time bucket migration and alerts on every upload |
CollectionIQ runs two independent AI pipelines orchestrated with LangGraph - one for answering queries in real time, one for generating the monthly portfolio report.
Every question typed in plain English passes through four agents in sequence before a result appears on screen.
flowchart TD
User(["Plain English Query\ne.g. rank executives by MAT/RUN ratio"])
User --> DE
subgraph QP [" Query Pipeline (LangGraph) "]
direction TB
DE["Domain Expert Agent\nGemini 2.5 Flash\n\nUnderstands NBFC terminology · Maps intent to columns\nDecides result shape · Detects priority mode\nInjects today's date for relative time filters"]
PP["Query Parser Agent\nGemini 2.5 Flash\n\nTranslates enriched query into structured filter spec\nHandles conditions · GROUP BY · HAVING · sort"]
EX["Data Executor\nPandas\n\nApplies row filters · Aggregations · Priority rules\nComputes KPIs and executive rankings"]
IG["Insight Generator Agent\nGemini 2.5 Flash\n\nReads computed KPIs and rankings\nGenerates domain-aware observations and recommendations"]
DE --> PP --> EX --> IG
end
IG --> R1["Loan Table\nFiltered customer records"]
IG --> R2["Ranked Table\nOne row per executive / branch / region"]
IG --> R3["Single Stat\nDirect answer with supporting context"]
Triggered on demand. Runs fully autonomously - no user input needed after clicking Generate.
flowchart TD
Trigger(["Generate Monthly Report"])
Trigger --> PA
subgraph RP [" Report Pipeline (LangGraph) "]
direction TB
PA["Portfolio Analyzer\nPandas\n\nComputes all five report sections in parallel\nHealth snapshot · Risk flags · Bucket migration\nBranch performance · Executive rankings"]
RN["Risk Narrator\nGemini 2.5 Flash\n\nWrites 6-8 bullet-point executive narrative\nGenerates 5 prioritized action items with owner and timeline"]
RB["Report Builder\nPython\n\nAssembles fully self-contained HTML report\nTable-based layout · Email-safe · No external CSS"]
ED["Email Dispatcher\nSMTP\n\nSends report as body and attachment\nFires only if SMTP is configured in .env"]
PA --> RN --> RB --> ED
end
RB --> DL["Download HTML Report"]
ED --> EM["Email to Configured Recipients"]
Both pipelines operate on the same in-memory DataFrame loaded from the Excel upload. No database, no cloud storage. Data never leaves the machine.
flowchart LR
XL["LCC Excel File\n.xlsx / .xls / .xlsb\nSingle or multiple regional files"] --> VAL["Validation\nSchema check · Column normalisation\nMulti-sheet detection · Date parsing"]
VAL --> BK["Bucketing\nDPD bucket assignment\nSTD · 1-30 · SMA-1 · SMA-2 · NPA\nSOH = POS + Closing Arrears"]
BK --> KPI["KPI Computation\nCollection % · SOH · Arrears · MoM delta"]
KPI --> DF[("In-Memory\nDataFrame")]
DF --> QP2["Query Pipeline"]
DF --> RP2["Report Pipeline"]
DF --> DB["Dashboard\nKPIs · Charts · Alerts · Scorecard"]
CollectionIQ/
├── app.py # Main Streamlit app - all UI layout and state
├── graph.py # AI query pipeline (LangGraph state machine)
├── utils.py # Data loading, column normalisation, metrics, charts
├── smart_alerts.py # 6 rule-based risk alerts (pure pandas, no LLM)
│
├── agents/
│ ├── domain_expert.py # Agent 1 - query enrichment + NBFC intent detection
│ ├── query_parser.py # Agent 2 - natural language to structured filter spec
│ ├── data_executor.py # Agent 3 - pandas filter / aggregation / priority mode
│ └── insight_generator.py # Agent 4 - AI observations on query results
│
├── analysis/
│ ├── executive_scorecard.py # Per-executive KPIs with quartile tier ranking
│ └── roll_rate.py # Bucket migration matrix and roll-rate KPIs
│
├── report_agent/
│ ├── graph.py # Report pipeline (LangGraph)
│ └── nodes/
│ ├── portfolio_analyzer.py # Computes all report sections
│ ├── risk_narrator.py # AI executive narrative + action plan
│ ├── report_builder.py # Assembles email-safe self-contained HTML report
│ └── email_dispatcher.py # SMTP delivery
│
├── sample_data/
│ ├── Current_Month_Demo.xlsx # Sample LCC extract - current month
│ └── Previous_Month_Demo.xlsx # Sample LCC extract - previous month
│
└── requirements.txt
| Layer | Technology | Role |
|---|---|---|
| UI and Dashboard | Streamlit | Interactive web interface, session state, multi-file upload |
| AI Models | Google Gemini 2.5 Flash | All four AI agents across both pipelines |
| Agent Orchestration | LangGraph | Stateful multi-agent graph with conditional routing |
| Data Processing | Pandas | Filtering, aggregation, bucketing, KPI computation |
| Charts | Plotly | DPD distribution, bucket migration heatmap, branch charts |
| AI SDK | google-genai | Gemini API with retry and exponential backoff |
| Report Delivery | Python smtplib | SMTP email with HTML body and attachment |
| Observability | LangSmith | Query tracing and result quality feedback |
| Excel Formats | openpyxl · xlrd · pyxlsb | Handles .xlsx, .xls, and .xlsb with serial-date correction |
| Date Handling | python-dateutil | Relative date resolution for time-based queries |
The domain knowledge layer: NBFC terminology, loan status values, strike rate definition, SOH calculation, priority framework, insurance delinquency logic - is embedded in the agent system prompts and verified against real portfolio data. The AI understands the difference between a RUN account, a MAT account, and an S&S account without any fine-tuning. Business context is injected at query time, making it straightforward to extend with new domain rules.
Both pipelines are stateless between runs. Each query or report generation starts fresh, no stale context, no memory leak, no shared state between users.









