Skip to content

Agent-Tracking-Mech/agentic-spreadsheet

Repository files navigation

agentic-spreadsheet

TypeScript library and HTTP/CLI tools for AI-assisted spreadsheet work: load Excel/CSV, ask natural-language questions, generate formulas, run structured cleaning transforms, and explain existing formulas. Built on SheetJS and the Vercel AI SDK.

TypeScript License: MIT Node.js

Contents: Features · Installation · Configuration · Usage · HTTP API · CLI · Docker · Testing · Troubleshooting

Features

  • Formula generation — Describe logic in English; get an Excel-style formula string.
  • Natural-language queries — Ask questions over tabular data (sample rows are sent to the model).
  • Transforms — LLM proposes a JSON plan (filter, dedupe, fill nulls, sort, rename columns, add column placeholders); the engine applies it deterministically.
  • Formula explanation — Plain-English explanation of a formula.
  • Formats — Load/save XLSX, CSV, ODS, XLS (via SheetJS).
  • Local-first — Files stay on disk; only prompts and excerpts go to your chosen LLM provider.

Requirements

  • Node.js 22+
  • An API key for at least one provider: OpenAI (OPENAI_API_KEY) and/or Anthropic (ANTHROPIC_API_KEY), depending on the model id you pass.

Installation

cd agentic-spreadsheet
npm install
npm run build

Configuration

Copy .env.example to .env and set keys:

OPENAI_API_KEY=sk-...
# Optional:
# ANTHROPIC_API_KEY=
# AGENT_MODEL=openai/gpt-4o-mini
PORT=3000
# API_KEY=your-secret-key
# MAX_BODY_BYTES=1048576

Model strings use a provider prefix, for example:

  • openai/gpt-4o-mini — requires OPENAI_API_KEY
  • anthropic/claude-sonnet-4-20250514 — requires ANTHROPIC_API_KEY

If you omit the prefix (e.g. gpt-4o-mini), OpenAI is assumed and OPENAI_API_KEY is required.

In code, new SpreadsheetAgent({ model: 'gpt-4o-mini' }) and model: 'openai/gpt-4o-mini' are equivalent for OpenAI (the prefix only selects the provider).

Scripts

Command Description
npm run build Compile TypeScript to dist/
npm run dev Run the HTTP server with hot reload (tsx watch)
npm start Run compiled server (node dist/api/main.js)
npm run cli Interactive CLI
npm test Run all Vitest tests (no API key required)
npm run typecheck TypeScript check without emit
npm run clean Remove the dist/ build output
npm run test:watch Vitest watch mode
npm run test:integration Live LLM query test (needs OPENAI_API_KEY)
npm run test:formula Schema unit tests only

npm publish runs prepack, which executes npm run build so dist/ is always included in the package tarball.

Usage (library)

import { SpreadsheetAgent } from './src/index.js'; // or from 'agentic-spreadsheet' when installed as a dependency
import * as path from 'node:path';

const agent = new SpreadsheetAgent({
  model: 'openai/gpt-4o-mini',
  maxRows: 10000,           // optional: trim each sheet to this many rows on load
  maxContextRows: 200,      // optional: rows sent to the LLM per request
  createModel: myModelFactory, // optional: inject model for tests/custom providers
});

const stats = agent.getWorkbookStats(workbook);
console.log(stats.rowCount, stats.columns);

const workbook = await agent.loadWorkbook(path.join('samples', 'sales-sample.csv'), 'csv');

const result = await agent.query(workbook, 'What is the total revenue?');
console.log(result.answer);
console.log(result.formula);

const formula = await agent.generateFormula(
  'Commission is 10% of sales if sales > 1000, else 5%',
  { columns: ['Sales'] }
);

const cleaned = await agent.transform(
  workbook,
  'Remove rows where Status equals Cancelled'
);
await agent.saveWorkbook(cleaned, 'output/cleaned.xlsx', 'xlsx');

Direct data access (no LLM)

const rows = agent.getRows(workbook); // first sheet; pass { sheetName: '…' } for others
const columns = agent.getColumns(workbook);
const filtered = agent.filterRows(workbook, (row) => Number(row['Revenue']) > 1000);

API surface

Method Purpose
loadWorkbook(path | Buffer, format?) Load a file or buffer. For a Buffer, pass format unless it is XLSX (the default). Paths infer format from the extension.
saveWorkbook(workbook, path, format?) Save as XLSX or CSV. CSV writes the first sheet only (same limitation as typical Excel CSV export).
query(workbook, question, { sheetName? }) NL question → { answer, formula?, rows? }.
generateFormula(description, { columns? }) Single formula string.
transform(workbook, instruction, { sheetName? }) NL → structured actions → updated workbook.
explainFormula(formula, { columns? }) Text explanation.
getRows / getColumns / filterRows Non-LLM helpers.
getWorkbookStats Row/column/sheet metadata for a workbook.
detectFormatFromPath(path) Infer SheetFormat from a file path extension.
Type SheetFormat 'xlsx' | 'csv' | 'ods' | 'xls'

query, transform, generateFormula, and explainFormula reject empty or whitespace-only text inputs. For query and transform, sheetName is optional (defaults to the first sheet). Unknown sheet names throw an error listing available sheet names.

The HTTP API rejects whitespace-only command values and trims formula for explain mode.

HTTP API

npm run dev
# or after build: npm start
  • GET /health — Liveness check (includes version; not rate-limited).
  • POST /spreadsheet — JSON body (validated with Zod; 1 MiB body limit by default):

Optional production settings via environment:

  • API_KEY — When set, POST /spreadsheet requires X-API-Key: … or Authorization: Bearer … (GET /health stays public).
  • MAX_BODY_BYTES — Override the JSON body size limit (default 1048576).
Field Description
mode query (default), transform, formula, or explain
command Natural language command (not required for explain)
file Path under the uploads directory (required for query/transform; optional for formula/explain to supply column context)
formula For mode: explain only
sheetName Optional sheet name

Responses for query and transform include a stats object (rowCount, columns, etc.). Missing upload files return 404.

Examples:

# Place a file in ./uploads (created on startup), e.g. uploads/data.csv

curl -s http://localhost:3000/health

curl -s -X POST http://localhost:3000/spreadsheet \
  -H "Content-Type: application/json" \
  -d '{"mode":"formula","command":"Sum column B if column A equals Yes"}'

curl -s -X POST http://localhost:3000/spreadsheet \
  -H "Content-Type: application/json" \
  -d '{"mode":"query","command":"What is the average revenue?","file":"data.csv"}'

curl -s -X POST http://localhost:3000/spreadsheet \
  -H "Content-Type: application/json" \
  -d '{"mode":"explain","formula":"=IF(A1>0,SUM(B1:B10),0)"}'

POST /spreadsheet only accepts file paths that resolve inside the configured uploads directory (traversal and the uploads root itself are rejected).

CLI

npm run cli

Prompts for a mode (query, transform, formula, explain), then runs an interactive loop:

  • query — Ask questions over a loaded file (default).
  • transform — Apply NL cleaning; saves results to ./output/cli-out-<timestamp>.xlsx.
  • formula — Generate formulas (optional file for column context).
  • explain — Explain a formula (optional file for column context).

Docker

npm run build
docker compose build
docker compose up

Mount ./uploads and ./output as in docker-compose.yml. Set OPENAI_API_KEY in the environment.

Testing

npm test                  # Unit + API tests (no API key)
npm run typecheck         # tsc --noEmit
npm run test:integration  # LLM query test (skipped without OPENAI_API_KEY)
npm run test:formula      # Schema unit tests

CI runs on push/PR to main via GitHub Actions (.github/workflows/ci.yml).

Project layout

agentic-spreadsheet/
├── src/
│   ├── SpreadsheetAgent.ts
│   ├── index.ts
│   ├── llm/model.ts
│   ├── engine/
│   ├── prompts/
│   ├── schemas/
│   ├── api/
│   │   ├── server.ts    # buildServer(), routes
│   │   └── main.ts      # HTTP listen
│   └── cli/
├── samples/
├── tests/
├── .github/workflows/ci.yml
├── README.md
├── Dockerfile
├── docker-compose.yml
├── .env.example
├── .gitignore
├── .dockerignore
├── package.json
├── LICENSE
├── tsconfig.json
└── vitest.config.ts

dist/ is produced by npm run build. dist/ and .vitest-cache/ (Vitest) are gitignored.

Troubleshooting

  • OPENAI_API_KEY is required — Export the key or add it to .env before calling LLM features (query, transform, generateFormula, explainFormula).
  • Port already in use — Set PORT=3010 (or another free port) in .env.
  • Integration test skippednpm run test:integration only runs when OPENAI_API_KEY is set in the environment.
  • Unauthorized (401) — Set X-API-Key or Authorization: Bearer when API_KEY is configured.
  • File not found (404) — Ensure the file exists under ./uploads and the path does not traverse outside that directory.
  • Wrong sheet or empty workbookquery / transform require a valid sheetName when you do not want the first sheet; empty workbooks are rejected.

Embed the HTTP server

import { buildServer } from 'agentic-spreadsheet/api';

const app = await buildServer({ uploadDir: './uploads', outputDir: './output' });
await app.listen({ port: 3000, host: '0.0.0.0' });

Google Sheets

The library does not call Google APIs. You can fetch values with googleapis, build a CSV buffer, and call loadWorkbook(buffer, 'csv') — see the Google Sheets API documentation.

License

MIT

Contact Info

Telegram: @tradingtermin

About

TypeScript library: AI agent for spreadsheets (queries, formulas, transforms) with SheetJS & LLMs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors