CLI tool that extracts text from documents and outputs it to stdout.
macOS / Linux
curl -sL https://raw.githubusercontent.com/tq303/dedox/main/scripts/install.sh | shWindows (run as administrator)
curl -sL https://raw.githubusercontent.com/tq303/dedox/main/scripts/install.bat -o install.bat && install.batnpm / yarn
npm install @tq303/dedox
yarn add @tq303/dedoxGo
go install github.com/tq303/dedox@latestLocal development
make installLink to a Node.js project (no publish needed)
Build the binary into the npm package, register it globally, then link from your project:
# in ddx/
make dev # builds into npm/bin/ddx
cd npm && yarn link # registers @tq303/dedox globally
# in your project/
yarn link "@tq303/dedox" # symlinks node_modules to local ddx/npm/After any Go change, just make dev — the linked project picks it up immediately.
ddx [file]Supported file types: .pdf, .docx, .xlsx, .pptx, .html, .rtf, .jpg, .txt
ddx report.pdf
ddx notes.docx | grep "keyword"
ddx data.xlsx > output.txtApply one or more named filters with --filter (repeatable, applied in order):
| Filter | What it does |
|---|---|
pii |
Redacts emails, phone numbers, SSNs, credit card numbers |
urls |
Redacts URLs |
ip |
Redacts IPv4 addresses |
boilerplate |
Strips page numbers, copyright lines, "Confidential" stamps |
norm |
Collapses multiple spaces and consecutive blank lines |
uniq |
Removes duplicate lines |
ddx report.pdf --filter boilerplate --filter norm
ddx contract.docx --filter pii --filter urls > redacted.txt
ddx logs.txt --filter ip | grep "error"const ddx = require('@tq303/dedox')
const text = ddx('report.pdf')
const redacted = ddx('contract.docx', { filters: ['pii', 'norm'] })- Non-destructive — never modifies the source file
- No runtime dependencies — single compiled binary
- Composable — output to stdout, works in pipelines like any Unix tool
- Fast — no LLM, no network, all local processing
- Simple — no config needed to get started