Skip to content

tq303/dedox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ddx

version build language license

CLI tool that extracts text from documents and outputs it to stdout.


Install

macOS / Linux

curl -sL https://raw.githubusercontent.com/tq303/dedox/main/scripts/install.sh | sh

Windows (run as administrator)

curl -sL https://raw.githubusercontent.com/tq303/dedox/main/scripts/install.bat -o install.bat && install.bat

npm / yarn

npm install @tq303/dedox
yarn add @tq303/dedox

Go

go install github.com/tq303/dedox@latest

Local development

make install

Link to a Node.js project (no publish needed)

Build the binary into the npm package, register it globally, then link from your project:

# in ddx/
make dev                        # builds into npm/bin/ddx
cd npm && yarn link             # registers @tq303/dedox globally

# in your project/
yarn link "@tq303/dedox"        # symlinks node_modules to local ddx/npm/

After any Go change, just make dev — the linked project picks it up immediately.


Usage

ddx [file]

Supported file types: .pdf, .docx, .xlsx, .pptx, .html, .rtf, .jpg, .txt

Examples

ddx report.pdf
ddx notes.docx | grep "keyword"
ddx data.xlsx > output.txt

Filters

Apply one or more named filters with --filter (repeatable, applied in order):

Filter What it does
pii Redacts emails, phone numbers, SSNs, credit card numbers
urls Redacts URLs
ip Redacts IPv4 addresses
boilerplate Strips page numbers, copyright lines, "Confidential" stamps
norm Collapses multiple spaces and consecutive blank lines
uniq Removes duplicate lines
ddx report.pdf --filter boilerplate --filter norm
ddx contract.docx --filter pii --filter urls > redacted.txt
ddx logs.txt --filter ip | grep "error"

Node.js

const ddx = require('@tq303/dedox')

const text = ddx('report.pdf')
const redacted = ddx('contract.docx', { filters: ['pii', 'norm'] })

Principles

  • Non-destructive — never modifies the source file
  • No runtime dependencies — single compiled binary
  • Composable — output to stdout, works in pipelines like any Unix tool
  • Fast — no LLM, no network, all local processing
  • Simple — no config needed to get started