I'm Thomas van Dongen. I'm head of AI engineering at Springer Nature and co-founder of Minish, an open-source NLP lab working on efficient models and packages.
| Project | Description | |
|---|---|---|
| model2vec | Distill sentence transformers into static embeddings that are orders of magnitude faster | |
| semhash | Multimodal semantic deduplication, outlier detection, and representative filtering | |
| semble | A code-search MCP/CLI tool for AI agents that drastically reduces token consumption | |
| pyversity | Diversify search & retrieval results to reduce redundancy and improve coverage | |
| vicinity | Fast, lightweight nearest neighbor search with pluggable backends | |
| model2vec-rs | A Rust port of Model2Vec | |
| tokenlearn | Pre-train static embedding models | |
| agentcheck | A Go CLI that audits what an AI agent can access before you run it |





