document-chunking

Here are 14 public repositories matching this topic...

GiovanniPasq / chunky

Open-source toolkit for reliable RAG pipelines: convert PDFs to Markdown, clean documents, inspect chunks, compare chunking strategies, and enrich metadata for LLM applications.

Updated Jun 6, 2026
Python

messkan / rag-chunk

Star

A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.

python nlp ia chunking rag vector-search embedding-vectors llm langchain retrieval-augmented-generation text-splitting rag-pipeline document-chunking

Updated Jan 18, 2026
Python

speedyk-005 / chunklet-py

Star

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.

visualization python nlp natural-language-processing chunking code-structure code-chunking sentence-boundary-detection rag chunks-processing chunks-algorithm text-splitting document-chunking

Updated Jun 8, 2026
Python

SStephanJX / Snowflake-RAG-System

Star

Production-ready Snowflake RAG system with type-specific chunking

snowflake embedding rag vector-search retrieval-augmented-generation snowflake-cortex document-chunking resume-processing

Updated Dec 11, 2025
PLpgSQL

davidmoserai / AzureDocumentIntelligenceChunker

Star

A lightweight Python library for metadata-rich document chunking in Retrieval-Augmented Generation (RAG) workflows. It leverages Azure AI Document Intelligence to enhance chunking by retaining hierarchical structure, page numbers, and bounding boxes for seamless integration with PDF viewers.

react python agent azure chunking agents unstructured-data rag production-grade react-pdf-viewer layout-parser llm langchain retrieval-augmented-generation azure-ai-search azure-ai-document-intelligence layout-parsing document-chunking

Updated Jan 11, 2025
Python

southpawriter02 / haiku-protocol

Star

A Controlled Natural Language (CNL) for AI designed to "minify" language and make AI context denser.

python documentation information-extraction developer-tools technical-writing entity-extraction text-compression encoder-decoder cnl controlled-natural-language streamlit llm prompt-engineering context-window semantic-compression document-chunking token-optimization

Updated Feb 14, 2026
Python

ItzikAquaMotek / rag-chunk

Star

📝 Parse, chunk, and evaluate Markdown for RAG pipelines with token-accurate support and flexible strategies for optimal context management.

tree-sitter library ai csharp dotnet chroma ia code-structure embedding-vectors streamlit hybrid-search aisearch semantickernel text-chunking rag-pipeline llama3 document-chunking propositional-models

Updated Jun 11, 2026
Python

kooroshsajadi / retrieval-augmented-generation

Star

This repository provides a fully modular implementation of a Retrieval-Augmented Generation (RAG) pipeline tailored for Italian legal-domain documents.

vectorization reranking rag hybrid-retrieval retrieval-augmented-generation document-chunking

Updated Nov 25, 2025
Python

FoxRav / RL-astradb-

Sponsor

Star

Astra Vector DB on Python-paketti, joka tallentaa dokumentteja DataStax Astra DB -vektoritietokantaan ja suorittaa semanttista hakua.

python nlp law finland embeddings openai semantic-search finlex document-search rag vector-search legal-tech vector-database astradb rag-pipeline document-chunking

Updated Jan 10, 2026
Python

meethardik / QandAUsingLLM

Star

building a CPU-Only "PDF Q&A System" using hugging face, chromaDB vector search, and Python

embedding-models pymupdf sentence-transformers chromadb all-minilm-l6-v2 document-chunking

Updated Jan 3, 2026
Python

choudaryhussainali / Langchain_Learnings

Star

"My complete LangChain learning journey — from basics to advanced RAG, LCEL, LangGraph, LangServe, LangSmith with hands-on code examples."

embeddings chains agents rag retrieval-systems prompt-engineering generative-ai langchain langsmith ai-reasoning vector-databases langserve lcel langgraph ragpipeline document-chunking memory-in-ai llms-integration

Updated Aug 12, 2025
Jupyter Notebook

ahmetguness / doc-chunking-api

Star

FastAPI service for document chunking and sentence-transformer embeddings for RAG, semantic search, and vector database ingestion.

Updated Jun 7, 2026
Python

dkoustubh / KChunker

Star

KChunker is a lightweight, ultra-fast document parsing and chunking engine designed for RAG systems. It intelligently structures native/scanned PDFs, Excel files, Word documents, and email trails by preserving layout hierarchy, extracting tables, and generating dense vector embeddings for local search databases (ChromaDB and FAISS)

python nlp ocr pdf-parser faiss rag vector-database dearpygui chromadb document-chunking

Updated May 22, 2026
Python

alienveryilmaz / RAG-text-splitter-document-chunking-tool

Star

Smart text chunking tool for RAG systems. Splits long texts into sentence-based chunks with ~10%-15% overlap for better context retention. Runs fully in-browser with a clean UI and copyable outputs.

ai splitter chunking rag llm ai-tool text-chunking document-chunking

Updated Dec 12, 2025
HTML

Improve this page

Add a description, image, and links to the document-chunking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the document-chunking topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document-chunking

Here are 14 public repositories matching this topic...

GiovanniPasq / chunky

messkan / rag-chunk

speedyk-005 / chunklet-py

SStephanJX / Snowflake-RAG-System

davidmoserai / AzureDocumentIntelligenceChunker

southpawriter02 / haiku-protocol

ItzikAquaMotek / rag-chunk

kooroshsajadi / retrieval-augmented-generation

FoxRav / RL-astradb-

meethardik / QandAUsingLLM

choudaryhussainali / Langchain_Learnings

ahmetguness / doc-chunking-api

dkoustubh / KChunker

alienveryilmaz / RAG-text-splitter-document-chunking-tool

Improve this page

Add this topic to your repo