Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG

199 points by bhavnicksm - 264 Days, 15 Hours ago Hacker News

I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground.

Core features:

- 21MB default install vs 80-171MB alternatives

- 33x faster token chunking than popular alternatives

- Supports multiple chunking strategies: token, word, sentence, and semantic

- Works with all major tokenizers (transformers, tokenizers, tiktoken)

- Zero external dependencies for basic functionality

Technical optimizations:

- Uses tiktoken with multi-threading for faster tokenization

- Implements aggressive caching and precomputation

- Running mean pooling for efficient semantic chunking

- Modular dependency system (install only what you need)

Benchmarks and code: https://github.com/bhavnicksm/chonkie

Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications?