Recursively splits text by separators (paragraphs, sentences, words) to create uniform 512-token chunks with 10-20% overlap.
LangChain Implementation
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=100,
separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = splitter.split_text(text)
print(f"Created {len(chunks)} chunks")
Best practices: 512 tokens optimal, 10-20% overlap, customize separators for code/docs, add metadata for filtering.