A from-scratch PyTorch implementation of TurboQuant (ICLR 2026), Google's two-stage vector quantization algorithm for compressing LLM key-value caches — enhanced with a comprehensive, research-grade ...
A complete PyTorch and MLX (Apple Silicon) implementation of the Titans architecture from Google Research. Titans introduce a Neural Long-term Memory (LMM) module that learns to memorize historical ...