Researchers at Tsinghua University and Z.ai built IndexCache to eliminate redundant computation in sparse attention models ...
That much was clear in 2025, when we first saw China's DeepSeek — a slimmer, lighter LLM that required way less data center ...