Abstract: This work evaluates the impact of matrix reordering on the performance of sparse matrix-vector multiplication across different multicore CPU platforms. Reordering can enhance performance by ...
In this tutorial, we build an elastic vector database simulator that mirrors how modern RAG systems shard embeddings across distributed storage nodes. We implement consistent hashing with virtual ...
Abstract: Understanding the causes of performance gaps between a portable programming model and a vendor-specific programming model is important for improving performance portability. This paper ...
A complete, educational implementation of Retrieval-Augmented Generation (RAG) using Python, FastAPI, local embeddings, Chroma vector database, and Ollama LLM. This project is designed to teach RAG ...