# TP=1: 128 heads, TP=2: 64 heads, TP=4: 32 heads, TP=8: 16 heads - "16q1s1k" # 16 requests, 1k KV cache - "16q1s2k" # 16 requests, 2k KV cache - "16q1s4k" # 16 requests, 4k KV cache # Medium batches ...
Fork 0 Star 0 Code Pull requests0 Actions Projects Security and quality0 Insights Code Pull requests Actions Projects Security and quality Insights Files Expand file tree main mlperf_inference_results ...