Understanding LLM Inference

Characterization of GPU-based Inference for Reasoning-Centric LLMs (Micron, Argonne)

Researchers from Micron Technology and Argonne National Laboratory have released “Understanding Inference Scaling for LLMs: ...

SDxCentral

AI inference crisis: Google engineers on why network latency and memory trump compute

Google researchers have warned that large language model (LLM) inference is hitting a wall amid fundamental problems with memory and networking problems, not compute. In a paper authored by ...

NextBigFuture

Defeating Nondeterminism in LLM Inference by Thinking Machines

A research article by Horace He and the Thinking Machines Lab (X-OpenAI CTO Mira Murati founded) addresses a long-standing issue in large language models (LLMs). Even with greedy decoding bu setting ...

VentureBeat

Large language model expands natural language understanding, moves beyond English

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now One of the primary use cases for artificial ...

Semiconductor Engineering

LLM Inference On CPUs (Intel)

“Large language models (LLMs) have demonstrated remarkable performance and tremendous potential across a wide range of tasks. However, deploying these models has been challenging due to the ...

InfoQ

Cactus v1: Cross-Platform LLM Inference on Mobile with Zero Latency and Full Privacy

A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...

TweakTown

Dell PowerEdge XE9712: NVIDIA GB200 NVL72-based AI GPU cluster for LLM training, inference

Dell has just unleashed its new PowerEdge XE9712 with NVIDIA GB200 NVL72 AI servers, with 30x faster real-time LLM performance over the H100 AI GPU. Dell Technologies' new AI Factory with NVIDIA sees ...

MUO on MSN

Local LLM setup: how to use RAG and an embedding model to stop wasting context

Local LLMs degrade fast when context fills up. An embedding model and RAG pipeline fixes that — and runs entirely on your ...

VentureBeat

Fine-tuning vs. in-context learning: New research guides better LLM customization for real-world tasks

Two popular approaches for customizing large language models (LLMs) for downstream tasks are fine-tuning and in-context learning (ICL). In a recent study, researchers at Google DeepMind and Stanford ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results