Large Language Models Benchmarks

DeepSeek open-sources V4 large language model series

Chinese artificial intelligence developer DeepSeek today released a new series of open-source large language models. V4, as ...

Science Media Centre

expert reaction to study evaluating performance of a large language model on the reasoning tasks of a physician

April 30, 2026 expert reaction to study evaluating performance of a large language model on the reasoning tasks of a physician . A study published in Science evaluates the perform ...

6don MSN

DeepSeek previews new AI model that ‘closes the gap’ with frontier models

DeepSeek says both models are more efficient and performant than DeepSeek V3.2 due to architectural improvements, and have ...

Hosted on MSN

New standards and benchmarks reshape 2026 LLM choices

A wave of 2026 developments — from Anthropic's Model Context Protocol to Microsoft's GraphRAG concept and rigorous benchmarks like Terminal-Bench 2.0 and SWE-Bench Pro — is redefining how AI teams ...

Hosted on MSN

AI model tops doctors in clinical reasoning tests

A Harvard-led study found that OpenAI’s o1-preview large language model matched or exceeded hundreds of physicians in six clinical reasoning experiments, particularly excelling in emergency triage.

Unite.AI

Simbian Launches Cyber Defense Benchmark, Reveals Major Gap in AI Security Capabilities

A new benchmark released by Simbian is challenging one of the most widely held assumptions in artificial intelligence: that the same models capable of finding vulnerabilities can also defend against ...

Renal & Urology News

Large Language Models Perform Poorly for Differential Diagnosis

Differential diagnosis was less accurate than diagnostic testing, but final diagnosis and management were more accurate.

14h

The AI industry’s massive bet on transformer models may not be enough for true AGI

As Big Tech pours unprecedented resources into scaling large language models, critics argue that transformer-based systems ...

TMCnet

ShengShu Technology Unveils World Action Model "Motubrain": One Brain, Infinite Possibilities for Robotic Intelligence

ShengShu Technology today announces Motubrain, a World Action Model that replaces multiple task-specific systems with a single, unified model that functions as a robotic brain for the physical world.

TMCnet

Hippocratic AI Launches Polaris 5.0: The First Evidence- based AI for Healthcare Proven to Outperform Every Frontier Model on Critical Medical Tasks and Safety

Built on more than 180m real patient interactions, validated by U.S.-licensed clinicians and now benchmarked against every leading frontier model, Polaris 5.0 leads safety, compliance and empathy for ...

Don't Default To The Biggest AI Model: Agentic Systems Deserve Better

This isn't about rejecting large models; it's about having the engineering discipline to use smaller, specialized models ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results