The National Security Agency (NSA) has officially begun testing a specialized version of Anthropic’s latest large language ...
Chinese artificial intelligence developer DeepSeek today released a new series of open-source large language models. V4, as ...
April 30, 2026 expert reaction to study evaluating performance of a large language model on the reasoning tasks of a physician . A study published in Science evaluates the perform ...
A Cairo-based artificial intelligence startup has released Horus 1.0-4B, a fully open-source large language model built in Egypt that outperforms several ...
The latest Chinese model trails U.S. competitors on benchmarks. But it may not have to win the performance race to reshape ...
A new benchmark released by Simbian is challenging one of the most widely held assumptions in artificial intelligence: that the same models capable of finding vulnerabilities can also defend against ...
Ultimately, hallucinations are a systemic feature of today’s LLMs. Unfortunately, they’re not an anomaly. But with the right ...
AgentClinic is a multimodal benchmark that tests clinical AI agents in simulated, dialogue-driven diagnostic settings rather ...
A wave of 2026 developments — from Anthropic's Model Context Protocol to Microsoft's GraphRAG concept and rigorous benchmarks like Terminal-Bench 2.0 and SWE-Bench Pro — is redefining how AI teams ...
ShengShu Technology today announces Motubrain, a World Action Model that replaces multiple task-specific systems with a single, unified model that functions as a robotic brain for the physical world.
In one case, a patient came into the emergency department with a pulmonary embolism. The condition initially improved with ...
A Nature-published study by an international research team has found that current AI benchmarks fail to accurately measure large language models’ core capabilities. Existing tests often mix skills ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results