LLM benchmarks - AI News

AI Search Agents Fail Live Web Tests

2026-05-31 research 👁 15

New research reveals leading AI models rely on training data rather than live web browsing, exposing critical reliabilit…

2026-05-07 research 👁 17

A new benchmark called ProgramBench challenges language models to reconstruct entire programs from specifications, revea…

2026-05-06 llm 👁 21

Hugging Face releases open-weight reasoning models that match proprietary systems from OpenAI and Google on key benchmar…

2026-05-03 research 👁 16

A bizarre thought experiment from China's Zhihu platform reveals both the power and limits of AI-driven scientific reaso…