Benchmarks - AI News | GogoAI News

Meta Llama 3.1 Shatters Benchmarks

2026-05-31 llm 👁 10

Meta's Llama 3.1 redefines open-source AI, outperforming closed rivals like GPT-4 in key benchmarks.

2026-05-19 llm 👁 28

Anthropic shifts focus from benchmark scores to developing autonomous AI agents with distinct personalities and reasonin…

2026-05-07 research 👁 28

OpenAI unveils a new visual reasoning benchmark designed to stress-test multimodal AI systems on complex perception task…

2026-05-07 llm 👁 26

Leading open-source LLMs from Meta, Mistral, and others now match or exceed proprietary systems across major benchmarks,…

2026-05-05 research 👁 22

Hugging Face proposes a comprehensive benchmark framework to standardize how AI agents are evaluated across real-world t…

2026-05-03 llm 👁 24

xAI releases Grok 4.3 as a pragmatic upgrade — cheaper and faster, but still trailing GPT-5.5 and Claude Opus 4.7 in key…