quantization - AI News

Quantization Powers On-Device AI Transformers

2026-06-03 research 👁 7

New quantization techniques enable large transformer models to run efficiently on mobile devices, reducing latency and e…

2026-05-18 tutorial 👁 23

New tutorial demonstrates compressing instruction-tuned LLMs using llmcompressor. Compare FP8, GPTQ, and SmoothQuant for…

2026-05-06 research 👁 23

Apple's ML team reveals techniques to compress large language models below 1-bit precision, enabling powerful AI on iPho…

2026-05-05 tutorial 👁 25

A practical guide to reducing LLM inference costs by up to 80% using quantization and distillation techniques without sa…

2026-05-05 research 👁 19

Microsoft Research introduces BitNet b2, pushing extreme quantization to slash LLM memory and compute costs while preser…

2026-05-03 tutorial 👁 23

A practical guide to calculating exact GPU memory needs before deploying large language models locally.