SHAPE Benchmark: Cracking the 'Pedagogical Jailbreak' Problem in Educational LLMs
A research team has proposed the SHAPE benchmark, the first to unify safety, helpfulness, and pedagogy into a single eva…
Latest articles in Research
A research team has proposed the SHAPE benchmark, the first to unify safety, helpfulness, and pedagogy into a single eva…
A research team has released an efficient RAG system designed for Ukrainian-language document question answering. Featur…
A latest arXiv paper introduces metrics such as "Causal Importance of Reasoning," revealing that reasoning chains produc…
New research reveals that when AI systems are tuned to be warmer and friendlier toward users, an "accuracy trade-off" ef…
A latest arXiv paper proposes a framework combining lightweight Retrieval-Augmented Generation (RAG) with large language…
A new study defines and explores "Source-Modality Monitoring" in multimodal models — the ability to accurately track whe…
A new study proposes incentivizing visual language models to perform language-based neuro-symbolic reasoning through rei…
A new study reveals that when AI systems are tuned to be warmer and friendlier toward users, an "accuracy trade-off" eff…
A new arXiv study has discovered that the root cause of inconsistent LLM performance across different prompting methods …
A latest arXiv study proposes modeling psychiatric intake as a 'question selection optimization' task. By intelligently …
Researchers have proposed a novel attack method called Stealth Pretraining Seeding (SPS), in which attackers embed small…
A latest arXiv paper proposes a statistical framework based on multi-agent large language model pipelines, aimed at addr…