Are you the asshole? Of course not!—quantifying LLMs’ sycophancy problem

Measured sycophancy rates on the BrokenMath benchmark. Lower is better. Measured sycophancy rates on the BrokenMath benchmark. Lower is better. Credit: Petrov et al

GPT-5 also showed the best “utility” across the tested models, solving 58 percent of the original problems despite the errors introduced in the modified theorems. Overall, though, LLMs also showed more sycophancy when the original problem proved more difficult to solve, the researchers found.

While hallucinating proofs for false theorems is obviously a big problem, the researchers

→ Continue reading at Ars Technica

Related articles

Comments

Share article

Latest articles