Experiments on whether disclosing logarithmic scoring rules reduces LLM overconfidence in multiple-choice QA.
calibration hallucination confidence-estimation abstention selective-prediction llm mmlu proper-scoring-rules mmlu-pro simpleqa
-
Updated
Jun 13, 2026 - Python