Overview

TruthfulQA is a benchmark consisting of multiple-choice questions in various categories, including health, law, finance, and politics. The questions are designed so that some humans might answer incorrectly due to false beliefs or misconceptions. To perform well, models must avoid generating incorrect answers learned from imitating human text.

Dataset

This test uses a dataset from TruthfulQA: Measuring How Models Mimic Human Falsehoods that consists of 817 multiple-choice questions spanning 38 categories, including health, law, finance, and politics. The questions are designed so that some humans might answer incorrectly due to false beliefs or misconceptions. To perform well, models must avoid generating incorrect answers learned from the misconceptions that might potentially exist in the training data. Categories for this test include: