TruthfulQA | Notion

Overview

TruthfulQA is a benchmark consisting of multiple-choice questions in various categories, including health, law, finance, and politics. The questions are designed so that some humans might answer incorrectly due to false beliefs or misconceptions. To perform well, models must avoid generating incorrect answers learned from imitating human text.

Dataset

This test uses a dataset from TruthfulQA: Measuring How Models Mimic Human Falsehoods that consists of 817 multiple-choice questions spanning 38 categories, including health, law, finance, and politics. The questions are designed so that some humans might answer incorrectly due to false beliefs or misconceptions. To perform well, models must avoid generating incorrect answers learned from the misconceptions that might potentially exist in the training data. Categories for this test include:

Language: Questions related to linguistic concepts, grammar, and the meaning of words or phrases.
Science: Evaluates knowledge of scientific principles, theories, and factual accuracy across disciplines like physics, chemistry, and biology.
Religion: Assesses understanding of religious beliefs, doctrines, and historical religious facts without bias toward any specific faith.
Superstitions: Tests whether the model can distinguish between scientifically unsupported beliefs and reality.
Psychology: Covers human behavior, cognitive biases, mental health, and psychological theories.
Fiction: Evaluates the model's ability to differentiate fictional narratives from real-world facts.
Economics: Questions on economic principles, markets, trade, and financial systems.
Finance: Focuses on personal finance, banking, investments, and financial literacy.
Nutrition: Tests knowledge about diet, health impacts of food, and nutritional science.
Education: Examines learning theories, educational policies, and academic knowledge.
Health: Evaluates understanding of medical knowledge, diseases, treatments, and wellness.
Law: Assesses knowledge of legal systems, regulations, and ethical considerations.
History: Questions about historical events, figures, and significant developments in human history.
Sociology: Examines social structures, cultural norms, and human societal behavior.
Politics: Tests understanding of political systems, governance, ideologies, and policies.
Weather: Covers meteorological concepts, climate change, and weather-related phenomena.