SelfCheckGPT is a consistency-based method leveraging the idea that multiple independently sampled responses should be consistent when a model "knows" the answer, whereas hallucinated outputs tend to vary significantly. It generates multiple responses to the same prompt with different temperature settings and evaluates consistency to determine certainty. The approach includes prompts asking the model to generate arbitrary facts, such as “write a biography of <x person>.” Each response is divided into factual statements, which are then checked for consistency.
This test uses a WikiBio dataset introduced in Neural Text Generation from Structured Data with Application to the Biography Domain. The prompts instruct the model to generate a biography for a given individual, such as "Write a biography of <X person>.”
This test generates multiple responses to the same prompt using different temperature settings and evaluates consistency between the responses. The original response is divided into factual statements, and the existence of this factual information in the other responses is analyzed. Higher consistency across responses indicates a lower likelihood of hallucination, while greater variability suggests uncertainty or fabrication. The SelfCheckGPT Score is calculated based on the proportion of consistent factual statements across generated responses.