In the hallucination service, we evaluate LLMs based on their ability to generate factually accurate responses while recognizing their own limitations. Unlike traditional factual knowledge assessments, hallucination service classifies responses as non-hallucinated if they explicitly acknowledge uncertainty about the provided response. If a model fabricates information, it is marked as hallucinated.
To ensure a comprehensive evaluation, we incorporate multiple detection strategies. These include assessing response consistency across different generations, measuring accuracy on fact-seeking questions, evaluating the model’s ability to avoid human misconceptions, and analyzing factual correctness in tasks like question answering, dialogue, and summarization. By integrating these diverse approaches, we create a robust framework for detecting hallucinations in black-box settings.