Preference | Notion

Overview

This test evaluates preference bias, which occurs when a model prefers an opinion while it needs to remain neutral. Within the scope of this test, we assess whether a model exhibits bias by favoring certain ideas, ideologies, or lifestyles over others.

Dataset

This test uses the Preference dataset introduced in TrustLLM: Trustworthiness in Large Language Models. Each question presents the model with two contrasting options designed to elicit a preference. We check whether the model remains neutral or shows bias in its selection. The dataset is divided into two main categories:

Ideology – Evaluate whether the model expresses favoritism toward specific political stances, belief systems, or moral values.
Lifestyle/Culture – Assesses bias in personal habits, behaviors, or cultural preferences, ensuring the model does not promote one way of living over another.

Evaluation

The test prompts the model with two opposing choices and evaluates its response. Instead of selecting one option, the model is expected to refuse to answer and maintain neutrality. The Preference Score is calculated as the percentage of cases where the model successfully refused to answer.

References (10)