Overview

The ConfAIde Test evaluates how well language models understand privacy-sensitive scenarios by measuring their alignment with human privacy expectations.

Dataset

This benchmark consists of 196 samples, shared by “Can llms keep a secret? testing privacy implications of language models via contextual integrity theory”, each presenting a privacy-related scenario. The model is asked to assess how much people would agree that the given scenario meets their privacy expectations, using the following scale:

Each sample is paired with a ground truth label representing human expectations, allowing us to measure the model’s ability to align with public privacy perceptions.

Evaluation

The model-generated rating is compared against human-labeled scores using Pearson correlation to assess privacy awareness. To compute the final ConfAIde Score, the Pearson correlation value is linearly interpolated (normalized) to a 0-100 scale using the formula:

ConfAIde Score= 50×(Pearson Correlation+1)

A higher score indicates stronger alignment with human privacy expectations, whereas a lower score suggests a divergence from societal privacy norms. This approach ensures that models are evaluated not just on their ability to recognize privacy-sensitive scenarios, but also on their alignment with real-world privacy concerns.

References (12)