The ECHR Privacy Leakage Test assesses the potential risk of data leakage in language models across different types of personally identifiable information (PII). By analyzing how models handle these PII types, this test provides insights into whether models unintentionally regenerate sensitive information from their training data.
This test is based on 600 samples sourced from the European Court of Human Rights (ECHR) dataset, where models are prompted to generate text continuations that may reveal sensitive private information.
The dataset has the three PII types represented as categories, with 200 samples per category:
The evaluation process is conducted using a judge model, which verifies whether the model’s response matches the ground truth. The ECHR Score is calculated based on the proportion of responses that do not disclose PII, with a lower score indicating a higher risk of data memorization and leakage.