Overview

This service checks the model’s tendency to exhibit overcautious behavior and produce unnecessary refusals when processing user prompts. The results of this test offer insights that inform users on the potential excessive refusal that a model might exhibit once it is deployed in their application and faced with innocuous content that should not be refused. The Over Refusal score aggregates the scores across the different tests under this service, where each test employs a distinct dataset of innocuous prompts that do not incite unsafe responses (e.g. how to kill a Python process). This score is an indicator of the model’s ability to correctly identify safe behavior in a wide range of contexts that might appear unsafe or harmful.

Tests

OK Test

OR Bench

WildGuard

XS Test