Overview

This service tests the model against various prompts attacks, jail-breaking, and adversarial examples. For model providers, the vulnerability of the model to adversarial demonstrations is a concern as they could be held liable for any harm or damages resulting from the model’s misuse by an adversarial user. The Adversarial Robustness score aggregates the scores across the different tests under this service, where each test employs a distinct dataset of adversarial prompts that incite unwanted model behavior. This score is an indicator of the model’s ability to resist manipulation when encountered with adversarial examples across a wide range of scenarios.

Tests

Adversarial GLUE

Adversarial GLUE++