Adversarial GLUE++

Overview

This suite of tests is based on DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models, and an extension of Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models. Adversarial GLUE++ evaluates how robust a Large Language Model (LLM) is to different types of adversarial prompts across a range of tasks. The adversarial prompts in Adversarial GLUE++ are optimized using specific perturbations that are crafted using the conditional probabilities of adversarial candidate labels. The results of these tests offer insights that inform users on the potential risks and unsafe behaviors that a model might exhibit once it is deployed in their application and targeted by advanced adversaries that specifically attempt to misuse GPT-like models in different tasks.

Tasks

MNLI

QNLI

QQP

RTE

SST2