Disparate Performance

BackResistance to Misuse

Home/Risks/Liu et al. (2024)/Resistance to Misuse

Disparate Performance

Home/Risks/Liu et al. (2024)/Resistance to Misuse

Disparate Performance

Resistance to Misuse

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment

Liu et al. (2024)

Category

Prohibiting the misuse by malicious attackers to do harm(p. 8)

Entity— Who or what caused the harm

Human

Due to a decision or action made by humans

AI system

Due to a decision or action made by an AI system

Other

Due to some other reason or is ambiguous

Intent— Whether the harm was intentional or accidental

Intentional

Due to an expected outcome from pursuing a goal

Unintentional

Due to an unexpected outcome from pursuing a goal

Other

Without clearly specifying the intentionality

Timing— Whether the risk is pre- or post-deployment

Pre-deployment

Occurring before the AI is deployed

Post-deployment

Occurring after the AI model has been trained and deployed

Other

Without a clearly specified time of occurrence

Sub-categories (4)

Propaganda

LLMs can be leveraged, by malicious users, to proactively generate propaganda information that can facilitate the spreading of a target

4.1 Disinformation, surveillance, and influence at scale

HumanIntentionalPost-deployment

Cyberattack

ability of LLMs to write reasonably good-quality code with extremely low cost and incredible speed, such great assistance can equally facilitate malicious attacks. In particular, malicious hackers can leverage LLMs to assist with performing cyberattacks leveraged by the low cost of LLMs and help with automating the attacks.

4.2 Cyberattacks, weapon development or use, and mass harm

HumanIntentionalPost-deployment

Social-Engineering

psychologically manipulating victims into performing the desired actions for malicious purposes

4.3 Fraud, scams, and targeted manipulation

HumanIntentionalPost-deployment

Copyright

The memorization effect of LLM on training data can enable users to extract certain copyright-protected content that belongs to the LLM’s training data.

6.3 Economic and cultural devaluation of human effort

HumanIntentionalPost-deployment

Other risks from Liu et al. (2024) (34)

Reliability

3.1 False or misleading information

AI systemUnintentionalPost-deployment

Reliability > Misinformation

3.1 False or misleading information

AI systemUnintentionalPost-deployment

Reliability > Hallucination

3.1 False or misleading information

AI systemUnintentionalPost-deployment

Reliability > Inconsistency

7.3 Lack of capability or robustness

AI systemUnintentionalPost-deployment

Reliability > Miscalibration

3.1 False or misleading information

AI systemUnintentionalPost-deployment

Reliability > Sychopancy

3.1 False or misleading information

AI systemIntentionalPost-deployment

View all 34 risks from this paper →