This page is still being polished. If you have thoughts, please share them via the feedback form.
Data on this page is preliminary and may change. Please do not share or cite these figures publicly.
Cryptographic protections, access controls, and hardware security.
Also in Non-Model
Most existing vulnerabilities in programming languages, deep learning frameworks, and pre-processing tools, aim to hijack control flows. Therefore, control-flow integrity (CFI), which ensures that the control flows follow a predefined set of rules, can prevent the exploitation of these vulnerabilities. However, CFI solutions incur high overheads when applied to large-scale software such as LLMs [320], [321]. To tackle this issue, a low-precision version of CFI was proposed to reduce overheads [322]. Hardware optimizations are proposed to improve the efficiency of CFI [323]. In addition, it is critical to analyze and prevent security accidents in the environments of LLMs developing and deploying. We argue that data provenance analysis tools can be leveraged to forensic security issues [324]–[327] and detect attacks against LLM actively [328]–[330]. The key concept of data provenance revolves around the provenance graph, which is constructed based on audit systems. Specifically, the vertices in the graph represent file descriptors, e.g., files, sockets, and devices. Meanwhile, the edges depict the relationships between these file descriptors, such as system calls.
However, conducting data provenance on LLM-based systems remains a challenging task [324], [327], [338]. We identify several issues that contribute to the challenges of conducting data provenance on LLM-based systems: • Computational Resources. LLMs are computationally intensive models that require significant processing power and memory resources. Capturing and storing detailed data provenance information for every input and output can result in a substantial increase in computational overheads. • Storage Requirements. LLMs generate a large volume of data, including intermediate representations, attention weights, and gradients. Storing this data for provenance purposes can result in substantial storage requirements. • Latency and Response Time. Collecting detailed data provenance information in real-time can introduce additional latency and impact the overall response time of LLM-based systems. This overhead can be particularly challenging for real-time processing, such as language translation services. • Privacy and Security. LLMs often handle sensitive or confidential data, e.g., personal information or proprietary business data. Capturing and maintaining data provenance raises concerns about privacy and security, as such information increases attack surfaces for breaches or unauthorized access. • Model Complexity and Interpretability. LLMs, especially advanced architectures like GPT-3, are highly complex models. Tracing and understanding the provenance of specific model outputs or decisions can be challenging due to the complexity and lack of interpretability of these models.
Reasoning
Secure development practices protecting software toolchain through control-flow integrity and forensic provenance analysis.
Mitigation in Input Modules
Mitigating the threat posed by the input module presents a significant challenge for LLM developers due to the diversity of the harmful inputs and adversarial prompts [209], [210].
1.2.1 Guardrails & FilteringMitigation in Input Modules > Defensive Prompt Design
Directly modifying the input prompts is a viable approach to steer the behavior of the model and foster the generation of responsible outputs. This method integrates contextual information or constraints in the prompts to provide background knowledge and guidelines while generating the output [22].
1.2.1 Guardrails & FilteringMitigation in Input Modules > Malicious Prompt Detection
Different from the methods of designing defensive prompts to preprocess the input, the malicious prompt detection method aims to detect and filter out the harmful prompts through the input safeguard
1.2.1 Guardrails & FilteringMitigation in Language Models
This section delves into mitigating risks associated with models, encompassing privacy preservation, detoxification and debiasing, mitigation of hallucinations, and defenses against model attacks.
1.1 ModelMitigation in Language Models > Privacy Preserving
Privacy leakage is a crucial risk of LLMs, since the powerful memorization and association capabilities of LLMs raise the risk of revealing private information within the training data. Researchers are devoted to designing privacypreserving frameworks in LLMs [226], [227], aiming to safeguard sensitive PII from possible disclosure during humanmachine conservation
1.1 ModelMitigation in Language Models > Detoxifying and Debiasing
To reduce the toxicity and bias of LLMs, prior efforts mainly focus on enhancing the quality of training data and conducting safety training.
1.1 ModelRisk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Cui, Tianyu; Wang, Yanling; Fu, Chuanpu; Xiao, Yong; Li, Sijia; Deng, Xinhao; Liu, Yunpeng; Zhang, Qinglin; Qiu, Ziyi; Li, Peiyang; Tan, Zhixing; Xiong, Junwu; Kong, Xinyu; Wen, Zujie; Xu, Ke; Li, Qi (2024)
Despite their impressive capabilities, large lan- guage models (LLMs) have been observed to generate responses that include inaccurate or fabricated information, a phenomenon com- monly known as “hallucination”. In this work, we propose a simple Induce-then-Contrast De- coding (ICD) strategy to alleviate hallucina- tions. We first construct a factually weak LLM by inducing hallucinations from the original LLMs. Then, we penalize these induced hallu- cinations during decoding to enhance the fac- tuality of the generated content. Concretely, we determine the final next-token predictions by amplifying the predictions from the orig- inal model and downplaying the induced un- truthful predictions via contrastive decoding. Experimental results on both discrimination- based and generation-based hallucination eval- uation benchmarks, such as TruthfulQA and FACTSCORE, demonstrate that our proposed ICD methods can effectively enhance the factu- ality of LLMs across various model sizes and families. For example, when equipped with ICD, Llama2-7B-Chat and Mistral-7B-Instruct achieve performance comparable to ChatGPT and GPT4 on TruthfulQA, respectively.
Other (multiple stages)
Applies across multiple lifecycle stages
Infrastructure Provider
Entity providing compute, platforms, or tooling for AI systems
Manage
Prioritising, responding to, and mitigating AI risks