Skip to main content
BackTraining-related (Robust overfitting in adversarial training)
Home/Risks/Gipiškis2024/Training-related (Robust overfitting in adversarial training)

Training-related (Robust overfitting in adversarial training)

Sub-category
Risk Domain

AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.

"Adversarial training can be affected by robust overfitting, where the model’s robustness on test data decreases during further training, particularly after the learning rate decay. This issue has been consistently observed across various datasets and algorithms in adversarial training settings [163, 230]. Robust over- fitting can affect the model’s ability to generalize effectively and reduce its resilience to adversarial attacks."

Other risks from Gipiškis2024 (144)