Skip to main content
BackModel outputs inconsistent with chain-of-thought reasoning
Home/Risks/Gipiškis2024/Model outputs inconsistent with chain-of-thought reasoning

Model outputs inconsistent with chain-of-thought reasoning

Sub-category
Risk Domain

Challenges in understanding or explaining the decision-making processes of AI systems, which can lead to mistrust, difficulty in enforcing compliance standards or holding relevant actors accountable for harms, and the inability to identify and correct errors.

"Chain-of-thought reasoning is sometimes employed to get a better understanding of the model’s output, where it encourages transparent reasoning in text form. However, in some cases, this reasoning is not consistent with the final answer given by the AI model, and as such does not give sufficient transparency [113]."(p. 25)

Part of Model Evaluations (Interpretability/Explainability)

Other risks from Gipiškis2024 (144)