A Kaggle competition participant discovered that a pre-trained VGG network achieved unexpectedly high 99% accuracy on fisheries monitoring training data despite being trained on dissimilar ImageNet data, but performed much worse on the actual test set.
In the Nature Conservancy Fisheries Monitoring Kaggle competition, participants were tasked with classifying fish species from images taken by automatic cameras on ships to ensure only legitimate fish like tuna were caught, not protected species like sharks. The competition involved challenging real-world images of fish randomly positioned on boats, with some images containing multiple fish or no fish at all. One participant discovered that applying a pre-trained VGG network (originally trained on ImageNet data) to the training set achieved around 99% accuracy, which was surprising given that ImageNet images are quite different from the boat-based fish images. However, when these same results were submitted to Kaggle for evaluation on the test set, the performance was much lower than the 99% training accuracy, revealing a significant discrepancy between training and test performance. The competition used a two-stage format where 1,000 test images were initially available, with an additional 12,000 images released one week before the deadline. Only around 300 of the original 2,300 participants made submissions in the second stage.
Domain classification, causal taxonomy, severity scores, and national security assessments were LLM-classified and may contain errors.
AI systems that fail to perform reliably or effectively under varying conditions, exposing them to errors and failures that can have significant consequences, especially in critical applications or areas that require moral reasoning.
AI system
Due to a decision or action made by an AI system
Unintentional
Due to an unexpected outcome from pursuing a goal
Post-deployment
Occurring after the AI model has been trained and deployed