Visual robustness and neural alignment in a shared foraging task: The Mouse vs. AI benchmark
Abstract
Visual robustness under real-world conditions remains a critical bottleneck for modern reinforcement learning agents. In contrast, biological systems such as mice show remarkable resilience to environmental changes, maintaining stable performance even under degraded visual input with minimal exposure. Inspired by this gap, we established the Mouse vs. AI: Robust Foraging Competition, a novel bioinspired visual robustness and neural alignment benchmark in which agents and mice perform the same naturalistic 3D foraging task. The benchmark consists of two tracks: (1) a robustness track evaluating generalization to unseen visual perturbations, and (2) a neural alignment track measuring how well model representations predict large-scale neural recordings from mouse visual cortex. We provide a public benchmark suite comprising a Unity-based environment for training and evaluating virtual agents, centralized evaluation protocols, and an extensive neural dataset with approximately 178 minutes of multimodal recordings of behavior, visual input, and neural activity from more than 50,000 neurons in mice performing the task. As part of the NeurIPS competition, we received submissions from 22 teams proposing diverse architectures and training strategies. By embedding evaluation in a shared sensorimotor task performed by both biological and artificial agents, the benchmark enables a class of evaluative claims not supported by existing robustness or neural alignment benchmarks in isolation: namely, whether robust, brain-like visual representations emerge spontaneously from behavior-driven learning, and whether robustness and neural alignment reinforce or trade off against one another. Access to the dataset as well as the benchmarking infrastructure will remain online at https://robustforaging.github.io.