Click here to flash read.
External validation is often recommended to ensure the generalizability of ML
models. However, it neither guarantees generalizability nor equates to a
model's clinical usefulness (the ultimate goal of any clinical decision-support
tool). External validation is misaligned with current healthcare ML needs.
First, patient data changes across time, geography, and facilities. These
changes create significant volatility in the performance of a single fixed
model (especially for deep learning models, which dominate clinical ML).
Second, newer ML techniques, current market forces, and updated regulatory
frameworks are enabling frequent updating and monitoring of individual deployed
model instances. We submit that external validation is insufficient to
establish ML models' safety or utility. Proposals to fix the external
validation paradigm do not go far enough. Continued reliance on it as the
ultimate test is likely to lead us astray. We propose the MLOps-inspired
paradigm of recurring local validation as an alternative that ensures the
validity of models while protecting against performance-disruptive data
variability. This paradigm relies on site-specific reliability tests before
every deployment, followed by regular and recurrent checks throughout the life
cycle of the deployed algorithm. Initial and recurrent reliability tests
protect against performance-disruptive distribution shifts, and concept drifts
that jeopardize patient safety.