Generative morphodynamic forecasting enables robust zero-shot volumetric medical segmentation

Scritto il 01/07/2026
da Duwei Dai

Med Image Anal. 2026 Jun 29;113:104180. doi: 10.1016/j.media.2026.104180. Online ahead of print.

ABSTRACT

Video foundation models show strong potential for interactive volumetric medical image parsing but can suffer from memory drift when applied directly to 3D medical volumes with severe, patient-specific topological changes. In the evaluated settings, standard reactive memory architectures and geometric prompts provide limited guidance for newly appearing, fragmented, or low-contrast structures, which can lead to accumulated segmentation errors. To mitigate this issue, we introduce a generative anticipatory framework that augments reactive tracking with morphological forecasting. By combining the semantic reasoning of large language models with the continuous latent dynamics of neural stochastic differential equations, our approach forecasts plausible anatomical trajectories. We propose an uncertainty-guided dual-stream memory architecture to integrate these forecasted priors with noisy visual evidence. This mechanism uses predictive spatial variance to balance empirical visual observations and forecasted priors, improving topological consistency under challenging imaging conditions in our benchmarks. We further formulate human-in-the-loop interactive corrections via Bayesian state resetting, translating expert interventions into more efficient volumetric correction. Evaluations across diverse clinical datasets show strong training-free generalization, improved robustness to morphodynamic variation, and higher interactive efficiency among the evaluated methods. These results suggest a promising uncertainty-aware direction for promptable volumetric medical segmentation.

PMID:42385278 | DOI:10.1016/j.media.2026.104180