Deep Learning with Simulated Data

Deep learning refers to the approximation of high-dimensional non-linear input-output relationships observed in a large corpus of training data by neural networks with multiple iterated layers. It has been applied with good results to challenging problems in medical imaging, e.g. registration, segmentation and anatomical labelling, annotation and classification, super-resolution imaging, quality assessment, and image synthesis.

Many deep learning approaches are supervised and large amounts of high-quality annotated training data is required to generate the models. Manual annotation of medical images by experts is time-consuming and the resulting annotations may suffer from inter-observer variability. We develop techniques for training deep learning models from simulation data. The idea is that computer simulations of anatomy, physiology, and image formation are used to generate simulated medical images (with annotations) on virtual patients and train deep learning models.

The four components of deep learning with simulated data are:

1. Anatomical model of the organ or structure of interest. This can be computational phantom or a statistical shape model based on data from large-scale population imaging;

2. Computational physiology model can be wide variety of models: systems of differential or algebraic equations, rule-based models, agent-based models, cellular automata, graph-based models etc.;

3. Image formation model that generates synthetic images for a given virtual anatomy combined with the outputs of the virtual physiology model.

Using only synthetic images for training a deep learning model will lead to over-fitting and poor generalisability to true medical images due to the absence of some physical effects in the simulated images and the lack of richness and noise present in real images. Therefore we explore the concept of learning the domain shift between simulated and true medical images, commonly referred to as learning the domain shift.