Disimpy: A massively parallel Monte Carlo simulator for generating diffusion-weighted MRI data in Python

Summary Disimpy is a simulator for generating diffusion-weighted magnetic resonance imaging (dMRI) data that is useful in the development and validation of new methods for data acquisition and analysis. Diffusion of water is modelled as an ensemble of random walkers whose trajectories are generated on an Nvidia (Nvidia Corporation, Santa Clara, California, United States) CUDA-capable (Nickolls, Buck, Garland, & Skadron, 2008) graphical processing unit (GPU). The massive parallelization results in a significant performance gain, enabling simulation experiments to be performed on standard laptop and desktop computers. Disimpy is written in Python (Python Software Foundation), making its source code very approachable and easily extensible.


Statement of need
Since the diffusion of water in biological tissues is restricted by microscopic obstacles such as cell organelles, myelin, and macromolecules, dMRI enables the study of tissue microstructure in vivo by probing the displacements of water molecules (Behrens & Johansen-Berg, 2009) . It has become a standard tool in neuroscience (Assaf, Johansen-Berg, & Thiebaut de Schotten, 2019) , and a large number of data acquisition and analysis methods have been developed to tackle the difficult inverse problem of inferring microstructural properties of tissue from the dMRI signal (Novikov, Fieremans, Jespersen, & Kiselev, 2019) .
Simulations have played an important role in the development of the field because they do not require the use of expensive scanner time and they provide a powerful tool for investigating the accuracy and precision of new methods, e.g., (Tournier, Calamante, & Connelly, 2007). Generally, dMRI simulations are based on modelling diffusion inside some geometry to obtain a solution to the diffusion equation, e.g., (Ianuş, Alexander, & Drobnjak, 2016;Li et al., 2019), or modelling diffusion using a more generalizable Monte Carlo approach, e.g., (Hall & Alexander, 2009;Rafael-Patino et al., 2020). The Monte Carlo method enables the use of complex and realistic tissue microstructure models, e.g., (Callaghan, Alexander, Palombo, & Zhang, 2020), and can be significantly accelerated using GPU computing, e.g., (Nguyen, Hernández-Garzón, & Valette, 2018).
Here, we present Disimpy, a GPU-accelerated dMRI simulator that enables a large amount of synthetic data to be generated on standard desktop and laptop computers without needing to access high performance computing clusters. Disimpy is written in Python, making its source code very approachable to researchers with no prior experience in GPU computing.

Features
Disimpy uses efficient numerical methods from Numpy (Van der Walt, Colbert, & Varoquaux, 2011) and Scipy (Virtanen et al., 2020) . Numba (Lam, Pitrou, & Seibert, 2015) is used to compile Python code into CUDA kernels and device functions (Nickolls et al., 2008) which are executed on the GPU. The random walker steps are generated in a massively parallel fashion on individual threads of one-dimensional CUDA blocks, resulting in a performance gain of over an order of magnitude when compared to Camino (Hall & Alexander, 2009) , a popular dMRI simulator written in Java (Figure 1). Given that random walker Monte Carlo dMRI simulations require at least 10 4 random walkers for sufficient convergence (Hall & Alexander, 2009) , it is important that Disimpy's runtime does not linearly depend on the number of random walkers until the number of walkers is in the thousands or tens of thousands, depending on the GPU.
Disimpy supports arbitrary diffusion encoding gradient sequences, such as those used in conventional pulsed field gradient experiments (Stejskal & Tanner, 1965) as well as the recently developed q-space trajectory encoding (Eriksson, Lasic, & Topgaard, 2013;Sjölund et al., 2015) . Useful helper functions for generating and manipulating gradient arrays are provided. Synthetic data from multiple gradient encoding schemes can be generated from the same simulation.

Signal generation
Signal generation in Disimpy follows the framework established in (Hall & Alexander, 2009)  In a dMRI experiment, the target nuclei are exposed to time-dependent magnetic field gradients which render the signal sensitive to diffusion. During the experiment, the spin of a nucleus experiences a path-dependent phase shift given by where γ is the gyromagnetic ratio of the nucleus, B 0 is the static main magnetic field of the scanner, G(t) is the diffusion encoding gradient, and r(t) is the location of the nucleus. G and B 0 change sign after the application of the refocusing pulse.
An imaging voxel contains an ensemble of nuclei for which the total signal is given by where P is the spin phase distribution and S 0 is the signal without diffusion-weighting while keeping other imaging parameters unchanged.
In Disimpy, diffusion is modelled as a three-dimensional random walk over discrete time. The steps, which are randomly sampled from a uniform distribution over the surface of a sphere using the xoroshiro128+ pseudorandom number generator (Blackman & Vigna, 2018), have a fixed length where D is the diffusion coefficient and dt is the duration of a time step. At every time point in the simulation, each walker accumulates phase given by At the end of the simulated dynamics, the normalized diffusion-weighted signal is calculated as the sum of the real parts of signals from all random walkers where N is the number of random walkers.
The initial positions of the random walkers are drawn from a uniform distribution across the diffusion environment. When a random walker collides with a restricting barrier, it is elastically reflected off the collision point in such a way that the random walker's total path length during dt is equal to l. Figure 1: Performance comparison between Disimpy and Camino, a popular dMRI simulator that runs single-threaded on the CPU. The comparison was performed on a desktop computer with an Intel Xeon E5-1620 v3 3.50 GHz x 8 CPU and an Nvidia Quadro K620 GPU. The simulations were performed using a mesh consisting of 10 4 triangles, shown in Figure 2.  (Gyori, Hall, Clark, Alexander, & Kaden, 2020) . (B) Example trajectories of 100 random walkers whose initial positions were randomly positioned inside the spheres. Some spheres contain more than one walker. (C) Example trajectories of 100 random walkers outside the spheres.