fastSF: A parallel code for computing the structure functions of turbulence

Turbulence is a complex phenomenon in fluid dynamics involving nonlinear interactions between multiple scales. Structure functions are popular diagnostics in the study of statistical properties properties of turbulent flows (Frisch, 1995; Kolmogorov, 1941a, 1941b). Some of the earlier works comprising of such analysis are those of Gotoh et al. (2002), Ishihara et al. (2003), and Ishihara & Gotoh (2009) for three-dimensional (3D) hydrodynamic turbulence; Yeung et al. (2005) and Ray et al. (2008) for passive scalar turbulence; Biferale et al. (2004) for two-dimensional (2D) hydrodynamic turbulence; and Kunnen et al. (2008), Kaczorowski & Xia (2013), and Bhattacharya et al. (2019) for turbulent thermal convection. Structure functions are two-point statistical quantities; thus, an accurate computation of these quantities requires averaging over many points. However, incorporation of a large number of points makes the computations very expensive and challenging. Therefore, we require an efficient parallel code for accurate computation of structure functions. In this paper, we describe the design and validation of the results of fastSF, a parallel code to compute the structure functions for a given velocity or scalar field.


Summary
Turbulence is a complex phenomenon in fluid dynamics involving nonlinear interactions between multiple scales. Structure functions are popular diagnostics in the study of statistical properties properties of turbulent flows (Frisch, 1995;Kolmogorov, 1941aKolmogorov, , 1941b. Some of the earlier works comprising of such analysis are those of Gotoh et al. (2002), Ishihara et al. (2003), and Ishihara & Gotoh (2009) for three-dimensional (3D) hydrodynamic turbulence; Yeung et al. (2005) and Ray et al. (2008) for passive scalar turbulence; Biferale et al. (2004) for two-dimensional (2D) hydrodynamic turbulence; and Kunnen et al. (2008), Kaczorowski & Xia (2013), and Bhattacharya et al. (2019) for turbulent thermal convection. Structure functions are two-point statistical quantities; thus, an accurate computation of these quantities requires averaging over many points. However, incorporation of a large number of points makes the computations very expensive and challenging. Therefore, we require an efficient parallel code for accurate computation of structure functions. In this paper, we describe the design and validation of the results of fastSF, a parallel code to compute the structure functions for a given velocity or scalar field.
fastSF is a C++ application for computing the structure functions of scalar and vector fields on Cartesian grids of a 2D or 3D periodic box, stored as HDF5 files. The code employs MPI (Message Passing Interface) parallelization with equal load distribution and vectorization for efficiency on SIMD architectures. The user can select the range of the orders of the structure functions to be computed and the computed structure functions are written to HDF5 files that can be further processed by the user.
We are not aware of any other open soure or commercial packages for computing structure functions; prior studies have relied on in-house software that was never publicly released. As an open source package, fastSF provides a standard high-performance implementation and thus facilitates wider use of structure functions.
fastSF uses MPI (Pacheco, 2011) for parallelism, HDF5 (Koziol & Robinson, 2018) via H5SI (Chatterjee & Chatterjee, 2020) for reading gridded field data and writing structure functions, as well as blitz++ (Veldhuizen, 1998) for vectorized computation and yaml-cpp (Beder, 2019) for reading control parameters. In the next section, we will briefly explain the velocity and scalar structure functions in turbulent flows.

Velocity and scalar structure functions
We denote the velocity and scalar fields using u and θ respectively. The velocity difference between any two points r and r + l is δu = u(r + l) − u(r). The difference in the parallel components of the velocity field along l is δu ∥ = δu ·l. The corresponding difference in the perpendicular component is δu ⊥ = |δu − δu ∥l |. Assuming statistical homogeneity, we define the longitudinal velocity structure functions of order q as and the transverse velocity structure functions of order q as Here, ⟨·⟩ denotes spatial averaging. Similarly, we can define the scalar structure functions for the scalar field as For isotropic turbulence (in addition to being homogeneous), the structure functions become functions of l, where l = |l|. The second-order velocity structure function S u ∥ q (l) provides an estimate for the energy in the eddies of size l or less (Davidson, 2004). For 3D incompressible hydrodynamic turbulence with homegeneity and isotropy, the thirdorder longitudinal velocity structure function in the inertial range (scales lying between the large-scale forcing regime and the small-scale dissipation regime) is given by (Frisch, 1995;Kolmogorov, 1941aKolmogorov, , 1941b where ϵ is the viscous dissipation rate. For an arbitrary order q, She & Leveque (1994) proposed that the longitudinal structure functions scale as S u ∥ q (l) ∼ ζ q , where the exponent ζ q is given by Figure~1 exhibits the plots of the negative of the normalized 3rd, 5th, and 7th-order longitudinal velocity structure functions computed using the simulation data of 3D hydrodynamic turbulence . The structure functions are normalized by (ϵl) ζq , where ζ q is given by Eq. (5). In the inertial range (0.2 < l < 0.7), the normalized third-order longitudinal velocity structure function is fairly close to 4/5 (represented by dashed line), consistent with Kolmogorov's theory. Moreover, the normalized fifth and seventh-order structure functions show a plateau for the same range of l, thus exhibiting consistency with She-Leveque's model.
In the next section, we provide a brief description of the code.

Design of the Code
In this section, we present a sketch of the structure function computation for the velocity structure functions. We employ vectorization and loops over l, thus requiring three loops for 3D fields and two loops for 2D fields. In the following, we provide the algorithm for structure function computation for a 2D velocity field.

Pseudo-code
Data: Velocity field u in domain (L x , L z ); number of processors P .

Procedure:
• Divide l's among various processors. The process of data division among the processors has been described later in this section.
• For every processor: -for l = (l x , l z ) assigned to the processor: * Compute δu(l x , l z ) by taking the difference between two points with the same indices in pink and green subdomains as shown in Fig. 2. This feature enables vectorized subtraction operation. * δu ∥ (l x , l z ) = δu ·l (Vectorized). * δu ⊥ (l x , l z ) = |δu − δu ∥l | (Vectorized). * for order q: , q, l x , and l z to the root process.
• The root process stores S u ∥ q (l x , l z ) and S u ⊥ q (l x , l z ). Since S u q (l) is important for intermediate scales (inertial range) only, we vary l upto half the domain size, that is, upto (L x /2, L z /2), to save computational cost. The l's are divided among MPI processors along x and z directions. Each MPI processor computes the structure functions for the points assigned to it and has access to the entire input data. After computing the structure function for a given l, each processor communicates the result to the root process, which stores the S u ∥ q (l) and S u ⊥ q (l) arrays. It is clear from Fig. 2 that the sizes of the pink or green subdomains are (L x − l x )(L z − l z ), which are function of l's. This function decreases with increasing l leading to larger computational costs for small l and less cost of larger l. Hence, a straightforward division of the domain among the processors along x and z directions will lead to a load imbalance. Therefore, we assign both large and small l's to each processor to achieve equal load distribution. We illustrate the above idea using the following example. We need to compute the structure functions for l ranging from 0 to 7. We divide the task among four processors, with two l's assigned to each processor. The following distribution of Similarly, if two processors are used, then the following distribution results in load balance.

• Stop
Processor 0: l = {0, 7, 2, 5}, This idea of load distribution has been implemented in our program and has been extended to higher dimensions.
Note that for 2D, l x > 0, but l z can take both positive and negative values. However, for isotropic turbulence, the structure functions for +l z and −l z are statistically equal. Therefore, in our computations, we restrict to l x > 0, l z > 0. For anisotropic turbulence, not discussed here, the structure functions will depend on (l x , l z ) rather than l; our code will be extended to such systems in future.
For 3D turbulence, the structure functions will depend on (l x , l y , l z ). We divide the tasks among processors over l x and l y as done above for 2D turbulence. The aforementioned algorithm can be easily extended to the 3D case. We employ a similar method for the computation of scalar structure functions as well.
In the next section, we discuss the scaling of our code.

Scaling of fastSF
fastSF is scalable over many processors due to vectorization and equal load distribution. We demonstrate the scaling of fastSF for the third-order longitudinal structure function for an idealized velocity field on a 128 3 grid. For our computation we employ a maximum of 1024 cores. We take the velocity field as We perform four runs on a Cray XC40 system (Shaheen II of KAUST) for this problem using a total of 16, 64, 256, and 1024 cores. We used 16 cores per node for each run. In Fig. 3, we plot the inverse of time taken in seconds versus the number of cores. The best fit curve for these data points yields T −1 ∼ p 0.986±0.002 , Thus, the data-points follow T −1 ∼ p curve to a good approximation. Hence, we conclude that our code exhibits good scaling.

Conclusions
This paper provides a brief description of fastSF, an efficient parallel C++ code that computes structure functions for given velocity and scalar fields. This code is shown to be scalable over many processors. An earlier version of the code was used by Bhattacharya et al. (2019) for analyzing the structure functions of turbulent convection. We believe that fastSF will be useful to turbulence community as it facilitates wider use of structure functions.