gLBM: A GPU enabled Lattice Boltzmann Method Library

Lattice Boltzmann Methods (LBM) are a class of computational fluid dynamics (CFD) algorithms for simulation. Unlike traditional formulations that simulate fluid dynamics on a macroscopic level with a mesh, the LBM characterizes the problem on a mesoscopic level applied to a grid discretization. LBM solves the fluid density problem with collide and stream (relaxation) processes. This approach has several advantages, including its adaptability to numerous fluid domains (i.e., vapours, liquid droplets), complex boundaries, irregular interior geometries, and parallelization. Traditional CFD methods are limited in the ability to parallelize the algorithm; however, the LBM algorithm discretization can be easily parallelized both for CPUs and GPUs. This enables fast fluid solutions for complex fluid domains. There are limitations associated with the LBM, including high Mach number applications. However, active research is addressing these limitations.


Summary
Lattice Boltzmann Methods (LBM) are a class of computational fluid dynamics (CFD) algorithms for simulation. Unlike traditional formulations that simulate fluid dynamics on a macroscopic level with a mesh, the LBM characterizes the problem on a mesoscopic level applied to a grid discretization. LBM solves the fluid density problem with collide and stream (relaxation) processes. This approach has several advantages, including its adaptability to numerous fluid domains (i.e., vapours, liquid droplets), complex boundaries, irregular interior geometries, and parallelization. Traditional CFD methods are limited in the ability to parallelize the algorithm; however, the LBM algorithm discretization can be easily parallelized both for CPUs and GPUs. This enables fast fluid solutions for complex fluid domains. There are limitations associated with the LBM, including high Mach number applications. However, active research is addressing these limitations.

Statement of need
The gLBM library is the three-dimensional Lattice Boltzmann algorithm implemented using GPUs to accelerate the fluid solver. The library is implemented in C++ and Cuda and is validated using a robust suite of custom verification and validation tools for sustainable community-based use and development. gLBM leverages an easy to use API that is welldocumented to import geometries for analysis using formats supported by ITK (McCormick et al., 2014), the open source Insight Toolkit, and configuration files that define the fluid parameters, grid discretization properties, and simulation parameters. An Apache 2.0 license was selected to support the widest distribution and use of the gLBM library. This stands in contrast to the copyleft licenses of other available libraries ("Lattice Boltzmann (LBM) Simulation Package for GPUs (CUDA, OpenCL)," 2020; "OpenLB -Open Source Lattice Boltzmann Code," 2020; "Palabos," 2020), which limit usability for commercial projects seeking to protect intellectual property.
The LBM algorithm builds on established work by Krüger et al. (2017), He & Luo (1997), Latt et al. (2008), Succi (2001), and Ubertini et al. (2010) to solve for the pressure and flow in the fluid domain. gLBM expands on this work to integrate solutions for wall shear stress and temperature. Initial work using this library has been published, applied to analysis of the upper airways (Rachel B. Clipp et al., 2018;Rachel B. Clipp et al., 2019;Quammen et al., 2016). The combination of GPU implementation, platform independence, permissive licensing, and integrated verification and validation make this library unique when compared to other available libraries of the LBM, some of which include a GPU implementation ("Advanced Simulation Library," 2015; "GPU-LBM-Simulation," 2014; "Lattice Boltzmann (LBM) Simulation Package for GPUs (CUDA, OpenCL)," 2020; "Lbm-Gpu," 2014; "OpenLB -Open Source Lattice Boltzmann Code," 2020; "Palabos," 2020). When investigating these libraries, we found that some had no license stipulations, which defaults to GitLab's most stringent license, or had a license ideal for academics and researchers but placed limitations on commercial use. Therefore, we chose the permissible Apache 2.0 license to ensure academic and commercial freedom and address our needs. Our library is limited to NVIDIA graphics cards due to the CUDA implementation, but does provide cross-platform functionality. Our D3Q19 implementation of the LBM algorithm has been tested in standard geometries, such as channel flow and biomedical applications, such as the nasal airways, and can be used for fluid flow in closed domains at this time. The use of a verification and validation suite provides a means to optimize and update algorithms and easily ensure the integrity of the solution is maintained. This library is ideal for students, researchers, and industry users looking to expand their use of the LBM and will be supported and maintained by Kitware, Inc., leaders in open source software development.

gLBM in Action
The typical LBM algorithm relies on a lattice connectivity rather than a mesh configuration. The lattice connectivity then results in a probability distribution function (PDF) for velocity (Equation 1).
The "stream and collide" algorithm can be described by the time step shift and relaxation that occur for the PDF. The PDF (f i ), for component i is shifted during one time step to the new position, x + c i t; this is the "streaming" step. At each node, the continuum fluid density ρ and velocity u are evaluated as the moments of the PDF. The PDFs for each node in the lattice are then relaxed towards the thermodynamic equilibrium values (Equation 2). This represents the molecular collision or the "collision" step.
where u is the continuum velocity (first moment of the particle velocities), c s is the speed of sound, and ω i are lattice-specific constants.
We applied a constant pressure boundary condition at the inlet and outlet and enforced a zero velocity boundary condition at the walls for all simulations.

Numerical Verification
We verified the LBM implemented in gLBM by comparing the solution in cylindrical and rectangular channels against the analytic solutions. The cylindrical channel of length 0.1 meters and diameter of 0.01 meters was executed for a grid spacing (Δx) of 0.0004, 0.0005, 0.0006, and 0.0007 meters. A rectangular channel with dimensions of 0.08 meters in the x-direction, 0.02 meters in y-direction, and 0.01 meters in the z-direction was analyzed with the same grid spacings. The solutions were executed for 10,000 iterations under an imposed constant pressure gradient. The results for numerical and grid convergence were similar for both geometries, with the results showing an iteration convergence occurring at less than 3,000 iterations and grid convergence evident. The analytical solutions are implemented in the gLBM library for future verification and validation efforts. The comparison between the analytical and the computed solutions for the rectangular channel are shown in Figure 1. For both geometries, we observed an entrance effect and a fast convergence to a fully developed parabolic profile with less than 1% error. This is clearer when investigating the results slice by slice through both channels, as shown in Figure 2.

Automated Verification and Validation
To maintain its validity, we developed a verification and validation (V&V) suite to continually verify any algorithm changes and automatically execute multiple geometries. gLBM includes a verification execution library that is designed to simulate a list of geometries found in a configuration file. The analytical solutions for the cylindrical and rectangular channels are included in the V&V library. The dimensions, grid discretization, and fluid and simulation parameters can be set in the configuration file, which applies to both the LBM and analytical results. This allows for reproducible, easy simulation of the analytic cases with auotmatic error reporting and plotting, which provides a method to continuously verify the numerical solution with the analytic solution during algorithm development and comparison. We automatically calculate the error at designated slices (axial locations) along the geometry and list these errors in a table for easy evaluation. We also automatically plot the velocity profile in each dimension, overlaying the analytic solution, the baseline (the previously validated or best case results), and the computed solution for easy visual inspection, as shown in Figure 3. For more complex geometries, where the analytical solution is unavailable, only the baseline and current results are plotted for evaluation. As changes to the gLBM libary are made, it is easy to compare results to ensure they are moving closer to the analytic solution. We also plot the overall error at each iteration of the solution to evaluate the convergence time for each solution. A summary table is also generated for developers to quickly assess the overal performance and verification data for each V&V run performed. Computational performance of each run is performed by calculating the Million Lattice Updates Per Second (MLUPS) on the provided geometry.

Future Directions
Our team is working towards a fast fluid solution for the upper airways to enable clinically relevant analysis of patient-specific surgical analysis. Our initial studies have successfully executed the gLBM library for this domain with mixed results. The library is able to produce reasonable results across the geometry studied; however, the local results within the geometry need further work and the execution speed is not optimal. An example of this geometry is included in the open source repository. Though initial results show promise, more work is required to improve the accuracy of the simulation. We also plan to address the high mach number limitations to perform aerospace simulations in hours, rather than the currently required days of analysis. Our future work will expand on the initial results shown in Figure  5 and advancements will be committed to the gLBM repository. We also plan to optimize the CUDA implementation for faster performance. A final domain we are exploring is in open boundary solutions, such as airfoils, which requires a difference boundary and boundary condition implementation.