BoxKit: A Python library to manage analysis of block-structured simulation datasets

BoxKit is a


Statement of need
Simulation sofware instruments like Flash-X (Dubey et al., 2022) store output in the form of Hierarchical Data Format (HDF5) datasets.Each dataset is often gigabytes (GB) in size and requires cache efficient techniques to enable its integration with Python packages.BoxKit data structures act as a wrapper around simulation output stored in HDF5 files and provide metadata for AMR blocks that describe the simulation domain.The wrapper objects are lightweight in nature and represent chunks of data stored on disk, acting as array like input for Python functions/methods.This approach allows for selective loading of data from disk to memory in form of chunks/blocks which improves cache efficiency.The library also enables creation of new datasets for data-intensive workflows, and can be extended beyond its current application to numerical simulations.Compared to existing data analysis packages like yt (Turk et al., 2011), BoxKit offers more intuitive abstraction layers over AMR blocks through its metadata wrappers.This provides raw access to simulation data allowing users to develop their own low-level methods for spatiotemporal interpolation and stenciled computations.We aim for this library to complement existing packages rather than replace them.
BoxKit also offers wrappers to scale the process of deploying workflows on NUMA and distributed computing architectures by providing decorators that can parallelize Python operations over a single data structure to operate over a list.This can be understood better using the workflow described in Figure 1 that has been applied to data analysis and machine learning applications in chemical and thermal science engineering (Dhruv, 2023;Hassan et al., 2023).Output from Flash-X boiling simulations is created and stored on multinode clusters.Processing this output through BoxKit allows for scaling a simple operation over block to a list of blocks as shown below, The Action wrapper converts the function, operation_on_block, into a parallel method which can be deployed on a multinode cluster with the desired backend (JobLib/Dask).BoxKit does not interfere with parallelization schema of target applications like SciKit, OpticalFlow, and PyTorch which function independently using available resources.Loading all the datasets into cache memory at the same time is very inefficient for this problem and requires use of BoxKit's metadata wrappers to efficiently load data chunks from disk, operate locally in space, and scale its computation across multiple threads.Based on the graph in Figure 2 the parallel performance scales better as   increases.
Mapping of AMR data to contingous arrays becomes important for applications where global operations in space are required.An example of this is SciKit's skimage.measuremethod, which can be used to measure bubble shape and size for Flash-X boiling simulations.BoxKit improves performance of this operation by ~5x.Data for these performance studies along with corresponding IPython notebooks can be found in (BoxKit Performance, n.d.).

Ongoing work
Our ongoing work focuses on developing BoxKit to improve performance of Scientific Machine Learning (SciML) applications and using it as part of a broader workflow that integrates Fortran/C++ based applications with state-of-art machine learning packages available in Python as highlighted in Figure 1.

Figure 1 :
Figure 1: BoxKit is designed to integrate simulation software instruments like Flash-X with Python-based machine learning and data analysis packages.Large simulation datasets (~10 GB) can leverage BoxKit to improve performance of offline training/analysis.This mechanism is part of a broader workflow to integrate simulations with machine learning using a Fortran-Python bridge shown with dotted lines.

Figure 2 :
Figure 2: Preliminary performance analysis of BoxKit on a single 22 core IBM Power9 node (L1 cache -32+32 kilobytes (KiB) per core, L2 cache -512 KiB per core) for operations involving calculation of temporal mean across multiple datasets (left), and merging block-structured AMR datasets into contiguous arrays (right).

Figure 2
Figure 2 provides results of performance tests performed on a single 22 core node on Summit (ORNL, 2023) for two basic operations: (1) Calculation of temporal mean of heat flux in Flash-X boiling simulations (, , , ), and (2) A block merger operations to convert AMR data into contiguous arrays.Calculation of temporal mean requires operation on data across multiple datasets, with each dataset approximately 10 GB in size.Following is the mathematical representation of the problem where   represents the total number of datasets,