array_split: Multi-dimensional array partitioning


The array_split (Latham 2017) Python package extends existing dense array partitioning capabilities found in the numpy (Walt, Colbert, and Varoquaux 2011) (numpy.array_split) and skimage (Van der Walt et al. 2014) (skimage.util.view_as_blocks) Python packages. In particular, it provides the means for partitioning based on array shape (rather than requiring an actual numpy.ndarray object) and can partition into sub-arrays based on a variety of criteria including: per-axis number of partitions, total number of sub-arrays (with per-axis number of partition constraints), explicit sub-array shape and constraining a partitioning with an upper bound on the resulting sub-array number of bytes.

Application areas include:

Parallel Processing
Data parallelism by partitioning array for multi-process concurrency (e.g. multiprocessing (“Multiprocessing – Process-Based Parallelism” 2017) or mpi4py (Dalcin et al. 2011)) based on number of cores, or partitioning for accelerator hardware concurrency (e.g. pyopencl or pycuda [kloeckner_pycuda_2012]) based on hardware memory limits.
File I/O
Partitioning large arrays for output to separate files (e.g. as part of a virtual dataset (The HDF Group 1997–1997-NNNN, Collette (2013))) based on maximum file size, or out-of-core partitioning based on in-core memory limits.


