Imagedata: A Python library to handle medical image data in NumPy array subclass Series

Imagedata provides a Series class inheriting the numpy.ndarray class (Harris et al., 2020), adding DICOM data structures. Plugins provide functions to import and export DICOM and other data formats. The DICOM plugin can read complete series, sorting the data as requested into multidimensional arrays. Input and output data can be accessed on various locations, including local files, DICOM servers and XNAT servers (Marcus et al., 2007). The Series class enables NumPy and derived libraries (like SciPy (Virtanen et al., 2020)) to work on medical images, simplifying input and output.


Summary
Imagedata is a python library to read and write medical image data into Series objects. In particular, imagedata will read, sort and write DICOM ® 3D and 4D series based on defined attributes. As far as possible, imagedata will handle geometry information between the medical image data formats like DICOM, NIfTI (Cox et al., 2004) and ITK (Yoo et al., 2002).
Imagedata provides a Series class inheriting the numpy.ndarray class (Harris et al., 2020), adding DICOM data structures. Plugins provide functions to import and export DICOM and other data formats. The DICOM plugin can read complete series, sorting the data as requested into multidimensional arrays. Input and output data can be accessed on various locations, including local files, DICOM servers and XNAT servers (Marcus et al., 2007). The Series class enables NumPy and derived libraries (like SciPy ) to work on medical images, simplifying input and output.
A feature is the conversion between different image formats. E.g., a pipeline based on a clinical DICOM series can be converted to NIfTI, processed by some NIfTI-based tool (e.g. FSL ). Finally, the result can be converted back to DICOM and stored as a new series in PACS (Picture Archive and Communication system).
A viewer is included, allowing the display of a stack of images, including modifying window width and centre, and scrolling through 3D and 4D image stacks. A region of interest (ROI) can be drawn, resulting in a mask as a NumPy ndarray, or as an outline.

Statement of need
DICOM is the standard image format and protocol when working with clinical medical images in a hospital. In tomographic imaging, the legacy DICOM formats like computed tomography (CT) and magnetic resonance (MR) information object definitions (IOD), are in common use. These formats store slices file by file, leaving the sorting of the files to the user. The more recent enhanced formats which can accommodate a complete 3D or 4D acquisition in one file, are only slowly adopted by manufacturers of medical equipment.
Working with legacy DICOM medical images in python can be accomplished using libraries like pydicom (Mason et al., 2021), GDCM (Malaterre, 2008), NiBabel  or ITK. Pydicom and GDCM are native DICOM libraries. As such, they do not provide access to medical images stored in other formats. NiBabel and ITK are mostly focused on NIfTI and ITK MetaIO image formats, respectively. These formats are popular in research tools. However, DICOM support is rudimentary. All these libraries typically leave the sorting of legacy DICOM image files to the user.
Highdicom  focus on parametric maps, annotations and segmentations, using enhanced DICOM images. Highdicom does an excellent job of promoting the enhanced DICOM standards, including storage of boolean and floating-point data. The handling of legacy DICOM objects are left to pydicom.
NumPy ndarrays is the data object of choice for numerical computations in Python. Imagedata extends NumPy arrays with DICOM information and functionality. Additionally, importing and exporting images to other image formats is available through the plugin architecture.
When setting up pipelines to process clinical data, patient information should be maintained throughout to ensure patient safety. If the process involves DICOM data only, this requirement is easily fulfilled. However, some popular image processing systems require formats that do not maintain patient information. The ability to attach DICOM metadata to these other formats let the user exploit a wider set of image processing software.
Imagedata builds on several of these libraries, attempting to solve the problem of sorting legacy DICOM images, providing NumPy ndarrays, and accessing medical images in various formats.

Architecture
The Series class is a numpy.ndarray subclass. A Series object is instantiated from an image source, either from input files, from a server connection, or from an ndarray. DICOM metadata is handled by a Header class, which also maintains an Axes class defining the axes of the array dimensions.
Handling specific image data formats are done by Formats plugins, while Archives plugins give access to files stored both in the filesystem and in compressed archives. The Transports plugins let the user access networked resources given by a URL. See the plugin architecture and main classes in Figure 1. E.g., an xnat:// URL will employ the XnatTransport plugin to fetch a compressed zip archive, and the ZipfileArchive will extract individual files from the archive.
Plugins are defined using python's entry_point (PyPA, 2022) mechanism. The naming convention requires any plugin to advertise itself on the imagedata_plugins list.

Examples
In this section, we demonstrate the use of imagedata in the python language. In addition, there is a console application image_data which comes in handy when the sole purpose is to convert and store an image dataset from one format to another.

Compute mean of two datasets
A basic example reading two time series from folders dirA and dirB, and writing their mean to folder dirMean. The format of the input data is automatically detected, and is not specified: from imagedata.series import Series a = Series('dirA', 'time') b = Series('dirB', 'time') assert a.shape == b.shape, "Shape of a and b differ" # Notice how series a and b are treated as NumPy arrays c = (a + b) / 2 c.write('dirMean')

Sorting
Sorting of DICOM slices is an important feature. Imagedata will sort slices into volumes based on slice location. Volumes may be sorted on various DICOM attributes: Some non-DICOM formats do not specify the labelling of 4D data. In this case, the sorting can be specified manually.

Slicing
Like ndarray, the Series object can be sliced. The imagedata package attempts to maintain the geometry of the sliced data.

Draw a region of interest
A region of interest (ROI) can be drawn, producing a mask as a NumPy ndarray. This example will obtain a mask image segment, convert the original grayscale image into a corresponding RGB image, and mask the green and blue color bands inside the ROI.

Converting data from DICOM and back
Some workflows process patient data using a tool that do not accept DICOM data. In order to maintain the coupling to patient data, the data can be converted to e.g. NIfTI and back.
# Convert the niftiResult back to DICOM, # using dicomDir as a template image_data --of dicom --template dicomDir dicomResult niftiResult # The resulting dicomResult will be a new DICOM series # that could be added to a PACS # Set series number and series description before # transmitting to PACS using DICOM transport image_data --sernum 1004 --serdes 'Processed data' \ dicom://server:104/AETITLE dicomResult

Example using python code
This code will store the Series data in a NIfTI format, letting some NIfTI-dependent code produce a result in niftiResult. This NIfTI dataset is loaded into a Series object, using the original DICOM data as template to maintain patient and study metadata. Finally, the new dataset is sent to a DICOM server using the DICOM protocol.