pointcloudset: Efficient Analysis of Large Datasets of Point Clouds Recorded Over Time

Point clouds are a very common format for representing three dimensional data. Point clouds can be acquired by different sensor types and methods, such as lidar (light detection and ranging), radar (radio detection and ranging), RGB-D (red, green, blue, depth) cameras, photogrammetry, etc. In many cases multiple point clouds are recorded over time, e.g., automotive lidars record point clouds with very high acquisition frequencies (typically around 10-20Hz) resulting in millions of points per second. Analyzing such a large collection of point clouds is a big challenge due to the huge amount of measurement data. The Python package pointcloudset provides a way to handle, analyse, and visualize large datasets consisting of multiple point clouds recorded over time. pointcloudset features lazy evaluation and parallel processing and is designed to enable development of new point cloud algorithms and their application on big datasets.


Summary
Point clouds are a very common format for representing three dimensional data. Point clouds can be acquired by different sensor types and methods, such as lidar (light detection and ranging), radar (radio detection and ranging), RGB-D (red, green, blue, depth) cameras, photogrammetry, etc. In many cases multiple point clouds are recorded over time, e.g., automotive lidars record point clouds with very high acquisition frequencies (typically around 10-20Hz) resulting in millions of points per second. Analyzing such a large collection of point clouds is a big challenge due to the huge amount of measurement data. The Python package pointcloudset provides a way to handle, analyse, and visualize large datasets consisting of multiple point clouds recorded over time. pointcloudset features lazy evaluation and parallel processing and is designed to enable development of new point cloud algorithms and their application on big datasets.

Emerging Point Cloud Sensor Technologies
Considering recently emerging and promising lidar technologies such as micro-electromechanical systems, optical phased array, vertical-cavity surface-emitting laser, single-photon avalanche diode, etc., combined with large efforts invested in particular by the automotive industry (Hecht, 2018;Thakur, 2016;Warren, 2019) to further develop low-cost lidar systems, lidar sensors have the potential to enable a new cost-efficient way to perceive and measure the environment. This will not only have a strong impact on automotive applications but also offers a large potential for other research fields and application domains, such as robotics, geophysics, etc. Already today, state-of-the-art lidar sensors designed for automotive applications, such as the Ouster OS-1 [https://ouster.com] or the Velodyne Ultra Puck [https://velodynelidar.com], offer many advantages compared to previous lidar systems used for 3D surveying or terrestrial laser scanning: automotive lidar sensors, are small in size, light in weight, robust, have a low eye safety class, and support high scanning speed. The expected substantial decrease in costs and increase in performance in the upcoming years will open up many new application areas for lidar systems.
Apart from the progress in the lidar sector, technological improvements, as well as size and cost reduction, can also be observed for other 3D sensing technologies such as radar and RGB-D cameras. This will additionally open up new possibilities and application areas for 3D sensing methods and increases the importance of Python packages that are able to process large amounts of point cloud data.

Statement of Need
Other Python packages for point clouds, such as open3D, pyntcloud, and pcl (Rusu & Cousins, 2011) and its Python bindings (Python-PCL Contributors, 2017), focus on processing single point clouds rather than on processing time series of point clouds. Another related library is PDAL (PDAL Contributors, 2018), which also works with pipelines on point clouds. However, it is focused on single point cloud processing as well. ROS (robot operating system) (Stanford Artificial Intelligence Laboratory et al., 2018) provides a way to store, access, and visualize multiple point clouds stored as rosbags. However, these rosbags are meant to be accessed only in a serial fashion, which is not ideal for postprocessing and not well suited for extracting subsets of the point cloud dataset.
Compared to these packages, pointcloudset provides efficient analysis of time series of point clouds using parallel processing. For instance, the package is a helpful toolkit for postprocessing lidar datasets recorded by ROS or for postprocessing multiple lidar scans from terrestrial laser scanners. In addition, pointcloudset can be used to develop algorithms on a single point cloud and apply them to big datasets of point clouds. meta dict [datetime.datetime(2020, 12, 2, 10, 23, 48, 995642), datetime.datetime(2020, 12, 2, 10, 23, 49, 154242), datetime.datetime(2020, 12, 2, 10, 23, 49, 238166)] {'orig_file ': 'vib2_2020-12-02-11-22-46.bag Figure 1 illustrates the structure of the Dataset class, including import and export possibilities. A Dataset consists of many PointCloud objects, which can be accessed like list elements in Python. Alternatively, a PointCloud object can also be created directly from files, as illustrated in Figure 2.

Contributions
T.G. developed the concept and architecture; T.G., B.S., and S.H. developed the software; T.G. wrote the automatic tests; B.S. and T.G. wrote the software documentation; T.H. and T.G. created Jupyter notebooks for example usage; S.M., T.G., and B.S. wrote the manuscript. All authors contributed to the manuscript and software testing.