CeRULEo: Comprehensive utilitiEs for Remaining Useful Life Estimation methOds

CeRULEo, which stands for Comprehensive utilitiEs for Remaining Useful Life Estimation methOds, is a Python package designed to train and evaluate regression models for predicting remaining useful life (RUL) of equipment. RUL estimation is a process that uses prediction methods to forecast the future performance of machinery and obtain the time left before machinery loses its operation ability. The RUL estimation has been considered as a central technology of Predictive Maintenance (PdM) (Heimes, 2008; X. Li et al., 2018). PdM techniques can statistically evaluate a piece of equipment’s health status, enabling early identification of impending failures and prompt pre-failure interventions, thanks to prediction tools based on historical data (Susto et al., 2014). CeRULEo offers a comprehensive suite of tools to help with the analysis and pre-processing of preventive maintenance data. These tools also enable the training and evaluation of RUL models that are tailored to the specific needs of the problem at hand.

In Industry 5.0, the industrial machines produce a large amount of data which can be used to predict an asset's life (Khan et al., 2023).RUL estimation uses prediction techniques to forecast a machine's future performance based on historical data, enabling early identification of potential failures and prompt pre-failure interventions.
Within the PdM and RUL regression ecosystem, finding a library that effectively combines modelling, feature extraction capabilities, and tools for model comparison poses a significant challenge.While numerous repositories and libraries exist for models and feature extraction in time series data (Christ et al., 2018;Tavenard et al., 2020), few offer a comprehensive solution that integrates both aspects effectively.The prog_models and prog_als libraries from NASA (Teubert et al., 2022) come closest to fulfilling this requirement.However, they have a strong focus on simulation and lack extensive mechanisms for feature extraction from time series data.
On the other hand, CeRULEo provides a comprehensive set of utilities designed to train and evaluate regression models for predicting RUL of equipment.CeRULEo emphasizes a datadriven approach using industrial data, particularly when a simulation model is unavailable or costly to develop, prioritizing model library-agnosticism for easy deployment in any production environment.
In order to achieve good performance, RUL regression requires data preparation and feature engineering.Typically, machinery data is provided as time series data from various sensors during operation.The first step in data preparation is often to create a dataset based on run-to-failure cycles.This involves dividing the time series into segments where the equipment starts in a healthy state and ends in a failure state, or is close to failure.The second step of data preparation is preprocessing.While PdM models can be used in a variety of contexts with different data sources and errors, there are some general techniques that can be applied (Serradilla et al., 2022), such as time-series validation, imputing missing values, handling homogeneous or non-homogeneous sampling rates, addressing values, range and behaviour differences across different machines and the creation of run-to-failure-cycle-based data.
CeRULEo addresses these issues by providing a comprehensive toolkit for preprocessing time series data for use in PdM models, with a focus on run-to-failure cycles.The preprocessing includes sensor data validation methods, for studying not only missing and corrupted values but also distribution drift among different pieces of equipment.
In addition to preprocessing, it enables the iteration of machine data for use in both mini-batch and full-batch regression models, and is compatible with popular machine learning frameworks such as scikit-learn (Pedregosa et al., 2011) and tensorflow (Abadi et al., 2015).The library also includes a catalog of successful deep learning models (Chen et al., 2022;Jayasinghe et al., 2019;H. Li et al., 2020) from the literature and a collection of commonly used RUL datasets for quick model evaluation.
The acceptance of PdM technologies is pivotal in Industry 5.0 for successful implementation, but hesitations or reluctance by decision-makers can still pose significant barriers (Oudenhoven et al., 2022).One effective approach to foster acceptance and understanding is through explainability, which plays a crucial role in PdM.As such, CeRULEo incorporates explainable models capable of providing additional information about the predictions, enhancing comprehension: one that can select the most relevant features for the model (Lemhadri et al., 2021), and a convolutional model (Fauvel et al., 2021) that provides post-hoc explanations of the predictions to understand the reasoning behind the predicted RUL.
Moreover, CeRULEo provides tools for evaluating and comparing PdM models based on not only traditional regression metrics, but also on their ability to prevent errors and reduce costs.In many cases, the costs of not accurately detecting or anticipating faults can be much higher than the cost of inspections or maintenance due to reduced efficiency, unplanned downtime, and corrective maintenance expenses.In PdM, it is particularly important to be accurate about the RUL of equipment near the end of its lifespan, as an overestimation of RUL can have serious consequences when immediate action is required.CeRULEo addresses this issue by providing mechanisms for weighting samples according to their importance and asymmetric losses for training models, as well as visualization tools for understanding model performance in relation to true RUL.

Financial Acknowledgement
This work was partially carried out within the MICS (Made in Italy -Circular and Sustainable) Extended Partnership and received funding from Next-GenerationEU (Italian PNRR -M4 C2, Invest 1.3 -D.D. 1551.11-10-2022,PE00000004).Moreover this study was also partially carried out within the PNRR research activities of the consortium iNEST (Interconnected North-Est Innovation Ecosystem) funded by the European Union Next-GenerationEU (Piano Nazionale di Ripresa e Resilienza (PNRR) -Missione 4 Componente 2, Investimento 1.5 -D.D. 1058 23/06/2022, ECS00000043).This work was also co-funded by the European Union in the context of the Horizon Europe project 'AIMS5.0-Artificial Intelligence in Manufacturing leading to Sustainability and Industry5.0'Grant agreement ID: 101112089.