Abstract:
Predictive and preventive maintenance strategies are critical in modern industries to ensure optimal equipment performance and longevity. Predictive maintenance utilizes data analytics to predict equipment failures and schedule maintenance proactively, reducing downtime and costs. In contrast, preventive maintenance involves scheduled inspections and repairs to prevent failures before they occur, enhancing reliability and efficiency of industrial operations. While deep learning techniques like Convolutional Neural Networks [1] and Long-Short Term Memory [2] have shown superior performance, they require extensive labeled datasets, which are often impractical to obtain in real-world settings. The integration of machine learning models such as Decision Trees, Support Vector Machines, and Random Forests for reliable Remaining Useful Life predictions in Prognostics and Health Management applications, aligning with the objectives of Industry 4.0, leverages unsupervised learning for initial feature extraction from raw, unlabeled data, followed by supervised training with machine learning models to enhance RUL prediction accuracy. The semi-supervised methodology is optimized using a Genetic Algorithm [3] to fine-tune hyperparameters, demonstrating its effectiveness on the C-MAPSS dataset [4]. First, dataset acquisition was done and together with the initial data application on multiple machine learning algorithms: Decision Tree, Support Vector Machine (SVM) and Random Forest occurred without data preprocessing. While those approaches do not deliver enough precision, they still have several benefits. Realizing the importance of data preprocessing, the first step taken us back to the data cleaning and preparation process, in which missing values, noise, and normalization were the major issues addressed. Accordingly, the next stage following data preprocessing executed was the application of the three machine learning algorithms. In this scenario, Random Forest kept performing better than others, especially because of its robustness and adaptability to high-dimensional datasets. Furthermore, our dataset was the time series data of several parameters.