Wind Turbine Fault Prediction with SVM, PCA and LDA

Role: Researcher

Duration: Mestrado

Wind Turbine Fault Prediction with SVM, PCA and LDA

Wind Turbine Fault Prediction with SVM, PCA and LDA

Technical-business analysis of an AI pipeline for the wind sector


Abstract

This article proposes a method based on generator electrical signals to detect defects caused by mass imbalance or blade pitch faults. The approach combines Support Vector Machines, Principal Component Analysis, and Linear Discriminant Analysis for early detection, which is essential to reduce downtime and avoid unnecessary maintenance. A framework composed of TurbSim, FAST, and MATLAB/Simulink simulated electrical signals from a 1.5 MW turbine under different wind scenarios and blade imbalances. In identifying failure conditions, 97.61% accuracy was achieved using only generator signals. The method was compared with eight classifiers and showed superior performance.


1. Business Perspective

Unplanned wind turbine downtime can reach 237 h/year, directly affecting revenue and increasing maintenance costs. By analyzing only generator current signatures — a non-invasive and low-CAPEX strategy — the method achieves 97.6% accuracy in early detection, enabling predictive maintenance and increasing capacity factor.


2. Data Generation and Simulation

As real SCADA data were unavailable, a synthetic dataset was created:

"TurbSim generated turbulent wind fields, FAST simulated the GE 1.5s turbine, and a MATLAB/Simulink PMSG model recorded three-phase currents for SVM training." (GitHub)

The figure below illustrates the data generation process:

Synthetic data generation process

ParameterValue
Rated power1.5 MW
GeneratorPermanent-magnet synchronous (PMSG)
Hub height84 m
Rotor diameter70 m
Configuration3-blade, upwind
Rated speed20 RPM
Rated torque736.79 kN·m

Operating Conditions – Database Scope

FeatureRange
Mean wind speed15.0 – 24.0 m/s
Turbulence intensity5% – 30%
Imbalance conditionsBalanced; mass −3%, +2%, +5%; aerodynamic 2°, 3°, 4°
Simulations per pair (wind × TI)12
Simulation time120 s (last 60 s stored)
Sampling frequency2 kHz
Operating region3

The final dataset comprises 7 classes × 5 wind speeds × 5 turbulence levels × 12 runs2,100 recordings, each with 60 s × 2 kHz × 3 phases = 360k samples.


3. Methodology

The fault detection pipeline follows the flow below:

Fault detection methodology flow

  1. Pre-processing
    • z-score normalization → PCA (99.99% variance) → LDA (6 dimensions).
  2. Model
    • SVM with RBF kernel, tuned via grid search (10-fold CV, 168 combinations).
  3. Benchmarks
    • k-NN, QDA, Decision Trees, Random Forest, AdaBoost, ANN, linear SVM.
  4. Metrics
    • Accuracy, Precision, Recall, F1, confusion matrix.

The PCA/LDA visualization below shows class separability:

PCA/LDA class visualization


4. Applied Skills

Throughout the end-to-end study — from data generation to model validation — the following capabilities were exercised:

  • Dimensionality reduction & statistics: PCA, LDA, variance analysis.
  • Advanced machine learning: SVM and margin theory.
  • Data engineering & MLOps: Python-Simulink-FAST automation, grid search, cross-validation.
  • High-performance computing: Parallel PCA on matrices > 120k × 2k.
  • Wind turbine domain knowledge: Physical modeling, pitch and rotor faults.

5. Results

The confusion matrix below shows model performance:

Model confusion matrix

  • Global accuracy: 97.6%; class-wise F1 ≥ 0.92.
  • Efficiency: PCA + LDA reduce the dataset by 98%, cutting hyperparameter search to ≈ 2 min.

6. Impact & Next Steps

  • Stakeholders: operators (↓ OPEX), investors (↑ capacity factor), OEMs (new SaaS revenue).
  • Roadmap: extend fault types (gearbox, yaw), add explainability (SHAP), deploy on ARM edge devices in the nacelle.

7. Conclusion

The project demonstrates, in a tangible way, how modern AI techniques can increase the reliability of renewable assets, combining physical simulation, machine learning, and robust software engineering.