Abstract:
The rapid expansion of the Internet of Things (IoT) has significantly increased the cyber-attack surface, exposing infrastructures to adaptive and stealthy threats. Traditional Intrusion Detection Systems (IDS), whether signature-based or anomalybased, remain constrained by high false positives, limited adaptability, and reliance on centralized data. While machine learning-based IDS improve detection, centralized training introduces privacy and governance risks. Federated Learning (FL) offers a privacy-preserving alternative; however, its dominant algorithm, Federated Averaging (FedAvg), assumes IID data and homogeneous models—leading to degraded performance under label skew and limited resilience to concept drift. Although ensemble-based and heterogeneous FL methods have been proposed, most are restricted to homogeneous deep models, lack interpretability, and rarely integrate drift-awareness, leaving open gaps in robustness and practicality. To address these challenges, this study proposes Heterogeneous Federated Learning with Ensemble Voting (HFL-EV), where diverse classical models—Random Forest, Decision Tree, KNN, and Logistic Regression—are trained locally on non-IID client data and aggregated via a learned meta-learner at the server. ADWIN-based drift detection enables autonomous, client-side retraining during distributional shifts without centralized coordination. Using the NF-ToN-IoT-V2 dataset, HFL-EV is benchmarked against centralized baselines trained on SMOTE-balanced data and federated configurations including FedAvg with MLP under IID partitions and federated evaluation of classical models under non-IID label skew with prediction-level aggregation. Results demonstrate that HFL-EV achieves near-centralized performance while operating under privacy-preserving federated constraints, dramatically outperforming FedAvg on minority-class detection and Cohen’s κ. The framework sustains reliable detection across both benign and attack classes, adapts autonomously to evolving traffic patterns, and preserves privacy by retaining raw traffic and base model parameters on-device. The learned meta-learner incurs minimal computational and communication overhead, making the approach practical for resource-constrained deployments. These findings establish HFL-EV as a practical, interpretable, and adaptive IDS for heterogeneous IoT/IIoT environments, addressing key limitations of existing federated intrusion detection approaches while maintaining strong detection performance under realistic non-IID conditions.