Articles | Open Access |

Resilient, Automated Monitoring and Fault-Tolerant Control for Critical Building Systems: Integrating GPU-Accelerated Anomaly Detection, Infrastructure-as-Code, and Self-Correcting HVAC Strategies

Dr. Elena Moretti , Institute for Systems Resilience, University of Lausanne

Abstract

This article presents an integrative framework for designing resilient, automated monitoring and fault-tolerant control systems for critical building infrastructure, with an emphasis on heating, ventilation, and air-conditioning (HVAC) systems. The proposed framework synthesizes advances in GPU-accelerated anomaly detection, infrastructure-as-code (IaC) for reproducible deployment, self-correcting control strategies, and organizational preparedness in emergency scenarios. The study draws on interdisciplinary evidence spanning disaster and emergency planning, GPU concurrency and diagnostic automation, advanced anomaly detection in large sensor networks, automated deployment methods, and domain-specific best practices for HVAC control and fault detection. The framework is organized into four interlocking pillars: (1) high-throughput real-time anomaly detection using graph and spatio-temporal models accelerated on commodity GPUs, (2) deterministic, auditable deployment and lifecycle management of monitoring pipelines through IaC, (3) model-based self-correcting controls and fault-tolerant supervisory strategies aligned to ASHRAE standards, and (4) organizational preparedness and recovery planning to close the loop between technical detection, operational response, and disaster resilience. Methodological exposition details the system architecture, data handling, algorithmic choices, and software lifecycle practices; a descriptive results section synthesizes expected outcomes and system behavior under a range of fault modes; and an extended discussion elaborates theoretical implications, limitations, and a pathway for future research and deployment. The design emphasizes safety, reproducibility, and operational viability: detection precision and recall are framed alongside latencies achievable by GPU acceleration and the governance benefits afforded by IaC. The conclusions highlight how coordinated technical and organizational practices can materially improve the readiness, response, and recovery of building systems facing sensor faults, actuator failures, or anomalous dynamics during emergencies. This research contributes a pragmatic, integrative roadmap for researchers, facility engineers, and system integrators seeking to move beyond isolated algorithms toward production-ready resilience for critical built-environment infrastructure.

Keywords

GPU-accelerated anomaly detection, HVAC fault-tolerance, infrastructure-as-code, self-correcting controls

References

Alexander, D. E. (2015). Disaster and emergency planning for preparedness, response, and recovery. Oxford University Press.

Alglave, J., Batty, M., Donaldson, A. F., Gopalakrishnan, G., Ketema, J., Poetzl, D., ... & Wickerson, J. (2015). GPU concurrency: Weak behaviours and programming assumptions. ACM SIGARCH Computer Architecture News, 43(1), 577-591.

Asres, M. W., Omlin, C. W., Wang, L., Yu, D., Parygin, P., Dittmann, J., ... & Cms-Hcal Collaboration. (2023). Spatio-temporal anomaly detection with graph networks for data quality monitoring of the Hadron Calorimeter. Sensors, 23(24), 9679.

Bengea, S. C., Li, P., Sarkar, S., Vichik, S., Adetola, V., Kang, K., Lovett, T., Leonardi, F., Kelman, A.D. (2015). Fault-tolerant optimal control of a building HVAC system. Science and Technology for the Built Environment, 21(6), 734–751.

Bhattacharjee, A. (2020). Algorithms and Techniques for Automated Deployment and Efficient Management of Large-Scale Distributed Data Analytics Services (Doctoral dissertation, Vanderbilt University).

Brambley, M., Fernandez, N., Wang, W., Cort, K.A., Cho, H., Ngo, H., Goddard, J.K. (2011). Final project report: Self-correcting controls for VAV system faults filter/fan/coil and VAV box sections. No. PNNL-20452; Pacific Northwest, National Laboratory (PNNL), Richland, WA, USA.

Chavan, A. (2022). Importance of identifying and establishing context boundaries while migrating from monolith to microservices. Journal of Engineering and Applied Sciences Technology, 4, E168. http://doi.org/10.47363/JEAST/2022(4)E168

Chinamanagonda, S. (2019). Automating Infrastructure with Infrastructure as Code (IaC). Available at SSRN 4986767.

Deep, A. T. (2024). Advanced financial market forecasting: integrating Monte Carlo simulations with ensemble Machine Learning models.

Dexter, A., Pakanen, J. (Eds.). (2001). Demonstrating Automated Fault Detection and Diagnosis Methods in Real Buildings. Technical Research Centre of Finland, Finland.

Dong, M. (2019). Combining unsupervised and supervised learning for asset class failure prediction in power systems. IEEE Transactions on Power Systems, 34(6), 5033-5043.

Economidou, M. (2011). Europe’s buildings under the microscope. A country-by-country review of the energy performance of buildings. Technical Report Buildings Performance Institute Europe.

Fernandez, N.; Brambley, M.; Katipamula, S. (2009a). Self-correcting HVAC controls: Algorithms for sensors and dampers in air-handling units, PNNL-19104; Pacific Northwest; National Laboratory: Richland, WA, USA.

Goel, G., & Bhramhabhatt, R. (2024). Dual sourcing strategies. International Journal of Science and Research Archive, 13(2), 2155. https://doi.org/10.30574/ijsra.2024.13.2.2155

Lulla, K., Chandra, R., & Ranjan, K. (2025). Factory-grade diagnostic automation for GeForce and data centre GPUs. International Journal of Engineering, Science and Information Technology, 5(3), 537-544.

Dhanagari, M. R. (2024). Scaling with MongoDB: Solutions for handling big data in real-time. Journal of Computer Science and Technology Studies, 6(5), 246-264. https://doi.org/10.32996/jcsts.2024.6.5.20

ASHRAE. (2018). Guideline 36–2018. High Performance Sequences of Operation for HVAC Systems. ASHRAE, Akron, OH, USA.

ASHRAE. (2020). Standard 135-2020—BACnetTM—A Data Communication Protocol for Building Automation and Control Networks. Available online: https://www.ashrae.org/technical-resources/bookstore/bacnet (accessed on 10 Aug 2021).

Bengea, S.C., Li, P., Sarkar, S., Vichik, S., Adetola, V., Kang, K., Lovett, T., Leonardi, F., Kelman, A.D. (2015). Fault-tolerant optimal control of a building HVAC system. Science and Technology for the Built Environment, 21(6), 734–751.

Brambley, M., Fernandez, N., Wang, W., Cort, K.A., Cho, H., Ngo, H., Goddard, J.K. (2011). Final project report: Self-correcting controls for VAV system faults filter/fan/coil and VAV box sections. No. PNNL-20452; Pacific Northwest, National Laboratory (PNNL), Richland, WA, USA.

Dexter, A., Pakanen, J. (Eds.). (2001). Demonstrating Automated Fault Detection and Diagnosis Methods in Real Buildings. Technical Research Centre of Finland, Finland.

Fernandez, N.; Brambley, M.; Katipamula, S. (2009a). Self-correcting HVAC controls: Algorithms for sensors and dampers in air-handling units, PNNL-19104; Pacific Northwest; National Laboratory: Richland, WA, USA.

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

Resilient, Automated Monitoring and Fault-Tolerant Control for Critical Building Systems: Integrating GPU-Accelerated Anomaly Detection, Infrastructure-as-Code, and Self-Correcting HVAC Strategies. (2025). Global Multidisciplinary Journal, 4(10), 72-83. https://www.grpublishing.org/journals/index.php/gmj/article/view/242