TY - GEN
T1 - Performance evaluation of distributed machine learning for load forecasting in smart grids
AU - Syed, Dabeeruddin
AU - Refaat, Shady S.
AU - Abu-Rub, Haitham
N1 - Funding Information:
This publication was made possible by NPRP grant [NPRP10-0101-170082] from the Qatar National Research Fund (a member of Qatar Foundation), the co-funding by IBERDROLA QSTP LLC and sponsorship by Texas A & M Energy Institute Fellowship. Portions of this research were conducted with the advanced computing resources provided by Texas A & M High Performance Research Computing. The statements made herein are solely the responsibility of the authors.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/1
Y1 - 2020/1
N2 - Load forecasting in smart grid is the process of predicting the amount of electrical power to meet the short, medium and long term demands. Accurate load forecasting helps electrical utilities to manage their energy production, operations, control and management. Most of the state-of-the-art forecasting methodologies utilize classical machine learning algorithms to predict the electrical load. There is a need that big data platforms and parallel distributed computing are utilized to their potential in the available solutions. In this paper, the Apache Spark and Apache Hadoop are utilized as big data platforms for distributed computing in order to predict the load using available big data. In this paper, MLib, Spark library for machine learning algorithms, is utilized for distributed computing. Using MLib allows testing the classic regression algorithms such as linear regression, generalized linear regression, decision tree, random forest and gradient-boosted trees in addition to survival regression and isotonic regression. The obtained results show that Spark produces high accuracy while parallelizing the process of load forecasting in highly competent training and test times. Actual big data are used in the load forecasting process.
AB - Load forecasting in smart grid is the process of predicting the amount of electrical power to meet the short, medium and long term demands. Accurate load forecasting helps electrical utilities to manage their energy production, operations, control and management. Most of the state-of-the-art forecasting methodologies utilize classical machine learning algorithms to predict the electrical load. There is a need that big data platforms and parallel distributed computing are utilized to their potential in the available solutions. In this paper, the Apache Spark and Apache Hadoop are utilized as big data platforms for distributed computing in order to predict the load using available big data. In this paper, MLib, Spark library for machine learning algorithms, is utilized for distributed computing. Using MLib allows testing the classic regression algorithms such as linear regression, generalized linear regression, decision tree, random forest and gradient-boosted trees in addition to survival regression and isotonic regression. The obtained results show that Spark produces high accuracy while parallelizing the process of load forecasting in highly competent training and test times. Actual big data are used in the load forecasting process.
KW - Apache Spark
KW - Distributed Computing
KW - Distributed Machine Learning
KW - Load Forecast
KW - Smart Grids
UR - http://www.scopus.com/inward/record.url?scp=85083110917&partnerID=8YFLogxK
U2 - 10.1109/KI48306.2020.9039797
DO - 10.1109/KI48306.2020.9039797
M3 - Conference contribution
AN - SCOPUS:85083110917
T3 - Proceedings of the 30th International Conference on Cybernetics and Informatics, K and I 2020
BT - Proceedings of the 30th International Conference on Cybernetics and Informatics, K and I 2020
A2 - Ciganek, Jan
A2 - Kozak, Stefan
A2 - Kozakova, Alena
PB - Institute of Electrical and Electronics Engineers (IEEE)
T2 - 30th International Conference on Cybernetics and Informatics, K and I 2020
Y2 - 29 January 2020 through 1 February 2020
ER -