Using Ensemble Data Mining Modelling for Nonbinary Overflow Detection in Urban Flooding

Kourosh Behzadian, Farzad Piadeh, Albert S. Chen, Luiza C. Campos, Zoran Kapelan

Research output: Contribution to conferencePosterpeer-review


Application of data-driven modelling especially using data mining techniques in flood warning systems has received significant attention recently due mainly to its well-explored sustainable solution for alleviating disruptive socio-economic effects of flood occurrence [1]. Various machine learning models with hybrid data mining techniques have been applied for water level prediction or overflow detection. However, the concept of time-series ensemble modelling has yet to be perceived well, particularly application of nonbinary classification for overflow detection and associated flood risk management [2].

This study presents a new real-time nonbinary overflow detection in urban flooding through extraction of rainfall key features by developing weak learner base models and proposing time-series multi-classification ensemble model. This framework is demonstrated by its application on real case study of urban drainage systems (UDS) located in London, UK. Extracted rainfall features which are selected by partial least squares analysis include (1) rainfall duration, (2) rainfall intensity, (3) evidence of previous rainfall occurrence, and (4) rainfall date of the year. These features are then used to develop seven base models including (1) discriminant analysis, (2) decision tree, (3) Gaussian process regression, (4) K-nearest neighbourhood, (5) Naïve bayes, (6) neural network pattern recognition, and (7) support vector machine to detect one of the three condition of (1) overflow, (2) water level rise is expected but drained successfully without any overflow occurrence, (3) no water level rise is expected. A novel ensemble model (ENS) which blends the performance of developed base models into the decision tree structure was then developed for overflow detection of next twelve 15-min timesteps (i.e., 3 hrs). The result performance of this model is compared by two well-practiced models i.e., stacked random forest (ERF), and nagging K-nearest neighbourhood (EKN) [3]. Confusion matrix is selected as a method of performance assessment in which total positive ratio, accuracy, and total negative ratio are picked up as key performance indicators.

Results show two new proposed rainfall features named “evidence of previous rainfall occurrence” and “rainfall date of the year” could significantly enhance the base model’s accuracy. Furthermore, ENS model could reduce overestimation and underestimation miss rates by nearly 10% in total for 3 hrs-ahead overflow detection, whereas these figures are 37% and 39% for total miss rate of ERF and EKN respectively in the same detection duration. Furthermore, the rate of correct high-hazard overflow detections is 88% in comparison to 64% in ERF and 24% in EKN, which highlights superior ability of the proposed model in early warning alarms of high-hazard situations.
Original languageEnglish
Number of pages1
Publication statusPublished - 28 Apr 2023
EventEuropean Geoscience Union: EGU General Assembly 2023 - Vienna, Austria
Duration: 23 Apr 202328 Apr 2023
Conference number: 23


ConferenceEuropean Geoscience Union
Abbreviated titleEGU23
Internet address


Dive into the research topics of 'Using Ensemble Data Mining Modelling for Nonbinary Overflow Detection in Urban Flooding'. Together they form a unique fingerprint.

Cite this