University of Hertfordshire

Standard

Evaluation of Machine Learning Algorithms for Classification of Primary Biological Aerosol using a new UV-LIF spectrometer. / Ruske, Simon; Topping, D. O.; Foot, V.E.; Kaye, Paul; Stanley, Warren; Crawford, I.P.; Morse, Andrew; Gallagher, Martin W.

In: Atmospheric Measurement Techniques Discussions, Vol. 10, No. 2, 13.07.2016, p. 695-708.

Research output: Contribution to journalArticlepeer-review

Harvard

APA

Vancouver

Author

Ruske, Simon ; Topping, D. O. ; Foot, V.E. ; Kaye, Paul ; Stanley, Warren ; Crawford, I.P. ; Morse, Andrew ; Gallagher, Martin W. / Evaluation of Machine Learning Algorithms for Classification of Primary Biological Aerosol using a new UV-LIF spectrometer. In: Atmospheric Measurement Techniques Discussions. 2016 ; Vol. 10, No. 2. pp. 695-708.

Bibtex

@article{3347f2fd76b2430e82f1553707211c03,
title = "Evaluation of Machine Learning Algorithms for Classification of Primary Biological Aerosol using a new UV-LIF spectrometer",
abstract = "Characterisation of bio-aerosols has important implications within Environment and Public Health sectors. Recent developments in Ultra-Violet Light Induced Fluorescence (UV-LIF) detectors such as the Wideband Integrated bio-aerosol Spectrometer (WIBS) and the newly introduced Multiparameter bio-aerosol Spectrometer (MBS) has allowed for the real time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal Spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complexenvironments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we test Hierarchical Agglomerative Clustering with various different linkages. For supervised learning, ten methods were tested; including decision trees, ensemble methods: Random Forests, Gradient Boosting and Ad-aBoost; two implementations for support vector machines: libsvm and liblinear; Gaussian methods: Gaussian na{\"i}ve Bayesian, quadratic and linear discriminant analysis and finally the k-nearest neighbours algorithm.The methods were applied to two different data sets measured using a new Multiparameter bio-aerosol Spectrometer which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. Clustering, in general performs slightly worse than the supervised learning methods correctly classifying, at best, only 72.7and 91.1 percent for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 88.1 and 97.8 percent of the testing data respectively across the two data sets.",
author = "Simon Ruske and Topping, {D. O.} and V.E. Foot and Paul Kaye and Warren Stanley and I.P. Crawford and Andrew Morse and Gallagher, {Martin W.}",
note = "{\textcopyright} Author(s) 2016. This work is distributed under the Creative Commons Attribution 3.0 License.",
year = "2016",
month = jul,
day = "13",
doi = "10.5194/amt-2016-214",
language = "English",
volume = "10",
pages = "695--708",
journal = "Atmospheric Measurement Techniques Discussions",
issn = "1867-8610",
number = "2",

}

RIS

TY - JOUR

T1 - Evaluation of Machine Learning Algorithms for Classification of Primary Biological Aerosol using a new UV-LIF spectrometer

AU - Ruske, Simon

AU - Topping, D. O.

AU - Foot, V.E.

AU - Kaye, Paul

AU - Stanley, Warren

AU - Crawford, I.P.

AU - Morse, Andrew

AU - Gallagher, Martin W.

N1 - © Author(s) 2016. This work is distributed under the Creative Commons Attribution 3.0 License.

PY - 2016/7/13

Y1 - 2016/7/13

N2 - Characterisation of bio-aerosols has important implications within Environment and Public Health sectors. Recent developments in Ultra-Violet Light Induced Fluorescence (UV-LIF) detectors such as the Wideband Integrated bio-aerosol Spectrometer (WIBS) and the newly introduced Multiparameter bio-aerosol Spectrometer (MBS) has allowed for the real time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal Spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complexenvironments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we test Hierarchical Agglomerative Clustering with various different linkages. For supervised learning, ten methods were tested; including decision trees, ensemble methods: Random Forests, Gradient Boosting and Ad-aBoost; two implementations for support vector machines: libsvm and liblinear; Gaussian methods: Gaussian naïve Bayesian, quadratic and linear discriminant analysis and finally the k-nearest neighbours algorithm.The methods were applied to two different data sets measured using a new Multiparameter bio-aerosol Spectrometer which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. Clustering, in general performs slightly worse than the supervised learning methods correctly classifying, at best, only 72.7and 91.1 percent for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 88.1 and 97.8 percent of the testing data respectively across the two data sets.

AB - Characterisation of bio-aerosols has important implications within Environment and Public Health sectors. Recent developments in Ultra-Violet Light Induced Fluorescence (UV-LIF) detectors such as the Wideband Integrated bio-aerosol Spectrometer (WIBS) and the newly introduced Multiparameter bio-aerosol Spectrometer (MBS) has allowed for the real time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal Spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complexenvironments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we test Hierarchical Agglomerative Clustering with various different linkages. For supervised learning, ten methods were tested; including decision trees, ensemble methods: Random Forests, Gradient Boosting and Ad-aBoost; two implementations for support vector machines: libsvm and liblinear; Gaussian methods: Gaussian naïve Bayesian, quadratic and linear discriminant analysis and finally the k-nearest neighbours algorithm.The methods were applied to two different data sets measured using a new Multiparameter bio-aerosol Spectrometer which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. Clustering, in general performs slightly worse than the supervised learning methods correctly classifying, at best, only 72.7and 91.1 percent for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 88.1 and 97.8 percent of the testing data respectively across the two data sets.

U2 - 10.5194/amt-2016-214

DO - 10.5194/amt-2016-214

M3 - Article

VL - 10

SP - 695

EP - 708

JO - Atmospheric Measurement Techniques Discussions

JF - Atmospheric Measurement Techniques Discussions

SN - 1867-8610

IS - 2

ER -