TY - JOUR

T1 - Photometric redshift estimation using Gaussian processes

AU - Bonfield, D.

AU - Sun, Yi

AU - Davey, N.

AU - Jarvis, M.J.

AU - Abdalla, F.B.

AU - Banerji, M.

AU - Adams, R.G.

N1 - The definitive article can be found at: http://www3.interscience.wiley.com/ Copyright Royal Astronomical Society

PY - 2010

Y1 - 2010

N2 - We present a comparison between Gaussian processes (GPs) and artificial neural networks (ANNs) as methods for determining photometric redshifts for galaxies, given training-set data. In particular, we compare their degradation in performance as the training-set size is degraded in ways which might be caused by the observational limitations of spectroscopy. Using publicly available regression codes, we find that performance with large, complete training sets is very similar, although the ANN achieves slightly smaller rms errors. Training sets with brighter magnitude limits than the test data do not strongly affect the performance of either algorithm, until the limits are so severe that they remove almost all of the high-redshift training objects. Similarly, the introduction of a plausible number (up to 10 per cent) of inaccurate redshifts into the training set has little effect on either method. However, if the size of the training set is reduced by random sampling, the rms errors of both methods increase, but they do so to a lesser extent and in a much smoother manner for the case of GP regression; for the example presented annz has rms errors 20 per cent worse than GP regression in the small training-set limit. Also, when training objects are removed at redshifts 1.3<z<1.7 , to simulate the effects of the 'redshift desert' of optical spectroscopy, the GP regression is successful at interpolating across the redshift gap, while the ANN suffers from strong bias for test objects in this redshift range. Overall, GP regression has attractive properties for photometric redshift estimation, particularly for deep, high-redshift surveys where it is difficult to obtain a large, complete training set. At present, unlike the ANN code, public GP regression codes do not take account of inhomogeneous measurement errors on the photometric data, and thus cannot estimate reliable uncertainties on the predicted redshifts. However, a better treatment of errors is in principle possible, and the promising results in this paper suggest that such improved GP algorithms should be pursued.

AB - We present a comparison between Gaussian processes (GPs) and artificial neural networks (ANNs) as methods for determining photometric redshifts for galaxies, given training-set data. In particular, we compare their degradation in performance as the training-set size is degraded in ways which might be caused by the observational limitations of spectroscopy. Using publicly available regression codes, we find that performance with large, complete training sets is very similar, although the ANN achieves slightly smaller rms errors. Training sets with brighter magnitude limits than the test data do not strongly affect the performance of either algorithm, until the limits are so severe that they remove almost all of the high-redshift training objects. Similarly, the introduction of a plausible number (up to 10 per cent) of inaccurate redshifts into the training set has little effect on either method. However, if the size of the training set is reduced by random sampling, the rms errors of both methods increase, but they do so to a lesser extent and in a much smoother manner for the case of GP regression; for the example presented annz has rms errors 20 per cent worse than GP regression in the small training-set limit. Also, when training objects are removed at redshifts 1.3<z<1.7 , to simulate the effects of the 'redshift desert' of optical spectroscopy, the GP regression is successful at interpolating across the redshift gap, while the ANN suffers from strong bias for test objects in this redshift range. Overall, GP regression has attractive properties for photometric redshift estimation, particularly for deep, high-redshift surveys where it is difficult to obtain a large, complete training set. At present, unlike the ANN code, public GP regression codes do not take account of inhomogeneous measurement errors on the photometric data, and thus cannot estimate reliable uncertainties on the predicted redshifts. However, a better treatment of errors is in principle possible, and the promising results in this paper suggest that such improved GP algorithms should be pursued.

U2 - 10.1111/j.1365-2966.2010.16544.x

DO - 10.1111/j.1365-2966.2010.16544.x

M3 - Article

SN - 0035-8711

VL - 405

SP - 987

EP - 994

JO - Monthly Notices of the Royal Astronomical Society

JF - Monthly Notices of the Royal Astronomical Society

IS - 2

ER -