University of Hertfordshire

By the same authors

Analysis of context-dependent errors for Illumina sequencing

Research output: Contribution to journalArticle

Standard

Analysis of context-dependent errors for Illumina sequencing. / Abnizova, Irina; Leonard, Steven; Skelly, Tom; Brown, Andy; Jackson, David; Gourtovaia, Marina; Qi, Guoying; Faruque, Nadeem ; Lewis, Kevin; Cox, Tony; te Boekhorst, Rene.

In: Journal of Bioinformatics and Computational Biology, Vol. 10, No. 2, 1241005, 03.04.2012.

Research output: Contribution to journalArticle

Harvard

Abnizova, I, Leonard, S, Skelly, T, Brown, A, Jackson, D, Gourtovaia, M, Qi, G, Faruque, N, Lewis, K, Cox, T & te Boekhorst, R 2012, 'Analysis of context-dependent errors for Illumina sequencing' Journal of Bioinformatics and Computational Biology, vol 10, no. 2, 1241005. DOI: 10.1142/S0219720012410053

APA

Abnizova, I., Leonard, S., Skelly, T., Brown, A., Jackson, D., Gourtovaia, M., ... te Boekhorst, R. (2012). Analysis of context-dependent errors for Illumina sequencing. Journal of Bioinformatics and Computational Biology, 10(2), [1241005]. DOI: 10.1142/S0219720012410053

Vancouver

Abnizova I, Leonard S, Skelly T, Brown A, Jackson D, Gourtovaia M et al. Analysis of context-dependent errors for Illumina sequencing. Journal of Bioinformatics and Computational Biology. 2012 Apr 3;10(2). 1241005. Available from, DOI: 10.1142/S0219720012410053

Author

Abnizova, Irina; Leonard, Steven; Skelly, Tom; Brown, Andy; Jackson, David; Gourtovaia, Marina; Qi, Guoying; Faruque, Nadeem ; Lewis, Kevin; Cox, Tony; te Boekhorst, Rene / Analysis of context-dependent errors for Illumina sequencing.

In: Journal of Bioinformatics and Computational Biology, Vol. 10, No. 2, 1241005, 03.04.2012.

Research output: Contribution to journalArticle

Bibtex

@article{fc2eae97465a47ab838d9f54b553fe46,
title = "Analysis of context-dependent errors for Illumina sequencing",
keywords = "Next-generation sequencing, statistical measures, error probability, quality value",
author = "Irina Abnizova and Steven Leonard and Tom Skelly and Andy Brown and David Jackson and Marina Gourtovaia and Guoying Qi and Nadeem Faruque and Kevin Lewis and Tony Cox and {te Boekhorst}, Rene",
year = "2012",
month = "4",
doi = "10.1142/S0219720012410053",
volume = "10",
journal = "Journal of Bioinformatics and Computational Biology",
issn = "0219-7200",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "2",

}

RIS

TY - JOUR

T1 - Analysis of context-dependent errors for Illumina sequencing

AU - Abnizova,Irina

AU - Leonard,Steven

AU - Skelly,Tom

AU - Brown,Andy

AU - Jackson,David

AU - Gourtovaia,Marina

AU - Qi,Guoying

AU - Faruque,Nadeem

AU - Lewis,Kevin

AU - Cox,Tony

AU - te Boekhorst,Rene

PY - 2012/4/3

Y1 - 2012/4/3

N2 - The new generation of short-read sequencing technologies requires reliable measures of data quality. Such measures are especially important for variant calling. However, in the particular case of SNP calling, a great number of false-positive SNPs may be obtained. One needs to distinguish putative SNPs from sequencing or other errors. We found that not only the probability of sequencing errors (i.e. the quality value) is important to distinguish an FP-SNP but also the conditional probability of \correcting" this error (the \second best call" probability, conditional on that of the first call). Surprisingly, around 80% of mismatches can be corrected" with this second call. Another way to reduce the rate of FP-SNPs is to retrieve DNA motifs that seem to be prone to sequencing errors, and to attach a corresponding conditional quality value to these motifs. We have developed several measures to distinguish between sequence errors and candidate SNPs, based on a base call’s nucleotide context and its mismatch type. In addition, we suggested a simple method to correct the majority of mismatches,based on conditional probability of their \second" best intensity call. We attach a corresponding second call confidence (quality value) of being corrected to each mismatch.

AB - The new generation of short-read sequencing technologies requires reliable measures of data quality. Such measures are especially important for variant calling. However, in the particular case of SNP calling, a great number of false-positive SNPs may be obtained. One needs to distinguish putative SNPs from sequencing or other errors. We found that not only the probability of sequencing errors (i.e. the quality value) is important to distinguish an FP-SNP but also the conditional probability of \correcting" this error (the \second best call" probability, conditional on that of the first call). Surprisingly, around 80% of mismatches can be corrected" with this second call. Another way to reduce the rate of FP-SNPs is to retrieve DNA motifs that seem to be prone to sequencing errors, and to attach a corresponding conditional quality value to these motifs. We have developed several measures to distinguish between sequence errors and candidate SNPs, based on a base call’s nucleotide context and its mismatch type. In addition, we suggested a simple method to correct the majority of mismatches,based on conditional probability of their \second" best intensity call. We attach a corresponding second call confidence (quality value) of being corrected to each mismatch.

KW - Next-generation sequencing

KW - statistical measures

KW - error probability

KW - quality value

U2 - 10.1142/S0219720012410053

DO - 10.1142/S0219720012410053

M3 - Article

VL - 10

JO - Journal of Bioinformatics and Computational Biology

T2 - Journal of Bioinformatics and Computational Biology

JF - Journal of Bioinformatics and Computational Biology

SN - 0219-7200

IS - 2

M1 - 1241005

ER -