Detalle del congreso

Autores: E. Tapia; P. Bulacio; F. Spetale; L. Angelone; L. Ornella.

Resumen: Motivation. Molecular barcoding provides an opportunity to spread, i.e., to multiplex, NGS capacity across multiple individuals at specific portions of the genomes, thus providing a better compromise between cost, coverage and throughput for population genomics projects. Molecular barcoding lays on the ability of short oligos, known as barcodes, to tag DNA fragments belonging to different samples. Despite the deployment method, barcodes must assure both resilience against sequencing errors and minimum interference with DNA sequencing reactions. Current design of barcodes rely on exhaustive searches of the space of oligo, thus limiting barcoding applications to tens of samples and low sequencing error rates. Considering that upcoming 3rd generation (3G) sequencing technologies will give us improved read lengths but also higher sequencing error rates [1], it follows that the problem of designing barcoding schemes with improved multiplexing and error resilience capacities deserves special attention. In this communication, we consider the intelligent, i.e., non-exhaustive, design of barcoding schemes based on linear error correcting codes. We aim the design of barcodes allowing the parallel sequencing of tens to hundreds of samples under rather high sequencing error rates. We explore the barcoding capacity of binary Hamming and BCH codes [2] and quaternary LDPC codes [3], i.e., LDPC codes in GF(4). Without loss of generality, we constrain ourselves to barcoding schemes for pyrosequencing applications. Results. Experimental results show that quaternary LDPC codes can induce more robust and more flexible barcoding schemes, with higher multiplexing capacity, than BCH and Hamming codes. Specifically, quaternary LDPC codes can achieve almost error free identification of hundreds of barcodes under sequencing error rates up to 0.04. Experimental results also show that improved error correction capacity can only be accomplished with an increasing number of flows per barcode and by reducing the multiplexing capacity. However, quaternary LDPC codes achieve the best compromise between barcodes processing complexity, error resilience and multiplexing capacity. Methods. C++ software for standard applications of linear error correcting codes was adapted to simulate barcodes constructed from linear codes [2-5]. Barcoding schemes containing long homopolymer regions favouring insertion and deletion errors in pyrosequencing applications, were avoided by the introduction of proper interleaver designs after the standard generation of codewords. Similarly, potential interferring barcoding schemes were avoided by the introduction of a novel codeword filter after the interleaving stage; the filter was developed based on [6]. Sequencing errors were modeled by a cuaternary symmetric channel: a nt remains unchanged with probability 1-p and becomes substituted with any other nt with probabilty p/3. Barcoding lengths were limited to 32 nt?s. Barcoding schemes comprising 8, 16 and 32 nt?s were evaluated for binary Hamming and BCH codes. Barcoding schemes comprising between 8 and 32 nt?s were evaluated for quaternary LDPC codes. Conclusions. Experimental results suggest that LDPC codes in GF(4) are good candidates for improving the multiplexing capacity of current 2G and upcoming 3G sequencing platforms. .

Tipo de reunión: Conferencia.

Tipo de trabajo: Resumen.

Producción: Barcodes in GF(4): Multiplexing for upcoming 3G sequencing technology.

Reunión científica: 7th International Conference of the Brazilian Association for Bioinformatics and Computational Biology (AB3C) and 3rd International Conference of the IberoAmerican Society for Bioinformatics (SoIBio).

Lugar: Florianopolis.

Institución organizadora: IberoAmerican Society for Bioinformatics -Brazilian Association for Bioinformatics and Computational Biology.

Publicado: Sí

Lugar publicación: Florianopolis

Mes de reunión: 10

Año: 2011.

Página web: aquí