Authors: Ornella, L.; Esteban, L; Serra, E.; Tapia, E.
Title: A classification approach for the analysis of coding vs non-coding sequences in the Trypanosoma cruzi genome.
We propose a classification approach for gene finding in highly divergent genomes. We apply it to the Trypanosoma cruzi one. Bona fide T. cruzi genomic sequences available in GenBank were recovered and converted in a training data set containing 450 coding and 540 non-coding sequences. These sequences were further processed and characterized into a fixed number of attributes: the frequency of a given mer length and a binary class label. Best classification results, 6.1 percent median error measured over 100 Montecarlo runs of ten-fold cross-validation, were obtained with Support Vector Machine RBF (radial basis function) classifiers at a mer length equalling the codon length. We remark that the current standard approach to gene finding, e.g., the one implemented by the Glimmer tool, is not directly applicable to the T. cruzi genome..
Magazine: Actas de la Academia Nacional de Ciencias.
Editorial: Academia Nacional de Ciencias.
Editing place: Cordoba.
Reference type: Con Referato.