Authors: Murillo, Javier; Spetale, Flavio; Arce, Débora; Tapia, Elizabeth; Guillaume, Serge; Bulacio, Pilar.
Title: Subset selection of protein attributes by Fuzzy Integrals.
Resumen: Electronic protein annotation is traditionally performed by similarity searches against databases of protein sequences and functional domains. Though robust, these methods usually leave a great portion of unannotated sequences which may be solved by supervised machine learning methods and corresponding training datasets. But the creation of training datasets requires protein sequence characterization by a suitable set of physiochemical attributes. In principle, an almost unlimited number of attributes can be used. However, this may lead to to machine learning methods of unwieldy complexity. To tackle this problem, attribute selection methods must be considered. Standard attribute selection methods provide rankings based on individual scores ignoring attributes relationships. We note, however, that the support of an individual attribute may be small but its collective contribution may be high. New proposals for attribute selection aim at selecting subsets of attributes based on their collective behavior. Under this baseline, collective attribute selection based on Choquet Fuzzy Integral (FI) is proposed. Briefly, FI is a trained aggregation method w.r.t. a fuzzy measure. The fuzzy measure is a set function that characterizes coalitions contributions (n attributes entail 2n coalitions and coefficients). Coalitions can then be aggregated by the FI towards protein-sequence classification (annotation). In this work, fuzzy measure identification is performed through the HLMSr algorithm considering coalitions of up to 3 attributes. Regarding the interpretation of fuzzy measures, the generalized Shapley index (gindex) is used. As a result, relevant, redundant and complementary physiochemical protein attributes can be recognized and used for robust protein-sequence annotation. The proposal is evaluated with attributesfrom Arabidopsis data and the biological interpretation of results is discussed.
Meeting type: Congreso.
Type of job: Resumen.
Production: Subset selection of protein attributes by Fuzzy Integrals.
Scientific meeting: 4to Congreso Bioinformática 4CA2B2C / 4SoIBio ? Rosario 2013.
Meeting place: Rosario.
Organizing Institution: CIFASIS.
It's published?: Yes
Publication place: Rosario
Meeting month: 10