A New Approach to Motif Templates Analysis via Compilation Technique

Abstract

Motif template assertion and analysis is compulsory operation in most of bioinformatics systems such as motif search, sequential pattern miner, and bioinformatics databases analysis. The motif template can be in any length, therefore, the typing errors increased according to the length of motif. Also, when the structure motifs are submitted to bioinformatics systems they require specification of their components, i.e. the simple motifs, gaps, and the limits of the gaps. This research proposed a context free grammar, GFC, to describe the motif structure, and then this CFG is utilized to design an interpreter to detect, debug the errors, and analyze the motif template to its components. All the errors of 100 motifs of length arranged from 100 Base to 10 KBase are detected. These motifs are entered by 10 data entries. The experiments showed high correlation between number of errors and number of gaps, size of simple motifs, and motif template size. The target code of the interpreter is the components of a submitted motif template to be used in bioinformatics systems as next steps