Research and Implementation of DNA Molecular Sequence Pattern Matching Algorithm Based on Bioinformatics

1 January 2023

journal article
Published by Hans Publishers in Computer Science and Application

Vol. 13 (02), 236-250
https://doi.org/10.12677/csa.2023.132024

Abstract

Bioinformatics is a science that integrates advanced biological science and computer technology. It integrates mathematics, information science and computer technology to scientifically organize, sort out and conclude the information of biology and medicine. DNA sequence alignment is one of the most important and basic research directions in bioinformatics and an important means to explore the relationship between genes and diseases. The main objective of this paper is to find all sequences that are identical to the target sequence and whose occurrence probability is greater than the given threshold in the uncertain molecular sequence data and to give the total number of target sequences and the starting site of each target sequence. In this paper, a weighted suffix tree-based DNA sequence pattern matching algorithm is proposed to solve the problem that the existing molecular sequence pattern matching algorithm based on “space for time” is limited to the calculation of times, and the image stereo matching method based on the double DNA sequence alignment algorithm in bioinformatics is limited to uncertain source data. This method uses weighted suffix trees as the main data structure, improves the matching accuracy of uncertain source data, and solves the problem that map data structure is limited to number calculation. Experimental results show that the proposed algorithm has improved the matching speed and sensitivity to a certain extent.

Keywords

This publication has 17 references indexed in Scilit:

DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes By Using Deep Learning
Cells, 2020
Bioinformatics approaches for deciphering the epitranscriptome: Recent progress and emerging topics
Computational and Structural Biotechnology Journal, 2020
ISGm1A: Integration of Sequence Features and Genomic Features to Improve the Prediction of Human m1A RNA Methylation Sites
IEEE Access, 2020
Identifying Enhancers and Their Strength by the Integration of Word Embedding and Convolution Neural Network
IEEE Access, 2020
Bigram-PGK: phosphoglycerylation prediction using the technique of bigram probabilities of position specific scoring matrix
BMC Molecular and Cell Biology, 2019
RNAm5CPred: Prediction of RNA 5-Methylcytosine Sites Based on Three Different Kinds of Nucleotide Composition
Molecular Therapy Nucleic Acids, 2019
DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding
International Journal of Machine Learning and Cybernetics, 2019
The roles of DNA, RNA and histone methylation in ageing and cancer
Nature Reviews Molecular Cell Biology, 2019
iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences
Oncotarget, 2016
40 years of suffix trees
Communications of the ACM, 2016