BioInfFast
Development of novel algorithms and software library for biomedical engineering application
About
The aim of this project is to develop a new algorithms for DNA alignment, search and clustering, new recommendation algorithm for collaborative health care and use of different meta-heuristic algorithms for biomedical engineering applications. We plan to create a software library that will consists of existing and new designed algorithms.
Description
Development of a novel algorithm for DNA alignment, search and clustering.
In recent years, several successful applications of data mining techniques for solving biological problems have been reported. Some examples of analyzing large data sets and inferring structure are: prediction of protein structure, gene classification, clustering of gene expression data, statistical modeling of protein-protein interaction, etc. Some common activities in bioinformatics that benefit from using information techniques are mapping and analyzing DNA and protein sequences, aligning DNA and protein sequences and creating 3-D models of protein structures.
Currently in the field of bioinformatics there is an urgent need of algorithms for DNA alignment, search and clustering, as well as optimization of the existing ones. In this activity, firstly we will try to develop and implement novel algorithms for DNA local pairwise alignment. Namely, the time inefficiency has been the major disadvantage of current known local pairwise alignment techniques. Despite the time complexity, space complexity is often found as a limiting factor when aligning large nucleotide sequences. In order to reduce space complexity of an alignment, methodology presented in (D. STOJANOV et al. [6]) represents each region of consecutive matching nucleotides with a triple, identifying region’s length and starting positions at the sequences. Based on this representation, measurements performed in [6] clearly show that linear space is required, while the time complexity is O(nm2). Most of the time in [6] is wasted on examining all combinations of un-gapped local alignments within overlapping sections and the number of alignments being examined. When aligning similar nucleotide sequences, large regions of consecutive matching nucleotides are part of the optimal un-gapped local alignment. Also, the probability of finding an optimal un-gapped local alignment within m–nucleotides long overlapping sections is higher than the probability of finding it in overlapping sections with less than m nucleotides, where m is the length of the smaller nucleotide sequence, subject of an alignment.
Thus, the aim of this activity is development of fast local alignment generating methodology, requiring linear time and space – O(m), when aligning approximately same size, similar nucleotide sequences.
We are going to work with data mining algorithms, that can be applied to gene finding, protein function domain detection, function motif detection, protein function inference, disease diagnosis, disease prognosis, disease treatment optimization, protein and gene interaction network reconstruction, etc.
Development of a novel recommendation algorithm for collaborative health care
The recent trend in healthcare support systems is the development of patient-centric pervasive environments in addition to the hospital-centric one. Pervasive health care takes steps to design, develop, and evaluate computer technologies that help citizens participate more closely in their own healthcare, on one hand, and on the other to provide flexibility in patients’ active everyday life with work, family and friends. In the framework of this working package mathematical model of a novel algorithm that generates recommendations and suggestions for preventive intervention instead of emergency care and hospital admissions is planned to be developed. The main purpose of this algorithm is to find the dependence between users’ health condition and performed physical activities. In this way, the proposed algorithm should generate recommendations that will help the user to adapt his physical activities in order to improve his own health. These recommendations should also encourage users to lead an active life filled with physical activities and thus become direct participants in maintaining their own health care.
The main purpose of this algorithm is to find the dependency of the users’ health condition and physical activities they perform. The algorithm will incorporate collaboration and classification techniques in order to generate recommendations and suggestions for preventive intervention. To achieve this, we will consider datasets from the health history of users and use classification algorithms on these datasets for grouping the users based on their similarity. Use of classified data when generating the recommendation provides more relevant recommendations, because they are enacted on knowledge for users with similar medical conditions and reference parameters.
There are a number of parameters that might be used to characterize a person. They are all mainly continuous variables and they are measured with (near) continuous resolution. On the other hand, the bio-medical parameters and phenomena are often too complex and too little understood to be modelled analytically. Because of its continuous nature, the fuzzy systems are very close to the medical reality and at the same time, fuzzy sets allow natural description of bio-medical variables using symbolic models and their formalisms, avoiding the analytical modelling.
Therefore, in our algorithm, fuzzy sets and fuzzy discretization will be considered as a suitable approach that can bridge the gap between the discrete way reasoning in the IT systems and the continuity of biomedical parameters. The proposed algorithm will be implemented within the library and validated using generic data. The results of this validation will be presented and conclusions will be derived.
Implementation of meta-heuristic algorithms for biomedical engineering applications
In this action several meta-heuristic algorithms will be implemented with emphasis on their application in solving optimization biomedical engineering problems. The following algorithms will be developed: genetic algorithms, ant-colony, particle swarm optimization and harmony search.
Genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover.
Ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. This algorithm is a member of the ant colony algorithms family, in swarm intelligence methods, and it constitutes some metaheuristic optimizations. The initial idea was aiming to search for an optimal path in a graph, based on the behavior of ants seeking a path between their colony and a source of food. The original idea has since diversified to solve a wider class of numerical problems, and as a result, several problems have emerged, drawing on various aspects of the behavior of ants.
Particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. PSO optimizes a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formulae over the particle's position and velocity. Each particle's movement is influenced by its local best known position and is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions.
Harmony search (HS) is a phenomenon-mimicking algorithm inspired by the improvisation process of musicians. Nowadays, music-inspired phenomenon-mimicking harmony search algorithm is fast growing with many applications. One of key success factors of the algorithm is the employment of a novel stochastic derivative which can be used even for discrete variables. Instead of traditional calculus-based gradient, the algorithm utilizes musician’s experience as a derivative in searching for an optimal solution. This can be a new paradigm and main reason in the successes of various applications.
Design and implementation of software library for biomedical applications
The last activity of this project foresees implementation of all the previously developed algorithms and their integration into a single software library. In this activity the architecture of the library will be designed first. Two architecture models will be evaluated: three-tier and MVC as possible solutions. The main aim of this activity will be to produce an architecture model that will enable plug-and-play possibilities for various independent modules implementing various algorithms and providing the same mathematical foundation for all of them. Moreover, the library will be designed so, that it will enable various interfaces towards the end-users applications.
Publications
Paper: Towards Computational Improvement of DNA Database Iindexing and Short DNA Query Searching
Authors: Stojanov, Done, Koceski, Saso, Mileva, Aleksandra, Koceska, Natasa, and Martinovska-Bande, Cveta
Journal: Biotechnology & Biotechnological Equipment, ISSN 1310-2818 (Print), 1314-3530 (Online)
Volume and number: 28(5), pp. 958-967
Year: 2014
Paper: Topological Prostate Segmentation Method in MRI
Authors: Stojanov, Done, and Koceski, Saso
Conference: 9th International Symposium Advances in Artificial Intelligence and Applications, FedCSIS 2014
Warsaw, Poland, 7 - 10 September, 2014
Annals of Computer Science and Information Systems,
Volume: 2, pp. 219–225
Paper: A short survey of pair-wise sequence alignment algorithms
Authors: Stojanov, Done, and Mileva, Aleksandra
Conference: CIIT 2014
Bitola, Macedonia, April 11-13, 2014
Paper: Development of a novel recommendation algorithm for collaborative health - care system model
Authors: Kulev, Igor, Vlahu-Gjorgievska, Elena, Trajkovik, Vladimir, and Koceski, Saso
Journal: Computer Science and Information Systems, ISSN 1820-0214
Volume and number: 10(3), pp. 1455-1471
Year: 2013
Paper: Improved alignment of homologous DNA sequences
Authors: Stojanov, Done, and Martinovska, Cveta
Journal: Annals of West University of Timisoara, ser. Biology, ISSN print 1453-7680, on-line: 2285-7044
Volume and number: 16(2), pp. 97-106
Year: 2013
Paper: FLAG: Fast Local Alignment Generating Methodology
Authors: Stojanov, Done, Koceski, Saso, and Mileva, Aleksandra
Journal: Romanian Biotechnological Letters, ISSN 1224-5984
Volume and number: 18(1), pp. 7881-7888
Year: 2013
Paper: A new, space-efficient local pairwise alignment methodology
Authors: Stojanov, Done, Mileva, Aleksandra, and Koceski, Saso
Journal: Advanced Studies in Biology, ISSN 1313-9495
Volume and number: 4(2), pp. 85-93
Year: 2012
Paper: Providing Collaborative Algorithms Support for Personal Health Care
Authors: Vladimir Trajkovik, Elena Vlahu-Gjorgievska, Igor Kulev, Saso Koceski
Journal: American Journal of Bioinformatics, p-ISSN 2167-6992
Volume and number: 1, pp. 41-49
Year: 2012
Paper: Conceptual Clustering and Analysis of Data from Gynecological Database
Author: Martinovska, Cveta
Conference: ICT Innovations 2009
Publisher: Springer-Verlag
Year: 2009
Bibliography
DNA alignment, search and clustering
[1] D. STOJANOV, S. KOCESKI, A. MILEVA, Fast Local Alignment Generating Methodology. Romanian Biotechnological Letters, 18(1),7881, 7888 (2013)
[2] D. STOJANOV, A. MILEVA, S. KOCESKI, A new, space-efficient local pairwise alignment methodology. Advanced Studies in Biology, 4(2),85, 93 (2012)
[3] Bray N., Dubchak I., and Pachter L., AVID: A global alignment program. Genome research, 13(1),97, 102 (2003)
[4] Shen S., Yang J., Yao A., and Hwang P., Super pairwise alignment (SPA): an efficient approach to global alignment for homologous sequences. Journal of Computational Biology, 9(3),477, 486 (2002)
[5] Cedric N., Higgins D., and Heringa J., T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology, 302(1),205, 218 (2000)
[6] Morgenstern B., DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics, 15(3),211, 218 (1999)
[7] Thompson J., Higgins D., and Gibson T., CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic acids research, 22(22),4673, 4680 (1994)
[8] Chao K., Pearson W., and Miller W., Aligning two sequences within a specified diagonal band. Computer Applications in The Biosciences - BIOINFORMATICS, 8(5),481, 487 (1992)
[9] X. HUANG, W. MILLER, A time-efficient, linear-space local similarity algorithm. Advances in Applied Mathematics, 12,337, 357 (1991)
[11] S. ALTSCHUL, W. GISH, W. MILLER, E. MYERS, D. LIPMAN, Basic local alignment search tool. Journal of Molecular Biology, 215(3),403, 410 (1990)
[12] M. WATERMAN, M. EGGERT, A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. Journal of Molecular Biology, 197,723, 728 (1987)
[13] D. LIPMAN, W. PEARSON, Rapid and sensitive protein similarity searches. Science, 227(4693),1435, 1441 (1985)
[14] Fickett W., Fast optimal alignment. Nucleic Acids Res, 12(1),175, 179 (1984)
[15] Gotoh O., An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162(3),705, 108 (1982)
[16] T. SMITH, M. WATERMAN, Identification of common molecular subsequences. Journal of Molecular Biology, 147(1), 195, 197 (1981)
Development of a novel recommendation algorithm for collaborative health care
[1] Nachman, L., Baxi, A., Bhattacharya, S., Darera, V.: Jog Falls: A Pervasive Healthcare Platform for Diabetes Management. Pervasive Computing, Vol. 6030, 94–111.( 2010)
[2] Mohan, P., Marin, D., Sultan, S., Deen, A.: MediNet: Personalizing the Self-Care Process for Patients with Diabetes and Cardiovascular Disease Using Mobile Telephony. In Proc. 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, Canada, 755-758. (2008)
[3] Lee, R.G., Chen, K.C., Hsiao, C.C., Tseng C.L.: A Mobile Care System With Alert Mechanism. Trans. Info. Tech. Biomed, Vol.11, No.5, 507-517. (2007)
Implementation of meta-heuristic algorithms for biomedical engineering applications
[1] Chandra Mohan, B., and R. Baskaran. A survey: Ant Colony Optimization based recent research and implementation on several engineering domain. Expert Systems with Applications. (2011)
[2] Ahmadlou, Mehran, and Hojjat Adeli. Enhanced probabilistic neural network with local decision circles: A robust classifier.Integrated Computer-Aided Engineering 17, no. 3, 197-210. (2010)
[3] Guo, Pengfei, Xuezhi Wang, and Yingshi Han. The enhanced genetic algorithms for the optimization design." In Biomedical Engineering and Informatics (BMEI), 3rd International Conference on, vol. 7, pp. 2990-2994. IEEE. (2010)
[4] Parsopoulos, Konstantinos E., and Michael N. Vrahatis. Particle swarm optimization and intelligence: advances and applications. Hershey: Information Science Reference. (2010)
[5] Sharma, Neeraj, Amit K. Ray, Shiru Sharma, K. K. Shukla, Lalit Aggarwal, and Satyajit Pradhan. Segmentation of medical images using simulated annealing based fuzzy C Means algorithm. International Journal of Biomedical Engineering and Technology 2, no. 3, 260-278. (2009)
[6] Geem, Z.W., Kim, J.H., Loganathan, G.V.: A new heuristic optimization algorithm: harmony search. Simulation 76, 60–68. (2001)
Software Library
- Primary nucleotide sequence databases
- Software for DNA search and alignment
DNA search:
- BLAST
- FASTA
DNA pairwise alignment:
- Matcher: Waterman-Eggert local alignment of two sequences
- Stretcher: Needleman-Wunsch rapid global alignment of two sequences
- Water: Smith-Waterman local alignment of sequences
DNA multiple alignment:
- ClustalW
- T-Coffee
- Muscle
People