Transforming DNA Sequences into Musical Patterns Via A 3-mer Classification

Abel Prayoga(1), Elis Khatizah(2),


(1) IPB University
(2) IPB University
(*) Corresponding Author

Abstract


DNA can be viewed as a symbolic sequence with patterns that vary across species. This study explores DNA sequences through two complementary approaches: species classification using simple machine learning methods and transformation of DNA into musical note representations. In the first task, DNA sequences from five organisms with different evolutionary distances are represented using 3-mer and 6-mer features. These k-mers form a vocabulary whose frequency counts are converted into feature vectors. Random Forest (RF) and Support Vector Machine (SVM) models are then applied for five-class classification. Using an 80:20 train-test split and 10-fold cross-validation, the SVM model achieved average accuracies above 0.90 for 3-mer features, with low standard deviation, indicating stable performance. In the second approach, 3-mer motifs are mapped to musical notes to generate species-based musical patterns. The resulting musical representations exhibit distinct structural differences across species, reflecting variations in underlying sequence composition. Overall, the results demonstrate that 3-mer features are effective for species discrimination and that musical transformation provides an alternative and intuitive way to visualize DNA sequence patterns.

KeywordsDNA Classification, DNA-to-Music, Random Forest, SVM.


Full Text:

PDF

References


C. R. . Calladine, Understanding DNA : the molecule & how it works. Elsevier Academic Press, 2004.

M. Yousef and J. Allmer, “Classification of Precursor MicroRNAs from Different Species Based on K-mer Distance Features,” Algorithms, vol. 14, no. 5, p. 132, Apr. 2021, doi: 10.3390/a14050132.

J. Liu, “Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm,” Apr. 2019. DOI: 10.48550/arXiv.1904.09061.

E. Khatizah and H. S. Park, “Country-Based COVID-19 DNA Sequence Classification in Relation with International Travel Policy,” Applied Sciences (Switzerland), vol. 14, no. 5, 2024, doi: 10.3390/app14051916.

D. Temperley, “Melodic Pattern Repetition and Efficient Encoding: A Corpus Study,” Empirical Musicology Review, vol. 18, no. 2, pp. 97–116, Jun. 2024, doi: 10.18061/emr.v18i2.9289.

K. M. Jenike et al., “k -mer approaches for biodiversity genomics,” Genome Res, Jan. 2025, doi: 10.1101/gr.279452.124.

S. H. Huspi, H. D. Abubakar, and M. Umar, “A Scheme of Pairwise Feature Combinations to Improve Sentiment Classification Using Book Review Dataset,” International Journal of Innovative Computing, vol. 12, no. 1, pp. 25–33, Nov. 2021, doi: 10.11113/ijic.v12n1.344.

F. Maleki, K. Ovens, K. Najafian, B. Forghani, C. Reinhold, and R. Forghani, “Overview of Machine Learning Part 1,” Neuroimaging Clin N Am, vol. 30, no. 4, pp. e17–e32, Nov. 2020, doi: 10.1016/j.nic.2020.08.007.

C. Avci, M. Budak, N. Yağmur, and F. Balçik, “Comparison between random forest and support vector machine algorithms for LULC classification,” International Journal of Engineering and Geosciences, vol. 8, no. 1, pp. 1–10, Feb. 2023, doi: 10.26833/ijeg.987605.

C. Moeckel et al., “A Survey of K-mer Methods and Applications in Bioinformatics,” Computational and Structural Biotechnology Journal, vol. 23, pp. 2289–2303, Dec. 2024, doi: https://doi.org/10.1016/j.csbj.2024.05.025.

S. C. Manekar and S. R. Sathe, “A benchmark study of k-mer counting methods for high-throughput sequencing,” GigaScience, Oct. 2018, doi: https://doi.org/10.1093/gigascience/giy125.

H. Grosjean and E. Westhof, “An integrated, structure- and energy-based view of the genetic code,” Nucleic Acids Res, vol. 44, no. 17, pp. 8020–8040, Sep. 2016, doi: 10.1093/nar/gkw608.

G. Hardegree, “Scales in Music,” Academia.edu, 2001. https://www.academia.edu/65537416/Scales_in_Music (accessed Nov. 2, 2025).

A. Nakamura, O. J. College, K. Kinoshita, and Y. Nanjo, “The Pentatonic Scale Gives Everyone a Chance to Create Music: Creating, Sharing, and Developing Music with Participants,” International Journal of Creativity in Music Education, vol.09, pp.42-55, 2022. DOI https://doi.org/10.50825/icme.09.0_42.

J. Wu, X. Liu, X. Hu, and J. Zhu, “PopMNet: Generating structured pop music melodies using neural networks,” Artificial Intelligence, vol. 286, p. 103303, Sep. 2020, doi: 10.1016/j.artint.2020.103303




DOI: http://dx.doi.org/10.36722/sst.v11i1.5258

Refbacks

  • There are currently no refbacks.


LP2M (Lembaga Penelitian dan Pengembangan Masyarakat)

Universitas AL-AZHAR INDONESIA, Lt.2 Ruang 207

Kompleks Masjid Agung Al Azhar

Jl. Sisingamangaraja, Kebayoran Baru

Jakarta Selatan 12110

Visitor

 This work is licensed under CC BY 4.0