Audio-to-score singing transcription based on a CRNN-HSMM hybrid model

Open Access

20 April 2021

journal article
research article
Published by Now Publishers in APSIPA Transactions on Signal and Information Processing

Vol. 10 (1)
https://doi.org/10.1017/atsip.2021.4

Abstract

This paper describes an automatic singing transcription (AST) method that estimates a human-readable musical score of a sung melody from an input music signal. Because of the considerable pitch and temporal variation of a singing voice, a naive cascading approach that estimates an F0 contour and quantizes it with estimated tatum times cannot avoid many pitch and rhythm errors. To solve this problem, we formulate a unified generative model of a music signal that consists of a semi-Markov language model representing the generative process of latent musical notes conditioned on musical keys and an acoustic model based on a convolutional recurrent neural network (CRNN) representing the generative process of an observed music signal from the notes. The resulting CRNN-HSMM hybrid model enables us to estimate the most-likely musical notes from a music signal with the Viterbi algorithm, while leveraging both the grammatical knowledge about musical notes and the expressive power of the CRNN. The experimental results showed that the proposed method outperformed the conventional state-of-the-art method and the integration of the musical language model with the acoustic model has a positive effect on the AST performance.

Keywords

This publication has 23 references indexed in Scilit:

Towards end-to-end polyphonic music transcription: Transforming music audio directly to a score
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017
Rhythm transcription of MIDI performances based on hierarchical Bayesian modelling of repetition and modification of musical note patterns
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
madmom
Published by Association for Computing Machinery (ACM) ,2016
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing, 2011
Hidden semi-Markov models
Artificial Intelligence, 2010
Automatic Transcription of Melody, Bass Line, and Chords in Polyphonic Music
Computer Music Journal, 2008
A hybrid graphical model for rhythmic parsing
Artificial Intelligence, 2002
An experiment in musical composition
IRE Transactions on Electronic Computers, 1957

Cited by 8 articles