Arts Entertainments

A Discriminative Model for Polyphonic Piano Transcription

Polyphonic Piano Transcription

Music sequences exhibit temporal structure and require an acoustic model that captures it. A recurrent architecture, which models the temporal correlations between notes and can also rewind the music to previous frames, is an effective solution. Such a model is called a music language model (MLM) and is an increasingly popular approach for automatic musical transcription. However, there are still several open questions regarding MLMs and the effectiveness of their usage in music transcription.

Despite their success in onset detection, MLMs are weak on predicting note sequences. The reason lies in the large output space of music. For example, a piano sound may have up to 88 pitches, with each pitch having a different time-frequency representation. This makes it difficult to use the same input representation for all frames in a music sequence. In addition, the acoustic models cannot detect multiple occurrences of the same pitch simultaneously. Therefore, a discriminative model that is trained to distinguish different pitches could significantly improve the performance of note-level polyphonic piano transcription.

https://www.tartalover.net/

In this article, we propose an integrated architecture that combines a MLM and an acoustic model. The MLM is used to predict a sequence of binary vectors representing the probabilities of each pitch, while the acoustic model is trained to detect a specific set of feature inputs. The outputs of the acoustic model are used as inputs for the MLM, and the log-likelihood of the MLM is maximized by repeating the same binary vectors as inputs.

A Discriminative Model for Polyphonic Piano Transcription

The proposed framework is applied to a new piano dataset with the goal of improving the performance of note-level transcription. The MLMs are evaluated on the MAPS database, which includes audio recordings of a real piano and an associated aligned MIDI file for each recorded frame. The results are compared to the baselines of GBS, RNN and LSTM.

In the hands of a skilled transcriber, the piano becomes a canvas for artistic expression, capable of conveying a vast array of emotions, moods, and textures. From the delicate intimacy of a solo piano sonata to the grandeur and power of a symphonic transcription, the piano offers limitless possibilities for interpretation and exploration.

The GBS method improves recall and precision, but it suffers from the drawback that prediction accumulates over time. The MLM-based inference algorithm can repair the thresholding transcription results by using the accumulated posterior probability p(y n | x n) to determine the sequence of non-blank onsets.

The results show that the proposed method outperforms the GBS and LSTM-based systems in both recall and F-measure. This demonstrates that the MLM-based inference algorithm is a promising method for improving note-level polyphonic piano transcription. The authors would like to thank the MAPS project for providing the piano dataset. This work was supported by the Royal Academy of Engineering Research Fellowship.

Leave a Reply

Your email address will not be published. Required fields are marked *