Generating Convincing Harmony Parts with Simple Long Short-Term Memory Networks

Faitas, Andrei and Baumann, Synne Engdahl and Næss, Torgrim Rudland and Torresen, Jim and Martin, Charles Patrick

Proceedings of the International Conference on New Interfaces for Musical Expression

Generating convincing music via deep neural networks is a challenging problem that shows promise for many applications including interactive musical creation. One part of this challenge is the problem of generating convincing accompaniment parts to a given melody, as could be used in an automatic accompaniment system. Despite much progress in this area, systems that can automatically learn to generate interesting sounding, as well as harmonically plausible, accompanying melodies remain somewhat elusive. In this paper we explore the problem of sequence to sequence music generation where a human user provides a sequence of notes, and a neural network model responds with a harmonically suitable sequence of equal length. We consider two sequence-to-sequence models; one featuring standard unidirectional long short-term memory (LSTM) architecture, and the other featuring bidirectional LSTM, both successfully trained to produce a sequence based on the given input. Both of these are fairly dated models, as part of the investigation is to see what can be achieved with such models. These are evaluated and compared via a qualitative study that features 106 respondents listening to eight random samples from our set of generated music, as well as two human samples. From the results we see a preference for the sequences generated by the bidirectional model as well as an indication that these sequences sound more human.