Inspecting and Interacting with Meaningful Music Representations using VAE

Yang, Ruihan and Chen, Tianyao and Zhang, Yiyi and gus xia

Proceedings of the International Conference on New Interfaces for Musical Expression

Variational Autoencoder has already achieved great results on image generation and recently made promising progress on music sequence generation. However, the model is still quite difficult to control in the sense that the learned latent representations lack meaningful music semantics. What users really need is to interact with certain music features, such as rhythm and pitch contour, in the creation process so that they can easily test different composition ideas. In this paper, we propose a disentanglement by augmentation method to inspect the pitch and rhythm interpretations of the latent representations. Based on the interpretable representations, an intuitive graphical user interface demo is designed for users to better direct the music creation process by manipulating the pitch contours and rhythmic complexity.