Interacting with musical avatars have been increasingly popular over the years, with the introduction of games likeGuitar Hero and Rock Band. These games provide MIDIequipped controllers that look like their real-world counterparts (e.g. MIDI guitar, MIDI drumkit) that the users playto control their designated avatar in the game. The performance of the user is measured against a score that needs tobe followed. However, the avatar does not move in responseto how the user plays, it follows some predefined movementpattern. If the user plays badly, the game ends with theavatar ending the performance (i.e. throwing the guitar onthe floor). The gaming experience would increase if theavatar would move in accordance with user input. This paper presents an architecture that couples musical input withbody movement. Using imitation learning, a simulated human robot learns to play the drums like human drummersdo, both visually and auditory. Learning data is recordedusing MIDI and motion tracking. The system uses an artificial intelligence approach to implement imitation learning,employing artificial neural networks.