Mobile devices have been a promising platform for musical performance thanks to the various sensors readily available on board. In particular, mobile cameras can provide rich input as they can capture a wide variety of user gestures or environment dynamics. However, this raw camera input only provides continuous parameters and requires expensive computation. In this paper, we propose to combine motion/gesture input with the touch input, in order to filter movement information both temporally and spatially, thus increasing expressiveness while reducing computation time. We present a design space which demonstrates the diversity of interactions that our technique enables. We also report the results of a user study in which we observe how musicians appropriate the interaction space with an example instrument.