We discuss how to model "gestures" in music performance with statistical latent-states models. A music performance can be described with compositional and expressive properties varying over time. In those property changes we often observe particular patterns, and such a pattern can be understood as a "gesture", since it serves as a medium transferring specific emotions. Assuming a finite number of latent states on each property value changes, we can describe those gestures with statistical latent-states models, and train them by unsupervised learning algorithms. In addition, model entropy provides us a measure for different effects of each properties on the gesture implementation. Test result on some of real performances indicates that the trained models could capture the structure of gestures observed in the given performances, and detect their boundaries. The entropy-based measure was informative to understand the effectiveness of each property on the gesture implementation. Test result on large corpora indicates that our model has potentials for afurther model improvement.