This paper describes a machine learning approach in the context of non-idiomatic human-machine improvisation. In an attempt to avoid explicit mapping of user actions to machine responses, an experimental machine learning strategy is suggested where rewards are derived from the implied motivation of the human interactor – two motivations are at work: integration (aiming to connect with machine generated material) and expression (independent activity). By tracking consecutive changes in musical distance (i.e. melodic similarity) between human and machine, such motivations can be inferred. A variation of Q-learning is used featuring a self-optimizing variable length state-action-reward list. The system (called Pock) is tunable into particular behavioral niches by means of a limited number of parameters. Pock is designed as a recursive structure and behaves as a complex dynamical system. When tracking systems variables over time, emergent non-trivial patterns reveal experimental evidence of attractors demonstrating successful adaptation.