An entertainment environment to enrich music listening experience is purposed. This environment is composed of 3 modules: a MIDI player, a music animation and a haptic module that translates the notes played by one instrument into a resemblant vibration. To create the haptic vibration, the notes’ relative pitch in the song are calculated, then these positions are mapped into the haptic signals’ amplitude and frequency. Also, the envelope of the haptic signal is modified, by using an ADSR filter, to have the same envelope as the audio signal. To evaluate the perceived cross-modal similarity between users, two experiments were performed. In both, the users used the complete entertainment environment to rank the similarity between 3 different haptic signals, with triangular, square and analogue envelopes and 4 different instruments in a classical song. The first experiment was performed with the purposed amplitude and frequency technique, while the second experiment was performed with constant frequency and amplitude. Results, show different envelope user preferences. The square and triangular envelopes were preferred in the first experiment, while only analogue envelopes were preferred in the second. This suggests that the users’ envelope perception was masked by the changes in amplitude and frequency.