Harmonic pitch class profiles

Harmonic pitch class profiles (HPCP) is a vector of features extracted from an audio signal, based on the Pitch Class Profile descriptor proposed by Fujishima in the context of a chord recognition system.^[1] HPCP are enhanced pitch distribution feature which are sequences of feature vectors describing tonality measuring the relative intensity of each of the 12 pitch classes of the equal-tempered scale within an analysis frame. It is also called Chroma.

By doing some process on musical signals, HPCP feature can be found and be used to estimate the key of a piece,^[2] to measure similarity between two musical pieces (cover version identification)^[3] and to classify music in terms of composer, genre or mood. The process is related to time-frequency analysis. In general, chroma features is robust to noise (e.g., ambient noise or percussive sounds), independent of timbre and played instruments and independent of loudness and dynamics.

HPCPs are tuning independent and consider the presence of harmonic frequencies, so that the reference frequency can be different from the standard A 440 Hz. The result of HPCP computation is a 12, 24, or 36-bin octave-independent histogram depending on the desired resolution, representing the relative intensity of each 1, 1/2, or 1/3 of the 12 semitones of the equal tempered scale.

General HPCP feature extraction procedure

The block diagram of the procedure is shown in Fig.1^[4] and is further detailed in.^[5]

The General HPCP feature extraction procedure is summarized as follows:

Input musical signal.
Do spectral analysis to know the frequency components of the music signal.
Use Fourier transform to convert the signal into a spectrogram. (The Fourier transform is a type of time-frequency analysis.)
Do frequency filtering. Only a frequency band between 100 and 5000 Hz is used.
Do peak detection. Only the local maximum values of spectrum are considered.
Do reference frequency computation procedure. Estimate the deviation with respect to 440 Hz.
Do Pitch class mapping with respect to the estimated reference frequency. This is a procedure for determining the pitch class value from frequency values. A weighting scheme with cosine function is used. It considers the presence of harmonic frequencies (harmonic summation procedure), taking account a total of 8 harmonics for each frequency. In order to map the value on a one-third of a semitone, the size of the pitch class distribution vectors has to be equal to 36.
Normalize the feature frame by frame dividing through the maximum value to eliminate dependency on global loudness. And then we can get a result HPCP sequence like Fig.2.

System of measuring similarity between two songs

After getting the HPCP feature, the pitch of the signal in a time section is known. The HPCP feature has been used to compute similarity between two songs in many research. A system of measuring similarity between two songs is shown in Fig.3. First, time-frequency analysis is needed to extract the HPCP feature. And then set two songs' HPCP feature to a global HPCP, so there is a standard of comparing. The next step is to use the two features to construct a binary similarity matrix. Smith–Waterman algorithm is used to construct a local alignment matrix H in the Dynamic Programming Local Alignment. Finally, after doing post processing, the distance between two songs can be computed.

References

^ Fujishima, T. Realtime chord recognition of musical sound: a system using Common Lisp Music, ICMC, Beijing, China, 1999, pp. 464–467.
^ Gomez, E. Herrera, P. (2004). Estimating The Tonality Of Polyphonic Audio Files: Cognitive Versus Machine Learning Modelling Strategies. ISMIR 2004 – 5th International Conference on Music Information Retrieval.
^ Joan Serra, Emilia Gomez, Perfecto Herrera, and Xavier Serra Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification August, 2008
^ Joan Serra, Emilia Gomez, Perfecto Herrera, and Xavier Serra Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification August, 2008
^ Gomez, E. Tonal description of polyphonic audio for music content processing. INFORMS Journal on Computing. Special Cluster on Music Computing. Chew, E., Guest Editor, 2004.

HPCP - Harmonic Pitch Class Profile plugin available for download http://mtg.upf.edu/technologies/hpcp

[1] Fujishima, T. Realtime chord recognition of musical sound: a system using Common Lisp Music, ICMC, Beijing, China, 1999, pp. 464–467.

[2] Gomez, E. Herrera, P. (2004). Estimating The Tonality Of Polyphonic Audio Files: Cognitive Versus Machine Learning Modelling Strategies. ISMIR 2004 – 5th International Conference on Music Information Retrieval.

[3] Joan Serra, Emilia Gomez, Perfecto Herrera, and Xavier Serra Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification August, 2008

[4] Joan Serra, Emilia Gomez, Perfecto Herrera, and Xavier Serra Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification August, 2008

[5] Gomez, E. Tonal description of polyphonic audio for music content processing. INFORMS Journal on Computing. Special Cluster on Music Computing. Chew, E., Guest Editor, 2004.

[1]

[2]

[3]

[4]

[5]

General HPCP feature extraction procedure

System of measuring similarity between two songs

See also

References