UWEE Tech Report Series

Modulation Scale Analysis for Content Identification


Somsak Sukittanon, Les E. Atlas, and James W. Pitton

2D features, content identification, feature extraction, feature normalization, long-term features, modulation features, short-term features.


For nonstationary signal classification, e.g. speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer-term signal variation. Partially motivated by human audition, we overcome the deficiencies of short-term features by employing modulation scale analysis for long-term feature analysis. Our analysis, which uses time-frequency theory integrated with psychoacoustic results on modulation frequency perception, not only contains short-term information about the signals, but also provides long-term information representing patterns of time variation. This paper describes these features and their normalization. We demonstrate the effectiveness of our long- term features over conventional short-term features in content-based audio identification. A simulated study using a large data set, including nearly ten thousand songs and requiring over a billion audio pairwise comparisons, shows that modulation scale features improves content identification accuracy substantially, especially when time and frequency distortions are imposed.

Download the PDF version

Download the Gzipped Postscript version