UWEE Tech Report Series

Modulation Scale Analysis for Content Identification


Somsak Sukittanon, Les E. Atlas, and James W. Pitton

2D features, content identification, feature extraction, feature normalization, long-term features, modulation features, short-term features.


For nonstationary signal classification, e.g. speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer-term signal variation. Partially motivated by human audition, we overcome the deficiencies of short-term features by employing modulation scale analysis for long-term feature analysis. Our analysis, which uses time-frequency theory integrated with psychoacoustic results on modulation frequency perception, not only contains short-term information about the signals, but also provides long-term information representing patterns of time variation. This paper describes these features and their normalization. We demonstrate the effectiveness of our long- term features over conventional short-term features in content-based audio identification. A simulated study using a large data set, including nearly ten thousand songs and requiring over a billion audio pairwise comparisons, shows that modulation scale features improves content identification accuracy substantially, especially when time and frequency distortions are imposed.

Download the PDF version