Audio Representation for Machine Learning

When training machine learning systems on audio data for tasks like speech recognition it is useful to first transform the audio into a rich intermediate representation like a spectrogram. Although with enough data effective models can be trained to use the raw audio as inputs models which begin with rich representations typically perform better. I will talk about several different audio representation schemes including spectrograms, mel filter banks, and MFCC's and wavelets. We will discuss how each of these representations works, the types of information preserved and destroyed by each, and their strengths and weaknesses from a machine learning perspective. [322]

Tim Anderton @anderton_tim

Friday, Jun 8th, 03:00pm-03:50pm
Room 200C (data)
Planning to attend? Yes No