| Top | Calendar | Links | Readings |
This course covers machine extraction of structure in audio files covering areas such as source separation (unmixing audio recordings into individual component sounds), sound object recognition (labeling sounds), melody tracking, beat tracking, and perceptual mapping of audio to machine-quantifiable measures.
This course is approved for the Breadth Interfaces & project requirement in the CS curriculum.
Prior programming experience sufficient to be able to do laboratory assignments in PYTHON, implementing algorithms and using libraries without being taught to do so (there is no language instruction on Python). Having taken EECS 211 and 214 would demonstrate this experience.
Fundamentals of Music Processing
Lecture: Tue, Tue, 3:30 - 4:50pm CST in Technological Institute M164
Dr. Jason Smith 1pm - 2pm Tuesdays and Fridays in Mudd 3506
TA Yuchen Cao 11am - 1pm Thursdays in Mudd 3108
Peer Mentor Nandini Ventakesh 11am - 12pm Wednesdays in Mudd 3108
Peer Mentor Aidan Mott 3pm - 4pm Mondays in Mudd 3rd Floor Front Counter
Please use CampusWire for class-related questions.
You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-…and so on).
Every assignment is worth 20 points. There are 5 assignments (including the final project). Your final grade will be the sum of midterm grade + your 4 highest assignment grades. This means you can skip any one assignment.
Homework and reading assignments are solo assignments and must be your original work.
You are expected to write your own code and write up your own answers to question. This means you. Not ChatGPT or Gemini or Copilot. This is an optional class you are (presumably) taking because you’re interested. So put in the time to learn this stuff, yourself.
Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can’t finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.
| Week | Date | Topic | ASSIGNMENT | Points |
|---|---|---|---|---|
| 1 | Thu Apr 2 | Course intro, Recording basics | ||
| 2 | Tue Apr 7 | Frequency & Pitch, Tuning Systems | ||
| 2 | Thu Apr 9 | Loudness & Amplitude | ||
| 3 | Tue Apr 14 | Fourier Transforms & Spectrograms | ||
| 3 | Thu Apr 16 | Convolution & Filtering | HW 1 Audio Basics | 20 |
| 4 | Tue Apr 21 | Convolution & FFT notebooks | ||
| 4 | Thu Apr 23 | SOURCE SEPARATION WITH REPET | ||
| 5 | Tue Apr 28 | MFCCs and Chromagrams & MFCC & Chroma notebooks | ||
| 5 | Thu Apr 30 | Self Similarity | HW 2 Spectrograms, Masking | 20 |
| 6 | Tue May 5 | MIDTERM REVIEW + Pitch Tracking | ||
| 6 | Thu May 7 | MIDTERM | MIDTERM | 20 |
| 7 | Tue May 12 | Sound Object Labeling | ||
| 7 | Thu May 14 | Deep Learning & Autoencoders | HW 3 Infinite Jukebox | 20 |
| 8 | Tue May 19 | Embeddings & Embeddings Notebook | ||
| 8 | Thu May 21 | Final projects, VoiceID, Source Separation | ||
| 9 | Tue May 26 | Final project group formation & proposals | HW 4 Using Embeddings | 20 |
| 9 | Thu May 28 | Current research in music & audio | Project proposal due | 3 (of 20) |
| 10 | Tue Jun 2 | Zoom meetings with project groups (no class: meetings by appointment) | Project meeting | 3 (of 20) |
| 10 | Thu Jun 4 | Current research in music & audio | ||
| 11 | Tue Jun 9 | Zoom meetings with project groups (no class: meetings by appointment) | Project meeting | 3 (of 20) |
| 11 | Thu Jun 11 | Final project presentations 3-5pm NEW LOCATION: HCI+D Center in Francis Searle Building | Final project | 11(of 20) |
Fundamentals of Music Processing, Chapter 1
Fundamentals of Music Processing, Chapter 2 & Section 3.1
Fundamentals of Music Processing, Chapter 4
Fundamentals of Music Processing, Chapter 6
Fundamentals of Music Processing, Chapter 7
* REPET for Background/Foreground Separation in Audio
Chapter 4 of Machine Learning : This is Tom Mitchell’s book. Historical overview + explanation of backprop of error. It’s a good starting point for actually understanding deep nets.
Yin: a fundamental frequency estimator for speech and music - This is, perhaps, the most popular pitch tracker.
Crepe: A Convolutional Representation for Pitch Estimation - A deep learning pitch tracker that improves on Yin.
The dummy’s guide to MFCC - an easy, high-level read. Start with this.
From Frequency to Quefrency: A History of the Cepstrum - a historical analysis of the uses of cepstrums
Recovering sound sources from embedded repetition - This is a paper on how humans actually listen to and parse audio based on repetition. Read any time.
EECS 352 Final projects from 2017 and 2015
Facebook’s Universal Music Translation
A coursera corse on pitch tracking
U of Iowa’s Music Instrument Samples Dataset
The SocialFX data set of word descriptors for audio
VocalSketch: thousands of vocal imitations of a large set of diverse sounds
Bach10: audio recordings of each part and the ensemble of ten pieces of four-part J.S. Bach chorales
Python Utilities for Detection and Classification of Acoustic Scenes
Librosa audio and music processing in Python
Essentia: an open source music analysis toolkit includes a bunch of feature extractors and pre-trained models for extracting e.g. beats per minute, mood, genre, etc.
Yaafe - audio features extraction toolbox
Sonic Visualizer music viz software
Lily Pond, open source music notation software
SoundSlice guitar tab and notation website
| Top | Calendar | Links | Readings |