cs352-public

CS 352 MACHINE PERCEPTION OF MUSIC AND AUDIO

Northwestern University Spring 2026

Course Description

This course covers machine extraction of structure in audio files covering areas such as source separation (unmixing audio recordings into individual component sounds), sound object recognition (labeling sounds), melody tracking, beat tracking, and perceptual mapping of audio to machine-quantifiable measures.

This course is approved for the Breadth Interfaces & project requirement in the CS curriculum.

Prior programming experience sufficient to be able to do laboratory assignments in PYTHON, implementing algorithms and using libraries without being taught to do so (there is no language instruction on Python). Having taken EECS 211 and 214 would demonstrate this experience.

Course Textbook

Fundamentals of Music Processing

Time & Place

Lecture: Tue, Tue, 3:30 - 4:50pm CST in Technological Institute M164

Instructors & Office Hours

Dr. Jason Smith 1pm - 2pm Tuesdays and Fridays in Mudd 3506

TA Yuchen Cao 11am - 1pm Thursdays in Mudd 3108

Peer Mentor Nandini Ventakesh 11am - 12pm Wednesdays in Mudd 3108

Peer Mentor Aidan Mott 3pm - 4pm Mondays in Mudd 3rd Floor Front Counter

Course Policies

Questions outside of class

Please use CampusWire for class-related questions.

Grading Policy

You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-…and so on).

Every assignment is worth 20 points. There are 5 assignments (including the final project). Your final grade will be the sum of midterm grade + your 4 highest assignment grades. This means you can skip any one assignment.

Homework and reading assignments are solo assignments and must be your original work.

AI policy

You are expected to write your own code and write up your own answers to question. This means you. Not ChatGPT or Gemini or Copilot. This is an optional class you are (presumably) taking because you’re interested. So put in the time to learn this stuff, yourself.

Submitting assignments

Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can’t finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.

Course Calendar

Week	Date	Topic	ASSIGNMENT	Points
1	Thu Apr 2	Course intro, Recording basics
2	Tue Apr 7	Frequency & Pitch, Tuning Systems
2	Thu Apr 9	Loudness & Amplitude
3	Tue Apr 14	Fourier Transforms & Spectrograms
3	Thu Apr 16	Convolution & Filtering	HW 1 Audio Basics	20
4	Tue Apr 21	Convolution & FFT notebooks
4	Thu Apr 23	SOURCE SEPARATION WITH REPET
5	Tue Apr 28	MFCCs and Chromagrams & MFCC & Chroma notebooks
5	Thu Apr 30	Self Similarity	HW 2 Spectrograms, Masking	20
6	Tue May 5	MIDTERM REVIEW + Pitch Tracking
6	Thu May 7	MIDTERM	MIDTERM	20
7	Tue May 12	Sound Object Labeling
7	Thu May 14	Deep Learning & Autoencoders	HW 3 Infinite Jukebox	20
8	Tue May 19	Embeddings & Embeddings Notebook
8	Thu May 21	Final projects, VoiceID, Source Separation
9	Tue May 26	Final project group formation & proposals	HW 4 Using Embeddings	20
9	Thu May 28	Current research in music & audio	Project proposal due	3 (of 20)
10	Tue Jun 2	Zoom meetings with project groups (no class: meetings by appointment)	Project meeting	3 (of 20)
10	Thu Jun 4	Current research in music & audio
11	Tue Jun 9	Zoom meetings with project groups (no class: meetings by appointment)	Project meeting	3 (of 20)
11	Thu Jun 11	Final project presentations 3-5pm NEW LOCATION: HCI+D Center in Francis Searle Building	Final project	11(of 20)

Course Reading

Fundamentals of Music Processing, Chapter 1

Fundamentals of Music Processing, Chapter 2 & Section 3.1

Fundamentals of Music Processing, Chapter 4

Fundamentals of Music Processing, Chapter 6

Fundamentals of Music Processing, Chapter 7

* REPET for Background/Foreground Separation in Audio

Chapter 4 of Machine Learning : This is Tom Mitchell’s book. Historical overview + explanation of backprop of error. It’s a good starting point for actually understanding deep nets.

Yin: a fundamental frequency estimator for speech and music - This is, perhaps, the most popular pitch tracker.

Crepe: A Convolutional Representation for Pitch Estimation - A deep learning pitch tracker that improves on Yin.

The dummy’s guide to MFCC - an easy, high-level read. Start with this.

From Frequency to Quefrency: A History of the Cepstrum - a historical analysis of the uses of cepstrums

Recovering sound sources from embedded repetition - This is a paper on how humans actually listen to and parse audio based on repetition. Read any time.

Helpful Links

Essentia: an open source music analysis toolkit includes a bunch of feature extractors and pre-trained models for extracting e.g. beats per minute, mood, genre, etc.

Yaafe - audio features extraction toolbox

Sonic Visualizer music viz software

Lily Pond, open source music notation software

SoundSlice guitar tab and notation website

Top

Calendar

Links

Readings