cs352-public

CS 352 MACHINE PERCEPTION OF MUSIC AND AUDIO

Northwestern University Spring 2026

Top Calendar Links Readings

Course Description

This course covers machine extraction of structure in audio files covering areas such as source separation (unmixing audio recordings into individual component sounds), sound object recognition (labeling sounds), melody tracking, beat tracking, and perceptual mapping of audio to machine-quantifiable measures.

This course is approved for the Breadth Interfaces & project requirement in the CS curriculum.

Prior programming experience sufficient to be able to do laboratory assignments in PYTHON, implementing algorithms and using libraries without being taught to do so (there is no language instruction on Python). Having taken EECS 211 and 214 would demonstrate this experience.

Course Textbook

Fundamentals of Music Processing

Time & Place

Lecture: Tue, Tue, 3:30 - 4:50pm CST in Technological Institute M164

Instructors & Office Hours

Dr. Jason Smith 1pm - 2pm Tuesdays and Fridays in Mudd 3506

TA Yuchen Cao 11am - 1pm Thursdays in Mudd 3108

Peer Mentor Nandini Ventakesh 11am - 12pm Wednesdays in Mudd 3108

Peer Mentor Aidan Mott 3pm - 4pm Mondays in Mudd 3rd Floor Front Counter

Course Policies

Questions outside of class

Please use CampusWire for class-related questions.

Grading Policy

You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-…and so on).

Every assignment is worth 20 points. There are 5 assignments (including the final project). Your final grade will be the sum of midterm grade + your 4 highest assignment grades. This means you can skip any one assignment.

Homework and reading assignments are solo assignments and must be your original work.

AI policy

You are expected to write your own code and write up your own answers to question. This means you. Not ChatGPT or Gemini or Copilot. This is an optional class you are (presumably) taking because you’re interested. So put in the time to learn this stuff, yourself.

Submitting assignments

Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can’t finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.

Course Calendar

Week Date Topic ASSIGNMENT Points
1 Thu Apr 2 Course intro, Recording basics    
2 Tue Apr 7 Frequency & Pitch, Tuning Systems    
2 Thu Apr 9 Loudness & Amplitude    
3 Tue Apr 14 Fourier Transforms & Spectrograms    
3 Thu Apr 16 Convolution & Filtering HW 1 Audio Basics 20
4 Tue Apr 21 Convolution & FFT notebooks    
4 Thu Apr 23 SOURCE SEPARATION WITH REPET    
5 Tue Apr 28 MFCCs and Chromagrams & MFCC & Chroma notebooks    
5 Thu Apr 30 Self Similarity HW 2 Spectrograms, Masking 20
6 Tue May 5 MIDTERM REVIEW + Pitch Tracking    
6 Thu May 7 MIDTERM MIDTERM 20
7 Tue May 12 Sound Object Labeling    
7 Thu May 14 Deep Learning & Autoencoders HW 3 Infinite Jukebox 20
8 Tue May 19 Embeddings & Embeddings Notebook    
8 Thu May 21 Final projects, VoiceID, Source Separation    
9 Tue May 26 Final project group formation & proposals HW 4 Using Embeddings 20
9 Thu May 28 Current research in music & audio Project proposal due 3 (of 20)
10 Tue Jun 2 Zoom meetings with project groups (no class: meetings by appointment) Project meeting 3 (of 20)
10 Thu Jun 4 Current research in music & audio    
11 Tue Jun 9 Zoom meetings with project groups (no class: meetings by appointment) Project meeting 3 (of 20)
11 Thu Jun 11 Final project presentations 3-5pm NEW LOCATION: HCI+D Center in Francis Searle Building Final project 11(of 20)

Course Reading

Fundamentals of Music Processing, Chapter 1

Fundamentals of Music Processing, Chapter 2 & Section 3.1

Fundamentals of Music Processing, Chapter 4

Fundamentals of Music Processing, Chapter 6

Fundamentals of Music Processing, Chapter 7

* REPET for Background/Foreground Separation in Audio

Chapter 4 of Machine Learning : This is Tom Mitchell’s book. Historical overview + explanation of backprop of error. It’s a good starting point for actually understanding deep nets.

Yin: a fundamental frequency estimator for speech and music - This is, perhaps, the most popular pitch tracker.

Crepe: A Convolutional Representation for Pitch Estimation - A deep learning pitch tracker that improves on Yin.

The dummy’s guide to MFCC - an easy, high-level read. Start with this.

From Frequency to Quefrency: A History of the Cepstrum - a historical analysis of the uses of cepstrums

Recovering sound sources from embedded repetition - This is a paper on how humans actually listen to and parse audio based on repetition. Read any time.

Places to get ideas

EECS 352 Final projects from 2017 and 2015

Google’s Project Magenta

Facebook’s Universal Music Translation

A coursera corse on pitch tracking

Datasets

U of Iowa’s Music Instrument Samples Dataset

The SocialFX data set of word descriptors for audio

VocalSet: a singing voice dataset consisting of 10.1 hours of monophonic recorded audio of professional singers

VocalSketch: thousands of vocal imitations of a large set of diverse sounds

Bach10: audio recordings of each part and the ensemble of ten pieces of four-part J.S. Bach chorales

The Million Song Dataset

Software

Python Utilities for Detection and Classification of Acoustic Scenes

Librosa audio and music processing in Python

Essentia: an open source music analysis toolkit includes a bunch of feature extractors and pre-trained models for extracting e.g. beats per minute, mood, genre, etc.

Yaafe - audio features extraction toolbox

Sonic Visualizer music viz software

Lily Pond, open source music notation software

SoundSlice guitar tab and notation website

Top Calendar Links Readings