The difference between speech and language processing and other data processing is the use of knowledge of language. In this course, we will study how to describe, process and compute different levels of language knowledge including Phonetics and Phonology, Morphology, Syntax, Semantics, and how the language knowledge is used in speech and language applications such as named entities recognition, information extraction, question answering, speech recognition, and speech synthesis.

Teaching team


Instructor
Zhizheng Wu
TA
Li Wang

News

This page is for Spring 2024. Please there is a significant change in assessment from Spring 2023.

I only write recommendation letter if you get 'A'. Sorry if you only get 'A-'.

I am happy to refer you internship opportunities if you have high-quality contributions to Amphion. This is an example of a high-quality contribution. See the comments from a top Silcon Valley investor.

Logistics

Course Information


This course is designed as the first course for students who are interested in speech and language technology. The first half of the course focuses on the fundamentals and introduces tools for students to use, and the second half emphasises on applications, giving students the opportunity to know how speech and language technology could impact human life. In particular, the topics include:

  • Understanding human speech: spectrogram, fundamental frequency, formant, etc
  • Human sounds and their organization
  • Words and their relationship to other words
  • Syntax: Structure of sentences
  • Text processing and regular expressions
  • Language models
  • Embedding: Representations of the meaning of words
  • Word classifications and Named entities recognition
  • Applications: speech recognition, speech synthesis, machine translation, chatbot, etc

Prerequisites

Textbooks

Recommended Books:

Grading Policy (CSC3160)

Assignments (40%)

Midterm exam (25%)

Date: Mar 11 (same as lecture time)

Scope: Lecture 1 - 11

Final exam (30%)

Date: May 8 (same as lecture time)

Scope: All lectures (including high-level concept from guest lectures)

For CSC3160 students, if you would like to work on the final project instead of the final exam. Please let me know in advance.

Final project (30%) [AIR6063 students only]

Due date: May 8 (EOD)

The project is to reproduce a paper within Amphion. Amphion team will provide you computational resources and guidelines. The topics range from text processing, modeling and speech processing. If you have a preference, please let the teaching team know, otherwise, we can discuss to select a paper for you to work on. You can work as a team (2 students) or as an individual. You need to write a project report (max 6 pages) for the final project. Here is the report template.

Rating guideline

Participation (5%)

Here are some ways to earn the participation credit, which is capped at 5%.

Late Policy

The penalty is 0.5% off the final course grade for each late day.

Schedule (To be revised)


Date Lecture Description Readings Lecture Note Events/Deadlines
Jan 8 Lecture 1: Introduction and course overview [Slides]
Jan 10 Lecture 2: Understanding sounds Pitch, loudness and timbre
What is a Sound Spectrum?
[Colab demo]
[Slides]
Jan 15 Lecture 3: Basics of speech signal processing and analysis [Slides]
Jan 17 Lecture 4: Introduction of speech production [Slides] All assignments are out
[Assignment 1]
[Assignment 2]
[Assignment 3]
[Assignment 4]
Jan 22 Lecture 5: Speech representation Basic Representations [Slides]
Jan 24 Lecture 6: Phones and Phonation [Slides]
Jan 29 No class
Jan 31 Lecture 8: Text processing [Slides] Assignment 1 due (11:59pm)
Feb 26 Lecture 9: Words, morphology, and parts of speech [Slides]
Feb 28 Lecture 10: Word embedding [Slides] Assignment 2 due (11:59pm)
Mar 4 Lecture 11: Syntax: structure of sentences [Slides]
Mar 6 Lecture 7: Speech perception [Slides]
Mar 11 Mid-term exam (scope: lecture 1 - 11)
Mar 13 Lecture 12: Language model [Slides] Assignment 3 due (11:59pm)
Mar 18 No class
Mar 20 Lecture 13: Word2Vec and Sentiment Analysis [Slides]
Mar 25 Lecture 14: TTS [Slides]
Mar 27 Lecture 15: Voice conversion [Slides]
Apr 1 Lecture 16: Automatic Speech Recognition [Slides]
Apr 3 Lecture 17: Machine Translation [Slides]
Apr 8 Lecture 18: Question answering [Slides]
Apr 10 Lecture 19: Chatbot [Slides]
Apr 15 No class: Instructor attending ICASSP 2024
Apr 17 No class: Instructor attending ICASSP 2024 Assignment 4 due (11:59pm)
Apr 22 Invited talk
Apr 24 Invited talk
Apr 29 Invited talk
May 5 Lecture 22: In class QA
May 8 Final exam