The difference between speech and language processing and other data processing is the use of knowledge of language. In this course, we will study how to describe, process and compute different levels of language knowledge including Phonetics and Phonology, Morphology, Syntax, Semantics, and how the language knowledge is used in speech and language applications such as named entities recognition, information extraction, question answering, speech recognition, and speech synthesis.

Teaching team


Instructor
Zhizheng Wu
TA
Xi Chen

Poster Session


A final project poster session is planned by the end of the course (tentatively May 20th 2023). This is to provide students the opportunities to connect with speech and language research/industry community.

Anyone from the CUHK-Shenzhen and speech and language technology community are welcome to join. More details will be provided when it is close to the event. Feel free to reach out!

Logistics


Course Information


This course is designed as the first course for students who are interested in speech and language technology. The first half of the course focuses on the fundamentals and introduces tools for students to use, and the second half emphasises on applications, giving students the opportunity to know how speech and language technology could impact human life. In particular, the topics include:

  • Understanding human speech: spectrogram, fundamental frequency, formant, etc
  • Human sounds and their organization
  • Words and their relationship to other words
  • Syntax: Structure of sentences
  • Text processing and regular expressions
  • Language models
  • Embedding: Representations of the meaning of words
  • Word classifications and Named entities recognition
  • Applications: speech recognition, speech synthesis, machine translation, chatbot, etc

Prerequisites

Textbooks

Recommended Books:

Grading Policy (CSC3160/MDS6002)

Assignments (30%)

Midterm exam (25%)

We will have a mid-term exam on March 9th 2023. The scope of the mid-term exam is from lecture 1 to lecture 12.

Final project (40%)

You need to write a project proposal (2 pages) and a project report (max 6 pages) for the final project. Here is the report template. You are also expected to report project milestones and make a project poster presentation. After the final project deadline, feel free to make your project open source.

Participation (5%)

Here are some ways to earn the participation credit, which is capped at 5%.

Late Policy

The penalty is 0.5% off the final course grade for each late day.

Schedule


Date Lecture Description Readings Lecture Note Events/Deadlines
Jan 4 Tutorial 0: GitHub, LaTeX, and Colab Learn LaTeX in 30 minutes
Colab official tutorial
Official tutorials of GitHub
[Slides] Self-study
Jan 5 Lecture 1: Introduction and course overview [Slides]
[Video]
Jan 10 Lecture 2: Machine learning in a nutshell Deep Learning in a Nutshell: Core Concepts
Machine learning, explained
[Slides]
[Video]
Jan 11 Tutorial 1: PyTorch PyTorch Quickstarts
PyTorch Installation
[Slides]
[Video]
[Colab]
Jan 12 Lecture 3: Understanding sound and acoustics Pitch, loudness and timbre
What is a Sound Spectrum?
[Slides]
[HTML]
[Video]
Assignment 1 out
Jan 15 Tutorial 2: TorchAudio (by Torchaudio team) TorchAudio Documentation [Slides]
[Video]
10:00am via zoom
Jan 17 Lecture 4: Understanding human speech Voice Acoustics: an introduction
Introduction to Speech Processing
[Slides]
[Video]
Feb 9 Lecture 5: Human sounds and their organization Chapter 25: Phonetics [Slides]
[Video]
Feb 14 Lecture 6: Text processing and regular expressions Chapter 2: Regular Expressions, Text Normalization, Edit Distance [Slides]
[Video]
Assignment 2 out
Assignment 1 due (11:59pm)
Feb 15 Tutorial 3: Text processing
Feb 16 Lecture 7: Words and their relationship to other words [Slides]
[Video]
Feb 21 Lecture 8: Syntax: Structure of sentences [Slides]
[Video]
Feb 23 Lecture 9: Language models Chapter 3: N-gram Language Models
Chapter 7: Neural Networks and Neural Language Models
[Slides]
[Video]
Assignment 2 due (11:59pm)
Feb 28 Lecture 10: Language models Chapter 3: N-gram Language Models
Chapter 7: Neural Networks and Neural Language Models
[Slides]
[Video]
Mar 2 Lecture 11: Embedding: Representations of the meaning of words Chapter 6: Vector Semantics and Embeddings [Slides]
[Video]
Project proposal due (11:59pm)
Mar 7 Lecture 12: Embedding: Representations of the meaning of words Chapter 6: Vector Semantics and Embeddings [Slides]
[Video]
Mar 8 Tutorial 4: Word embedding
Mar 9 Midterm exam Assignment 3 out
Mar 14 Lecture 13: Word classifications and Named entities recognition [Slides]
[Video]
Mar 15 Tutorial 5: Visualization and plotting
Mar 16 Lecture 14: SLP Application - Sentiment analysis [Slides]
[Video]
Mar 21 Lecture 15: SLP Application - Text summarization [Slides]
[Video]
Assignment 3 due (11:59pm)
Mar 22 Lecture 16: Summarizing Conversations: From Meetings to Social Media (by Nancy Chen) Invited talk. Location: DY103, Time: 12-13
Mar 28 Lecture 17: SLP Application - Fundamentals of speech recognition (by Xiong Xiao) Invited guest lecture
Mar 30 Lecture 18: SLP Application - Text-to-speech synthesis [Slides]
[Video]
Project milestone 1 due (11:59pm)
Apr 6 Lecture 19: SLP Application - Voice conversion [Slides]
[Video]
Apr 11 Lecture 20: SLP Application - Machine translation Chapter 10: Machine Translation [Slides]
[Video]
Apr 13 Lecture 21: SLP Application - Question answering Chapter 23: Question Answering [Slides]
[Video]
Apr 18 Lecture 22: SLP Application - Chatbot Chapter 24: Chatbots and Dialogue Systems [Slides]
[Video]
Project milestone 2 due (11:59pm)
Apr 20 Lecture 23: Industry applications of speech and language processing Invited talk
Apr 25 Lecture 24: Industry applications of speech and language processing Invited talk
Apr 27 Final project review preparation
May 4 Final project review preparation Final project report early submission due (11:59pm)
May 11 Final project report due (11:59pm)
May 20 Final project poster session This session is open to the CUHK-Shenzhen community and invited guests. Details will be available soon.
Tentative Time: (9am - 1pm).