This tutorial combines the theory and practical application of Deep Neural Networks (DNNs) for Text-to-Speech (TTS). It illustrates how DNNs are rapidly advancing the performance of all areas of TTS, including waveform generation and text processing, using a variety of model architectures. We link the theory to implementation with the Open Source Merlin toolkit.

Video recording is available via this link.