Zhizheng WU (武执正), Ph.D.
Apple Siri speech team@Cupertino, California, USA |
Email: wuzhizheng {at} gmail {dot} com |
Zhizheng Wu has been a Research Scientist at Apple Inc. since 2016, prior to which he was a research fellow at University of Edinburgh from 2014 to 2016. He received his Ph.D. from Nanyang Technological University, Singapore. During his studies, he joined Microsoft Research Asia (2007 - 2009) and the University of Eastern Finland (2012) as a visiting scientist and received the best paper award at the Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2012. He co-organised the first Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015) at Interspeech 2015 and the first Voice Conversion Challenge (VCC 2016) as a special session at Interspeech 2016. He delivered a tutorial on “Spoofing and Anti-Spoofing: A Shared View of Speaker Verification, Speech Synthesis and Voice Conversion” at APSIPA ASC 2015. He is the principal architect of the open-source speech synthesis system, Merlin.
Program code and dataset
Google Scholar Profile
- Zhizheng Wu, Junichi Yamagishi, Tomi Kinnunen, Cemal Hanilci, Md Sahidullah, Aleksandr Sizov, Nicholas Evans, Massimiliano Todisco, Hector Delgado, "ASVspoof: the Automatic Speaker Verification Spoofing and Countermeasures Challenge", IEEE Journal of Selected Topic of Signal Processing, 2017
- Xiaohai Tian, Siu-Wa Lee, Zhizheng Wu, Eng Siong Chng, Haizhou Li, "An Exemplar-based Approach to Frequency Warping for Voice Conversion", IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017
- Yanmin Qian, Nanxin Chen, Heinrich Dinkel, Zhizheng Wu, "Deep Feature Engineering for Noise Robust Spoofing Detection", IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017
- Zhizheng Wu, Simon King, "Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Trajectory Error Training", IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016 (accepted) [PDF]
- Ibon Saratxaga, Jon Sanchez, Zhizheng Wu, Inma Hernaez, Eva Navas, "Synthetic Speech Detection Using Phase Information", Speech Communication, 2016 (A).
- Zhizheng Wu, Phillip L. De Leon, Cenk Demiroglu, Ali Khodabakhsh, Simon King, Zhen-Hua Ling, Daisuke Saito, Bryan Stewart, Tomoki Toda, Mirjam Wester, Junichi Yamagishi, "Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance", IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol 24, Issue 4, pp 768-783, 2016 [PDF] [Dataset]
- Zhizheng Wu, Haizhou Li, "On the study of replay and voice conversion attacks to text-dependent speaker verification", Multimedia Tools and Applications, Springer, 2015. DOI:10.1007/s11042-015-3080-9 [PDF]
- Aleksandr Sizov, Elie Khoury, Tomi Kinnunen, Zhizheng Wu, Sebastien Marcel, "Joint Speaker Verification and Anti-Spoofing in the i-Vector Space", IEEE Transactions on Information Forensics and Security, Vol 10, Issue 4, pp. 821-832, 2015. [PDF]
- Zhizheng Wu, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Federico Alegre, Haizhou Li, "Spoofing and countermeasures for speaker verification: a survey", Speech Communication, Volume 66, Pages 130–153, 2015 [PDF]
- Zhizheng Wu, Eng Siong Chng, Haizhou Li, "Exemplar-based voice conversion using joint nonnegative matrix factorization", Multimedia Tools and Applications, Vol 74, Issue 22, pp 9943-9958, Springer, 2015
- Zhizheng Wu, Haizhou Li, "Voice conversion versus speaker verification: an overview", APSIPA Transactions on Signal and Information Processing, 3, e17 doi:10.1017/ATSIP.2014.17. [PDF] [Invited paper]
- Zhizheng Wu, Tuomas Virtanen, Eng Siong Chng, Haizhou Li, "Exemplar-based sparse representation with residual compensation for voice conversion", IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol 22, Issue 10, pp. 1506-1521, 2014. [PDF] [Code]
- Zhizheng Wu, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, "Mixture of Factor Analyzers using priors from non-parallel speech for voice conversion", IEEE Signal Processing Letter, Vol 19, Issue 12, pp. 914-917, 2012. [PDF]
- Yao Qian, Zhizheng Wu, Boyang Gao, Frank K Soong, "Improved Prosody Generation by Maximizing
Joint Likelihood of State and Longer Units", IEEE Transactions on Audio, Speech and Language Processing, Vol 19, Issue 6, pp. 1702-1710, 2011. [PDF]
Book Chapter
- Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Zhizheng Wu, Federico Alegre, Phillip De Leon, "Speaker recognition anti-spoofing", Book Chapter in "Handbook of Biometric Anti-spoofing", Springer, S. Marcel, S. Li and M. Nixon, Eds., 2014. [PDF]
- Nicholas Evans, Federico Alegre, Zhizheng Wu, Tomi Kinnunen "Anti-spoofing: voice conversion", Book chapter in "Encyclopedia of Biometrics", 2nd Edition, Springer, Stan Z. Li and Anil K. Jain, Eds, 2014
- Federico Alegre, Nicholas Evans, Tomi Kinnunen, Zhizheng Wu, Junichi Yamagishi "Anti-spoofing: voice databases", Book chapter in "Encyclopedia of Biometrics", 2nd Edition, Springer, Stan Z. Li and Anil K. Jain, Eds, 2014
- Zhizheng Wu, Oliver Watts, Simon King, "Merlin: An Open Source Neural Network Speech Synthesis System", the 9th ISCA Speech Synthesis Workshop (2016).
- Mirjam Wester, Zhizheng Wu, Junichi Yamagishi, "Multidimensional scaling of systems in the Voice Conversion Challenge 2016", the 9th ISCA Speech Synthesis Workshop (2016).
- Mei Li, Zhizheng Wu, Lei Xie, "On the impact of phoneme alignment in DNN-based speech synthesis", the 9th ISCA Speech Synthesis Workshop (2016).
- Srikanth Ronanki, Gustav Eje Henter, Zhizheng Wu, Simon King, "A template-based approach for speech synthesis intonation generation using LSTMs", Interspeech 2016.
- Felipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu, Simon King, "Waveform generation based on signal reshaping for statistical parametric speech synthesis", Interspeech 2016.
- Mirjam Wester, Zhizheng Wu, Junichi Yamagishi, "Analysis of the Voice Conversion Challenge 2016 Evaluation Results", Interspeech 2016.
- Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, Junichi Yamagishi, "The Voice Conversion Challenge 2016", Interspeech 2016.
- Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li, "An investigation of spoofing speech detection under additive noise and reverberant conditions", Interspeech 2016.
- Manu Airaksinen, Bajibabu Bollepalli, Lauri Juvela, Zhizheng Wu, Simon King, Paavo Alku, "GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis", Interspeech 2016.
- Zhizheng Wu, Simon King, "Investigating gated recurrent neural networks for speech synthesis", ICASSP 2016 [PDF]
- Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li, "Spoofing detection from a feature representation perspective", ICASSP 2016 [PDF]
- Thomas Merritt, Robert A.J. Clark, Zhizheng Wu, Junichi Yamagishi, Simon King, "Deep neural network-guided unit selection synthesis", ICASSP 2016 [PDF]
- Oliver Watts, Gustav Eje Henter, Thomas Merritt, Zhizheng Wu, Simon King, "From HMMs to DNNs: where do the improvements come from?", ICASSP 2016 [PDF]
- Gustav Eje Henter, Srikanth Ronanki, Oliver Watts, Mirjam Wester, Zhizheng Wu, Simon King, "Robust TTS duration modelling using DNNs", ICASSP 2016 [PDF]
- Zhizheng Wu, Simon King, "Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features", Interspeech 2015. [PDF] [Poster]
- Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King, "A study of speaker adaptation for DNN-based speech synthesis", Interspeech 2015. [PDF] [Slides]
- Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Cemal Hanilci, Md Sahidullah, Aleksandr Sizov, "ASVspoof 2015: the First Automatic Speaker Verification Spoofing and Countermeasures Challenge", Interspeech 2015. [PDF] [Slides]
- Qiong Hu, Zhizheng Wu, Korin Richmond, Junichi Yamagishi, Yannis Stylianou, Ranniery Maia, "Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning", Interspeech 2015. [PDF]
- Cassia Valentini-Botinhao, Zhizheng Wu, Simon King, "Towards minimum perceptual error training for DNN-based speech synthesis", Interspeech 2015. [PDF]
- Oliver Watts, Zhizheng Wu, Simon King, "Sentence-level control vectors for deep neural network speech synthesis", Interspeech 2015. [PDF]
- Xiaohai Tian, Zhizheng Wu, Siu-Wa Lee, Nguyen Quy Hy, Minghui Dong, Eng Siong Chng, "System Fusion for High-Performance Voice Conversion", Interspeech 2015. [PDF]
- Mirjam Wester, Zhizheng Wu, Junichi Yamagishi, "Human vs Machine Spoofing Detection on Wideband and Narrowband Data", Interspeech 2015. [PDF]
- Thomas Merritt, Junichi Yamagishi, Zhizheng Wu, Oliver Watts, Simon King, "Deep neural network context embeddings for model selection in rich-context HMM synthesis", Interspeech 2015. [PDF]
- Oliver Watts, Srikanth Ronanki, Zhizheng Wu, Tuomo Raitio, Antti Suni, "The NST–GlottHMM entry to the Blizzard Challenge 2015", The Blizzard Challenge workshop 2015. [PDF]
- Zhizheng Wu, Cassia Valentini-Botinhao, Oliver Watts, Simon King, "Deep neural network employing multi-task learning and stacked bottleneck features for speech synthesis", ICASSP 2015. [PDF]
- Zhizheng Wu, Ali Khodabakhsh, Cenk Demiroglu, Junichi Yamagishi, Daisuke Saito, Tomoki Toda, Simon King, "SAS: A speaker verification spoofing database containing diverse attacks", ICASSP 2015. [PDF] [Slides]
- Xiaohai Tian, Zhizheng Wu, Siu Wa Lee, Nguyen Quy Hy, Eng Siong Chng, Minghui Dong, "Sparse representation for frequency warping based voice conversion", ICASSP 2015 [PDF] [Code]
- Zhizheng Wu, Eng Siong Chng, Haizhou Li, "Joint nonnegative matrix factorization for exemplar-based voice conversion", Interspeech 2014. [PDF]
- Siu-Wa Lee, Zhizheng Wu, Minghui Dong, Xiaohai Tian, Haizhou Li, "A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis", Interspeech 2014. [PDF]
- Elie Khoury, Tomi Kinnunen, Aleksandr Sizov, Zhizheng Wu, Sebastien Marcel, "Introducing I-Vectors for Joint Anti-spoofing and Speaker Verification", Interspeech 2014. [PDF]
- Zhizheng Wu, Sheng Gao, Eng Siong Chng, Haizhou Li, "A study on replay attack and anti-spoofing for text-dependent speaker verification", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2014. [PDF]
- Xiaohai Tian, Zhizheng Wu, Siu-Wa Lee, Eng Siong Chng, "Correlation-based frequency warping for voice conversion", International Symposium on Chinese Spoken Language Processing (ISCSLP) 2014. [PDF]
- Zhizheng Wu, Haizhou Li, "Voice conversion and spoofing attack on speaker verification systems", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2013. [Invited paper] [PDF] [Slides]
- Xiaohai Tian, Zhizheng Wu, Eng Siong Chng, "Local partial least square regression for spectral mapping in voice conversion", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2013. [PDF]
- Zhizheng Wu, Tuomas Virtanen, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, "Exemplar-based voice conversion using non-negative spectrogram deconvolution", The 8th speech synthesis workshop (SSW8). [PDF] [Slides]
- Zhizheng Wu, Tuomas Virtanen, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, "Exemplar-based unit selection for voice conversion utilizing temporal information", Interspeech 2013. [PDF]
- Zhizheng Wu, Anthony Larcher, Kong Aik Lee, Eng Siong Chng, Tomi Kinnunen, Haizhou Li, "Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints", Interspeech 2013. [PDF]
- Zhizheng Wu, Eng Siong Chng, Haizhou Li, "Conditional restricted boltzmann machine for voice conversion", ChinaSIP 2013. [PDF]
- Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li, "Synthetic speech detection using temporal modulation feature", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2013. [PDF]
- Zhizheng Wu, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, Eliathamby Ambikairajah, "A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2012. [PDF] [Slides] [Code] [Best Paper Award]
- Zhizheng Wu, Eng Siong Chng, Haizhou Li, "Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition", Interspeech 2012. [PDF] [Slides] [Code]
- Tomi Kinnunen, Zhizheng Wu, Kong Aik Lee, Filip Sedlak, Eng Siong Chng, Haizhou Li, "Vulnerability of Speaker Verification Systems Against Voice Conversion Spoofing Attacks: the Case of Telephone Speech", IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP) 2012. [PDF]
- Zhizheng Wu, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, "Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion", Interspeech, Makuhari, Japan, 2010. [PDF]
- Zhizheng Wu, Eng Siong Chng, Haizhou Li, "Development of HMM-based Malay Text-to-Speech System", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2010, Singapore, 2010.
- Yao Qian, Frank Soong, Miaomiao Wang, Zhizheng Wu, "A Minimum V/U Error Approach to F0 Generation in HMM-Based TTS", Interspeech, Brighton, UK, 2009.
- Yao Qian, Zhizheng Wu, Frank K Soong, "Improved Prosody Generation by Maximizing Joint Likelihood of State and Longer Units", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009.
- Zhizheng Wu, Yao Qian, Frank K Soong, Bo Zhang, "Modeling and Generating Tone Contour with Phrase Intonation for Mandarin Chinese Speech", International Symposium on Chinese Spoken Language Processing (ISCSLP), Kunming, China, 2008.
- Boyang Gao, Yao Qian, Zhizheng Wu, Frank K Soong, "Duration Refinement by Jointly Optimizing State and Longer Units", Interspeech, Brisbane, Australia, 2008.
Technical report
- Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, "ASVspoof 2015: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan", 2014. [PDF]
- Zhizheng Wu, "Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): introductory talk".", Interspeech 2015. [Slides]
Reviewer for
- IEEE Transactions on Audio, Speech and Language Processing
- IEEE Transactions on Information Forensics and Security
- Computer Speech and Language (Elsevier)
- Digital Signal Processing (Elsevier)
- Multimedia Tools and Applications (Springer)
- The Journal of Signal Processing Systems (Springer)
- Interspeech (2014, 2015), International Symposium on Chinese Spoken Language Processing (ISCSLP) (2014), ChinaSIP (2013, 2014)
Programming: C/C++, Matlab, Python, Perl...
Languages: Mandarin (Native), English