Zhizheng WU (武执正), Ph.D.


portrait
Apple Siri speech team@Cupertino, California, USA
 
 
Email: wuzhizheng {at} gmail {dot} com

Biography

Zhizheng Wu has been a Research Scientist at Apple Inc. since 2016, prior to which he was a research fellow at University of Edinburgh from 2014 to 2016. He received his Ph.D. from Nanyang Technological University, Singapore. During his studies, he joined Microsoft Research Asia (2007 - 2009) and the University of Eastern Finland (2012) as a visiting scientist and received the best paper award at the Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2012. He co-organised the first Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015) at Interspeech 2015 and the first Voice Conversion Challenge (VCC 2016) as a special session at Interspeech 2016. He delivered a tutorial on “Spoofing and Anti-Spoofing: A Shared View of Speaker Verification, Speech Synthesis and Voice Conversion” at APSIPA ASC 2015. He is the principal architect of the open-source speech synthesis system, Merlin.

Demo

Program code and dataset

Publications

Google Scholar Profile

Journal

  1. Zhizheng Wu, Junichi Yamagishi, Tomi Kinnunen, Cemal Hanilci, Md Sahidullah, Aleksandr Sizov, Nicholas Evans, Massimiliano Todisco, Hector Delgado, "ASVspoof: the Automatic Speaker Verification Spoofing and Countermeasures Challenge", IEEE Journal of Selected Topic of Signal Processing, 2017
  2. Xiaohai Tian, Siu-Wa Lee, Zhizheng Wu, Eng Siong Chng, Haizhou Li, "An Exemplar-based Approach to Frequency Warping for Voice Conversion", IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017
  3. Yanmin Qian, Nanxin Chen, Heinrich Dinkel, Zhizheng Wu, "Deep Feature Engineering for Noise Robust Spoofing Detection", IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017
  4. Zhizheng Wu, Simon King, "Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Trajectory Error Training", IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016 (accepted) [PDF]
  5. Ibon Saratxaga, Jon Sanchez, Zhizheng Wu, Inma Hernaez, Eva Navas, "Synthetic Speech Detection Using Phase Information", Speech Communication, 2016 (A).
  6. Zhizheng Wu, Phillip L. De Leon, Cenk Demiroglu, Ali Khodabakhsh, Simon King, Zhen-Hua Ling, Daisuke Saito, Bryan Stewart, Tomoki Toda, Mirjam Wester, Junichi Yamagishi, "Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance", IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol 24, Issue 4, pp 768-783, 2016 [PDF] [Dataset]
  7. Zhizheng Wu, Haizhou Li, "On the study of replay and voice conversion attacks to text-dependent speaker verification", Multimedia Tools and Applications, Springer, 2015. DOI:10.1007/s11042-015-3080-9 [PDF]
  8. Aleksandr Sizov, Elie Khoury, Tomi Kinnunen, Zhizheng Wu, Sebastien Marcel, "Joint Speaker Verification and Anti-Spoofing in the i-Vector Space", IEEE Transactions on Information Forensics and Security, Vol 10, Issue 4, pp. 821-832, 2015. [PDF]
  9. Zhizheng Wu, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Federico Alegre, Haizhou Li, "Spoofing and countermeasures for speaker verification: a survey", Speech Communication, Volume 66, Pages 130–153, 2015 [PDF]
  10. Zhizheng Wu, Eng Siong Chng, Haizhou Li, "Exemplar-based voice conversion using joint nonnegative matrix factorization", Multimedia Tools and Applications, Vol 74, Issue 22, pp 9943-9958, Springer, 2015
  11. Zhizheng Wu, Haizhou Li, "Voice conversion versus speaker verification: an overview", APSIPA Transactions on Signal and Information Processing, 3, e17 doi:10.1017/ATSIP.2014.17. [PDF] [Invited paper]
  12. Zhizheng Wu, Tuomas Virtanen, Eng Siong Chng, Haizhou Li, "Exemplar-based sparse representation with residual compensation for voice conversion", IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol 22, Issue 10, pp. 1506-1521, 2014. [PDF] [Code]
  13. Zhizheng Wu, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, "Mixture of Factor Analyzers using priors from non-parallel speech for voice conversion", IEEE Signal Processing Letter, Vol 19, Issue 12, pp. 914-917, 2012. [PDF]
  14. Yao Qian, Zhizheng Wu, Boyang Gao, Frank K Soong, "Improved Prosody Generation by Maximizing Joint Likelihood of State and Longer Units", IEEE Transactions on Audio, Speech and Language Processing, Vol 19, Issue 6, pp. 1702-1710, 2011. [PDF]

Book Chapter

  • Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Zhizheng Wu, Federico Alegre, Phillip De Leon, "Speaker recognition anti-spoofing", Book Chapter in "Handbook of Biometric Anti-spoofing", Springer, S. Marcel, S. Li and M. Nixon, Eds., 2014. [PDF]
  • Nicholas Evans, Federico Alegre, Zhizheng Wu, Tomi Kinnunen "Anti-spoofing: voice conversion", Book chapter in "Encyclopedia of Biometrics", 2nd Edition, Springer, Stan Z. Li and Anil K. Jain, Eds, 2014
  • Federico Alegre, Nicholas Evans, Tomi Kinnunen, Zhizheng Wu, Junichi Yamagishi "Anti-spoofing: voice databases", Book chapter in "Encyclopedia of Biometrics", 2nd Edition, Springer, Stan Z. Li and Anil K. Jain, Eds, 2014

Conference

  1. Zhizheng Wu, Oliver Watts, Simon King, "Merlin: An Open Source Neural Network Speech Synthesis System", the 9th ISCA Speech Synthesis Workshop (2016).
  2. Mirjam Wester, Zhizheng Wu, Junichi Yamagishi, "Multidimensional scaling of systems in the Voice Conversion Challenge 2016", the 9th ISCA Speech Synthesis Workshop (2016).
  3. Mei Li, Zhizheng Wu, Lei Xie, "On the impact of phoneme alignment in DNN-based speech synthesis", the 9th ISCA Speech Synthesis Workshop (2016).
  4. Srikanth Ronanki, Gustav Eje Henter, Zhizheng Wu, Simon King, "A template-based approach for speech synthesis intonation generation using LSTMs", Interspeech 2016.
  5. Felipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu, Simon King, "Waveform generation based on signal reshaping for statistical parametric speech synthesis", Interspeech 2016.
  6. Mirjam Wester, Zhizheng Wu, Junichi Yamagishi, "Analysis of the Voice Conversion Challenge 2016 Evaluation Results", Interspeech 2016.
  7. Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, Junichi Yamagishi, "The Voice Conversion Challenge 2016", Interspeech 2016.
  8. Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li, "An investigation of spoofing speech detection under additive noise and reverberant conditions", Interspeech 2016.
  9. Manu Airaksinen, Bajibabu Bollepalli, Lauri Juvela, Zhizheng Wu, Simon King, Paavo Alku, "GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis", Interspeech 2016.
  10. Zhizheng Wu, Simon King, "Investigating gated recurrent neural networks for speech synthesis", ICASSP 2016 [PDF]
  11. Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li, "Spoofing detection from a feature representation perspective", ICASSP 2016 [PDF]
  12. Thomas Merritt, Robert A.J. Clark, Zhizheng Wu, Junichi Yamagishi, Simon King, "Deep neural network-guided unit selection synthesis", ICASSP 2016 [PDF]
  13. Oliver Watts, Gustav Eje Henter, Thomas Merritt, Zhizheng Wu, Simon King, "From HMMs to DNNs: where do the improvements come from?", ICASSP 2016 [PDF]
  14. Gustav Eje Henter, Srikanth Ronanki, Oliver Watts, Mirjam Wester, Zhizheng Wu, Simon King, "Robust TTS duration modelling using DNNs", ICASSP 2016 [PDF]

  15. Zhizheng Wu, Simon King, "Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features", Interspeech 2015. [PDF] [Poster]
  16. Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King, "A study of speaker adaptation for DNN-based speech synthesis", Interspeech 2015. [PDF] [Slides]
  17. Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Cemal Hanilci, Md Sahidullah, Aleksandr Sizov, "ASVspoof 2015: the First Automatic Speaker Verification Spoofing and Countermeasures Challenge", Interspeech 2015. [PDF] [Slides]
  18. Qiong Hu, Zhizheng Wu, Korin Richmond, Junichi Yamagishi, Yannis Stylianou, Ranniery Maia, "Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning", Interspeech 2015. [PDF]
  19. Cassia Valentini-Botinhao, Zhizheng Wu, Simon King, "Towards minimum perceptual error training for DNN-based speech synthesis", Interspeech 2015. [PDF]
  20. Oliver Watts, Zhizheng Wu, Simon King, "Sentence-level control vectors for deep neural network speech synthesis", Interspeech 2015. [PDF]
  21. Xiaohai Tian, Zhizheng Wu, Siu-Wa Lee, Nguyen Quy Hy, Minghui Dong, Eng Siong Chng, "System Fusion for High-Performance Voice Conversion", Interspeech 2015. [PDF]
  22. Mirjam Wester, Zhizheng Wu, Junichi Yamagishi, "Human vs Machine Spoofing Detection on Wideband and Narrowband Data", Interspeech 2015. [PDF]
  23. Thomas Merritt, Junichi Yamagishi, Zhizheng Wu, Oliver Watts, Simon King, "Deep neural network context embeddings for model selection in rich-context HMM synthesis", Interspeech 2015. [PDF]
  24. Oliver Watts, Srikanth Ronanki, Zhizheng Wu, Tuomo Raitio, Antti Suni, "The NST–GlottHMM entry to the Blizzard Challenge 2015", The Blizzard Challenge workshop 2015. [PDF]
  25. Zhizheng Wu, Cassia Valentini-Botinhao, Oliver Watts, Simon King, "Deep neural network employing multi-task learning and stacked bottleneck features for speech synthesis", ICASSP 2015. [PDF]
  26. Zhizheng Wu, Ali Khodabakhsh, Cenk Demiroglu, Junichi Yamagishi, Daisuke Saito, Tomoki Toda, Simon King, "SAS: A speaker verification spoofing database containing diverse attacks", ICASSP 2015. [PDF] [Slides]
  27. Xiaohai Tian, Zhizheng Wu, Siu Wa Lee, Nguyen Quy Hy, Eng Siong Chng, Minghui Dong, "Sparse representation for frequency warping based voice conversion", ICASSP 2015 [PDF] [Code]

  28. Zhizheng Wu, Eng Siong Chng, Haizhou Li, "Joint nonnegative matrix factorization for exemplar-based voice conversion", Interspeech 2014. [PDF]
  29. Siu-Wa Lee, Zhizheng Wu, Minghui Dong, Xiaohai Tian, Haizhou Li, "A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis", Interspeech 2014. [PDF]
  30. Elie Khoury, Tomi Kinnunen, Aleksandr Sizov, Zhizheng Wu, Sebastien Marcel, "Introducing I-Vectors for Joint Anti-spoofing and Speaker Verification", Interspeech 2014. [PDF]
  31. Zhizheng Wu, Sheng Gao, Eng Siong Chng, Haizhou Li, "A study on replay attack and anti-spoofing for text-dependent speaker verification", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2014. [PDF]
  32. Xiaohai Tian, Zhizheng Wu, Siu-Wa Lee, Eng Siong Chng, "Correlation-based frequency warping for voice conversion", International Symposium on Chinese Spoken Language Processing (ISCSLP) 2014. [PDF]
  33. Zhizheng Wu, Haizhou Li, "Voice conversion and spoofing attack on speaker verification systems", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2013. [Invited paper] [PDF] [Slides]
  34. Xiaohai Tian, Zhizheng Wu, Eng Siong Chng, "Local partial least square regression for spectral mapping in voice conversion", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2013. [PDF]
  35. Zhizheng Wu, Tuomas Virtanen, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, "Exemplar-based voice conversion using non-negative spectrogram deconvolution", The 8th speech synthesis workshop (SSW8). [PDF] [Slides]
  36. Zhizheng Wu, Tuomas Virtanen, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, "Exemplar-based unit selection for voice conversion utilizing temporal information", Interspeech 2013. [PDF]
  37. Zhizheng Wu, Anthony Larcher, Kong Aik Lee, Eng Siong Chng, Tomi Kinnunen, Haizhou Li, "Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints", Interspeech 2013. [PDF]
  38. Zhizheng Wu, Eng Siong Chng, Haizhou Li, "Conditional restricted boltzmann machine for voice conversion", ChinaSIP 2013. [PDF]
  39. Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li, "Synthetic speech detection using temporal modulation feature", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2013. [PDF]
  40. Zhizheng Wu, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, Eliathamby Ambikairajah, "A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2012. [PDF] [Slides] [Code] [Best Paper Award]
  41. Zhizheng Wu, Eng Siong Chng, Haizhou Li, "Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition", Interspeech 2012. [PDF] [Slides] [Code]
  42. Tomi Kinnunen, Zhizheng Wu, Kong Aik Lee, Filip Sedlak, Eng Siong Chng, Haizhou Li, "Vulnerability of Speaker Verification Systems Against Voice Conversion Spoofing Attacks: the Case of Telephone Speech", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2012. [PDF]
  43. Zhizheng Wu, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, "Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion", Interspeech, Makuhari, Japan, 2010. [PDF]
  44. Zhizheng Wu, Eng Siong Chng, Haizhou Li, "Development of HMM-based Malay Text-to-Speech System", Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2010, Singapore, 2010.
  45. Yao Qian, Frank Soong, Miaomiao Wang, Zhizheng Wu, "A Minimum V/U Error Approach to F0 Generation in HMM-Based TTS", Interspeech, Brighton, UK, 2009.
  46. Yao Qian, Zhizheng Wu, Frank K Soong, "Improved Prosody Generation by Maximizing Joint Likelihood of State and Longer Units", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009.
  47. Zhizheng Wu, Yao Qian, Frank K Soong, Bo Zhang, "Modeling and Generating Tone Contour with Phrase Intonation for Mandarin Chinese Speech", International Symposium on Chinese Spoken Language Processing (ISCSLP), Kunming, China, 2008.
  48. Boyang Gao, Yao Qian, Zhizheng Wu, Frank K Soong, "Duration Refinement by Jointly Optimizing State and Longer Units", Interspeech, Brisbane, Australia, 2008.

Technical report

  • Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, "ASVspoof 2015: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan", 2014. [PDF]

Talks

  • Zhizheng Wu, "Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): introductory talk".", Interspeech 2015. [Slides]

Reviewer for

  • IEEE Transactions on Audio, Speech and Language Processing
  • IEEE Transactions on Information Forensics and Security
  • Computer Speech and Language (Elsevier)
  • Digital Signal Processing (Elsevier)
  • Multimedia Tools and Applications (Springer)
  • The Journal of Signal Processing Systems (Springer)

  • Interspeech (2014, 2015), International Symposium on Chinese Spoken Language Processing (ISCSLP) (2014), ChinaSIP (2013, 2014)

Skills

Programming: C/C++, Matlab, Python, Perl...
Languages: Mandarin (Native), English