Slides, video and bibliography for my invited keynote at Odyssey 2018, Les Sables d’Olonne, France, June 2018.
Speaking naturally? It depends who is listening…
Presented at Speaker Odyssey 2018.PDF slides
Links to the videos and sounds included in this presentation:
- Audio for Carlini and Wagner “Audio Adversarial Examples: Targeted Attacks on Speech-to-Text”, eprint arXiv:1801.01944
- Video for Carlini, Mishra, Vaidya, Zhang, Micah, Shields, Wagner and Zhou. “Hidden Voice Commands”, USENIX Security Symposium (Security), August 2016 PDF
- Video for Athalye, Engstrom, Ilyas and Kwok “Synthesizing Robust Adversarial Examples”, eprint arXiv:1707.07397
Readings
Speech quality- John G. Beerends, Christian Schmidmer, Jens Berger, Matthias Obermann, Raphael Ullmann, Joachim Pomy, Michael Keyhl. Perceptual Objective Listening Quality Assessment (POLQA), The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part II–Perceptual Model. J. Audio Eng. Soc. 61(6), Jun 2013
- Florian Hinterleitner, Sebastian Moeller, Tiago H. Falk, Tim Polzehl. Comparison of Approaches for Instrumentally Predicting the Quality of Text-to-Speech Systems: Data from Blizzard Challenges 2008 and 2009.. Proc Blizzard Challenge workshop 2010, Kansai Science City, Japan, Sep 2010
- Christoph R. Norrenbrock, Florian Hinterleitner, Ulrich Heute, Sebastian Moeller. Quality prediction of synthesized speech based on perceptual quality dimensions. Speech Communication 66 pp17–35, Feb 2015
- A Nguyen, J Yosinski, J Clune. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, Jun 2015
- Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and harnessing adversarial examples. Unreviewed report arXiv:1412.6572v3
- Anish Athalye, Logan Engstrom, Andrew Ilyas, Kevin Kwo. Synthesizing robust adversarial examples. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, July 2018.
- Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, Wenchao Zhou. Hidden Voice Commands. 25th USENIX Security Symposium, Austin, TX, USA, Aug 2016.
- Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, Wenyuan Xu. DolphinAttack: Inaudible Voice Commands. In Proc. 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), Dallas, TX, USA, Oct-Nov 2017.
- Tomi Kinnunen, Haizhou Li. An overview of text-independent speaker recognition: From features to supervectors. Speech Communication 52 pp12–40, Jan 2010
- Zhizheng Wu, Junichi Yamagishi, Tomi Kinnunen, Cemal Hanilci, Mohammed Sahidullah, Aleksandr Sizov, Nicholas Evans, Massimiliano Todisco, Hector Delgado. ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge. IEEE Journal of Selected Topics in Signal Processing, 11(4), June 2017.
- Hector Delgado, Massimiliano Todisco, Md Sahidullah, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Junichi Yamagishi. ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements. Proc. Speaker Odyssey 2018 The Speaker and Language Recognition Workshop, Les Sables d’Olonne, France, Jun 2018.
- Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu. Natural TTS synthesis by conditioning Wavenet on Mel Spectrogram Predictions. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, Apr 2018.
- Cassia Valentini-Botinhao, Zhizheng Wu, Simon King. Towards minimum perceptual error training for DNN-based speech synthesis. Proc. 16th Annual Conference of the International Speech Communication Association (Interspeech), Dresden, Germany, Sep 2015.
- Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li. Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework. ASRU 2017
- Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari. Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis. Proc. ICASSP 2017
- Yuki Saito, Shinnosuke Takamichi ,Hiroshi Saruwatari. Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks IEEE/ACM Trans Audio, Speech, and Language Processing, 26(1) Jan 2018 DOI 10.1109/TASLP.2017.2761547
- Linghan Zhang, Sheng Tan, Jie Yang, Yingying Chen. VoiceLive: A Phoneme Localization based Liveness Detection for Voice Authentication on Smartphones CCS’16, October 24-28, 2016, Vienna, Austria
- Linghan Zhang, Sheng Tan, Jie Yang. Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication. CCS’17, October 30-November 3, 2017, Dallas, TX, USA
- Sree Hari Krishnan Parthasarathi. Privacy-Sensitive Audio Features for Conversational Speech Processing. PhD THÈSE NO 5234 (2011) EPFL, Switzerland