Spontaneous Speech Transcription Strategy

This topic has 4 replies, 2 voices, and was last updated 10 years, 1 month ago by Simon King.

Viewing 4 reply threads

Author

Posts
- February 4, 2016 at 14:51 #2416
  Joseph M
  Student
  Would it be possible to record spontaneous speech, in the studio, without a script, and then use some form of ASR (commercial grade, therefore hopefully robust) to generate a post-facto script? There would be errors, of course, but hopefully relatively few, and these could be hand-fixed. Thereby creating an accurate script of the spontaneous speech without needing to hand-transcribe the entire recording. This script would then be used to generate phone strings for forced alignment, etc.
  
  Has this been done?
- February 4, 2016 at 15:59 #2418
  Simon King
  Professor
  Using spontaneous speech as the basis for a speech synthesiser is an attractive idea, but is rather hard in practice, for several reasons. Here are some of them:
  
  Word-level transcription: spontaneous speech is harder to transcribe even at the word level than read speech, because it is not entirely made of words (as found in a lexicon); ASR could be tried, as could hard-transcription, but both would have difficulty with this – remember that commercial ASR is designed for careful planned speech such as dictation and will not work very well for unplanned speech
  
  Phonetic transcription: even harder than word-level transcription, because the pronunciations deviate considerably from those found in the lexicon (due to co-articulation, assimilation, deletion,…)
  
  Phonetic alignment: the idea that speech is a linear string of phones (“beads on a string”) was never quite true even for read speech, but is even more problematic for spontaneous speech.
  
  Here’s an experiment to try:
  1. record a spontaneous utterance
  2. transcribe the words
  3. record a read-text version of that
  4. compare the spontaneous and read-text versions side by side
    
    listen
    
    examine waveforms and spectrograms
    
    try to hand-label word and phone boundaries
- February 4, 2016 at 16:02 #2420
  Simon King
  Professor
  Sebastian Andersson, Kallirroi Georgila, David Traum, Matthew Aylett, and Robert Clark. Prediction and realisation of conversational characteristics by utilising spontaneous speech for unit selection. In Proc. Speech Prosody, Chicago, USA, May 2010. PDF
  
  Sebastian Andersson, Junichi Yamagishi, and Robert A.J. Clark. Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis. Speech Communication, 54(2):175-188, 2012. DOI: 10.1016/j.specom.2011.08.001
- February 5, 2016 at 11:23 #2428
  Joseph M
  Student
  Do the audio examples from these 2 papers still exist somewhere? Can I listen to them?
- February 7, 2016 at 10:37 #2497
  Simon King
  Professor
  There are some examples for the Speech Communication paper.
Author

Posts

Viewing 4 reply threads

You must be logged in to reply to this topic.

Spontaneous Speech Transcription Strategy

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis