Page 17

Forum Replies Created

Viewing 15 posts - 241 through 255 (of 1,087 total)

← 1 2 3 … 16 17 18 … 71 72 73 →

Author

Posts
October 15, 2020 at 12:00 in reply to: Festival Commands not Working #12483
Simon
Professor
This means that you have not performed synthesis for that Utterance object. Perhaps you only set the text but didn’t complete the rest of the pipeline? A Wave relation only exists in an Utterance object if the waveform generation step has been run (e.g., SayText runs all steps including that one)
October 15, 2020 at 07:55 in reply to: Festival – No Errors Found #12481
Simon
Professor
No, you have to find mistakes: instances where the output speech is incorrect. Festival’s POS tagger is really old and not particularly good, so you will find mistakes. You might need to craft sentences in order to do this. Think about ambiguity!
October 15, 2020 at 07:55 in reply to: Cannot copy files from the University network to VM #12480
Simon
Professor
It looks like students who enrolled slightly later on the course didn’t have their access enabled. Computing support are fixing this now…

Update: now fixed. Everyone should be able to rsync.
October 15, 2020 at 07:48 in reply to: Cannot adjust the PATH in Configure Festival exercise #12479
Simon
Professor
This issue with sound playback occurs in some VM installations, and can usually be solved like this.
October 15, 2020 at 07:46 in reply to: Cannot adjust the PATH in Configure Festival exercise #12478
Simon
Professor
You don’t need to use sudo to edit that file – it’s in your file space, so you can edit it as yourself.

You should add the PATH line at the end of the file, as per the instructions, not the start.

Don’t type the “$” before commands – this is used in the instructions to indicate the bash prompt (to differentiate from the Festival prompt).
October 15, 2020 at 07:43 in reply to: Rsync not copying any files #12477
Simon
Professor
You are running the wrong version of Festival (an old version that is also installed in the VM). You need to set your PATH as described in the instructions for Module 3 Tutorial B, to pick up the correct version.
October 14, 2020 at 13:39 in reply to: Cannot copy files from the University network to VM #12457
Simon
Professor
That means you are not connected to the VPN. Have you installed the VPN client (Forticlient) and used it to connect to the VPN?
October 14, 2020 at 13:16 in reply to: Cannot copy files from the University network to VM #12455
Simon
Professor
OK – good – you can reach the machine. That’s possible without the VPN, but logging in to it requires the VPN.

To test if you are on the VPN, use a browser (try this on both your host computer and in the VM) and type “What’s my IP” into Google. You are looking for an IP address starting with one of these:

129.215
192.41

or maybe one of these

192.107
192.82
193.62
193.63
194.80
194.81
October 14, 2020 at 12:39 in reply to: Cannot copy files from the University network to VM #12450
Simon
Professor
Make sure your host computer is connected to the VPN. Then, in the VM, try these and report your results back here:
```
$ nslookup scp1.ppls.ed.ac.uk
$ ping scp1.ppls.ed.ac.uk
$ ping 129.215.204.88
```
October 14, 2020 at 09:08 in reply to: Auditory process in animals #12443
Simon
Professor
A very interesting topic. I’m not going to answer this now, but ask you to ask this question again in Module 7, when we will use knowledge of human hearing to motivate feature extraction for Automatic Speech Recognition.
October 14, 2020 at 09:04 in reply to: Many frequencies generated at glottis #12442
Simon
Professor
A simplified description of the vocal folds is that they are closed most of the time. The air pressure from the lungs eventually forces them to burst open and release the pressure, after which they snap shut very rapidly.

This is not random motion, but a very particular pattern of opening and closing.

The signal generated by one cycle of the vocal folds (mainly related to the very rapid closing) can be approximated as an impulse.

Hence, the acoustic wave generated at the glottis is approximately an impulse train. So, we use an impulse train as a model of this signal, and our model has just one parameter: F0.

In the literature on phonation you will of course find much more sophisticated descriptions and models than this. The glottal wave is not exactly an impulse train, but has a particular shape which can be parameterised – e.g., the Liljencrantz-Fant (LF) model has 4 parameters. This is beyond the scope of Speech Processing.
October 13, 2020 at 21:30 in reply to: Understanding the term “bin” #12441
Simon
Professor
Yes, that’s why there is a non-zero magnitude in the 0 Hz DFT bin.
October 13, 2020 at 19:59 in reply to: Understanding the term “bin” #12439
Simon
Professor
Is the waveform of an impulse train centred on zero?
October 13, 2020 at 18:54 in reply to: Understanding the term “bin” #12437
Simon
Professor
Yes, that’s correct – we always plot all the bins, even if their magnitude is zero. Normally we just join them up with a line for easy visualisation, but the notebooks plot them as distinct points to help you understand the process.

Your last sentence gets it: the 0 Hz basis function is a horizontal line. It’s not at 0 because that wouldn’t be much use when weighted and summed to make the signal being analysed, so it’s at an amplitude of 1.

We need this 0 Hz component to account for any offset (bias) in the signal: is it on average above zero, below zero, or centred on zero?
October 13, 2020 at 18:48 in reply to: Pitch Accent #12436
Simon
Professor
Welcome to the wonderful world of prosody! Terminology is sometimes used differently by different authors.

This is why the video ‘Prosody’ avoided getting into definitions and concentrated on the engineering application. So…keeping to what is relevant for speech synthesis, and talking only about English:

Words are made of syllables. In the pronunciation dictionary, at least one syllable in the word is marked as having primary lexical stress, and perhaps some other syllables as having secondary lexical stress.

When spoken in citation form (i.e., as an isolated word, obeying the dictionary pronunciation), the primary lexically-stressed syllable will sound more prominent than the others. The speaker will make some F0 movement on it to achieve this, and probably also make it louder and longer than usual. This is called a pitch accent. There might also be smaller pitch accents on the other lexically-stressed syllables.

In connected speech, not every lexically-stressed syllable in a spoken sentence will receive a pitch accent (there would be too many). Only some words in the sentence will be chosen by the speaker to receive a pitch accent, which will be placed on a lexically-stressed syllable.

So: lexical stress marked in the dictionary indicates syllables that might receive a pitch accent in connected speech.

Syllables that are not accented may have their vowels reduced, potentially all the way to schwa.
Author

Posts

Viewing 15 posts - 241 through 255 (of 1,087 total)

← 1 2 3 … 16 17 18 … 71 72 73 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis