Forum Replies Created
-
AuthorPosts
-
Token_POS
is where Festival stores additional disambiguation information needed for certain homographs where POS alone is not sufficient (e.g., “read”).http://www.cstr.ed.ac.uk/projects/festival/manual/festival_15.html#SEC59
Yes. The tokens “e.g.” and “i.e.”, which are abbreviations for Latin expressions, should be read as letter sequences (LSEQ). Text normalisation can be expected to handle them correctly.
(Since these are very common, many dictionaries will include them as “words”)
Perhaps you have some padding space within your figure, and it really is full line width already? Can you crop it in whatever tool you used to create it?
Or try giving an actual dimension, perhaps
\includesvg[width=10cm]{yourfigurefile}
or
\includesvg[width=10cm, svgpath=yourfigurefile]
If that fails, then export to PDF (not PNG).
In general, no.
Anything that you might be able to find in a dictionary (if you imagine having a really huge dictionary) is a Standard Word (even if our particular dictionary doesn’t include it).
Another way to decide what is Standard Word might be to say that its pronunciation has to be determined directly from its spelling, using the same method as for all other Standard Words, without any other processing first.
Anything that you would not expect to find in the dictionary, however large it was, is a Non-Standard Word (NSW). These need converting to Standard Word(s) before attempting dictionary lookup – and that process is called normalisation.
For example, I just made up the word “Simonification”. If that got into common usage (it’s possible!) then one day dictionary writers would include it in their dictionaries. So, that’s a Standard Word, even though no current dictionary yet includes it.
In contrast, no dictionary would ever attempt to include
- £1.25
- £1.26
- £1.27
- etc.
so those are NSWs.
October 25, 2019 at 15:01 in reply to: Linear Predictive Coding (LPC) – what is the residual? #10028The residual is a special waveform. It is what you need to input to the filter in order to exactly reconstruct the speech signal.
The filter is not a perfect simulation of the vocal tract. The vocal folds also do not generate a perfect impulse train. Therefore, putting a impulse train through an LPC filter will not produce perfectly natural speech.
We really like the simple form of the filter because it is easy to solve equations to find its coefficients, given a frame of natural speech waveform. So, we account for the imperfections in the model by replacing the pulse train with the residual, which contains all the information “left over” from the original speech waveform that the filter was not able to model (that’s why it’s called the “residual”).
Join smoothing:
- we can manipulate the filter coefficients of a few frames around each join, if we wish to remove discontinuities in the spectral envelope.
- we can use PSOLA on the residual (it’s just a waveform) to manipulate the fundamental frequency, if we wish to remove pitch discontinuities across joins
So, in residual-excited LPC (RELP) we still need to use PSOLA to manipulate F0 and duration of the residual! Why not do that processing on the speech waveform (i.e., TD-PSOLA)? It’s because PSOLA works better on residual waveforms than on speech waveforms. This is because the residual is closer to an impulse train and so overlap-add creates fewer artefacts.
October 25, 2019 at 13:58 in reply to: Referencing several parts of the same work (e.g., chapters in a book). #10026If the whole book is by the same author(s), it should appear only once in the bibliography.
If the book is a collection of chapters by different authors, then each chapter should be a separate entry in the bibliography, under the appropriate author(s).
When citing any longer work, and especially a book, you should narrow down the citation at the point where you cite it, to help your reader precisely locate the material you are implicitly asking them to read. This could be to a chapter, section, or page(s), such as:
“Text processing for TTS usually involves a sequence of diverse operations. Some are as simple as splitting sentences into tokens, others as challenging as disambiguating homographs (Taylor, 2009, Section 4.1).”
Omitting the section number in the above example implies that you expect the reader to read the entire book in order to locate the material supporting your claim. That is an easy way to annoy your reader.
I generally prefer to use section or chapter rather than page number because this should be more consistent across different editions, the e-book version, and so on.
Citing an entire long work would only be appropriate if it really is the entire work that supports your claim, such as:
“Taylor (2009) provides a comprehensive overview of the state-of-the-art in TTS as it was in 2009, but there have been significant developments since then.”
October 25, 2019 at 11:25 in reply to: Linear Predictive Coding (LPC) – what is the residual? #10022The spectral envelope of the speech signal (the output of the source-filter model) is equal to the frequency response of the filter. That is, the filter is solely responsible for shaping the speech spectral envelope.
In the frequency domain, we multiply frequency responses together, so the residual spectrum has to be flat so that when it is multiplied by the filter frequency response we get the speech spectral envelope.
In the idealised (simplified) version of the model, the residual is replaced with a train of impulses. This has a flat spectral envelope, as you will have discovered in the lab when analysing that signal. It has harmonic structure, but all harmonics are of the same amplitude.
There is a note on the library page for the ebook that says “Access to this ebook is unavailable until 04/07/2020” – this is probably because the publisher has withdrawn it, or we have exceeded the number of views allowed.
This book is published by a very strange and uncooperative publisher who won’t let us purchase any more “copies” of the ebook and limits the number of views (per year, I think).
You’ll have to do things the normal way (you might think “old fashioned”) and actually go to the library to read the book ! There are copies on reserve.
You might also ask classmates if they have long-term loan copies or their own copy, and ask to share.
Note: I expect taught MSc students on degrees including SLP and some of the Informatics programmes to buy this book and not use library copies.
The filter parameters control the spectral envelope (e.g., the formants) and so this is what we manipulate to smooth joins. So we do alter the values of the filter coefficients (which are model parameters).
There is an answer to your question in Forums > Speech Processing – Exam revision > Top Hat questions
October 15, 2019 at 13:46 in reply to: Installing the Scottish male voice for festival at home #9975No, those are the instructions for actually building a new voice. That’s the coursework for Speech Synthesis in semester 2.
To install a voice, you simply need to copy the appropriate files from the computers in the lab. Korin is the best person to write instructions for that, so I’ll ask him to do so.
That’s a good idea and I will try to do this.
It will take a little time because the 3rd edition looks substantially different from the 2nd edition and therefore I need to read it and compare the content to the current readings.
The official readings (i.e., what is examinable) will remain those from the 2nd edition.
All classes are recorded and can be accessed via the Learn page for the course.
You are probably using a too short analysis window, which is preventing you from resolving the spectrum with enough detail.
In Wavesurfer’s spectrum window, the parameter to change is “FFT points”. Try a larger value such as 4096 (which would be about 0.25 seconds at a 16 kHz sampling rate) or 8192.
-
AuthorPosts