Page 13

Forum Replies Created

Viewing 15 posts - 181 through 195 (of 1,087 total)

← 1 2 3 … 12 13 14 … 71 72 73 →

Author

Posts
November 17, 2020 at 15:15 in reply to: LPC for voice conversion #13111
Simon
Professor
Yes, that’s a reasonable approach.

In this classic paper on voice conversion they use the cepstrum to represent the spectral envelope rather than LPC co-efficients.

In this paper, we do use linear prediction as the parameterisation of the spectral envelope. But we don’t use the co-efficients of the difference equation (the LPC parameters) directly – we transform them to another representation called line spectral frequencies (LSFs) for reasons explained in the paper.

Rather than use the source speaker’s residual (which is one option), we predict the residual from the converted spectral envelope.
November 17, 2020 at 14:59 in reply to: Online Access For Jurafsky & Martin #13108
Simon
Professor
If you are Edinburgh, please remember that there are plenty of hardcopies for loan in the main library.
November 17, 2020 at 14:56 in reply to: Speech Zone Logo #13107
Simon
Professor
Yes – you can read waveforms!
November 17, 2020 at 14:54 in reply to: Is filterbank removable? #13106
Simon
Professor
Yes – this is a very reasonable proposition. Like many good ideas, it has been tried. Here’s a paper from Tokuda (most famous for speech synthesis) et al on what they call Mel-Generalized Cepstral Analysis – this also shows the relationship between the cepstrum and LPC analysis. (This is well-beyond the scope of the Speech Processing course!)
November 17, 2020 at 08:55 in reply to: Online Access For Jurafsky & Martin #13101
Simon
Professor
I have sent you a personal message in Teams. Any other students in the same situation should contact me for personal advice.
November 17, 2020 at 08:47 in reply to: 2013-2014 Part 2, Question 1 #13100
Simon
Professor
This answer really needs to be in the form of a diagram. Try uploading one here, annotated with the terms you use in your written explanation, and I’ll check it for you.
November 16, 2020 at 15:54 in reply to: Spectral Envelope Confusion #13091
Simon
Professor
The cepstrum is created after the cosine transform – the cepstral coefficients are the coefficients of that series expansion (the weights on the cosine basis functions).

Any form of source-filter separation has to make some assumption about either the form of the filter, or of the source (or both). Otherwise, the problem is insoluble.

For this discussion, let’s assume “the vocal tract filter’s frequency response” and “the spectral envelope” are the same thing.

We assume the lower cepstral coefficients represent the vocal tract filter’s frequency response because they capture the slower-changing (with respect to frequency) components of the spectrum. We are making the assumption that the vocal tract filter’s frequency response is rather slowly-changing (with respect to frequency).

You are right that the mel-scale filterbank was designed to smooth away F0, so in fact the truncation step might not be needed for that purpose. Nevertheless, we still want to truncate so that we have small number of features in our final feature vector. Truncation serves multiple purposes, of which “removal of any remaining traces of F0” is just one.

You are right that we can only observe the vocal tract filter’s frequency response (which includes the formant peaks) at frequencies where the source has energy. For voiced speech, that means at the harmonics. But the vocal tract filter’s frequency response exists at all frequencies, even between harmonics – it’s just that we can’t see it directly. Think of it as “joining the dots” that are the harmonics.
November 16, 2020 at 14:45 in reply to: Online Access For Jurafsky & Martin #13089
Simon
Professor
I’m sorry about this. The publisher of this book has a very unusual pricing policy that means the library is unable to buy more viewing ‘credits’ for this copy, and additional copies are prohibitively expensive (nearly £2000 and still only allowing 100 views for that copy!).
November 16, 2020 at 09:11 in reply to: Alignment length in dynamic programming grid #13083
Simon
Professor
The lengths of both the exemplar (template) and unknown can vary. The dynamic programming algorithm works in all cases.

But, during the process of alignment, for a particular pair of exemplar+unknown, the lengths of both are known and are constant.
November 15, 2020 at 09:45 in reply to: Features in Chapter 8-Holmes&Holmes #13066
Simon
Professor
They are pointing to a problem with simple distance metrics such as the Euclidean distance. This metric assumes all dimensions of the feature vector are equally important and simply sums up the squared differences between corresponding elements in the two vectors being compared.

This is sensitive to the scale of each element.

Take the example of filterbank energies as our feature vector, and that – in general and on average across all the data – the amount of energy in the 2nd filter is around 10 times larger than that in the 11th filter. (Look at a typical speech magnitude spectrum to see why this could be the case.)

The 2nd element of the feature vector will contribute about 10 times as much to the total distance being calculated as the 11th element. It is being treated as more important.

One solution to this would be to weight the elements as we sum them up in the Euclidean distance, to balance their contributions according to how important we think they are.

This is precisely what the Gaussian distribution does for us: it is what the standard deviation parameter is for. This scales each dimension of the distance calculation according to the amount of variability we see along that dimension for the class we are modelling.

Out of scope for this course, but something you will see in the literature, is a scaled Euclidean distance called the Mahalanobis distance. That is the same form that appears in the exponent of the Gaussian equation.
November 15, 2020 at 09:23 in reply to: CW2 Files #13062
Simon
Professor
You can enable file sharing between VM and host in the VMWare settings. We generally recommend working in the VM because everything you need is there.

However, it is possible to work on your own linux machine if you have the skills to compile HTK from source. You will need to download your own copy of HTK, and agree to the license.

If you copy the data from the VM to your local machine, you must delete it at the end of the course.
November 11, 2020 at 14:33 in reply to: Errors running initialise_models #13033
Simon
Professor
You also need to uncomment the line setting the path to all the data:

# DATA=${DATA:-/Volumes/Network/courses/sp/data}

This week’s B tutorial will explain the assignment – you don’t need to start working on it until after that tutorial.
November 11, 2020 at 10:30 in reply to: Alignment length in dynamic programming grid #13030
Simon
Professor
The path you show is perfectly valid, although of course it seems a rather unlikely one and will probably result in a large global distance (indicating that the template and unknown are probably not very similar). The number of local distances summed up to make the global distance can indeed vary depending on the path taken through the grid.

The two sequences of feature vectors can be of differing lengths – this will generally be the case, in fact.
November 7, 2020 at 18:09 in reply to: Virtual Machine not working #13015
Simon
Professor
Try forcibly quitting the VMWare host program. On a Mac this would be the Apple menu (top left of your screen), then “Force Quit…”, or via the Activity Monitor. On Windows, use the Task Manager.
November 7, 2020 at 17:08 in reply to: CW2 Files #13013
Simon
Professor
Work in the virtual machine. HTK is already installed. You do not need Wavesurfer because this year we are going to skip the data collection part.
Author

Posts

Viewing 15 posts - 181 through 195 (of 1,087 total)

← 1 2 3 … 12 13 14 … 71 72 73 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis