Forum Replies Created
-
AuthorPosts
-
Here’s a way to
ssh
via a gateway machine (outside the University firewall) to a machine inside the firewall, in a single line. Does not require the VPN:$ ssh -t s1234567@student.ssh.inf.ed.ac.uk -t ssh s1234567@ppls-atl-0020.ppls.ed.ac.uk Password: s1234567@ppls-atl-0020.ppls.ed.ac.uk's password:
the first password request is for
student.ssh.inf.ed.ac.uk
, the second forppls-atl-0020.ppls.ed.ac.uk
.Setting up ssh keys appropriately should allow you to do this without passwords, except Informatics don’t allow ssh keys, so you need to use Kerberos – see their support pages.
This part does work though, to avoid needing a password for the lab computer: generate keys on
student.ssh.inf.ed.ac.uk
and copy toppls-atl-0020.ppls.ed.ac.uk
usingssh-copy-id
.The error “ssh_exchange_identification: read: Connection reset by peer” usually means you had a few failed login attempts in a short period of time. Wait and try later.
Please include the complete command line you are running, and the full error message, so I can help you.
Semantically Unpredictable Sentences (SUS) follow a simple template format, given in the paper along with links to word lists. From these, a simple script can be written to randomly-generate SUS.
Remember that SUS may not be necessary if you don’t have a ceiling effect on intelligibility – you will want to informally find that out before proceeding with SUS. Using SUS with a very low-intelligibility voice might lead to a floor effect!
Harvard sentences are semantically plausible and (supposedly) phonetically-balanced when used in groups of 10. They are still widely used for intelligibility testing when there is no risk of ceiling effect, such as in noise (or, in the case of this assignment, when the synthetic voice is far from perfect!).
I would expect students to continue studying even when there are no classes. Therefore, yes, I would expect you to have worked through all materials according the originally-planned class schedule.
What we actually cover in each remaining class may be adjusted to make best use of the available class time (but without scheduling additional hours to replace cancelled classes).
It’s too early for me to make an announcement about what the effect on the exam might be. I have not yet written the exam.
February 19, 2020 at 10:02 in reply to: Festival inserting extra lines when running bulk processing script #10675There might be some non-ASCII (and non-printing – therefore hard to detect) characters in a few sentences. Here’s one way to remove all non-ASCII characters
cat input.txt | iconv -c -t ASCII > output.txt
Or you could simply manually remove those sentences that get split across two lines by Festival.
F0 is real-valued. Taylor argues that this means there is a very natural way to measure the distance between two F0 values. For example, we could take their difference. I would make this argument on the basis of perception: it is clear that a larger difference in F0 values will generally produce a larger perceived difference in two speech sounds. The relationship is not linear, but at least it is monotonic.
This is in contrast to using multiple high-level features such as stress, accentuation, phrasing and phonetic identity. It is not at all clear what distance metric we should use here, for reasons including:
- they are not real-valued
- we don’t know their relative importance
- we don’t know if/how they are correlated with one another
- the relationship with perception is not so obvious as for F0
make_mfcc_list uses utts.data as its source of filenames, so perhaps you have modified that?
Weijia W: Adding a word to the script like that is a very good technique – this is exactly the right line of thinking to explore what is happening in each step of voice building.
Bingzi Y: you are right – “prprprfg” wasn’t a good choice of “word” because this would have been classified by Festival as a NSW and expanded into something else (perhaps treated as a LSEQ?). Your “Moschops” is a better choice because this is clearly a possible word in English (in fact, it happens to be a real word in this case).
You need to carefully distinguish two very different ways in which we save computation in both DTW and the Viterbi algorithm for HMMs.
Dynamic Programming: this algorithm efficiently evaluates all possible paths (= state sequences for HMMs). All paths are evaluated, and none are disregarded. This algorithm is exact and introduces no errors compared to a naive exhaustive search of the paths one at a time.
Pruning: this involves not exploring some paths (state sequences) at all. In DTW, this means that we will not visit every single point in the grid. In the Viterbi algorithm for HMMs implemented as token passing, it means that not all states will have tokens at all time steps. Pruning introduces errors whenever an unexplored part of the grid would have been on the globally most likely path.
In Dynamic Programming, we talk about “throwing away” all but the locally best path when two or more paths meet. The paths that are “thrown away” have already been evaluated up to that point. Extending those paths further would involve exactly the same computations as extending the best path. So we are effectively still evaluating all paths. We save computation without introducing any error: that’s the magic of Dynamic Programming.
This is not the same as pruning, in which we stop exploring some of the locally best paths, because there is another path (into another point on the DTW grid, or arriving at a different state in the HMM) that is much better.
Your first explanation of search space is correct.
Token passing is an algorithm, not a model.
Tokens generate the given observations and – in doing so – we compute the probability of that observation being generated from the state’s pdf. Yes, we just “look up” that probability.
December 10, 2019 at 10:57 in reply to: Question 8 — why finite state language module popular #10540You are right – writing a language model by hand for a “very-large-vocabulary” would be impractical, and it would be impossible to manually set appropriate probabilities on all the transitions.
December 10, 2019 at 10:56 in reply to: Question 24, computation efficiency of Euclidian distance vs Gaussian #10539In the question, we are not being asked about actually doing classification, but just about the computational cost of calculating a distance vs calculating a probability.
Correct – the Euclidean distance measure is not learned from the data.
Pronunciation model is another name for the word model: a model of words that emits a phoneme sequence.
-
AuthorPosts