› Forums › Speech Synthesis › The front end › CART › Word frequency
- This topic has 1 reply, 2 voices, and was last updated 4 years, 2 months ago by Simon.
-
AuthorPosts
-
-
October 17, 2020 at 09:58 #12544
In the video “Learning Decision Trees”, 10m33s-10m54s,
I was intuitively surprised that ‘sch’ wouldn’t be most commonly seen as /sk/ (e.g. school, scheme).
Perhaps this is because the algorithm only looks at type frequency (counts within dictionaries) and not token frequency (counts within corpora); and the examples for <sch>==/sk/ tend to be at the steep end of the Zipf curve. Is that so?
Wouldn’t it be better to factor in token frequency, so as to maximise correct judgements across naturalistic output?
I wondered if my intuition is skewed by the human mind’s preference for recalling word-initial features, so I don’t tend to notice examples like <kitsch>. But a scan of a dictionary (https://www.thefreedictionary.com/words-containing-sch) makes me think that isn’t the issue, as most instances of <sch> are word-initial.
-
October 18, 2020 at 18:01 #12576
The approach to G2P in the video predicts exactly one phone for every letter (not a sequence of letters to a sequence of phones, as you suggest). This is a small simplification for the purposes of explanation – the video is about decision trees as a model, not G2P as a problem. A real G2P system using a CART, such as the one in Festival, would typically still process the input one letter at a time but predict 0, 1 or 2 phones. A state-of-the-art G2P model would indeed map the complete letter sequence to the complete phoneme sequence.
The video uses real data and so these frequencies are correct, as per the dictionary being used.
You suggest using data comprising word tokens from a corpus rather than word types from a dictionary. This seems reasonable, except that the goal of the CART is not to predict pronunciation for high-frequency words (they are very likely to be in the dictionary, after all), but to predict for previously unseen words.
That gives rise to the rather difficult question: what test set should we measure accuracy on? We have little choice but to hold words out of the dictionary for this purpose, but will they be representative of the yet-to-be-seen words we have to deal with later? How could we know?
So, it’s not so obvious whether types or tokens (= frequency-weighted types) make the best training data. Try it for yourself, both ways?
-
-
AuthorPosts
- You must be logged in to reply to this topic.