- This topic has 1 reply, 2 voices, and was last updated 4 years, 9 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › The front end › Sentence tokenization in festival.
I’m a bit unclear on how sentence tokenization works in festival…
In the Jufrafsky reading it says sentence tokenization algorithms are “trained on machine learning methods rather than being hand built” (+ this explnation is used in one of the examples in the feedback)
But in the festival manual (section 9.1) it says about the utterance chunking decision tree “these are heuristics and written by hand not trained from data”.
Have I got confused about setence tokenization vs. utterance chunking?
Yes, J&M (2nd edition, Section 8.1.1) are discussing segmenting a longer text into sentences, rather than dividing a sentence into tokens for further processing. The former is the harder problem, for the reasons they explain.
They don’t explicitly discuss the latter, but imply that hand-written rules using whitespace and punctuation would be enough, given that this is what happens to the entire text before a classifier is used to find End-Of-Sentence boundaries.
Festival will process multi-sentence text, although its internal data structure assumes a single utterance and there is no representation of “sentences” within an utterance.
So, for the purposes of this assignment you should restrict yourself to isolated single sentences.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in