Handwritten rules

Every user of a language holds a lot of knowledge about that language in their mind. One way to capture and make use of that knowledge is in the form of rules.

This video just has a plain transcript, not time-aligned to the videoThe first stage in Text-to-Speech is to tokenise the written form and normalise it into a sequence of words.
We've already seen that that's non-trivial because the relationship between written form and underlying words can contain a lot of ambiguity.
That means we now need to develop some techniques for performing tokenisation and normalisation, and some subsequent tasks that are coming up, such as determining pronunciation.
I've already asked you to think quite a lot about the methods that you might use.
I've given you some strong hints that sometimes the best solution involves using your own linguistic knowledge, or the linguistic knowledge in the mind of a native speaker of the language, and capturing that in a form that can be implemented in software.
An obvious form to capture that knowledge would be as rules.
I'm going to call them here 'handwritten rules' to distinguish them from other rules and rule-like systems that might be learned from data that we'll encounter further on.
Let's try tokenising a sentence by rule.
I've written it here in a programming language: in Python.
Don't worry if you don't know Python; you should still be able to read this code.
We've got some input and we're just going to scan through the characters.
If the input character is a white space, we'll split.
In other words, there's just one rule, and the rule is here.
If the character is equal to white space, we split.
Now that's such a simple rule, there's no problem implementing it in this little fragment of software.
That works fine for this really simple rule.
But it's actually very bad engineering practice, because there's no separation between where the rules are and the general engine that applies the rules.
The rules are deep inside the code here.
The engine that applies them is the thing that scans through the characters and applies the rule character by character.
If we decide this rule is not quite good enough - we want to change it - we have to change code.
It would be much better to have some separation between the code and the rules.
There are lots of different formalisms for writing rules.
We're not going to get too hung up on the fine details of them, but here's one.
They're called context-sensitive rewrite rules.
I'm not using any particular notation for writing these.
Hopefully it's intuitive and easy to understand.
The rules are here, and they say that if you've got this token, then it rewrites to the following characters if you find it in the context of a capitalised word to the left and anything at all to the right.
So these rules mean if you find these characters after a capitalised word, it should be 'Drive' and if you find it before a capitalised word it should be 'Doctor'.
The big advantage of rules is that they're human-friendly.
We can write them down by hand and other humans can read them and understand them.
But there is a downside, and one downside is that they're going to be sensitive to the order in which you apply them.
In particular, if we come across a case where 'Dr.' is after a capitalised word and before a capitalised word, the order of application of the rules will change the result.
Now that might be a rare case, but still it's a bad property of rules.
So these are just rules, and we would have some general-purpose software that applies them.
If we wanted to, we could update them at any time, or swap them out for a set for another language, without changing the software - the code - at all.
That's good engineering.
Context-sensitive rewrite rules examine the immediate context, but sometimes we might need to look across the entire sentence.
Here are some rules of a similar form that do that.
This rule says that, if you find this spelling, you should annotate it as being a fish if any of these other words occur in the same sentence.
But if you find any of these words in the same sentence, you should annotate as the musical term.
I'm not claiming these are comprehensive rules or even very good rules.
Again, don't get too hung up on the way of writing the rule down.
I'm not using any particular notation.
Just understand the general concept that there are some things we could write down rules for, if we think we can come up with a comprehensive set of rules and, in this case, a comprehensive set of these trigger words that tell us how to disambiguate 'bass'.
For some problems in spoken language processing, handwritten rules are the right thing to do because they have some nice properties.
They directly capture linguistic knowledge from a native speaker, and that's acquired over a lifetime of experience.
That speaker has distilled all of their data that they've been exposed to - all the language they've experienced - into some internal representation, which we ask them to express directly as a rule.
So we're effectively using a very large amount of data there, without having to go and gather that data directly.
They're going to be computational efficient, they are very small to store, and very fast to apply.
They're interpretable, so we can write them down and then come back to them later and still understand them, and modify them, and improve them.
That's not going to be true for many forms of machine learning.
Context-sensitive rewrite rules and collocation rules are one option.
But there is a more general formal framework in which you can write down something that looks like rules, and that's called a 'finite state transducer'.
That's a very important class of model because it could be created by hand, either directly or by compiling a finite state transducer from some other formalism, such as a set of rules.
Or, importantly, it could be learned automatically from data.
So next we could look a finite state transducers for performing some text processing tasks in speech synthesis.
But much later on, we'll come back to finite state models again when we do Automatic Speech Recognition.
Whilst rules are sometimes the right answer, they're not always.
In the case of rules, we can very quickly end up with large sets of rules with complicated interactions, so their output is sensitive to the order in which we apply them, and then they become much harder to maintain.
Rules are just one tool in our toolbox.
For some languages, handwritten rules are sufficient for determining the pronunciation of words.
Unfortunately, English is not one of those languages, and so that's a case where we're going to need to use a model that can be learned from data.
For that we'll choose a decision tree.

Log in if you want to mark this as completed
This video covers topics:
Excellent 93
Very helpful 3
Quite helpful 7
Slightly helpful 3
Confusing 0
No rating 0
My brain hurts 1
Really quite difficult 0
Getting harder 1
Just right 96
Pretty simple 8
No rating 0