All of the training data starts at the root node.
Place all your data at the root node and compute the entropy. This is a measure of how predictable the value of the predictee is at this point. I’ll start you off:
The value “BREAK” occurs 12 times so its probability is 12/34 which is about 0.35.
The value “NO BREAK” occurs 22 times so its probability is 22/34 which is about 0.65.
Now compute entropy using “- sum of p log p”. Make sure you know how to do this for yourself first. Then, to save time you could use this entropy calculator.
Video to be added after the lecture…