- This topic has 1 reply, 2 voices, and was last updated 8 years, 2 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › The front end › CART › Weighted/unweighted sum
I’ve encountered this term on Jurafsky & Martin (somewhere about probabilities, I think Ch. 5?) and it was mentioned on one of the videos as well. I’m afraid I still can’t understand what it means or what’s the difference between the two.
A weighted sum gives a weight (or “importance”) to each of the items being added together, than others. So,items larger weights have more effect on the result and vice versa.
In the CART training algorithm, a weighted sum is used to compute the total entropy of a possible partition of the data. The weighting is needed to correct for the fact that each side of the partition (the “Yes” and “No” branches) might have differing numbers of data points, and to make the result comparable to the entropy at the parent node. We set the weights in the weighted sum to reflect the fraction of data points on each side.
Imagine this example:
We have 1000 data points at a particular node in the tree, and the entropy here is 3.4 bits.
We try a question, and the result is that 500 data points go down the “No” branch and 500 data points go down the “Yes” branch.
This question turns out to be pretty useless, because the distribution of predictee values in each branch remains about the same as at the parent node. So, the entropy in each side is going to be about 3.4 bits.
An evenly weighted sum of these two values would give the wrong answer of 7.8 bits. We need to do a weighted sum:
(0.5 x 3.4) + (0.5 x 3.4) = 3.4 bits
The same argument holds whatever the entropy of the two branches, and whatever proportion of data points goes down each branch.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in