Weighted/unweighted sum

This topic has 1 reply, 2 voices, and was last updated 8 years, 4 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- October 12, 2016 at 13:49 #5466
  Dina K
  Student
  I’ve encountered this term on Jurafsky & Martin (somewhere about probabilities, I think Ch. 5?) and it was mentioned on one of the videos as well. I’m afraid I still can’t understand what it means or what’s the difference between the two.
- October 16, 2016 at 17:11 #5479
  Simon
  Professor
  A weighted sum gives a weight (or “importance”) to each of the items being added together, than others. So,items larger weights have more effect on the result and vice versa.
  
  In the CART training algorithm, a weighted sum is used to compute the total entropy of a possible partition of the data. The weighting is needed to correct for the fact that each side of the partition (the “Yes” and “No” branches) might have differing numbers of data points, and to make the result comparable to the entropy at the parent node. We set the weights in the weighted sum to reflect the fraction of data points on each side.
  
  Imagine this example:
  
  We have 1000 data points at a particular node in the tree, and the entropy here is 3.4 bits.
  
  We try a question, and the result is that 500 data points go down the “No” branch and 500 data points go down the “Yes” branch.
  
  This question turns out to be pretty useless, because the distribution of predictee values in each branch remains about the same as at the parent node. So, the entropy in each side is going to be about 3.4 bits.
  
  An evenly weighted sum of these two values would give the wrong answer of 7.8 bits. We need to do a weighted sum:
  
  (0.5 x 3.4) + (0.5 x 3.4) = 3.4 bits
  
  The same argument holds whatever the entropy of the two branches, and whatever proportion of data points goes down each branch.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Weighted/unweighted sum

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis