CART: measuring and using entropy

This topic has 3 replies, 3 voices, and was last updated 9 years, 7 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- October 9, 2015 at 11:32 #282
  Qianchu L
  Student
  In CART, to determine which split is the best, we would measure entropy and compare the entropy values of different features/questions (for example between “Is it mainly red” and “Is it mainly yellow?” in the video example). Because for each question, the node will split into two branches (eg. “mainly red” and “not mainly red”), do we actually measure two entropy values under both “mainly red” and “not mainly red” and calculate a sum or average value of the entropy for the question (eg “Is it mainly red”?)? Then we could compare this value across different features to determine which is the best split?
- October 12, 2015 at 08:50 #284
  Simon
  Professor
  You are correct – we measure the entropy of each half of the split, and sum these values (weighted by occupancy) to get the total entropy value for that question.
- October 26, 2015 at 10:06 #417
  Verity S
  Student
  When making CART trees, you said we use the weighted sum of entropy across the branches to work out the total entropy after a split. What equation do we use for this?
- October 26, 2015 at 11:11 #420
  Simon
  Professor
  The weights are simply the fractions of data points in each side of the split. So, we compute entropy as usual for each side (“yes” vs “no”) and then when we sum these two values, we weight each of them by the fraction of the data that went down that branch (e.g., if 1/3 of the data points had “yes” as the answer to the question under consideration, then we would weight the “yes” side’s entropy by 1/3 and the “no” side’s by 2/3, then add them together).
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

CART: measuring and using entropy

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis