- This topic has 3 replies, 3 voices, and was last updated 8 years, 9 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › The front end › CART › CART: measuring and using entropy
In CART, to determine which split is the best, we would measure entropy and compare the entropy values of different features/questions (for example between “Is it mainly red” and “Is it mainly yellow?” in the video example). Because for each question, the node will split into two branches (eg. “mainly red” and “not mainly red”), do we actually measure two entropy values under both “mainly red” and “not mainly red” and calculate a sum or average value of the entropy for the question (eg “Is it mainly red”?)? Then we could compare this value across different features to determine which is the best split?
You are correct – we measure the entropy of each half of the split, and sum these values (weighted by occupancy) to get the total entropy value for that question.
When making CART trees, you said we use the weighted sum of entropy across the branches to work out the total entropy after a split. What equation do we use for this?
The weights are simply the fractions of data points in each side of the split. So, we compute entropy as usual for each side (“yes” vs “no”) and then when we sum these two values, we weight each of them by the fraction of the data that went down that branch (e.g., if 1/3 of the data points had “yes” as the answer to the question under consideration, then we would weight the “yes” side’s entropy by 1/3 and the “no” side’s by 2/3, then add them together).
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in