CART – Distinguishing between majority and pure labels

This topic has 1 reply, 2 voices, and was last updated 9 years, 5 months ago by Simon King.

Viewing 1 reply thread

Author

Posts
- October 9, 2016 at 12:02 #5270
  Rose W
  Student
  In the CART model, is there any way of distinguishing between pure labels and majority labels to help you work out the likelihood that the unlabelled data will be classified correctly?
- October 9, 2016 at 13:34 #5272
  Simon King
  Professor
  I think you are asking about the distribution of labels at a leaf of the tree – is that what you mean?
  
  In general, with real data, we will not get pure leaves (i.e., all data points have a single label). So, we can say that there is always a distribution of labels at every leaf.
  
  The question then becomes: how do we make use of that, when making predictions for unseen test data points? There are two possibilities:
  1. give the test data point the majority label of the leaf that it reaches
  2. give the test data point a probabilistic label (i.e., a distribution across all possible label values) of the leaf that it reaches
  In the second case, some subsequent process will have to resolve the uncertainty about the label – perhaps by using additional information such as the sequence of labels assigned to preceding and following points (in a sequence).
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.