Label the test data

The tree can now be used to make predictions for unseen test data, where only the predictors' values are known.

Now use your CART to predict where phrase breaks occur in the test data. You’ll first need to extract the same features (predictors) as for the training data, of course. After you have made your predictions of the test data labels, compare them to the correct labels for the test data and compute the accuracy. How often does your tree get it right?

Video to be added after the lecture…