Worked example 2 – phrase breaks

Another worked example in which you have to decide what predictors to use, then build the tree.
  • The task

    We want to predict the locations of phrase breaks, just from text. Our chosen method is a simple form of machine learning: a classification tree (CART).

  • The training data

    Supervised machine learning starts with training data, labelled with the value of the predictee. You now need to decide what features (predictors) to extract.

  • The questions

    To use your chosen predictors in a CART, you need to devise binary yes/no questions that query their values.

  • The root node

    All of the training data starts at the root node.

  • Build the tree

    The tree building algorithm recursively partitions the data using questions.

  • Label the test data

    The tree can now be used to make predictions for unseen test data, where only the predictors' values are known.