Unit selection is a search problem

The key difference with old-fashioned diphone synthesis is that the database now contains multiple examples of each diphone type. Therefore, the problem of synthesis becomes one of selecting from amongst many possible unit sequences.
  • From diphones to unit selection

    A quick recap of diphone synthesis and its limitations: we need to use a lot of signal processing, this can only really manipulate F0 and duration, and anyway we are basing those manipulations on relatively poor predictions from simplistic models. The signal processing also produces artefacts that degrade the waveform quality.

  • Context-dependent "beads-on-a-string"

    The entire concept of unit selection rests on the assumption that speech can be broken down into a sequence of non-overlapping units, like the beads on a necklace. That assumption is too strong, but we can mitigate the problems of this assumption by using context-dependent units.

  • A first look at the database

    A more complete description of how the database in constructed comes later, but we need some idea of what it contains before looking at the search procedure.

  • The search procedure

    The core algorithm is a simple search. Because of the way the cost functions are formulated, we can use Dynamic Programming to do this very efficiently.