The join cost measures potentially-audible mismatch at the points where candidate units from the database are joined. To make the runtime synthesis faster, we can precompute the acoustic features that are used by the join cost.
Festival’s join cost measures mismatch in both the spectrum (represented as MFCCs) and F0. So, we will now normalise and combine MFCCs and F0 into a single file per utterance:
bash$ mkdir coef bash$ make_norm_join_cost_coefs coef f0 mfcc '.*.mfcc'
and, since the join cost is only ever evaluated using the first and last frames in each candidate unit, those files can now be stripped of all values that are not close to diphone boundaries; this makes them much smaller and therefore faster to load into Festival:
bash$ mkdir coef2 bash$ strip_join_cost_coefs coef coef2 utt/*.utt