If you are making several variants with and without your additional sentences, think about whether you should keep the total size of the database the same in all cases.
Exclude data the simple way
This is as simple as commenting out lines in utts.data
– just put a semicolon at the start of the line:
( arctic_a0001 "Author of the danger trail, Philip Steels, etc." ) ( arctic_a0002 "Not at this particular case, Tom, apologized Whittemore." ) ; ( arctic_a0003 "For the twentieth time that evening the two men shook hands." )
You can now restart Festival and load the voice.
Exclude data the correct way
The simple method above is acceptable for the purposes of this exercise. But, it won’t reveal the full effect of a smaller database because the alignment of the labels is not changed. A smaller database will also have an impact (probably negative) on the quality of the alignments, so the correct way to remove utterances is to actually delete the lines from utts.data
(make a backup copy first) and re-run the forced alignment. You will then need to rebuild the utterance structures to incorporate the new label timestamps.
(There is no need to re-run the pitch marking, F0 estimation, MFCC extraction, or LPC stages)