› Forums › Speech Synthesis › Festival › Join Costs
- This topic has 5 replies, 2 voices, and was last updated 8 years, 10 months ago by Simon.
-
AuthorPosts
-
-
February 25, 2016 at 14:50 #2647
From lecture 2:
Slide 53, you mention that we can bias against joins in difficult places (e.g., vowels before an [r] ). Does festival do this?
Slide 55: you state ‘Even for a naturally-occurring sequence, cost will not be exactly 0’. This makes sense – if frames are 10ms apart, we would expect some small degree of change to show up in either MFCC, energy, or F0, pretty much all the time. However, festival frequently reports ‘0’ for join costs, for diphones that are originally-contiguous from the same utterance. Does festival use a kind of threshold, below which it simply gives the cost as ‘0’? Or is this another kind of bias being employed, towards originally-contiguous units?
In slide 88 you mention the possibility of adding context to join costing: inducing a dependency in both directions: unit choice depends on both previous and succeeding unit choices. Is festival using this idea?
-
February 25, 2016 at 18:39 #2649
Festival doesn’t do anything to bias against joins in [r] etc – but commercial systems certainly do.
The join cost for naturally-contiguous units is simply defined to be zero and isn’t even calculated.
Festival computes join cost entirely locally, just from the frames either side of the join.
-
February 25, 2016 at 21:23 #2650
ahhh ok – so this takes us back to the idea that we discussed in the lab. Festival is currently mildly biased towards naturally-contiguous units, as they are always zero join cost. So we could, in theory, increase this bias by scaling up the actual MFCC vector values in the make_norm_join_cost_coefs script, which should have the effect of increasing the euclidian distance between all NON-naturally contiguous joins, which will raise their costs, but because they all go up by the same relative amount, their ‘cost relationship’ stays the same. Thus, no net effect on unit selection EXCEPT that zero-join-costs (naturally-contiguous diphones) are now more likely to ‘win’, as their zero-join-cost becomes more valuable in offsetting non-zero target costs. Am i getting that right?
-
February 26, 2016 at 08:00 #2652
Yes, I think that would work. Changing the normalisation of the join cost coefficients (not just MFCCs – also F0 and energy) effectively changes the relative weight between join cost and target cost.
Try making the join cost coeffs very small – you should get more joins (fewer contiguous sequences of candidate), and therefore presumably more bad joins.
Try making them rather large, and you should get more contiguous sequences of candidates, but which match the target context less well.
-
February 26, 2016 at 17:34 #2655
So…I tried making the join costs ‘rather large’. I scaled by a factor of, 10, then 100, then…1000. This definitely caused festival to make some different unit choices, but not as we would have predicted. I am including the Unit relation output here (also including a screenshot attached). I chose a sentence from the Arctic script, which is in the database in its entirety. We would expect festival to chose diphones from only that utterance if the join costs for NOT doing so were sufficiently high. It did not behave this way. Please see festival output below. Please note very large join costs (in the hundreds). Can you shed some light on this?
id _88 ; name #_n ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 9 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 0.92 ; target_cost 0.0833333 ; join_cost 0 ; end 0.139313 ; num_frames 15 ;
id _89 ; name n_aa ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 7 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 1.024 ; target_cost 0 ; join_cost 0 ; end 0.269188 ; num_frames 18 ;
id _90 ; name aa_t ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 11 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 1.174 ; target_cost 0 ; join_cost 0 ; end 0.346563 ; num_frames 12 ;
id _91 ; name t_a ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 1 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 1.186 ; target_cost 0 ; join_cost 0 ; end 0.380876 ; num_frames 5 ;
id _92 ; name a_t ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 4 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 1.24 ; target_cost 0 ; join_cost 0 ; end 0.494875 ; num_frames 15 ;
id _93 ; name t_dh ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 0 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 1.338 ; target_cost 0 ; join_cost 0 ; end 0.520125 ; num_frames 2 ;
id _94 ; name dh_i ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 1 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 1.374 ; target_cost 0 ; join_cost 0 ; end 0.557625 ; num_frames 4 ;
id _95 ; name i_s ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 2 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 1.404 ; target_cost 0 ; join_cost 0 ; end 0.654688 ; num_frames 12 ;
id _96 ; name s_p ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 8 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 1.574 ; target_cost 0 ; join_cost 0 ; end 0.758375 ; num_frames 10 ;
id _97 ; name p_@r ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 1 ; source_utt arctic_b0233 ; source_ph1 “[Val item]” ; source_end 2.948 ; target_cost 0.3125 ; join_cost 313.7 ; end 0.822875 ; num_frames 7 ;
id _98 ; name @r_r ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 5 ; source_utt arctic_b0233 ; source_ph1 “[Val item]” ; source_end 3.056 ; target_cost 0.291667 ; join_cost 0 ; end 0.892875 ; num_frames 7 ;
id _99 ; name r_t ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 2 ; source_utt arctic_a0299 ; source_ph1 “[Val item]” ; source_end 3.378 ; target_cost 0.291667 ; join_cost 312.956 ; end 1.00556 ; num_frames 11 ;
id _100 ; name t_i ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 8 ; source_utt arctic_a0358 ; source_ph1 “[Val item]” ; source_end 4.28 ; target_cost 0.270833 ; join_cost 175.823 ; end 1.10281 ; num_frames 11 ;
id _101 ; name i_k ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 3 ; source_utt arctic_a0113 ; source_ph1 “[Val item]” ; source_end 1.882 ; target_cost 0.375 ; join_cost 206.539 ; end 1.16381 ; num_frames 9 ;
id _102 ; name k_y ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 6 ; source_utt arctic_a0059 ; source_ph1 “[Val item]” ; source_end 1.416 ; target_cost 0.0833333 ; join_cost 371.751 ; end 1.22725 ; num_frames 7 ;
id _103 ; name y_@ ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 0 ; source_utt arctic_a0059 ; source_ph1 “[Val item]” ; source_end 1.426 ; target_cost 0 ; join_cost 0 ; end 1.23375 ; num_frames 1 ;
id _104 ; name @_lw ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 1 ; source_utt arctic_a0059 ; source_ph1 “[Val item]” ; source_end 1.432 ; target_cost 0.0625 ; join_cost 0 ; end 1.27219 ; num_frames 6 ;
id _105 ; name lw_@r ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 9 ; source_utt arctic_b0201 ; source_ph1 “[Val item]” ; source_end 3.26 ; target_cost 0 ; join_cost 210.443 ; end 1.33881 ; num_frames 10 ;
id _106 ; name @r_r ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 0 ; source_utt arctic_b0201 ; source_ph1 “[Val item]” ; source_end 3.266 ; target_cost 0.0625 ; join_cost 0 ; end 1.38719 ; num_frames 7 ;
id _107 ; name r_k ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 3 ; source_utt arctic_a0197 ; source_ph1 “[Val item]” ; source_end 2.784 ; target_cost 0.1875 ; join_cost 574.409 ; end 1.48956 ; num_frames 10 ;
id _108 ; name k_ei ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 10 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 2.322 ; target_cost 0 ; join_cost 170.495 ; end 1.612 ; num_frames 14 ;
id _109 ; name ei_s ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 11 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 2.436 ; target_cost 0 ; join_cost 0 ; end 1.77081 ; num_frames 18 ;
id _110 ; name s_# ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 6 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 2.608 ; target_cost 0.0625 ; join_cost 0 ; end 1.86469 ; num_frames 7 ;
id _111 ; name #_B_150 ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 1 ; source_utt nina_x1_001 ; source_ph1 “[Val item]” ; source_end 1.39 ; target_cost 0.291667 ; join_cost 392.459 ; end 1.94481 ; num_frames 8 ;
id _112 ; name B_150_# ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 7 ; source_utt nina_x1_001 ; source_ph1 “[Val item]” ; source_end 1.54 ; target_cost 0.270833 ; join_cost 0 ; end 2.02488 ; num_frames 8 ;
id _113 ; name #_t ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 0 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 2.612 ; target_cost 0.0833333 ; join_cost 392.495 ; end 2.06181 ; num_frames 3 ;
id _114 ; name t_aa ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 8 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 2.714 ; target_cost 0.3125 ; join_cost 0 ; end 2.249 ; num_frames 20 ;
id _115 ; name aa_m ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 11 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 2.95 ; target_cost 0.3125 ; join_cost 0 ; end 2.41513 ; num_frames 16 ;
id _116 ; name m_# ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 5 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 3.038 ; target_cost 0.375 ; join_cost 0 ; end 2.52394 ; num_frames 12 ;
id _117 ; name #_B_150 ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 1 ; source_utt nina_x1_001 ; source_ph1 “[Val item]” ; source_end 1.39 ; target_cost 0.291667 ; join_cost 336.479 ; end 2.60406 ; num_frames 8 ;
id _118 ; name B_150_# ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 7 ; source_utt nina_x1_001 ; source_ph1 “[Val item]” ; source_end 1.54 ; target_cost 0.270833 ; join_cost 0 ; end 2.68413 ; num_frames 8 ;
id _119 ; name #_@ ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 8 ; source_utt arctic_a0187 ; source_ph1 “[Val item]” ; source_end 0.912 ; target_cost 0.145833 ; join_cost 336.796 ; end 2.80275 ; num_frames 13 ;
id _120 ; name @_p ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 2 ; source_utt arctic_a0113 ; source_ph1 “[Val item]” ; source_end 1.212 ; target_cost 0.25 ; join_cost 384.506 ; end 2.91031 ; num_frames 12 ;
id _121 ; name p_aa ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 11 ; source_utt arctic_b0057 ; source_ph1 “[Val item]” ; source_end 1.662 ; target_cost 0.375 ; join_cost 217.368 ; end 3.06812 ; num_frames 17 ;
id _122 ; name aa_lw ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 7 ; source_utt arctic_a0203 ; source_ph1 “[Val item]” ; source_end 1.41 ; target_cost 0.145833 ; join_cost 293.85 ; end 3.19712 ; num_frames 18 ;
id _123 ; name lw_@ ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 10 ; source_utt arctic_b0382 ; source_ph1 “[Val item]” ; source_end 3.458 ; target_cost 0.1875 ; join_cost 240.775 ; end 3.26556 ; num_frames 11 ;
id _124 ; name @_jh ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 4 ; source_utt arctic_a0427 ; source_ph1 “[Val item]” ; source_end 2.744 ; target_cost 0.1875 ; join_cost 590.098 ; end 3.40162 ; num_frames 14 ;
id _125 ; name jh_ai ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 6 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 3.702 ; target_cost 0 ; join_cost 265.882 ; end 3.50825 ; num_frames 10 ;
id _126 ; name ai_z ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 10 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 3.858 ; target_cost 0 ; join_cost 0 ; end 3.66731 ; num_frames 14 ;
id _127 ; name z_d ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 3 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 3.944 ; target_cost 0 ; join_cost 0 ; end 3.71181 ; num_frames 4 ;
id _128 ; name d_hw ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 5 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 3.996 ; target_cost 0 ; join_cost 0 ; end 3.82456 ; num_frames 11 ;
id _129 ; name hw_i ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 6 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 4.124 ; target_cost 0 ; join_cost 0 ; end 3.909 ; num_frames 8 ;
id _130 ; name i_t ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 3 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 4.17 ; target_cost 0 ; join_cost 0 ; end 3.98606 ; num_frames 9 ;
id _131 ; name t_m ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 1 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 4.242 ; target_cost 0 ; join_cost 0 ; end 4.0315 ; num_frames 4 ;
id _132 ; name m_or ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 3 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 4.294 ; target_cost 0 ; join_cost 0 ; end 4.1045 ; num_frames 7 ;
id _133 ; name or_r ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 3 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 4.386 ; target_cost 0 ; join_cost 0 ; end 4.22169 ; num_frames 10 ;
id _134 ; name r_# ; ph1 “[Val item]” ; sig “[Val wave]” ; coefs “[Val track]” ; middle_frame 8 ; source_utt arctic_a0002 ; source_ph1 “[Val item]” ; source_end 4.532 ; target_cost 0 ; join_cost 0 ; end 4.31669 ; num_frames 10 ;Attachments:
You must be logged in to view attached files. -
March 1, 2016 at 11:49 #2666
I’ve realised there is indeed a run-time interface to all of the various join and target cost weights and beam widths, etc. I had originally thought that these were deprecated and values were compiled in to the code, but I was wrong.
See the full list of functions – look for those that start “du_” (which means “diphone unit”)
This should be simpler than what you’re doing above.
-
-
AuthorPosts
- You must be logged in to reply to this topic.