Forum Replies Created
-
AuthorPosts
-
Thank you, I’ve got two questions leading on from the answer above.
1) Is it right to think of there being an infinite number of phones for each phoneme (both in the articulatory and acoustic realms) since two utterances of the same phone will never be precisely alike? Is there value in using three levels instead:
– phonemes
…which each map onto…
– phones we use for our analysis [like dental versus alveolar t]
…which each map onto…
– an infinite range of physical phones, with tiny physical differences which are of no use to the analysis2) At what point does the signal become auditory in the sense mentioned above? For example, is it when it enters the cochlea, enters the brain, or something else?
It’s solved now. I hadn’t noticed the formula spread beyond the visible end of the line (i.e. ” + sum(right_counts)) “). I should have counted my brackets.
The rest of this post is the original problem:
I want to achieve this for my second level (having asked one question): “Now we need to calculate the entropy of each of those two distributions, and compute the weighted sum. This will give the total entropy of the partitioned data.”
The above instruction is in tts-m4-2-decision-tree-pencil-and-paper (search for “weighted”)
I think I should use the following calculation: “total_entropy = ( sum(left_counts)*entropy(left_counts) + sum(right_counts)*entropy(right_counts) ) / (sum(left_counts)”
The above instruction is in tts-m4-1-entropy (search for “weighted”).
(I am using the Virtual Machine, not the Remote Desktop. My mistake.)
I followed the steps from the thread linked above, i.e. added the following two lines to the bottom of config.scm:
(Parameter.set ‘Audio_Method ‘Audio_Command)
(Parameter.set ‘Audio_Command “play -t raw -r 16000 -b 16 -c 1 -e signed-integer $FILE”)and the problem is fixed. Thanks!
Having opened Festival with the config file, the voice doesn’t sound right.
It seems to have changed voice and could well be a Scottish male, but it’s as though I only hear a few random fragments of any utterance I play. It sounds almost as brief as a click, but I can tell it’s speech. I say random because on repeat attempts, it changes. Sometimes it’s silent.
Earlier when I’d tried the basic voice, I could hear the complete utterance as expected. However, I’ve just tried this:
(quit)
festival
(SayText “hello world”)
which I guess would use the original voice again, but it sounds the same as I described in 2nd paragraph above.Attachments:
You must be logged in to view attached files.I’d missed that step, sorry again – it worked fine once I was using the VPN.
I hadn’t, sorry, I was confused by having two sets of instructions.
I’ve now had a go at rsync and got this error (attached screenshot) – saying things like “connection refused” “rsync error”
Attachments:
You must be logged in to view attached files.In sp-m2-2-fir-filters,
“all the input values get scaled up, with the the middle of the filter window get biggest relative increase.”
What should it say?
I was starting from https://speech.zone/courses/speech-processing/module-2-basics-speech-production/videos/impulse-train/ at 1m00s (i.e. it is in time domain)
After some Slack debate, I think we got to this answer:
The impulse train is a composite wave, which means it’s made up of a bunch of different sine/cosine waves. The way this wave would occur is if you have a cosine wave at every multiple of the fundamental frequency, at the same amplitude. When you add them together, you’ll get this graph on your image. If you do a Fourier transform (decomposing the complex wave back to a series of simple cosine waves) of this graph, you’ll get the frequency domain graph where indeed it’ll show that there is a frequency spike at every integer multiple of F0, which is what made up the impulse train in the first place.
Thanks for the tool!
Seems to work fine without the quotes, is that bad in some way?
for i in *.aac; do ffmpeg -i $i ./test/${i%.aac}.wav; done
Perfect.
To do it to several files at once, I’ve found this:for i in *.aac; do ffmpeg -i “$i” “${i%.*}.wav”; done
What’s going on in {i%.*} ? Particularly the % sign.
Edit: Got it, % seems to mean “delete the following suffix”.My copy (edition 2) seems to have a different Fig4.13 from the attachment above (#9948). It has 4 cycles per 0.02s on the top graph. But the text says it is 100Hz (top p54) and shows ‘a pair of cycles’ (bottom p52). It is a mistake in the book, right?
And I can only see 5 peaks, not 6 as stated at top of p54.
In sp-m1-3-sampling-sinusoids
section 3.1”
For a given sampling rate 𝑓𝑠(seconds/sample) we can work out the time between each sample as:
𝑡𝑠=1𝑓𝑠
The units of 𝑡𝑠
is seconds/sample.
“I think fs should be “samples/second” while ts is “seconds/sample”?
Windows 10
VMWare Workstation 15 Player – installed Aug 26 -
AuthorPosts