Forum Replies Created

Viewing 13 posts - 1 through 13 (of 13 total)

Author

Posts
October 28, 2020 at 19:14 in reply to: The term “phone” #12779
Ross C
Student
Thank you, I’ve got two questions leading on from the answer above.

1) Is it right to think of there being an infinite number of phones for each phoneme (both in the articulatory and acoustic realms) since two utterances of the same phone will never be precisely alike? Is there value in using three levels instead:
– phonemes
…which each map onto…
– phones we use for our analysis [like dental versus alveolar t]
…which each map onto…
– an infinite range of physical phones, with tiny physical differences which are of no use to the analysis

2) At what point does the signal become auditory in the sense mentioned above? For example, is it when it enters the cochlea, enters the brain, or something else?
October 22, 2020 at 11:53 in reply to: Weighted sum of the two entropies #12655
Ross C
Student
It’s solved now. I hadn’t noticed the formula spread beyond the visible end of the line (i.e. ” + sum(right_counts)) “). I should have counted my brackets.

The rest of this post is the original problem:

I want to achieve this for my second level (having asked one question): “Now we need to calculate the entropy of each of those two distributions, and compute the weighted sum. This will give the total entropy of the partitioned data.”

The above instruction is in tts-m4-2-decision-tree-pencil-and-paper (search for “weighted”)

I think I should use the following calculation: “total_entropy = ( sum(left_counts)*entropy(left_counts) + sum(right_counts)*entropy(right_counts) ) / (sum(left_counts)”

The above instruction is in tts-m4-1-entropy (search for “weighted”).
October 13, 2020 at 17:15 in reply to: Festival config file #12422
Ross C
Student
(I am using the Virtual Machine, not the Remote Desktop. My mistake.)

I followed the steps from the thread linked above, i.e. added the following two lines to the bottom of config.scm:
(Parameter.set ‘Audio_Method ‘Audio_Command)
(Parameter.set ‘Audio_Command “play -t raw -r 16000 -b 16 -c 1 -e signed-integer $FILE”)

and the problem is fixed. Thanks!
October 13, 2020 at 14:27 in reply to: Festival config file #12417
Ross C
Student
Having opened Festival with the config file, the voice doesn’t sound right.

It seems to have changed voice and could well be a Scottish male, but it’s as though I only hear a few random fragments of any utterance I play. It sounds almost as brief as a click, but I can tell it’s speech. I say random because on repeat attempts, it changes. Sometimes it’s silent.

Earlier when I’d tried the basic voice, I could hear the complete utterance as expected. However, I’ve just tried this:
(quit)
festival
(SayText “hello world”)
which I guess would use the original voice again, but it sounds the same as I described in 2nd paragraph above.

Attachments:
You must be logged in to view attached files.
October 13, 2020 at 14:19 in reply to: Festival config file #12416
Ross C
Student
I’d missed that step, sorry again – it worked fine once I was using the VPN.
October 12, 2020 at 21:17 in reply to: Festival config file #12362
Ross C
Student
I hadn’t, sorry, I was confused by having two sets of instructions.

I’ve now had a go at rsync and got this error (attached screenshot) – saying things like “connection refused” “rsync error”

Attachments:
You must be logged in to view attached files.
October 6, 2020 at 14:45 in reply to: Correction #12240
Ross C
Student
In sp-m2-2-fir-filters,

“all the input values get scaled up, with the the middle of the filter window get biggest relative increase.”

What should it say?
October 5, 2020 at 18:51 in reply to: energy at every multiple #12233
Ross C
Student
I was starting from https://speech.zone/courses/speech-processing/module-2-basics-speech-production/videos/impulse-train/ at 1m00s (i.e. it is in time domain)

After some Slack debate, I think we got to this answer:

The impulse train is a composite wave, which means it’s made up of a bunch of different sine/cosine waves. The way this wave would occur is if you have a cosine wave at every multiple of the fundamental frequency, at the same amplitude. When you add them together, you’ll get this graph on your image. If you do a Fourier transform (decomposing the complex wave back to a series of simple cosine waves) of this graph, you’ll get the frequency domain graph where indeed it’ll show that there is a frequency spike at every integer multiple of F0, which is what made up the impulse train in the first place.

Thanks for the tool!
October 3, 2020 at 18:53 in reply to: .aac files #12184
Ross C
Student
Seems to work fine without the quotes, is that bad in some way?

for i in *.aac; do ffmpeg -i $i ./test/${i%.aac}.wav; done
October 3, 2020 at 17:43 in reply to: .aac files #12183
Ross C
Student
Perfect.
To do it to several files at once, I’ve found this:

for i in *.aac; do ffmpeg -i “$i” “${i%.*}.wav”; done

What’s going on in {i%.*} ? Particularly the % sign.
Edit: Got it, % seems to mean “delete the following suffix”.
October 2, 2020 at 18:36 in reply to: Ladefoged – Chapter 4 #12170
Ross C
Student
My copy (edition 2) seems to have a different Fig4.13 from the attachment above (#9948). It has 4 cycles per 0.02s on the top graph. But the text says it is 100Hz (top p54) and shows ‘a pair of cycles’ (bottom p52). It is a mistake in the book, right?

And I can only see 5 peaks, not 6 as stated at top of p54.
September 27, 2020 at 00:56 in reply to: Correction #12014
Ross C
Student
In sp-m1-3-sampling-sinusoids
section 3.1

”
For a given sampling rate 𝑓𝑠

(seconds/sample) we can work out the time between each sample as:

𝑡𝑠=1𝑓𝑠

The units of 𝑡𝑠
is seconds/sample.
“

I think fs should be “samples/second” while ts is “seconds/sample”?
September 22, 2020 at 11:36 in reply to: Took part in trial #11940
Ross C
Student
Windows 10
VMWare Workstation 15 Player – installed Aug 26
Author

Posts

Viewing 13 posts - 1 through 13 (of 13 total)

Ross C

Forum Replies Created

Attachments:

Attachments:

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis