Forum Replies Created
-
AuthorPosts
-
Hi Shona,
Based on our AI policy, you can use AI based tool to help with coding suggestions as long as you declare this in the report.
However, you should be careful to check that coding suggestions from LLMs as they can be incorrect, so to use this effectively you need to understand whether the generated code is good or not. The same advice applies to looking up answers to questions on websites like Stack Overflow (not all answers are good!).
Also, while you’re not being directly assessed on coding for this assignment, we do want you to learn some scripting skills. The best way to learn these skills in the long term is to try to figure things out yourself first!
cheers,
CatherineHi Sophie,
The AT 4.02 lab is open 8am-8pm Monday to Friday. If you are a PPLS student that’s probably the only lab that you can use in Appleton Tower. Informatics students will have access to other labs, but the computers in those labs won’t have access to the voice you need to use for the assignment.
Sorry that the remote desktop isn’t working! Could you report it to IS helpline with details of what exactly is not working? https://www.ishelpline.ed.ac.uk/forms/
cheers,
CatherineHi Emily,
Sorry I think that’s a reference to a previous version of the assignment! We’ll see about making a new template, though you really just need to make sure you use the section headings as described in the assignment instructions.
cheers,
CatherineHi Patricjia,
When you sample a sine wave that’s higher than the Nyquist frequency (i.e., half the sampling rate = 8000 Hz in the question) you still “record” a sine want it’s just that it’s not going to be the one that was your actual input. When you’re just a bit above the Nyquist frequency the inaccuracy in sampling the underlying input doesn’t cause too much difference. But it has a bigger and bigger impact the higher the frequency gets (past the Nyquist Frequency).
In fact, frequencies are mirrored around the Nyquist frequency. So, with a 16000 Hz sampling rate, a sine wave with frequency 8640 Hz = 8000 + 640 Hz looks like 8000 – 640 = 7360 Hz due to aliasing.
You can see a (different) example of this in the lab materials when we plot the sound sweep with naive downsampling from 22050 Hz to 8000 Hz sampling rate (in the “Sampling and Aliasing” section). The sound generated changes from a continuously rising frequency, to one that goes up to 4000 Hz and then starts going down. The spectrogram shows this with a turning point at 4000 Hz in the attached image. You can use the code blocks in the notebook just below that figure to see what’s happening in sampling terms.
cheers,
CatherineAttachments:
You must be logged in to view attached files.Yes! The relevant documentation is here:
https://www.fon.hum.uva.nl/praat/manual/Intro_3_7__Configuring_the_spectral_slice.htmlIf you look in the default “Advanced Spectrogram Settings…” (in the Spectrogram drop down) you’ll see that the default setting for Window Shape is “Gaussian”. The documentation says that for a window length of 0.005 s, “If the window shape is Gaussian, Praat will extract a part of the sound that runs from 5 milliseconds before the cursor to 5 ms after the cursor. The spectrum will then be based on a “physical” window length of 10 ms, although the “effective” window length is still 5 ms”. This is because the a Gaussian window shape basically reduces the amplitudes at the edges of the window. it’s a tapered window as discussed in this video: https://speech.zone/courses/speech-processing/module-3-digital-speech-signals/videos-2/short-term-analysis/.
So the number of samples in that window actually corresponds to the number of samples in 0.010 s given a sampling rate of 44100 Hz (the sampling rate of the recording). The length of the window in samples is the 44100 * 0.01 = 441 samples. But that still doesn’t quite get you the analysis frequencies you see in Praat for that specific example.
Praat actually does one more thing that changes the sample size (hence real window length), which is that it uses the Fast Fourier Transform. This is an efficient implementation of the Discrete Fourier Transform that is much faster than the original formulation. The catch is that it only works if the number of input samples is a power of 2. So in this case, the window is padded out with zeros to 512 samples (=2^9).
This is an option if you create a spectrum from the objects menu See the setting “fast” here (though I don’t recommend trying to understand the Fourier transform from the rest of that page!): https://www.fon.hum.uva.nl/praat/manual/Sound__To_Spectrum___.html
Though the use of the Fast Fourier Transform appears to be a fixed setting if you generate the spectral slice from the Sound viewer.
So, if you have 512 samples and a sampling rate of 44100, you get DFT analysis frequencies as multiples of 44100/512 = 86.13281 Hz (which is what Eli observed above!).
The moral of this story, is that for software (like Praat) which has a lot of potential options “under the hood”, you may need to check out the documentation to understand what exactly it’s doing!
Hi Xueyan,
Many apologies for this. It appears the sound wasn’t recorded in the first half. I’m not sure why. I’m fairly certain I turned the microphone on but may have done something while setting up.
If you have any questions about the lecture, feel free to ask here or in office hours!
Apologies again,
CatherineHi Ha Anh,
You should be able to see the answers now!
cheers,
CatherineHi Yujia,
Grapheme-to-phoneme is more specific that “sounds” aren’t always the same as phonemes. But in general you can consider them the same thing: in practice both refer to methods to map from written text to pronunciations. You’ll see grapheme-to-phoneme (or g2p) much more often in the recent literature.
cheers,
CatherineHi Claire,
1. {exam number}_{lab report word count} should be the name of the file you submit. You should use another title for the title of the actual report. For assignment 1, something like “Speech Processing Assignment 1” is fine. Though, do note for assignment 2, we’ll be expecting a more informative title.
2. No, you don’t need a table of contents for assignment 1. But do use the section headings noted in the instructions.
cheers,
CatherineHi Iakovi,
At this point it looks like the PPLS AT lab won’t be accessible on the weekends We’re looking in whether we can get access for you guys but can’t guaranteed anything – sorry!
best wishes,
CatherineSorry! I think I still had the settings wrong! Hopefully it should work now?
cheers,
CatherineHi Ha Ahn,
[just copying this from another thread since I think it’s the same problem!]
I think the scp (i.e., file transfer) server is down. I’ll put in a ticket about this, but in the meantime you can rsync from the remote desktop machines instead (they all get you to the same files system!).
So, you’d just need to replace scp1.ppls.ed.ac.uk with something like ppls-atl-1066.ppls.ed.ac.uk in your rsync call.
The list of remote desktop machines is here (you’ll need UoE VPN on to access):
https://resource.ppls.ed.ac.uk/whoson/atlab.phpcheers,
CatherineHi Ha Anh,
Yes, sorry I thought you could already see the results but forgot to select the option! You should be able to see your results per question now.
cheers,
CatherineHi Yujia,
Great question! In general, sound waves don’t have to be symmetric like sinusoids (e.g. sine waves). But what Fourier Analysis tells us is that we can approximate period waves that don’t look at all like sinsuoids, by adding together scaled and shifted sinusoids of different frequencies.
So, yes, if we want to make a spiky and non-symmetric waveform like and impulse train we actually need to use sinusoids of every frequency we have available to us (scaled and shifted to get rid of the negative amplitude bits and also all the curvy bits!).
An important thing to note here is that the impulse train of zeros and ones is really an idealised model of the source. Though we can use it to synthesize sppech sounds, the actual human vocal source is much more complicated. Actual glottal pulses don’t really look like a single spike, and they do produce negative pressure, but they still have a positive bias as they are not symmetric.
cheers,
CatherineHi Ha Anh,
Yes there’s be a practice quizzes for the other online tests. I’m aiming to have the one for test 2 out tomorrow! The one of the ASR part of the should follow not too long after that.
cheers,
Catherine -
AuthorPosts