Forum Replies Created
-
AuthorPosts
-
Is the fact that when having MFCCs, we are able to control the size of the feature vectors the same as “dimensionality reduction” – and are both aspects advantageous because it speeds up computation if the feature vectors are smaller?
Just to check my understanding: would the FFT dimension be 200 for speech sampled with 16 kHz and an analysis frame of size 25ms?
I’ve worked on the first part a) of this question and uploaded my diagrams here.
Could you please check them for me?I’m especially unsure about a) ii. – whether it is correct to just discard anything from the waveform that exceeds the duration that we want?
Attachments:
You must be logged in to view attached files.Changing the resolution in the VM settings worked for me, thank you!
I tried to turn on Large Text, but nothing changed.
Any other suggestions?Attachments:
You must be logged in to view attached files.I would say it is centred above zero because it has no values with a negative amplitude?
I would say the 0 Hz component is on average centered on zero, just because all the speech signals that we’ve looked at so far were centered on zero in the time domain.
What does then a magnitude > 0 dB mean in the magnitude spectrum?
Or how does the energy at 0 Hz relate to a bias?This means the plot shows me all possible basis functions that could be used to reconstruct the original signal, but only the bins that have a magnitude > 0 are relevant? So DFT will just output 0 dB for the first possible basis function and every other basis function that is not a component of the original signal?
A 0 Hz basis function has no cycles per second, but as it is a basis function it still has to be a sinusoid. So it’s difficult for me to imagine that.
Maybe it is a flat line on the x-axis or it doesn’t exist?
I’m glad to hear that, thank you!
But then in the example/exercise in notebook sp-m2-1-impulse-as-source.ipynb, where there is an impulse train with 64 samples (see impulse-train.png attached) – why is then the lowest frequency in the magnitude spectrum at 1 Hz and not at 0 Hz (see magnitude-spectrum.png)?
The other frequencies that have a positive magnitude make sense to me in this plot, but I can not get the first one.Attachments:
You must be logged in to view attached files.I tried changing the resolution to various options but unfortunately, this didn’t have an impact on the font and app sizes. The window gets smaller only and there is a black border around it (see attached screenshot).
Is there a different way to solve this?
PS: in another course, we are using the Informatics Remote Desktop service and there the solution was to go to Preferences > Mate Tweak > Windows > HiDPI and setting it to HiDPI instead of auto-detect.
But I can’t find this setting on the VM/ this Linux version.Attachments:
You must be logged in to view attached files.I had the same problem that (SayText “phrase”) just gives a blip of sound and the commands
festival> (Parameter.set 'Audio_Method 'Audio_Command) festival> (Parameter.set 'Audio_Command "play -t raw -r 16000 -b 16 -c 1 -e signed-integer $FILE")
fixed this problem, thanks!
However, I have to enter those commands every time I start festival for SayText to work. Should it be like this or is there maybe another workaround?
It’s also super strange for me to read/ to say that we choose a sampling frequency above the Nyquist frequency.
Nyquist frequency is defined in terms of sampling rate, how can you make your sampling rate dependent on the Nyquist frequency when the sampling rate is “prior to” Nyquist frequency as it is used in its definition?October 1, 2020 at 10:54 in reply to: Handbook of Phonetic Sciences – Chapter 20 – Intro to Signal Processing #12158Hi,
in the Section on Sinusoids on page 761, it is explained that “resonant behavior always involves such an exchange between two energy forms”.
So there are two major aspects of resonant behavior – periodic transfer of energy between two forms & exponential decay. In the domain of speech, exponential decay makes sense to me because it is also illustrated in the book in Fig. 20.4.However, I don’t understand which two energy forms are exchanged in the vocal tract as a resonance system?
I just want to better understand the analogy between “swinging on a swing and the corresponding exchange between kinetic and potential energy” and producing speech.
-
AuthorPosts