Forum Replies Created
-
AuthorPosts
-
I apologise that the Remote Desktop is unreliable for sound playback. Please can all students submit support requests to the IS Helpline about this.
Unfortunately, at the weekend, you cannot access the lab and will have to use the Remote Desktop.
1. refer to the Formatting instructions for what is included and excluded from the word count
2. no, use any reasonable format that you like – keep your reader in mind: what format will be easiest for them read?
3. your goal is to demonstrate your understanding, so you will very likely need to say something brief about what Text Normalisation is, so that you can explain the error and why it occurred; if you propose a possible solution, you will of course need to say something about how Text Normalisation is performed (what are the sub-steps? how is each done? rules? machine learning?).
4. If you insist, yes, but it will be included in the word count, so I cannot see any good reason to include one. An Appendix is for optional material that the reader does not have to look at unless they wish to – so you would be using up word count on something that may not be read…
5. To get a good mark, yes! Refer to the Bibliography section of Report Write-up.
That’s correct – actually
car
is a core Scheme function, which returns the first item from a list.October 31, 2023 at 17:01 in reply to: Module 6 – Clarification for pitch period, impulse response, fundamental period #17064Moving on the Module 6 and the video Pitch period, we are now looking at how to extract the vocal tract’s impulse response from a natural speech waveform.
If the impulse response actually did decay all the way to zero before the next glottal pulse, this would be easy for the reason stated above: one pitch period of the speech waveform would be exactly the impulse response we want.
Unfortunately, in natural speech, things are not that simple: the impulse responses overlap. So all we can do is deal in terms of pitch periods. We extract overlapping frames from the waveform so that we can reconstruct the waveform later using overlap-and-add. Since the analysis frames overlap, they will contain more than one pitch period. A good choice is an analysis frame capturing two pitch periods.
October 31, 2023 at 16:55 in reply to: Module 6 – Clarification for pitch period, impulse response, fundamental period #17063A pitch period is the period in-between two glottal pulses. Its duration is denoted T0 (measured in seconds). The term `pitch period’ is used to refer to both the duration (T0) and to the speech waveform itself.
The term ‘fundamental period’ is another way of saying ‘pitch period’ (and is more technically correct, of course, because ‘fundamental frequency’ is more correct than ‘pitch’ when talking about a speech waveform.
A ‘pitch mark’ is a label we might place on a speech waveform to mark the position of a glottal pulse. Assuming the pitch marks are accurate, then the duration between two consecutive pitch marks is T0, by definition.
If the glottal pulses are sufficiently far apart in time (a large T0), then the impulse response of the vocal tract will decay away to zero before the next glottal pulse. In this case, each pitch period is equal to one impulse response of the vocal tract. This is almost the case in the (synthetic) speech waveform in the video Impulse Response where the waveform has decayed almost to zero before the next period starts.
So, a simple way to understand voiced speech is as a sequence of impulse responses of the vocal tract. This is a useful and helpful simplification for developing our understanding of speech signals. The video Source-filter model also makes this simplifying assumption (and all the speech signals used as examples are synthetic, to make things clearer).
However, in natural speech, the waveform generally does not decay all the way to zero before the next glottal pulse. Therefore, the impulse responses overlap (and we can assume they are simply summed, using our simplified model of the vocal tract).
I’ve been pushing hard for longer access hours, but unfortunately have been refused on the grounds of “Health and Safety”. The instructions I have been given by the PPLS Head of Information Services is that students should use other spaces on campus (which I think includes Informatics rooms in the same building) for group study, and to access the PPLS AT 4.02 lab machines remotely from there.
I’m sorry about this – I have been unable to get a satisfactory explanation of why it’s unsafe to use AT 4.02 at the weekend, whilst is safe to use other rooms.
How much disk space is available?
$ df -h .
If the Use% column is showing close to 100%, that means the disk is nearly full.
If you are using a disk that is shared with other people (as is the case in the PPLS lab), then the amount of available space is the total for everyone sharing that disk (it doesn’t belong to you individually). The number reported by
du
will fluctuate up and down, as other users create or delete files.How much disk am I using? Change to your home directory, then measure the size of all the items there:
$ cd
$ du -sh *
That may take a minute or two to run and may produce a lot of output. It will be more convenient to sort the output by size:
$ du -sh * | sort -h
Now you know which directory is the largest, you could
cd
into it, and repeat the above, drilling down to find what is using the most space.Or, get clever and find all directories at once and measure their size, reporting this in a sorted list (this will take some time, so be patient):
$ find . -type d -exec du -sh {} \; | sort -h
One example would be a convolutional layer. This has a very specific pattern of connections that express the operation of convolution between the activations output by a layer and a “kernel” (which is expressed by weight sharing).
We might use a convolutional layer when we wish to apply the same operation to all parts of some representation (potentially of varying size). They are very commonly used in image processing, but have their uses in speech processing too. For example, we might use them to create a learnable feature extractor for waveform-input ASR.
bash$ sox recordings/arctic_a0001.wav -b16 -r 16k wav/arctic_a0001.wav remix 1
works as expected for me on your file.
Use
soxi
to inspect your output file: does it have the expected sampling rate, bit depth and duration?One explanation for the large size of your output file could be that you accidentally combined multiple files, which would happen if you did this:
bash$ sox recordings/*.wav -b16 -r 16k wav/arctic_a0001.wav remix 1
Run
soxi recordings/arctic_a0001.wav
to see information about that file format, and post the output here. If you wish, attach one file, such asrecordings/arctic_a0001.wav
to your post so I can investigate.-r
indicates the sampling rate of the output file.sox
will automatically determine the sampling rate of the input.Here is a screenshot for another example aggregate device, this time combining an external USB microphone with the built-in headphone port of a laptop.
Attachments:
You must be logged in to view attached files.The problem is that, on newer Macs, the microphone and the headphones/speakers appear as separate audio devices. So there is no single device with both inputs and outputs.
Here’s a possible solution:
try creating an aggregate device, using Audio MIDI Setup (which you’ll find in /Applications/Utilities). Press the small “+” in the lower left corner to create a new device. The attached screenshot shows you what to do.
Then, select this as your device in SpeechRecorder.
Warning! If you use the built-in microphone of your laptop at the same time as the built-in speakers, you will get audio feedback! Use headphones (being careful about the volume in case of feedback), or mute the speakers whilst recording / turn the microphone volume to zero for playback.
Attachments:
You must be logged in to view attached files.Correct! Can you explain how they contribute to modelling duration?
Please state the word count on the first page of your assignment, and also include it in the name of the submission (as the instructions state). Using the word count from Overleaf is perfectly acceptable. If that word count is within the limit, you will not be penalised.
-
AuthorPosts