Page 6

Forum Replies Created

Viewing 15 posts - 76 through 90 (of 1,104 total)

← 1 2 3 … 5 6 7 … 72 73 74 →

Author

Posts
April 7, 2023 at 15:27 in reply to: Disk space #16810
Simon King
Professor
How much disk space is available?

$ df -h .

If the Use% column is showing close to 100%, that means the disk is nearly full.

If you are using a disk that is shared with other people (as is the case in the PPLS lab), then the amount of available space is the total for everyone sharing that disk (it doesn’t belong to you individually). The number reported by du will fluctuate up and down, as other users create or delete files.

How much disk am I using? Change to your home directory, then measure the size of all the items there:

$ cd
$ du -sh *

That may take a minute or two to run and may produce a lot of output. It will be more convenient to sort the output by size:

$ du -sh * | sort -h

Now you know which directory is the largest, you could cd into it, and repeat the above, drilling down to find what is using the most space.

Or, get clever and find all directories at once and measure their size, reporting this in a sorted list (this will take some time, so be patient):

$ find . -type d -exec du -sh {} \; | sort -h
March 30, 2023 at 17:59 in reply to: When are two layers NOT fully connected? #16801
Simon King
Professor
One example would be a convolutional layer. This has a very specific pattern of connections that express the operation of convolution between the activations output by a layer and a “kernel” (which is expressed by weight sharing).

We might use a convolutional layer when we wish to apply the same operation to all parts of some representation (potentially of varying size). They are very commonly used in image processing, but have their uses in speech processing too. For example, we might use them to create a learnable feature extractor for waveform-input ASR.
February 10, 2023 at 13:13 in reply to: Downsampling with sox #16774
Simon King
Professor
bash$ sox recordings/arctic_a0001.wav -b16 -r 16k wav/arctic_a0001.wav remix 1

works as expected for me on your file.

Use soxi to inspect your output file: does it have the expected sampling rate, bit depth and duration?

One explanation for the large size of your output file could be that you accidentally combined multiple files, which would happen if you did this:

bash$ sox recordings/*.wav -b16 -r 16k wav/arctic_a0001.wav remix 1
February 9, 2023 at 18:00 in reply to: Downsampling with sox #16771
Simon King
Professor
Run soxi recordings/arctic_a0001.wav to see information about that file format, and post the output here. If you wish, attach one file, such as recordings/arctic_a0001.wav to your post so I can investigate.
February 8, 2023 at 13:13 in reply to: Downsampling with sox #16769
Simon King
Professor
-r indicates the sampling rate of the output file. sox will automatically determine the sampling rate of the input.
February 4, 2023 at 09:02 in reply to: Playback on Mac SpeechRecorder not working #16763
Simon King
Professor
Here is a screenshot for another example aggregate device, this time combining an external USB microphone with the built-in headphone port of a laptop.

Attachments:
You must be logged in to view attached files.
February 3, 2023 at 17:33 in reply to: Playback on Mac SpeechRecorder not working #16761
Simon King
Professor
The problem is that, on newer Macs, the microphone and the headphones/speakers appear as separate audio devices. So there is no single device with both inputs and outputs.

Here’s a possible solution:

try creating an aggregate device, using Audio MIDI Setup (which you’ll find in /Applications/Utilities). Press the small “+” in the lower left corner to create a new device. The attached screenshot shows you what to do.

Then, select this as your device in SpeechRecorder.

Warning! If you use the built-in microphone of your laptop at the same time as the built-in speakers, you will get audio feedback! Use headphones (being careful about the volume in case of feedback), or mute the speakers whilst recording / turn the microphone volume to zero for playback.

Attachments:
You must be logged in to view attached files.
December 10, 2022 at 10:54 in reply to: confusion about lexically contrastive #16655
Simon King
Professor
Correct! Can you explain how they contribute to modelling duration?
December 7, 2022 at 19:55 in reply to: Different word count #16649
Simon King
Professor
Please state the word count on the first page of your assignment, and also include it in the name of the submission (as the instructions state). Using the word count from Overleaf is perfectly acceptable. If that word count is within the limit, you will not be penalised.
December 7, 2022 at 08:17 in reply to: confusion about lexically contrastive #16635
Simon King
Professor
You are right that HMMs have “a set of probability distributions” but there are two different types of probability distribution in an HMM. One type is the emission probability density functions: the multivariate Gaussian in each emitting state that generates observations (MFCCs).

What is the other type?
December 6, 2022 at 18:04 in reply to: MFCC #16628
Simon King
Professor
In the video Cepstral Analysis, Mel-Filterbanks, MFCCs we first had a recap of filterbank features. These would be great features, except that they exhibit co-variance.

We then reminded ourselves of how the source and filter combine in the time domain using convolution, or in the frequency domain using multiplication. We made them additive by taking the log and devised a way to deconvolve the source and filter. This video only explained the classical cepstrum – there was no Mel scale or filterbank.

Finally, in the video From MFCCs, towards a generative model using HMMs we developed MFCCs, by using our filterbank features as a starting point, then applying the same crucial steps as we would for using the cepstrum to obtain the filter without the source: take the log (make source and filter additive), series expansion (separate source and filter along the quefrency axis), truncate (discard the source).
December 1, 2022 at 08:30 in reply to: Duplicate questions in quiz? #16577
Simon King
Professor
If, after reading the questions again carefully, you are certain they are duplicates, please email the screenshots to me.
November 28, 2022 at 17:24 in reply to: confusion about lexically contrastive #16562
Simon King
Professor
First, remember that pitch is the perceptual correlate of F0. We can only measure F0 from a speech waveform. Pitch only exists in the mind of the listener.

When we say F0 is not lexically contrastive in ASR, we mean that it is not useful for telling two words apart. The output of ASR is the text, so we do not need to distinguish “preSENT” from “PREsent”, for example, we simply need to output the written form “present”.

Duration is lexically contrastive because there are pairs of words in the language that differ in their vowel length.

Hidden Markov Models do model duration. Can you explain how they do that?
November 28, 2022 at 17:19 in reply to: Recognise Test Data for multiple users #16560
Simon King
Professor
It’s preferred to copy-paste text into your forum post rather than use screenshots (which are not searchable, and which we cannot quote in our replies).

You are trying to use an MFCC file which does not exist – that’s what the HTK error “Cannot open Parm File” means.

Take a look in /Volumes/Network/courses/sp/data/mfcc/ to see how to data are organised into train and test partitions, and what the filenames are.

The train and test data are arranged differently:

Because we have labels for the train data, we can keep all of the training examples from each speaker in a single MFCC file, where the corresponding label file specifies not just the labels (e.g., “one” or “seven”) but also the start and end times.

The test data are cut into individual digits, ready to be recognised.
November 10, 2022 at 21:39 in reply to: Clarification on ‘What in the speech signal differentiates different phones?’ #16489
Simon King
Professor
Only you can answer that: you need to give enough background for your reader to understand the points you will make later in the report. For example, if your explanation for an audible concatenation refers to source and filter properties, you should have provided enough background about that for your explanation to be understood by your reader.
Author

Posts

Viewing 15 posts - 76 through 90 (of 1,104 total)

← 1 2 3 … 5 6 7 … 72 73 74 →

Simon King

Forum Replies Created

Attachments:

Attachments:

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis