Page 55

Forum Replies Created

Viewing 15 posts - 811 through 825 (of 1,084 total)

← 1 2 3 … 54 55 56 … 71 72 73 →

Author

Posts
June 6, 2016 at 15:14 in reply to: Working on a remote machine #3232
Simon
Professor
Using screen to create a persistent remote session

Now let’s imagine that we want to run a program on the remote machine that takes a long time. It’s risky to do this directly on the command line, because if the ssh connection fails, then the program will terminate.

One option is to use nohup (find information on the web) but I strongly recommend using an different (and strangely underused) technique: screen.

screen gives you a persistent session on the remote machine. You can disconnect and reconnect to it whenever you like, and leave programs running in it. They will continue running after you disconnect.

To make screen easy to use, you need to set up a configuration file first. Log in to the remote machine and create a file called .screenrc in your home directory there. Note that this filename starts with a period. Here is an example of what to put in this file – the first line has a tricky character sequence – two backquotes inside double quotes:

escape "``"

and after than line, put this:
```
# define a bigger scrollback, default is 100 lines
defscrollback 1024

hardstatus on
hardstatus alwayslastline
hardstatus string "%{.bW}%-w%{.rW}%n %t%{-}%+w %=%{..G} %H %{..Y} %d/%m %C%a " 

# some tabs, with titles - change this to whatever you like
screen -t script -h 1000 1
screen -t log -h 1000 2
screen -t bash -h 1000 3
```
Once you’ve created that file (I’ve also attached an example – download and rename it), log out of the remote machine.

Now connect to the remote machine like this, which uses ssh to start screen:
```
$ ssh -t kairos.inf.ed.ac.uk /usr/bin/screen -D -R
```
you can navigate between the tabs using the key sequence ` (backquote) then either p or n. You have three separate shells running in this example.

If you actually need to type a backquote character, just press ` twice.

Try disconnecting – either just kill the Terminal that screen is running in, or use the key sequence ` (backquote) then d (for ‘detach’). To re-connect, just use the ssh command above. Your screen will come back just as you left it – magic!

Attachments:
You must be logged in to view attached files.
June 6, 2016 at 14:44 in reply to: Synthesising directly from a phone sequence rather than text #3230
Simon
Professor
SayPhones is probably only going to work for a diphone voice, not a Multisyn unit selection voice. Try loading a diphone voice and see if that works. You are going to get monotonic F0 though, I think.
June 6, 2016 at 13:36 in reply to: Running work in the background – Dice machines #3229
Simon
Professor
Two things to do

1. ask for more quota (http://www.inf.ed.ac.uk/systems/support/form/ – mention my name)

2. make a directory in /disk/scratch on the local machine you are working on – this is NOT backed up, so should just be used for temporary working space
June 6, 2016 at 12:29 in reply to: Malloc errors with run_lstm.py #3227
Simon
Professor
I’ve updated …/dnn_tts/configuration/configuration.py in the centrally installed version.
June 5, 2016 at 10:41 in reply to: Synthesising directly from a phone sequence rather than text #3224
Simon
Professor
I’m not sure of the solution to this. Let’s talk in person – is Festival the best framework for you, or should we consider a DNN system?
June 5, 2016 at 10:39 in reply to: letter to sound alignment – question 2 #3223
Simon
Professor
You don’t need utterance structures for the very simple case that you are trying at this point (treat letters as phonemes, and use no other linguistic information). To build a voice, you simply need to figure out how to create the input features for training the DNN. You need to use the Prepare the input labels steps of the DNN voice building exercise as your starting point, but replace some steps with your own scripts.

For example, you do not need the step “Convert utterance structures to full context labels” – you need to create these full context labels using your own script (I suggest starting with a “full context” of triphones or quinphones).

The “Convert label files to numerical values” will be essentially the same, but you’ll need to modify the questions so that they correctly query your labels.

It’s well worth doing all of this with your own scripts (they are quite simple) because this will give you a deeper understanding of all the steps involved. Then, you could switch to the Ossian framework, which will automate some of this for you.
June 5, 2016 at 10:33 in reply to: letter to sound alignment #3222
Simon
Professor
Yes – you are on the right lines – just assume that each letter is a phoneme.
June 5, 2016 at 10:31 in reply to: Malloc errors with run_lstm.py #3221
Simon
Professor
A malloc (“memory allocation”) error of “can’t allocate region” suggests that you are running out of memory (RAM). Try reducing the minibatch size.

In sequence training, the minibatches need to be constructed from entire utterances, rather than randomised frames. So, the minibatch size will vary slightly, and not be constant. This may be why you only get this error seemingly randomly.
May 31, 2016 at 16:56 in reply to: Problem running modified run_lstm.py locally #3205
Simon
Professor
You can take a copy of bandmat from /Volumes/Network/courses/ss/dnn/dnn_tts/bandmat, or add that to your PYTHONPATH
May 31, 2016 at 13:52 in reply to: Problem running modified run_lstm.py locally #3201
Simon
Professor
Currently libxml isn’t working on the lab machines. But it’s not needed – comment out all lxml (or modules from lxml) imports. These will be in
```
frontend/label_composer.py
frontend/label_normalisation.py
```
Or, if you want to be more future-proof (you might need the libxml functions if you want to integrate with Ossian), wrap the imports in try...except such as
```
try:
    from lxml import etree
except:
    print "Failed to import etree from lxml"
```
or
```
try:
    import lxml
    from lxml import etree
    from lxml.etree import * 
    MODULE_PARSER = etree.XMLParser()
except:
    print "Failed to import lxml"
```
May 10, 2016 at 17:24 in reply to: building the mlf from the front end #3179
Simon
Professor
Stripping problematic punctuation from utts.data should be OK in your first build of this system. Come back and solve this problem later.
April 27, 2016 at 12:39 in reply to: Python script to run different architectures #3170
Simon
Professor
Also, I think that run_dnn.py might be hardwired to use only tanh layers regardless of what the configuration file specifies.
April 26, 2016 at 16:01 in reply to: Decision Tree Clustering #3168
Simon
Professor
Your reasoning behind why we need to cluster (also called “tie”) models is correct, yet.

The nodes in the tree each contain a question about a phonetic feature (e.g., “is the previous phone nasal?”). The tree is simply a CART. The phonetic features are the predictors. The predictee is the current model state’s parameters (mean and variance of its Gaussian).

The tree is learned in very much the same way as a classification or regression tree.

Your question about how this eventually affects the generated waveform can be restated in two parts

1. how does this affect the models’ parameters?

2. how do model parameters affect the waveform that they generate?

The answer to 1. you have already figured out: the models share parameters, that’s all. We don’t need to average the group of models (actually, model states) that end up at a leaf – we simply have only one shared (= tied) state there and it is trained on all the corresponding data. So, If you like, you might think instead of the tree finding all the suitable data that this shared state should be trained on, pooled across a group of sufficiently-similar contexts.

The answer to 2. is via the usual generation process of statistical parametric speech synthesis: the models generate trajectories of vocoder parameters, and those are then vocoded into a waveform.
April 23, 2016 at 11:33 in reply to: Silence Removal when Normalising Input Features #3165
Simon
Professor
To be more precise: most frames of all regions labelled as silence, are removed.

It improves training (as found empirically) because otherwise the training data is dominated by silence frames and the network will optimise for generating silence in preference to speech sounds (it’s very easy to minimise the error on silence, and that contributes too much to total error if there are a lot of silence frames).

To prevent the truncation of phrase-final speech sounds, the correct solution is to improve the forced alignment.
April 23, 2016 at 11:30 in reply to: Python script to run different architectures #3164
Simon
Professor
I believe you can (and should) now switch to using run_lstm.py in all cases, both LSTM and purely feed-forward architectures.
Author

Posts

Viewing 15 posts - 811 through 825 (of 1,084 total)

← 1 2 3 … 54 55 56 … 71 72 73 →

Simon

Forum Replies Created

Attachments:

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis