› Forums › Speech Synthesis › The front end › CART › Worked example 1 – letter-to-sound
- This topic has 1 reply, 2 voices, and was last updated 8 years, 2 months ago by Simon.
-
AuthorPosts
-
-
October 9, 2016 at 11:32 #5269
I am trying to apply the logic from the video Entropy: understanding the equation to the CART worked example, but the math doesn’t quite work out. The shortest way of transmitting information about EH, AA and AO I can think of, is to refer to EH with 0, to AA with 1 and to AO with 01. So on average that’s going to be (3×1+2×1+2×2)/7 = 1.29 bits; but it seems like the equation gives you the entropy resulting from the system EH:0, AA:01, AO:10. I see how AA and AO having the same probability could relate to them being referred to with the same number of bits, but is that a requirement?
-
October 11, 2016 at 08:26 #5377
In the Entropy: understanding the equation I used a few carefully chosen example distributions to help you understand the general formula for entropy.
At 5:30 in the video, you will see that I used this code for sending messages about three values:
Code 1
- green = 0
- blue = 10
- red = 11
but I didn’t go into the precise reason for choosing that code rather than, say
Code 2
- green = 0
- blue = 1
- red = 01
So, let’s clear that detail up now. When we transmit a variable length code, we also have to make it possible for the receiver to know when each item in the message starts and finishes. In other words, for any string of bits, there has to be a single unambiguous decoding of the message.
Consider sending the message “green green blue red”
Using Code 1: 001011
Using Code 2: 00101
At this point, it looks like code 2 is better – it can send the message with fewer bits. But now let’s try to decode them. Using Code 1, the message is unambiguous
001011 = green green blue red , and there are no other possible ways to decode
But using Code 2 we have more than one possible decoding
00101 = green green blue red
00101 – green red blueSo, that code is not allowed!
Your code has the same problem:
- EH = 0
- AA = 1
- AO with 01
because the message “EH AH” is coded as 01, and this cannot be decoded unambiguously (it might mean “EH AH” or it might mean “AO”).
-
-
AuthorPosts
- You must be logged in to reply to this topic.