› Forums › Automatic speech recognition › Gaussian probability density functions (pdfs) › covariance parameters
- This topic has 4 replies, 3 voices, and was last updated 9 years ago by Simon.
-
AuthorPosts
-
-
November 10, 2015 at 12:24 #599
Why do we need two covariance parameters in the variance matrix for two dimensions? What are these covariance parameters? Should there only be one covariance value for every pair of dimensions?
-
November 10, 2015 at 19:23 #600
Well spotted !
The covariance matrix is symmetrical. Along the diagonal are the variance values (the “covariance between each dimension and itself” if you like). Off the diagonal are the covariance values between pairs of dimensions.
Since the covariance between a and b is the same as the covariance between b and a, the upper triangle of this matrix (above the diagonal) is the same as the lower triangle (below the diagonal).
Let’s define covariance formally to understand why this is:
Here are two variables – the elements of a two-dimensional feature vector: \([X_1 , X_2]\)
First, let’s write down the variance of \(X_1\), which is defined simply as the average squared distance from the mean – in other words, to estimate it from data, we simply compute the squared difference between every data point and the mean, and take the average of that.
[latex]
\sigma^2_1 = var(X_1) = E[ (X_1 – \mu_1)(X_1 – \mu_1) ]
[/latex]The “E[…]” notation is just a fancy formal way of saying “the average value” and the E stands for “expected value” or “expectation”.
Here’s the covariance between \(X_1\) and \(X_2\)
[latex]
cov(X_1,X_2) = E[ (X_1 – \mu_1)(X_2 – \mu_2) ]
[/latex]Now, for yourself, write down the covariance between \(X_2\) and \(X_1\). You will find that it’s equal to the value above.
[showhide more_text="Reveal the answer" less_text="Hide the answer" hidden="yes"]
Here’s the covariance between \(X_2\) and \(X_1\)
[latex]
cov(X_2,X_1) = E[ (X_2 – \mu_2)(X_1 – \mu_1) ]
[/latex]and because multiplication is commutative we can write
[latex]
(X_1 – \mu_1)(X_2 – \mu_2) = (X_2 – \mu_2)(X_1 – \mu_1)
[/latex]and therefore
[latex]
cov(X_1,X_2) = cov(X_2,X_1) \\
[/latex]Let’s move up to three dimensions. Noting that \(cov(X_1,X_2)\) can be written as \(\Sigma_{12}\), the full covariance matrix looks like this:
[latex]
\Sigma = \left( \begin{array}{ccc}
\Sigma_{11} & \Sigma_{12} & \Sigma_{13} \\
\Sigma_{21} & \Sigma_{22} & \Sigma_{23} \\
\Sigma_{31} & \Sigma_{32} &\Sigma_{33} \end{array} \right)
[/latex]But we normally write \(\sigma_1^2\) rather than \(\Sigma_{11}\), and since \(\Sigma_{12} = \Sigma_{21}\) we can write this:
[latex]
\Sigma = \left( \begin{array}{ccc}
\sigma_1^2 & \Sigma_{12} & \Sigma_{13} \\
\Sigma_{12} & \sigma_2^2 & \Sigma_{23} \\
\Sigma_{13} & \Sigma_{23} & \sigma_3^2 \end{array} \right)
[/latex]See how the matrix is symmetrical. It has just over half as many parameters as you might have thought at first. But, the number of parameters in a covariance matrix is still proportional to the square of the dimension of the feature vector. That’s one reason we might try to make feature vectors as low-dimensional as possible before modelling them with a Gaussian.
Confused by the notation?
The subscripts are always indexing the dimension of the feature vector. The superscript “2” in \(\sigma^2\) just means “squared”: \(\sigma^2 = \sigma \times \sigma\)
The notation of upper and lower case sigma is also potentially confusing, because \(\Sigma\) is a covariance matrix, \(\sigma\) is standard deviation, and \(\sigma^2\) is variance. We do not write \(\Sigma^2\) for the covariance matrix!
[/showhide]
PS – let me know if the maths doesn’t render in your browser.
-
November 13, 2015 at 09:37 #612
This is now clear to me. Thank you so much. I can see the maths in IE but not in Chrome or Firefox.
-
December 1, 2015 at 13:47 #983
Then why do we say that the computational cost of a full covariance matrix is [latex]D^{2}[/latex]? Shouldn’t it be just [latex]\frac{N!}{2(N-2)!}[/latex] ? Or do we simply not differentiate between the two costs?
-
December 1, 2015 at 15:03 #990
Let’s separate out a few different aspects of this.
The storage space required for a covariance matrix is [latex]O(D^2)[/latex], where D is the dimensionality of the observation vector.
The computational cost can worked out by looking at the vector-matrix-vector multiplication in the formula for a multivariate Gaussian – can you work that out?
But the real issue is the large number of model parameters — which is also [latex]O(D^2)[/latex] — that need to be estimated from data.
-
-
AuthorPosts
- You must be logged in to reply to this topic.