Statistics`MultinormalDistribution`The most commonly used probability distributions for multivariate data analysis are those derived from the multinormal (multivariate Gaussian) distribution. This package contains multinormal, multivariate Student , Wishart, Hotelling , and quadratic form distributions. Distributions are usually represented in the symbolic form name[ , , ... ]. When there are many parameters, they may be organized into lists, as in the case of QuadraticFormDistribution. Functions such as Mean, which give properties of statistical distributions, take the symbolic representation of the distribution as an argument. Standard probability distributions derived from the multivariate Gaussian distribution. A -variate multinormal distribution with mean vector and covariance matrix is denoted . If , , is distributed (where is the zero vector), and X denotes the data matrix composed of the row vectors , then the matrix has a Wishart distribution with scale matrix and degrees of freedom parameter , denoted . The Wishart distribution is most typically used when describing the covariance matrix of multinormal samples. A vector that has a multivariate Student t distribution can also be written as a function of a multinormal random vector. Let be a standardized multinormal vector with covariance matrix and let be a chi-square variable with degrees of freedom. (Note that since is standardized, is the mean vector of and is also the correlation matrix of .) Then has a multivariate distribution with correlation matrix and degrees of freedom, denoted . The multivariate Student distribution is elliptically contoured like the multinormal distribution, and characterizes the ratio of a multinormal vector to the standard deviation common to each variate. When and , the multivariate distribution is the same as the multivariate Cauchy distribution (here denotes the identity matrix). The Hotelling T distribution is a univariate distribution proportional to the F-ratio distribution. If vector and matrix are independently distributed and , then has the Hotelling distribution with parameters and , denoted . This distribution is commonly used to describe the sample Mahalanobis distance between two populations. A quadratic form in a multinormal vector distributed is given by , where is a symmetric matrix, is a -vector, and is a scalar. This univariate distribution can be useful in discriminant analysis of multinormal samples. Functions of univariate statistical distributions applicable to multivariate distributions. In this package distributions are represented in symbolic form. Generally, PDF[dist, x] evaluates the density at if is a numerical value, vector, or matrix, and otherwise leaves the function in symbolic form. Similarly, CDF[dist, x] gives the cumulative density and CharacteristicFunction[dist, t] gives the characteristic function of the specified distribution. In some cases explicit forms of these expressions are not available. For example, PDF[QuadraticFormDistribution[{A, b, c}, {mu, sigma}], x] does not evaluate, but a Series expansion of the PDF about the lower support point of the domain (for a positive definite quadratic form) does evaluate. The CDF of MultinormalDistribution and StudentTDistribution is available for numerical vector arguments, but not for symbolic vector arguments. In the case of MultivariateTDistribution, the CharacteristicFunction is expressed in terms of an integral. If is a diagonal matrix, the closed form result for CDF[MultinormalDistribution[mu,sigma],x] is computed directly. If is not diagonal and has the form for and for , where , a method for multivariate normal distributions with product-covariance structures described in Y. L. Tong, The Multivariate Normal Distribution, Springer-Verlag, 1990 is used. If does not have either of these special forms, a general method described in Alan Genz, "Numerical Computation of Multivariate Normal Probabilities," Journal of Computational and Graphical Statistics 1 (1992), pp. 141-149 is used. If the correlation matrix in CDF[MultivariateTDistribution[r, m], x] is diagonal, a single numeric integration is performed using the closed form result for the multivariate normal CDF and the relationship between multivariate normal and multivariate T distributions. Otherwise, general methods based on separation of variable techniques in Alan Genz and Frank Bretz, "Comparison of Methods for the Computation of Multivariate t-Probabilities," Journal of Computational and Graphical Statistics 11 (2002), pp. 950-971 are used. This loads the package. Here is a symbolic representation of a standardized binormal distribution. A standardized random vector has a zero mean vector and a covariance matrix equal to its correlation matrix.
Out[2]= |  |
This gives its probability density function.
Out[3]= |  |
You can make a plot of the density to observe its distribution.
Out[4]= |  |
Here is the probability of the distribution in the region .
Out[5]= |  |
This gives the domain of the quadratic form distribution qdist.
Out[6]= |  |
The series expansion of the PDF of the quadratic form distribution can be plotted. A 20-term expansion is clearly poor for .
Out[7]= |  |
CDF[MultinormalDistribution[mu,sigma],x] and CDF[MultivariateTDistribution[r, m], x] are computed as multidimensional numeric integrals with the same default options as NIntegrate. If fewer digits of precision are required, quicker results can be obtained by setting a lower value for PrecisionGoal. For large values of m, a change of variable is performed by CDF to provide accurate results. The change of variable is made for m > 400 if the correlation matrix R is diagonal and for m > 550 if R is not diagonal. For values of m above these threshholds, computations will generally be slower. Also, sharp features near the edges of the integration region may pose additional problems for convergence as precision and accuracy goals are increased. Increasing the value of SingularityDepth or MaxRecursion will often overcome these problems. The following CDF may take several seconds to integrate.
Out[8]= |  |
If only 3 digits of precision are required, good results can be obtained in a fraction of a second by use of the PrecisionGoal option.
Out[9]= |  |
The change of variables used for large m may slow down the computations with default settings.
Out[10]= |  |
Out[11]= |  |
Furthermore, convergence problems may exist, unless options are chosen to improve the integration.
Out[12]= |  |
Many of the multivariate distributions have hidden arguments that are evaluated when the distribution is first entered. Random variate generation will be more efficient if these arguments are evaluated only once. This is an inefficient means of computing 1000 multinormal variates because the Cholesky decomposition of the covariance matrix is computed for each variate.
Out[13]= |  |
This method of generating 1000 variates is more efficient because the Cholesky decomposition is computed once.
Out[14]= |  |
Functions of univariate statistical distributions not applicable to multivariate distributions. In the multivariate case, it is difficult to define Quantile as the inverse of the CDF function, since many values of the random vector (or random matrix) correspond to a single probability value. This package defines Quantile only for the univariate distribution HotellingTSquareDistribution and some minor degenerate cases of the other distributions. The elliptically-contoured distributions MultinormalDistribution and MultivariateTDistribution support EllipsoidQuantile and its inverse RegionProbability. Functions of vector-valued multivariate statistical distributions. This gives the ellipse centered on the mean that encloses 50% of the ndist distribution.
Out[15]= |  |
This gives the probability of the distribution within the ellipse. Note that the ellipse must correspond to a constant-probability contour of the prescribed distribution.
Out[16]= |  |
As , the  elliptical contour of MultivariateTDistribution[m, r] approaches the  elliptical contour of a multinormal distribution with zero mean vector and covariance matrix .
Out[17]= |  |
|