
If not, you can transform them using log and root for exponential distributions and Box-Cox for skewed distributions.
Inspecting the univariate distributions of each variable and make sure that they are normally distribute. LDA assumes that predictors are normally distributed (Gaussian distribution) and that the different classes have class-specific means and equal variance/covariance. These directions, called linear discriminants, are a linear combinations of predictor variables. The LDA algorithm starts by finding directions that maximize the separation between classes, then use these directions to predict the class of individuals. # Transform the data using the estimated parameters PreProcess(method = c("center", "scale")) # Split the data into training (80%) and test set (20%)ĬreateDataPartition(p = 0.8, list = FALSE) Split the data into training and test set:. It’s generally recommended to standardize/normalize continuous predictor before the analysis. We’ll use the iris data set, introduced in Chapter for predicting iris species based on the predictor variables Sepal.Length, Sepal.Width, Petal.Length, Petal.Width.ĭiscriminant analysis can be affected by the scale/unit in which predictor variables are measured. This leads to an improvement of the discriminant analysis. Regularized discriminant anlysis ( RDA): Regularization (or shrinkage) improves the estimate of the covariance matrices in situations where the number of predictors is larger than the number of samples in the training data. Mixture discriminant analysis ( MDA): Each class is assumed to be a Gaussian mixture of subclasses.įlexible Discriminant Analysis ( FDA): Non-linear combinations of predictors is used such as splines. Here, there is no assumption that the covariance matrix of classes is the same. Quadratic discriminant analysis ( QDA): More flexible than LDA. Assumes that the predictor variables (p) are normally distributed and the classes have identical variances (for univariate analysis, p = 1) or identical covariance matrices (for multivariate analysis, p > 1). Linear discriminant analysis ( LDA): Uses linear combinations of predictors to predict the class of a given observation. The following discriminant analysis methods will be described: #GAUSSIAN WINDOW FUNCTIIN IN INMR CODE#
Additionally, we’ll provide R code to perform the different types of analysis. In this chapter, you’ll learn the most widely used discriminant analysis techniques and extensions. Note that, both logistic regression and discriminant analysis can be used for binary classification tasks. Additionally, it’s more stable than the logistic regression for multi-class classification problems. Previously, we have described the logistic regression for two-class classification problems, that is when the outcome variable has two possible values (0/1, no/yes, negative/positive).Ĭompared to logistic regression, the discriminant analysis is more suitable for predicting the category of an observation in the situation where the outcome variable contains more than two classes. It works with continuous and/or categorical predictor variables. Discriminant analysis is used to predict the probability of belonging to a given class (or category) based on one or multiple predictor variables.