Posted April 2, 2012 by RhoBeta in Analytics

A Brief Encounter with Principal Components Analysis

Principal Component
Principal Component

In Short

Principal Components Analysis is a method that reduces data dimensionality by performing a covariance analysis between factors. As such, it is suitable for data sets in multiple dimensions. PCA is recommended as an exploratory tool to uncover unknown trends in the data. Principal component analysis is appropriate when you have obtained measures on a number of observed variables and wish to develop a smaller number of artificial variables (called principal components) that will account for most of the variance in the observed variables. A principal component can be defined as a linear combination of optimally-weighted observed variables. The principal components may then be used as predictor or criterion variables in subsequent analyses.


Principal component analysis is a variable reduction procedure. It is useful when we have data of large number of variables and that there is some scope for redundancy in those variables. In this case, redundancy means that some of the variables are correlated with one another, possibly because they are measuring the same construct. Because of this redundancy, it should be possible to reduce the observed variables into a smaller number of principal components (artificial variables) that will account for most of the variance in the observed variables.

Same Same but Different

Principal component analysis is sometimes confused with factor analysis, and this is understandable, because there are many important similarities between the two procedures: both are variable reduction methods that can be used to identify groups of observed variables that tend to hang together empirically. There is a conceptual difference between principal component analysis and factor analysis that should be understood at the outset. The most important deals with the assumption of an underlying causal structure: factor analysis assumes that the covariation in the observed variables is due to the presence of one or more latent variables (factors) that exert causal influence on these observed variables. Principal component analysis makes no assumption about an underlying causal model. Principal component analysis is simply a variable reduction procedure that (typically) results in a relatively small number of components that account for most of the variance in a set of observed variables.


Avatar of RhoBeta
Expertise in Analytics Tools,Software,technology.