Principal Component Analysis
Big Data is increasingly becoming the norm and affecting many domains. When there's lots of data involving multiple variables, the work of a data scientist gets difficult. Algorithms will also take longer to complete. Wouldn't it be sensible to identify and consider only those variables that influence the most and discard others?
Principal Component Analysis (PCA) extracts the most important information. This in turn leads to compression since the less important information are discarded. With fewer data points to consider, it becomes simpler to describe and analyze the dataset.^{}
PCA can be seen a tradeoff between faster computation and less memory consumption versus information loss.^{} It's considered as one of the most useful tools for data analysis.^{}
Discussion

Could you explain PCA with a simple example? We can describe the shape of a fish with two variables: height and width. However, these two variables are not independent of each other. In fact, they have a strong correlation. Given the height, we can probably estimate the width; and vice versa. Thus, we may say that the shape of a fish can be described with a single component.^{}
This doesn't mean that we simply ignore either height or width. Instead, we transform our two original variables into two orthogonal (independent) components that give a complete alternative description. The first component (blue line) will explain most of the variation in the data. The second component (dotted line) will explain the remaining variation. Note that both components are derived from both height and width.^{}
More intuitively, the first component line can be seen as the bestfit line that minimizes information loss. Alternatively, it can also be seen as the line that maximizes the variation; that is, it tries to explain as much of the variation in the dataset as possible.^{}

Could you mention some realworld use cases of PCA? PCA has been applied for facial recognition. For 90% capture variance, only a third of the components had to be retained. This may be sufficient for Machine Learning applications. The other twothirds contain most of the image details.^{}
In another study, the consumption of 17 different food types was studied across 4 countries in the UK. Thus, this problem has 17 features and hence nontrivial to analyze. With PCA, the first component showed that Northern Ireland was unique. People of Northern Ireland consumed fresh potatoes and fresh fruit differently from other populations.^{}
The lower molar teeth of an ancient mammal named Kuehneotherium was studied in nine variables. PCA showed that just two components are enough to explain over 95% of total variation. When plotted, it was easy to see the clusters and relate them back to the original features. One cluster stood for a species of Kuehneotherium while another broader cluster suggested an unidentified animal.^{}
To detect lactose in lactosefree milk using NIR spectroscopy, containing 601 dimensions, PCA identified distinct clusters with just two principal components.^{}

Isn't PCA similar to Dimensionality Reduction? In a complex dataintensive problem, there are usually many influencing variables. The term variable is equivalent to other commonly used terms: feature or dimension.
The idea of reducing the number of variables or dimensions is called Dimensionality Reduction. This can be done in two ways:^{}
 Feature Elimination: We drop some features that we may consider unimportant. While the approach is simple, we lose useful information present in those dropped features.
 Feature Extraction: We transform the original set of features into another set of features. The idea is to pack the most important information into as few derived features as possible. We can reduce the number of dimensions by dropping some of the derived features. But we don't lose complete information from the original features: derived features are a linear combination of the original features.
PCA is in fact a method for doing feature extraction. In PCA, derived features are also called composite features or principal components.^{} Moreover, these principal components are linearly independent from one another.^{}

What are advantages of the PCA technique? PCA minimizes information loss even when fewer principal components are considered for analysis. This is because each principal component is along a direction that maximizes variation, that is, the spread of data.^{} More importantly, the components themselves need not be identified a priori: they are identified by PCA from the dataset. Thus, PCA is an adaptive data analysis technique.^{} In other words, PCA is an unsupervised learning method.^{}
By reducing the number of dimensions, PCA enables easier data visualization. Visualization helps us to identify clusters, patterns and trends more easily. Fewer dimensions means less computation and lower error rate.^{} PCA reduces noise and makes algorithms work better.^{}
Finding the principal components is really an eigenvalue/eigenvector problem,^{} which has been well studied with lots of algorithms available for practical use.
Although Gaussian distribution of data is assumed, as a descriptive tool PCA doesn't need this assumption. It can be used for exploratory analysis on data of any distribution. There are also variations of PCA that cater to different data types and structures.^{}

What are drawbacks of the PCA technique? Here are some drawbacks of PCA:^{}
 PCA works only if the observed variables are linearly correlated. If there's no correlation, PCA will fail to capture adequate variance with fewer components.
 PCA is lossy. Information is lost when we discard insignificant components.
 Scaling of variables can yield different results. Hence, scaling that you use should be documented. Scaling should not be adjusted to match prior knowledge of data.
 Since each principal components is a linear combination of the original features, visualizations are not easy to interpret or relate to original features.^{}
Milestones
Midnineteenth century works by Cauchy and Jacobi in classical analytic geometry show that the equations for the principal axes of quadratic forms and surfaces are known.^{}
Francis Galton in his Natural Inheritance connects principal axes for the first time with the correlation ellipsoid.^{}
Karl Pearson invents PCA while working to find the major and minor axes of an ellipse. However, he does not use the term PCA.^{} In his geometric interpretation of the problem, he's trying to find "lines and planes of closest fit to systems of points in space".^{}
Harold Hotelling develops PCA independently and names the technique.^{} His approach is what is familiar to us today, using successive orthogonal linear combinations with maximum variance.^{} The 1930s is also the decade when the development of Factor Analysis is started. This is closely related to PCA.^{}
Around 1960, Malinowski introduces PCA to chemistry. After 1970, many chemical applications of PCA appear in literature.^{}
How does one determine how many principal components to retain for analysis? In the context of factor analysis, R.B. Cattell proposes a method called Scree Test.^{} A Scree Plot is used for this purpose. It represents graphically the eigenvalues or the percentages of total variation accounted for by each principal component.^{}
Sample Code
References
 Abdi, Hervé and Lynne J Williams. 2010. "Principal Component Analysis." WIREs Comp Stat 2010, vol. 2, pp. 433459, John Wiley & Sons, Inc. Accessed 20190112.
 Brems, Matt. 2017. "A OneStop Shop for Principal Component Analysis." Towards Data Science, April 17. Accessed 20190112.
 Cavaioni, Mike. 2017. "Machine Learning: Unsupervised Learning — Principal Component Analysis." Machine Learning bites, February 07. Accessed 20190112.
 Galarnyk, Michael. 2017. "Principle Component Analysis (PCA) for Data Visualization." Python_Tutorials, on GitHub, December 02. Accessed 20190112.
 Jolliffe, Ian T. and Jorge Cadima. 2016. "Principal component analysis: a review and recent developments." Philos Trans A Math Phys Eng Sci. vol. 374, no. 2065, Apr 13. Accessed 20190112.
 Lavrenko, Victor and Charles Sutton. 2011. "IAML: Dimensionality Reduction." School of Informatics, University of Edinburgh. Accessed 20190122.
 Ledesma, Rubén, Pedro ValeroMora, and Guillermo Macbeth. 2015. "The Scree Test and the Number of Factors: a Dynamic Graphics Approach." The Spanish Journal of Psychology, vol. 18, e11, pp. 110, June. Accessed 20190112.
 Leeuw, Jan De. 2011. "History and Theory of Nonlinear Principal Component Analysis." UCLA Department of Statistics, February 11. Accessed 20190112.
 Lever, Jake, Martin Krzywinski, and Naomi Altman. 2017. "Principal component analysis." Nature Methods, vol. 14, pp. 641–642. Accessed 20190112.
 Lipp, Jesse. 2015. "PCA – Part 5: Eigenpets." Bioramble, September 01. Accessed 20190112.
 Pellicia, Daniel. 2018. "Classification of NIR spectra using Principal Component Analysis in Python." Instruments & Data Tools Pty Ltd, March 23. Accessed 20190112.
 Powell, Victor and Lewis Lehe. 2015. "Principal Component Analysis: Explained Visually." Accessed 20190112.
 Starmer, Josh. 2018. "Principal Component Analysis (PCA), StepbyStep." StatQuest, on YouTube, April 02. Accessed 20190112.
 Statistica Help. 2018. "Scree Plot, Scree Test." Statistica Help, TIBCO Software Inc. Accessed 20190112.
 Werner, Steffen, Jochen C Rink and Benjamin Friedrich. 2014. "Shape Mode Analysis Exposes Movement Patterns in Biology: Flagella and Flatworms as Case Studies." PloS one, vol. 9, no. 11, July. Accessed 20190112.
 Wikipedia. 2019. "Principal component analysis." Wikipedia, January 3. Accessed 20190112.
 Wold, Svante, Kim Esbensen, and Paul Geladi. 1987. "Principal Component Analysis." Chemometrics and Intelligent Laboratory Systems, vol. 2, pp. 3752, Elsevier Science Publishers B.V.,Amsterdam. Accessed 20190112.
Further Reading
 Starmer, Josh. 2018. "Principal Component Analysis (PCA), StepbyStep." StatQuest, on YouTube, April 02. Accessed 20190112.
 Powell, Victor and Lewis Lehe. 2015. "Principal Component Analysis: Explained Visually." Accessed 20190112.
 Williams, Alex. 2016. "Everything you did and didn't know about PCA." Its Neuronal, March 27. Accessed 20190112.
 Wold, Svante, Kim Esbensen, and Paul Geladi. 1987. "Principal Component Analysis." Chemometrics and Intelligent Laboratory Systems, 2, pp. 3752, Elsevier Science Publishers B.V.,Amsterdam. Accessed 20190112.
 Abdi, Hervé and Lynne J Williams. 2010. "Principal Component Analysis." WIREs Comp Stat 2010, vol. 2, pp. 433459, John Wiley & Sons, Inc. Accessed 20190112.
 Jolliffe, I.T. 2002. "Principal Component Analysis." Second Edition, part of Springer Series in Statistics, SpringerVerlag, New York, Inc. Accessed 20190112.
Article Stats
Cite As
See Also
 Eigenvalues and Eigenvectors for Data Scientists
 Singular Value Decomposition
 Dimensionality Reduction
 Feature Engineering
 Factor Analysis
 Kernel Principal Component Analysis