Principal Component Analysis
Big Data is increasingly becoming the norm and affecting many domains. When there's lots of data involving multiple variables, the work of a data scientist gets difficult. Algorithms will also take longer to complete. Wouldn't it be sensible to identify and consider only those variables that influence the most and discard others?
Principal Component Analysis (PCA) extracts the most important information. This in turn leads to compression since the less important information are discarded. With fewer data points to consider, it becomes simpler to describe and analyze the dataset.
PCA can be seen a trade-off between faster computation and less memory consumption versus information loss. It's considered as one of the most useful tools for data analysis.
Discussion
-
Could you explain PCA with a simple example? We can describe the shape of a fish with two variables: height and width. However, these two variables are not independent of each other. In fact, they have a strong correlation. Given the height, we can probably estimate the width; and vice versa. Thus, we may say that the shape of a fish can be described with a single component.
This doesn't mean that we simply ignore either height or width. Instead, we transform our two original variables into two orthogonal (independent) components that give a complete alternative description. The first component (blue line) will explain most of the variation in the data. The second component (dotted line) will explain the remaining variation. Note that both components are derived from both height and width.
More intuitively, the first component line can be seen as the best-fit line that minimizes information loss. Alternatively, it can also be seen as the line that maximizes the variation; that is, it tries to explain as much of the variation in the dataset as possible.
-
Could you mention some real-world use cases of PCA? PCA has been applied for facial recognition. For 90% capture variance, only a third of the components had to be retained. This may be sufficient for Machine Learning applications. The other two-thirds contain most of the image details.
In another study, the consumption of 17 different food types was studied across 4 countries in the UK. Thus, this problem has 17 features and hence non-trivial to analyze. With PCA, the first component showed that Northern Ireland was unique. People of Northern Ireland consumed fresh potatoes and fresh fruit differently from other populations.
The lower molar teeth of an ancient mammal named Kuehneotherium was studied in nine variables. PCA showed that just two components are enough to explain over 95% of total variation. When plotted, it was easy to see the clusters and relate them back to the original features. One cluster stood for a species of Kuehneotherium while another broader cluster suggested an unidentified animal.
To detect lactose in lactose-free milk using NIR spectroscopy, containing 601 dimensions, PCA identified distinct clusters with just two principal components.
-
Isn't PCA similar to Dimensionality Reduction? In a complex data-intensive problem, there are usually many influencing variables. The term variable is equivalent to other commonly used terms: feature or dimension.
The idea of reducing the number of variables or dimensions is called Dimensionality Reduction. This can be done in two ways:
- Feature Elimination: We drop some features that we may consider unimportant. While the approach is simple, we lose useful information present in those dropped features.
- Feature Extraction: We transform the original set of features into another set of features. The idea is to pack the most important information into as few derived features as possible. We can reduce the number of dimensions by dropping some of the derived features. But we don't lose complete information from the original features: derived features are a linear combination of the original features.
PCA is in fact a method for doing feature extraction. In PCA, derived features are also called composite features or principal components. Moreover, these principal components are linearly independent from one another.
-
What are advantages of the PCA technique? PCA minimizes information loss even when fewer principal components are considered for analysis. This is because each principal component is along a direction that maximizes variation, that is, the spread of data. More importantly, the components themselves need not be identified a priori: they are identified by PCA from the dataset. Thus, PCA is an adaptive data analysis technique. In other words, PCA is an unsupervised learning method.
By reducing the number of dimensions, PCA enables easier data visualization. Visualization helps us to identify clusters, patterns and trends more easily. Fewer dimensions means less computation and lower error rate. PCA reduces noise and makes algorithms work better.
Finding the principal components is really an eigenvalue/eigenvector problem, which has been well studied with lots of algorithms available for practical use.
Although Gaussian distribution of data is assumed, as a descriptive tool PCA doesn't need this assumption. It can be used for exploratory analysis on data of any distribution. There are also variations of PCA that cater to different data types and structures.
-
What are drawbacks of the PCA technique? Here are some drawbacks of PCA:
- PCA works only if the observed variables are linearly correlated. If there's no correlation, PCA will fail to capture adequate variance with fewer components.
- PCA is lossy. Information is lost when we discard insignificant components.
- Scaling of variables can yield different results. Hence, scaling that you use should be documented. Scaling should not be adjusted to match prior knowledge of data.
- Since each principal components is a linear combination of the original features, visualizations are not easy to interpret or relate to original features.
Milestones
How does one determine how many principal components to retain for analysis? In the context of factor analysis, R.B. Cattell proposes a method called Scree Test. A Scree Plot is used for this purpose. It represents graphically the eigenvalues or the percentages of total variation accounted for by each principal component.
Sample Code
References
- Abdi, Hervé and Lynne J Williams. 2010. "Principal Component Analysis." WIREs Comp Stat 2010, vol. 2, pp. 433-459, John Wiley & Sons, Inc. Accessed 2019-01-12.
- Brems, Matt. 2017. "A One-Stop Shop for Principal Component Analysis." Towards Data Science, April 17. Accessed 2019-01-12.
- Cavaioni, Mike. 2017. "Machine Learning: Unsupervised Learning — Principal Component Analysis." Machine Learning bites, February 07. Accessed 2019-01-12.
- Galarnyk, Michael. 2017. "Principle Component Analysis (PCA) for Data Visualization." Python_Tutorials, on GitHub, December 02. Accessed 2019-01-12.
- Jolliffe, Ian T. and Jorge Cadima. 2016. "Principal component analysis: a review and recent developments." Philos Trans A Math Phys Eng Sci. vol. 374, no. 2065, Apr 13. Accessed 2019-01-12.
- Lavrenko, Victor and Charles Sutton. 2011. "IAML: Dimensionality Reduction." School of Informatics, University of Edinburgh. Accessed 2019-01-22.
- Ledesma, Rubén, Pedro Valero-Mora, and Guillermo Macbeth. 2015. "The Scree Test and the Number of Factors: a Dynamic Graphics Approach." The Spanish Journal of Psychology, vol. 18, e11, pp. 1-10, June. Accessed 2019-01-12.
- Leeuw, Jan De. 2011. "History and Theory of Nonlinear Principal Component Analysis." UCLA Department of Statistics, February 11. Accessed 2019-01-12.
- Lever, Jake, Martin Krzywinski, and Naomi Altman. 2017. "Principal component analysis." Nature Methods, vol. 14, pp. 641–642. Accessed 2019-01-12.
- Lipp, Jesse. 2015. "PCA – Part 5: Eigenpets." Bioramble, September 01. Accessed 2019-01-12.
- Pellicia, Daniel. 2018. "Classification of NIR spectra using Principal Component Analysis in Python." Instruments & Data Tools Pty Ltd, March 23. Accessed 2019-01-12.
- Powell, Victor and Lewis Lehe. 2015. "Principal Component Analysis: Explained Visually." Accessed 2019-01-12.
- Starmer, Josh. 2018. "Principal Component Analysis (PCA), Step-by-Step." StatQuest, on YouTube, April 02. Accessed 2019-01-12.
- Statistica Help. 2018. "Scree Plot, Scree Test." Statistica Help, TIBCO Software Inc. Accessed 2019-01-12.
- Werner, Steffen, Jochen C Rink and Benjamin Friedrich. 2014. "Shape Mode Analysis Exposes Movement Patterns in Biology: Flagella and Flatworms as Case Studies." PloS one, vol. 9, no. 11, July. Accessed 2019-01-12.
- Wikipedia. 2019. "Principal component analysis." Wikipedia, January 3. Accessed 2019-01-12.
- Wold, Svante, Kim Esbensen, and Paul Geladi. 1987. "Principal Component Analysis." Chemometrics and Intelligent Laboratory Systems, vol. 2, pp. 37-52, Elsevier Science Publishers B.V.,Amsterdam. Accessed 2019-01-12.
Further Reading
- Starmer, Josh. 2018. "Principal Component Analysis (PCA), Step-by-Step." StatQuest, on YouTube, April 02. Accessed 2019-01-12.
- Powell, Victor and Lewis Lehe. 2015. "Principal Component Analysis: Explained Visually." Accessed 2019-01-12.
- Williams, Alex. 2016. "Everything you did and didn't know about PCA." Its Neuronal, March 27. Accessed 2019-01-12.
- Wold, Svante, Kim Esbensen, and Paul Geladi. 1987. "Principal Component Analysis." Chemometrics and Intelligent Laboratory Systems, 2, pp. 37-52, Elsevier Science Publishers B.V.,Amsterdam. Accessed 2019-01-12.
- Abdi, Hervé and Lynne J Williams. 2010. "Principal Component Analysis." WIREs Comp Stat 2010, vol. 2, pp. 433-459, John Wiley & Sons, Inc. Accessed 2019-01-12.
- Jolliffe, I.T. 2002. "Principal Component Analysis." Second Edition, part of Springer Series in Statistics, Springer-Verlag, New York, Inc. Accessed 2019-01-12.
Article Stats
Cite As
See Also
- Eigenvalues and Eigenvectors for Data Scientists
- Singular Value Decomposition
- Dimensionality Reduction
- Feature Engineering
- Factor Analysis
- Kernel Principal Component Analysis