• PCA in a nutshell. Source: Lavrenko and Sutton 2011, slide 13.
    image
  • A scree plot based on eigenvalues shows that three factors will explain most of the data. Source: Statistica Help 2018.
    image
  • Illustration of principal component analysis. Source: Werner and Friedrich 2014, fig. 1.
    image
  • Use of PCA for facial recognition. Source: Lipp 2015.
    image
  • Examples where PCA may not work well. Source: Lever et al. 2017, fig. 4.
    image

Principal Component Analysis

userimg
arvindpdmn
700 DevCoins
userimg
raam.raam
403 DevCoins
2 authors have contributed to this article
Last updated by arvindpdmn
on 2019-01-22 06:17:28
Created by raam.raam
on 2019-01-06 08:49:47

Summary

 image
PCA in a nutshell. Source: Lavrenko and Sutton 2011, slide 13.

Big Data is increasingly becoming the norm and affecting many domains. When there's lots of data involving multiple variables, the work of a data scientist gets difficult. Algorithms will also take longer to complete. Wouldn't it be sensible to identify and consider only those variables that influence the most and discard others?

Principal Component Analysis (PCA) extracts the most important information. This in turn leads to compression since the less important information are discarded. With fewer data points to consider, it becomes simpler to describe and analyze the dataset.

PCA can be seen a trade-off between faster computation and less memory consumption versus information loss. It's considered as one of the most useful tools for data analysis.

Milestones

1850

Mid-nineteenth century works by Cauchy and Jacobi in classical analytic geometry show that the equations for the principal axes of quadratic forms and surfaces are known.

1889

Francis Galton in his Natural Inheritance connects principal axes for the first time with the correlation ellipsoid.

1901

Karl Pearson invents PCA while working to find the major and minor axes of an ellipse. However, he does not use the term PCA. In his geometric interpretation of the problem, he's trying to find "lines and planes of closest fit to systems of points in space".

1930

Harold Hotelling develops PCA independently and names the technique. His approach is what is familiar to us today, using successive orthogonal linear combinations with maximum variance. The 1930s is also the decade when the development of Factor Analysis is started. This is closely related to PCA.

1960

Around 1960, Malinowski introduces PCA to chemistry. After 1970, many chemical applications of PCA appear in literature.

1966
image

How does one determine how many principal components to retain for analysis? In the context of factor analysis, R.B. Cattell proposes a method called Scree Test. A Scree Plot is used for this purpose. It represents graphically the eigenvalues or the percentages of total variation accounted for by each principal component.

Discussion

  • Could you explain PCA with a simple example?
     image
    Illustration of principal component analysis. Source: Werner and Friedrich 2014, fig. 1.

    We can describe the shape of a fish with two variables: height and width. However, these two variables are not independent of each other. In fact, they have a strong correlation. Given the height, we can probably estimate the width; and vice versa. Thus, we may say that the shape of a fish can be described with a single component.

    This doesn't mean that we simply ignore either height or width. Instead, we transform our two original variables into two orthogonal (independent) components that give a complete alternative description. The first component (blue line) will explain most of the variation in the data. The second component (dotted line) will explain the remaining variation. Note that both components are derived from both height and width.

    More intuitively, the first component line can be seen as the best-fit line that minimizes information loss. Alternatively, it can also be seen as the line that maximizes the variation; that is, it tries to explain as much of the variation in the dataset as possible.

  • Could you mention some real-world use cases of PCA?
     image
    Use of PCA for facial recognition. Source: Lipp 2015.

    PCA has been applied for facial recognition. For 90% capture variance, only a third of the components had to be retained. This may be sufficient for Machine Learning applications. The other two-thirds contain most of the image details.

    In another study, the consumption of 17 different food types was studied across 4 countries in the UK. Thus, this problem has 17 features and hence non-trivial to analyze. With PCA, the first component showed that Northern Ireland was unique. People of Northern Ireland consumed fresh potatoes and fresh fruit differently from other populations.

    The lower molar teeth of an ancient mammal named Kuehneotherium was studied in nine variables. PCA showed that just two components are enough to explain over 95% of total variation. When plotted, it was easy to see the clusters and relate them back to the original features. One cluster stood for a species of Kuehneotherium while another broader cluster suggested an unidentified animal.

    To detect lactose in lactose-free milk using NIR spectroscopy, containing 601 dimensions, PCA identified distinct clusters with just two principal components.

  • Isn't PCA similar to Dimensionality Reduction?

    In a complex data-intensive problem, there are usually many influencing variables. The term variable is equivalent to other commonly used terms: feature or dimension.

    The idea of reducing the number of variables or dimensions is called Dimensionality Reduction. This can be done in two ways:

    • Feature Elimination: We drop some features that we may consider unimportant. While the approach is simple, we lose useful information present in those dropped features.
    • Feature Extraction: We transform the original set of features into another set of features. The idea is to pack the most important information into as few derived features as possible. We can reduce the number of dimensions by dropping some of the derived features. But we don't lose complete information from the original features: derived features are a linear combination of the original features.

    PCA is in fact a method for doing feature extraction. In PCA, derived features are also called composite features or principal components. Moreover, these principal components are linearly independent from one another.

  • What are advantages of the PCA technique?

    PCA minimizes information loss even when fewer principal components are considered for analysis. This is because each principal component is along a direction that maximizes variation, that is, the spread of data. More importantly, the components themselves need not be identified a priori: they are identified by PCA from the dataset. Thus, PCA is an adaptive data analysis technique. In other words, PCA is an unsupervised learning method.

    By reducing the number of dimensions, PCA enables easier data visualization. Visualization helps us to identify clusters, patterns and trends more easily. Fewer dimensions means less computation and lower error rate. PCA reduces noise and makes algorithms work better.

    Finding the principal components is really a eigenvalue/eigenvector problem, which has been well studied with lots of algorithms available for practical use.

    Although Gaussian distribution of data is assumed, as a descriptive tool PCA doesn't need this assumption. It can be used for exploratory analysis on data of any distribution. There are also variations of PCA that cater to different data types and structures.

  • What are drawbacks of the PCA technique?
     image
    Examples where PCA may not work well. Source: Lever et al. 2017, fig. 4.

    Here are some drawbacks of PCA:

    • PCA works only if the observed variables are linearly correlated. If there's no correlation, PCA will fail to capture adequate variance with fewer components.
    • PCA is lossy. Information is lost when we discard insignificant components.
    • Scaling of variables can yield different results. Hence, scaling that you use should be documented. Scaling should not be adjusted to match prior knowledge of data.
    • Since each principal components is a linear combination of the original features, visualizations are not easy to interpret or relate to original features.

Sample Code

  • # Source: http://r.789695.n4.nabble.com/How-to-comment-in-R-td882882.html
    # Accessed 2019-01-12.
     
    # Do PCA on iris dataset and plot
    library(ggfortify)
    df <- iris[c(1, 2, 3, 4)]
    autoplot(prcomp(df), data = iris, colour = 'Species')

References

  1. Abdi, Hervé and Lynne J Williams. 2010. "Principal Component Analysis." WIREs Comp Stat 2010, vol. 2, pp. 433-459, John Wiley & Sons, Inc. Accessed 2019-01-12.
  2. Brems, Matt. 2017. "A One-Stop Shop for Principal Component Analysis." Towards Data Science, April 17. Accessed 2019-01-12.
  3. Cavaioni, Mike. 2017. "Machine Learning: Unsupervised Learning — Principal Component Analysis." Machine Learning bites, February 07. Accessed 2019-01-12.
  4. Galarnyk, Michael. 2017. "Principle Component Analysis (PCA) for Data Visualization." Python_Tutorials, on GitHub, December 02. Accessed 2019-01-12.
  5. Jolliffe, Ian T. and Jorge Cadima. 2016. "Principal component analysis: a review and recent developments." Philos Trans A Math Phys Eng Sci. vol. 374, no. 2065, Apr 13. Accessed 2019-01-12.
  6. Lavrenko, Victor and Charles Sutton. 2011. "IAML: Dimensionality Reduction." School of Informatics, University of Edinburgh. Accessed 2019-01-22.
  7. Ledesma, Rubén, Pedro Valero-Mora, and Guillermo Macbeth. 2015. "The Scree Test and the Number of Factors: a Dynamic Graphics Approach." The Spanish Journal of Psychology, vol. 18, e11, pp. 1-10, June. Accessed 2019-01-12.
  8. Leeuw, Jan De. 2011. "History and Theory of Nonlinear Principal Component Analysis." UCLA Department of Statistics, February 11. Accessed 2019-01-12.
  9. Lever, Jake, Martin Krzywinski, and Naomi Altman. 2017. "Principal component analysis." Nature Methods, vol. 14, pp. 641–642. Accessed 2019-01-12.
  10. Lipp, Jesse. 2015. "PCA – Part 5: Eigenpets." Bioramble, September 01. Accessed 2019-01-12.
  11. Pellicia, Daniel. 2018. "Classification of NIR spectra using Principal Component Analysis in Python." Instruments & Data Tools Pty Ltd, March 23. Accessed 2019-01-12.
  12. Powell, Victor and Lewis Lehe. 2015. "Principal Component Analysis: Explained Visually." Accessed 2019-01-12.
  13. Starmer, Josh. 2018. "Principal Component Analysis (PCA), Step-by-Step." StatQuest, on YouTube, April 02. Accessed 2019-01-12.
  14. Statistica Help. 2018. "Scree Plot, Scree Test." Statistica Help, TIBCO Software Inc. Accessed 2019-01-12.
  15. Werner, Steffen, Jochen C Rink and Benjamin Friedrich. 2014. "Shape Mode Analysis Exposes Movement Patterns in Biology: Flagella and Flatworms as Case Studies." PloS one, vol. 9, no. 11, July. Accessed 2019-01-12.
  16. Wikipedia. 2019. "Principal component analysis." Wikipedia, January 3. Accessed 2019-01-12.
  17. Wold, Svante, Kim Esbensen, and Paul Geladi. 1987. "Principal Component Analysis." Chemometrics and Intelligent Laboratory Systems, vol. 2, pp. 37-52, Elsevier Science Publishers B.V.,Amsterdam. Accessed 2019-01-12.

Milestones

1850

Mid-nineteenth century works by Cauchy and Jacobi in classical analytic geometry show that the equations for the principal axes of quadratic forms and surfaces are known.

1889

Francis Galton in his Natural Inheritance connects principal axes for the first time with the correlation ellipsoid.

1901

Karl Pearson invents PCA while working to find the major and minor axes of an ellipse. However, he does not use the term PCA. In his geometric interpretation of the problem, he's trying to find "lines and planes of closest fit to systems of points in space".

1930

Harold Hotelling develops PCA independently and names the technique. His approach is what is familiar to us today, using successive orthogonal linear combinations with maximum variance. The 1930s is also the decade when the development of Factor Analysis is started. This is closely related to PCA.

1960

Around 1960, Malinowski introduces PCA to chemistry. After 1970, many chemical applications of PCA appear in literature.

1966
image

How does one determine how many principal components to retain for analysis? In the context of factor analysis, R.B. Cattell proposes a method called Scree Test. A Scree Plot is used for this purpose. It represents graphically the eigenvalues or the percentages of total variation accounted for by each principal component.

Tags

See Also

  • Eigenvalues and Eigenvectors for Data Scientists
  • Singular Value Decomposition
  • Dimensionality Reduction
  • Feature Engineering
  • Factor Analysis
  • Kernel Principal Component Analysis

Further Reading

  1. Starmer, Josh. 2018. "Principal Component Analysis (PCA), Step-by-Step." StatQuest, on YouTube, April 02. Accessed 2019-01-12.
  2. Powell, Victor and Lewis Lehe. 2015. "Principal Component Analysis: Explained Visually." Accessed 2019-01-12.
  3. Williams, Alex. 2016. "Everything you did and didn't know about PCA." Its Neuronal, March 27. Accessed 2019-01-12.
  4. Wold, Svante, Kim Esbensen, and Paul Geladi. 1987. "Principal Component Analysis." Chemometrics and Intelligent Laboratory Systems, 2, pp. 37-52, Elsevier Science Publishers B.V.,Amsterdam. Accessed 2019-01-12.
  5. Abdi, Hervé and Lynne J Williams. 2010. "Principal Component Analysis." WIREs Comp Stat 2010, vol. 2, pp. 433-459, John Wiley & Sons, Inc. Accessed 2019-01-12.
  6. Jolliffe, I.T. 2002. "Principal Component Analysis." Second Edition, part of Springer Series in Statistics, Springer-Verlag, New York, Inc. Accessed 2019-01-12.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
3
0
700
4
0
403
1203
Words
0
Chats
7
Edits
5
Likes
1402
Hits

Cite As

Devopedia. 2019. "Principal Component Analysis." Version 7, January 22. Accessed 2019-06-27. https://devopedia.org/principal-component-analysis