Factor Analysis
 Summary

Discussion
 Could you provide an intuitive explanation of Factor Analysis?
 What are latent variables and factor loadings in Factor Analysis?
 What are the main types of Factor Analysis?
 What are the some methods of doing Factor Analysis?
 What are some assumptions for Factor Analysis?
 Why do we do Factor Rotation?
 Isn't Factor Analysis the same as Principal Component Analysis?
 Considering PCA and FA, how are variables related to components or factors?
 What tools are available to perform Factor Analysis?
 Milestones
 References
 Further Reading
 Article Stats
 Cite As
When analysing data containing many measured variables, it may happen that some of the variables are correlated. This could be because they share an underlying influence or common factor. It would be useful to understand how these variables are correlated and seek an intuitive explanation about what's common among them. This will also simplify further analysis by reducing the dataset into fewer variables or factors. This is what factor analysis tries to achieve.^{}
A good factor is intuitive, easy to interpret, has a simple structure and lacks complex loadings.^{} Factor analysis is in some sense an art. It's been said,^{}
Factor analysis is not a purely statistical technique; there is always a certain amount of guesswork in it... Factor analysis is certainly a very treacherous tool in inexperienced hands.
Discussion
Could you provide an intuitive explanation of Factor Analysis? Suppose a village survey is conducted and the questionnaire includes 500 questions. This survey therefore results in a large dataset of 500 variables. However, we may discover that many of the variables are correlated. We can probably put related variables into groups such as income, education, healthcare, cleanliness, etc. These are called factors. Now our analysis becomes easier from many variables to fewer factors.^{}
Let's say we measure students' abilities in terms of four variables: vocabulary, grammar, arithmetic, and geometry. We can make a hypothesis that vocabulary and grammar abilities must be correlated. Likewise, arithmetic and geometry abilities must be correlated. We can therefore hypothesize two factors: language ability and math ability. Subsequent analysis on the data can either confirm or reject the presence of these factors and to what extent they relate to the variables. In fact, the factors themselves could be correlated with each other and we might identify a single factor that we can call academic ability.^{}
What are latent variables and factor loadings in Factor Analysis? What we call factors are in fact latent variables. These are variables that can't be measured but in fact influence variables that are measured.^{}
The coefficients of latent variables are called factor loadings with respect to a measured variable. In other words, the extent to which a variable is associated with the factor is quantitatively expressed by its factor loading. It's possible that a measured variable is influenced by more than one factor.^{}
To give an example, when income, education and occupation are correlated, the common factor could be "individual socioeconomic status" (F1). On the other hand, house value, neighbourhood crimes and amenities can point to another factor "neighbourhood socioeconomic status" (F2). Consider a loading of 0.65 between income and F1; and a loading of 0.48 between occupation and F1. This implies that F1 influences income more strongly than occupation.^{}
An absolute value of 0.4 or higher can be considered as a high loading.^{}
What are the main types of Factor Analysis? The two main types of FA are:^{} ^{}
 Exploratory Factor Analysis (EFA): This is used when we wish to summarize data efficiently, when we want to know how many factors are present and their associated factor loadings. EFA is about revealing patterns in the relationships among variables.
 Confirmatory Factor Analysis (CFA): This is used when a researcher starts with one or more hypotheses. Each hypothesis may state the presence of certain factors. Analysis on measured data must prove or disprove each hypothesis. A graphical representation of a hypothesis is called path diagram. CFA produces fit statistics that are used to confirm if data fits a particular hypothesis.
Structural Equation Modelling (SEM) is similar to CFA but allows us to test complex hypotheses about the structure of variables.^{} SEM may be seen as a method to do CFA.^{}
What are the some methods of doing Factor Analysis? All methods of factor analysis are looking for correlations among variables. FA is usually done in one of these ways: Principal Component Analysis (PCA), Principal Axis Factoring (PAF), Ordinary or Unweighted Least Squares (ULS), Generalized or Weighted Least Squares (WLS), Maximum Likelihood (ML).^{} Other methods include Image Factoring (based on ULS) and Alpha Factoring.^{}
PAF is considered the conventional technique. It uses eigenvalue decomposition of a correlation matrix. ULS is considered one of the better methods. It produces the Minimum Residual (MinRes) solution.^{} One study that compared some of these methods found that ULS gave accurate results.^{}
What are some assumptions for Factor Analysis? Variables have to be correlated but there shouldn't be perfect multicollinearity among the variables; that is, one variable cannot be predicated accurately from other variables. Data shouldn't have outliers. We assume interval data.^{}
Nonlinearity is not allowed but nonlinear variables can be transformed to linear ones before applying factor analysis. In fact, in the discipline of statistics, factor analysis is considered as a part of Generalized Linear Model (GLM).^{}
Why do we do Factor Rotation? Sometimes we will find that a variable has high factor loadings due to more than one factor. This makes it difficult to interpret the factors. Since factor models are not unique, factor rotation allows us to find another factor model that can perhaps be interpreted better.^{}
There are two rotation types:
 Orthogonal: This uses the loading matrix that represents the correlation between variables and factors.^{} In this type, the rotated factors remain orthogonal to one another.^{}
 Oblique: This uses factor correlation matrix, structure matrix, pattern matrix, and factor coefficient matrix.^{} In this type, the rotated factors are allowed to become correlated. Oblique rotation may be considered as a subset of orthogonal rotation. If data clusters are in fact uncorrelated, then an oblique rotation will result in orthogonal factors.^{}
There are different methods to perform these rotations. For orthogonal rotation, we have Quartimax, Varimax and Equamax. For oblique rotation, we have Oblimin and Promax.^{}
Isn't Factor Analysis the same as Principal Component Analysis? Both PCA and FA achieve dimensionality reduction while minimizing information loss. Both appear to use similar techniques of extraction, interpretation and rotation to reduce many variables to fewer components or factors. Yet, they are fundamentally different.^{}
PCA extracts maximum variance into the first component, then extracts maximum variance into second component, and so on. Factors in FA have no such order. Factors identify common variance among variables.^{} For this reason, FA is also called Common Factor Analysis (CFA).^{} FA doesn't capture error or unique variance whereas PCA considers all the variance.^{}
If variables are uncorrelated, PCA will still find suitable components but EFA will be unable to identify useful factors. It's been noted that as the number of variables increases (at least 40 variables), results from PCA and EFA tend to come closer.^{}
Considering PCA and FA, how are variables related to components or factors? Factors cause variables. Components are aggregate of variables.^{}
In PCA, components (C) are a linear combination of variables (Y). FA aims to identify latent variables or factors (F). Latent variables themselves can't be measured directly but are seen to cause or influence the measured variables. The extent of influence (b) is called factor loading. While components in PCA explain all of the variance in data, factors may not explain all the variance in a variable, thus resulting in a term that's unique (u) to each measured variable.^{} ^{}
Mathematically,^{}
$$FA:\ Y_1=b_1F+u_1;\ Y_2=b_2F+u_2;\ Y_3=b_3F+u_3;\ Y_4=b_4F+u_4\\PCA:\ C=w_1Y_1+w_2Y_2+w_3Y_3+w_4Y_4$$
What tools are available to perform Factor Analysis? In R language (which is free and open source), factor analysis can be performed easily thanks to the
psych
package.^{} EFA can be performed using your choice of method: MinRes, PAF, ULS, WLS or ML.^{} Thescree
function can help in determining the number of significant factors. An alternative to this is parallel analysis that can be done using thefa.parallel
function.^{}An example of a commercial product is JMP of SAS. The Multivariate platform of JMP can do both PCA and FA.^{} SPSS is another commercial product that can do both PCA and EFA.^{}
Milestones
The genesis of factor analysis is in human personality psychology: to identify attributes and then categorize them into a structural model. Francis Galton uses a dictionary to identify terms that describe personality. However, he fails to come up with a model.^{}
Spearman gets interested in the work of Galton. In a paper published in 1904, he uses the terms factor and loadings, although he doesn't describe the methods he used for factorizing.^{}
L.L. Thurstone, considered the father of factor analysis, uses 60 terms across 1300 subjects to arrive at five broad factors. He doesn't pursue his analysis further. R.B. Cattell identifies at least a dozen factors in the 1940s but it's Donald Fiske who shows that they reduce to five factors.^{}
Harry Harman introduces Minimum Residuals (MinRes), an approach to factor analysis via least squares.^{}
To determine how many factors to retain, R.B. Cattell proposes a graphical method called the Scree Plot, a plot of eigenvalues vs. factors.^{}
The term Confirmatory Factor Analysis (CFA) is introduced. Prior to this, factor analysis was exploratory in nature but the "exploratory" prefix was not used.^{}
Following Fiske and other researchers after him, it's in the 1980s that the BigFive factor structure to describe human personality finally takes shape.^{} This goes to prove that finding the correct number of factors, and identifying those factors, is not a trivial problem.
References
 Ainsworth, Andrew. 2014. "Factor Analysis." Via SlideServe, August 14, Psychology 524: Applied Multivariate Statistics, California State University Northridge. Accessed 20190122.
 CastroSchilo, Laura. 2017. "Principal components or factor analysis?" JMP Blog, April 25. Accessed 20190122.
 Cooper, Belle Beth. 2014. "How The “Big Five” Personality Traits in Science Can Help you Build a More Effective team." Blog, Buffer, February 03. Updated 20140616. Accessed 20190205.
 Coughlin, Kevin Barry. 2013. "An Analysis of Factor Extraction Strategies: A Comparison of the Relative Strengths of Principal Axis, Ordinary Least Squares, and Maximum Likelihood in Research Contexts that Include both Categorical and Continuous Variables." Graduate Theses and Dissertations, University of South Florida, January. Accessed 20190122.
 Dargie, Emma, Ronald R. Holden, and Caroline F Pukall. 2017. "The Vulvar Pain Assessment Questionnaire: Factor Structure, Preliminary Norms, Internal Consistency, and TestRetest Reliability." Journal of Sexual Medicine, vol. 14, no. 12, pp. 15851596. Accessed 20190122.
 Escobar, Manolo Romero. 2016. "Structural Equation Modeling: What is a Latent Variable?" The Analysis Factor, March 07. Accessed 20190122.
 Gaskin, James. 2014. "Exploratory Factor Analysis (conceptual)." YouTube, November 05. Accessed 20190122.
 Goldberg, L. R. 1995. "What the hell took so long? Donald Fiske and the BigFive factor structure." Chapter 3 in P. E. Shrout & S. T. Fiske (Eds.), Personality research, methods, and theory: A Festschrift honoring Donald W. Fiske, pp. 2943, Hillsdale, NJ: Erlbaum. Accessed 20190205.
 GraceMartin, Karen. 2017. "The Fundamental Difference Between Principal Component Analysis and Factor Analysis, The Difference Between Confirmatory and Exploratory Factor Analysis." The Analysis Factor, January 20. Accessed 20190122.
 Haunschmid, Verena. 2015. "[Dimensionality Reduction #2] Understanding Factor Analysis using R." May 16. Accessed 20190122.
 Humbert, Anne Laurie. 2017. "An introduction to exploratory factor analysis in IBM SPSS Statistics." York University, November 10. Accessed 20190205.
 IDRE. 2019. "Principal Components (PCA) and Exploratory Factor Analysis (EFA) with SPSS." Institute for Digital Research and Education, UCLA. Accessed 20190122.
 Jain, Manishika. 2018. "Factor Analysis  Factor Loading, Factor Scoring & Factor Rotation (Research & Statistics)." Examrace, on YouTube, December 08. Accessed 20190122.
 Jöreskog, Karl G. 2003. "Factor Analysis by MINRES." March 13. Accessed 20190205.
 McGrew, Kevin. 2014. "A Gentle, NonTechnical Introduction to Factor Analysis." IQ's Corner, January 13. Accessed 20190122.
 Nelson, Amanda E, Robert F DeVellis, Jordan B Renner, Todd A Schwartz, Philip G Conaghan, Virginia B Kraus and Joanne M Jordan. 2011. "Quantification of the wholebody burden of radiographic osteoarthritis using factor analysis." Arthritis Research & Therapy, 13:R176, October 25. Accessed 20190205.
 PennState. 2019. "Lesson 12: Factor Analysis." STAT 505, Applied Multivariate Statistical Analysis, PennState Eberly College of Science.
 Rahn, Maike. 2012a. "Factor Analysis: A Short Introduction, Part 1." The Analysis Factor, September 10. Accessed 20190122.
 Revelle, William. 2019. "Procedures for Psychological, Psychometric, and Personality Research." psych v1.8.12, R Docs, January 12. Accessed 20190205.
 Rummel, R.J. 2011. "Understanding Factor Analysis." University of Hawaii. Accessed 20190205.
 Statistica Help. 2018. "Scree Plot, Scree Test." Statistica Help, TIBCO Software Inc. Accessed 20190205.
 Statistics Solutions. 2019. "Factor Analysis." Statistics Solutions. Accessed 20190122.
 ttnphns. 2013. "Best factor extraction methods in factor analysis." Cross Validated, StackExchange, February 24. Updated 20181008. Accessed 20190122.
 ttnphns. 2015. "Factor rotation methods (varimax, oblimin, etc.)  what do the names mean and what do the methods do?" Cross Validated, StackExchange, December 06. Updated 20170413. Accessed 20190205.
 Vincent, Douglas F. 1953. "The Origin and Development of Factor Analysis." Journal of the Royal Statistical Society. Series C (Applied Statistics) 2, no. 2, pp. 10717. Accessed 20190122.
 Warfel, Evan. 2015. "How to do factor analysis." Blog, Domino Data Lab, January 27. Accessed 20190122.
Further Reading
 Waller, Lee Rusty. 2013. "1 Factor Analysis  An Introduction." YouTube, May 23. Accessed 20190122.
 Rahn, Maike. 2012b. "Factor Analysis: A Short Introduction, Part 3The Difference Between Confirmatory and Exploratory Factor Analysis." The Analysis Factor, November 02. Accessed 20190122.
 Rummel, R.J. 2011. "Understanding Factor Analysis." University of Hawaii. Accessed 20190205.
 NCSS. 2019. "Chapter 420: Factor Analysis." NCSS Statistical Software. Accessed 20190122.
 PennState. 2019. "Lesson 12: Factor Analysis." STAT 505, Applied Multivariate Statistical Analysis, PennState Eberly College of Science.
 Tryfos, Peter. 1998. "Chapter 14: Factor Analysis." In Methods for Business Analysis and Forecasting: Text & Cases, Wiley. Accessed 20190122.
Article Stats
Cite As
See Also
 Principal Component Analysis
 Principal Axis Factoring
 Structural Equation Modelling
 Maximum Likelihood Estimation
 Exploratory Data Analysis
 Machine Learning