Factor Analysis

When analysing data containing many measured variables, it may happen that some of the variables are correlated. This could be because they share an underlying influence or common factor. It would be useful to understand how these variables are correlated and seek an intuitive explanation about what's common among them. This will also simplify further analysis by reducing the dataset into fewer variables or factors. This is what factor analysis tries to achieve.

A good factor is intuitive, easy to interpret, has a simple structure and lacks complex loadings. Factor analysis is in some sense an art. It's been said,

Factor analysis is not a purely statistical technique; there is always a certain amount of guesswork in it... Factor analysis is certainly a very treacherous tool in inexperienced hands.

Discussion

  • Could you provide an intuitive explanation of Factor Analysis?
    Four variables influenced by two factors, which in turn could themselves be correlated. Source: McGrew 2014, Fig 4.
    Four variables influenced by two factors, which in turn could themselves be correlated. Source: McGrew 2014, Fig 4.

    Suppose a village survey is conducted and the questionnaire includes 500 questions. This survey therefore results in a large dataset of 500 variables. However, we may discover that many of the variables are correlated. We can probably put related variables into groups such as income, education, healthcare, cleanliness, etc. These are called factors. Now our analysis becomes easier from many variables to fewer factors.

    Let's say we measure students' abilities in terms of four variables: vocabulary, grammar, arithmetic, and geometry. We can make a hypothesis that vocabulary and grammar abilities must be correlated. Likewise, arithmetic and geometry abilities must be correlated. We can therefore hypothesize two factors: language ability and math ability. Subsequent analysis on the data can either confirm or reject the presence of these factors and to what extent they relate to the variables. In fact, the factors themselves could be correlated with each other and we might identify a single factor that we can call academic ability.

  • What are latent variables and factor loadings in Factor Analysis?
    Vulvar pain can be traced to three underlying factors. Source: Dargie et al. 2017, table 5.
    Vulvar pain can be traced to three underlying factors. Source: Dargie et al. 2017, table 5.

    What we call factors are in fact latent variables. These are variables that can't be measured but in fact influence variables that are measured.

    The coefficients of latent variables are called factor loadings with respect to a measured variable. In other words, the extent to which a variable is associated with the factor is quantitatively expressed by its factor loading. It's possible that a measured variable is influenced by more than one factor.

    To give an example, when income, education and occupation are correlated, the common factor could be "individual socioeconomic status" (F1). On the other hand, house value, neighbourhood crimes and amenities can point to another factor "neighbourhood socioeconomic status" (F2). Consider a loading of 0.65 between income and F1; and a loading of 0.48 between occupation and F1. This implies that F1 influences income more strongly than occupation.

    An absolute value of 0.4 or higher can be considered as a high loading.

  • What are the main types of Factor Analysis?

    The two main types of FA are:

    • Exploratory Factor Analysis (EFA): This is used when we wish to summarize data efficiently, when we want to know how many factors are present and their associated factor loadings. EFA is about revealing patterns in the relationships among variables.
    • Confirmatory Factor Analysis (CFA): This is used when a researcher starts with one or more hypotheses. Each hypothesis may state the presence of certain factors. Analysis on measured data must prove or disprove each hypothesis. A graphical representation of a hypothesis is called path diagram. CFA produces fit statistics that are used to confirm if data fits a particular hypothesis.

    Structural Equation Modelling (SEM) is similar to CFA but allows us to test complex hypotheses about the structure of variables. SEM may be seen as a method to do CFA.

  • What are the some methods of doing Factor Analysis?
    Various methods of doing Factor Analysis. Source: ttnphns 2013.
    Various methods of doing Factor Analysis. Source: ttnphns 2013.

    All methods of factor analysis are looking for correlations among variables. FA is usually done in one of these ways: Principal Component Analysis (PCA), Principal Axis Factoring (PAF), Ordinary or Unweighted Least Squares (ULS), Generalized or Weighted Least Squares (WLS), Maximum Likelihood (ML). Other methods include Image Factoring (based on ULS) and Alpha Factoring.

    PAF is considered the conventional technique. It uses eigenvalue decomposition of a correlation matrix. ULS is considered one of the better methods. It produces the Minimum Residual (MinRes) solution. One study that compared some of these methods found that ULS gave accurate results.

  • What are some assumptions for Factor Analysis?

    Variables have to be correlated but there shouldn't be perfect multicollinearity among the variables; that is, one variable cannot be predicated accurately from other variables. Data shouldn't have outliers. We assume interval data.

    Non-linearity is not allowed but non-linear variables can be transformed to linear ones before applying factor analysis. In fact, in the discipline of statistics, factor analysis is considered as a part of Generalized Linear Model (GLM).

  • Why do we do Factor Rotation?
    Illustrating two types of factor rotation. Source: Humbert 2017, slide 27.
    Illustrating two types of factor rotation. Source: Humbert 2017, slide 27.

    Sometimes we will find that a variable has high factor loadings due to more than one factor. This makes it difficult to interpret the factors. Since factor models are not unique, factor rotation allows us to find another factor model that can perhaps be interpreted better.

    There are two rotation types:

    • Orthogonal: This uses the loading matrix that represents the correlation between variables and factors. In this type, the rotated factors remain orthogonal to one another.
    • Oblique: This uses factor correlation matrix, structure matrix, pattern matrix, and factor coefficient matrix. In this type, the rotated factors are allowed to become correlated. Oblique rotation may be considered as a subset of orthogonal rotation. If data clusters are in fact uncorrelated, then an oblique rotation will result in orthogonal factors.

    There are different methods to perform these rotations. For orthogonal rotation, we have Quartimax, Varimax and Equamax. For oblique rotation, we have Oblimin and Promax.

  • Isn't Factor Analysis the same as Principal Component Analysis?
    Comparing CFA and PCA. Source: IDRE 2019.
    Comparing CFA and PCA. Source: IDRE 2019.

    Both PCA and FA achieve dimensionality reduction while minimizing information loss. Both appear to use similar techniques of extraction, interpretation and rotation to reduce many variables to fewer components or factors. Yet, they are fundamentally different.

    PCA extracts maximum variance into the first component, then extracts maximum variance into second component, and so on. Factors in FA have no such order. Factors identify common variance among variables. For this reason, FA is also called Common Factor Analysis (CFA). FA doesn't capture error or unique variance whereas PCA considers all the variance.

    If variables are uncorrelated, PCA will still find suitable components but EFA will be unable to identify useful factors. It's been noted that as the number of variables increases (at least 40 variables), results from PCA and EFA tend to come closer.

  • Considering PCA and FA, how are variables related to components or factors?
    Illustrating relationships between variables and components/factors. Source: Adapted from Grace-Martin 2017.
    Illustrating relationships between variables and components/factors. Source: Adapted from Grace-Martin 2017.

    Factors cause variables. Components are aggregate of variables.

    In PCA, components (C) are a linear combination of variables (Y). FA aims to identify latent variables or factors (F). Latent variables themselves can't be measured directly but are seen to cause or influence the measured variables. The extent of influence (b) is called factor loading. While components in PCA explain all of the variance in data, factors may not explain all the variance in a variable, thus resulting in a term that's unique (u) to each measured variable.

    Mathematically,

    $$FA:\ Y_1=b_1F+u_1;\ Y_2=b_2F+u_2;\ Y_3=b_3F+u_3;\ Y_4=b_4F+u_4\\PCA:\ C=w_1Y_1+w_2Y_2+w_3Y_3+w_4Y_4$$

  • What tools are available to perform Factor Analysis?

    In R language (which is free and open source), factor analysis can be performed easily thanks to the psych package. EFA can be performed using your choice of method: MinRes, PAF, ULS, WLS or ML. The scree function can help in determining the number of significant factors. An alternative to this is parallel analysis that can be done using the fa.parallel function.

    An example of a commercial product is JMP of SAS. The Multivariate platform of JMP can do both PCA and FA. SPSS is another commercial product that can do both PCA and EFA.

Milestones

1884

The genesis of factor analysis is in human personality psychology: to identify attributes and then categorize them into a structural model. Francis Galton uses a dictionary to identify terms that describe personality. However, he fails to come up with a model.

1901

Spearman gets interested in the work of Galton. In a paper published in 1904, he uses the terms factor and loadings, although he doesn't describe the methods he used for factorizing.

1934

L.L. Thurstone, considered the father of factor analysis, uses 60 terms across 1300 subjects to arrive at five broad factors. He doesn't pursue his analysis further. R.B. Cattell identifies at least a dozen factors in the 1940s but it's Donald Fiske who shows that they reduce to five factors.

1960

Harry Harman introduces Minimum Residuals (MinRes), an approach to factor analysis via least squares.

1966
An example scree plot showing five significant factors. Source: Nelson et al. 2011, fig. 2.
An example scree plot showing five significant factors. Source: Nelson et al. 2011, fig. 2.

To determine how many factors to retain, R.B. Cattell proposes a graphical method called the Scree Plot, a plot of eigenvalues vs. factors.

1969

The term Confirmatory Factor Analysis (CFA) is introduced. Prior to this, factor analysis was exploratory in nature but the "exploratory" prefix was not used.

1980
Human personality traits can be reduced to just five factors. Source: Cooper 2014.
Human personality traits can be reduced to just five factors. Source: Cooper 2014.

Following Fiske and other researchers after him, it's in the 1980s that the Big-Five factor structure to describe human personality finally takes shape. This goes to prove that finding the correct number of factors, and identifying those factors, is not a trivial problem.

References

  1. Ainsworth, Andrew. 2014. "Factor Analysis." Via SlideServe, August 14, Psychology 524: Applied Multivariate Statistics, California State University Northridge. Accessed 2019-01-22.
  2. Castro-Schilo, Laura. 2017. "Principal components or factor analysis?" JMP Blog, April 25. Accessed 2019-01-22.
  3. Cooper, Belle Beth. 2014. "How The “Big Five” Personality Traits in Science Can Help you Build a More Effective team." Blog, Buffer, February 03. Updated 2014-06-16. Accessed 2019-02-05.
  4. Coughlin, Kevin Barry. 2013. "An Analysis of Factor Extraction Strategies: A Comparison of the Relative Strengths of Principal Axis, Ordinary Least Squares, and Maximum Likelihood in Research Contexts that Include both Categorical and Continuous Variables." Graduate Theses and Dissertations, University of South Florida, January. Accessed 2019-01-22.
  5. Dargie, Emma, Ronald R. Holden, and Caroline F Pukall. 2017. "The Vulvar Pain Assessment Questionnaire: Factor Structure, Preliminary Norms, Internal Consistency, and Test-Retest Reliability." Journal of Sexual Medicine, vol. 14, no. 12, pp. 1585-1596. Accessed 2019-01-22.
  6. Escobar, Manolo Romero. 2016. "Structural Equation Modeling: What is a Latent Variable?" The Analysis Factor, March 07. Accessed 2019-01-22.
  7. Gaskin, James. 2014. "Exploratory Factor Analysis (conceptual)." YouTube, November 05. Accessed 2019-01-22.
  8. Goldberg, L. R. 1995. "What the hell took so long? Donald Fiske and the Big-Five factor structure." Chapter 3 in P. E. Shrout & S. T. Fiske (Eds.), Personality research, methods, and theory: A Festschrift honoring Donald W. Fiske, pp. 29-43, Hillsdale, NJ: Erlbaum. Accessed 2019-02-05.
  9. Grace-Martin, Karen. 2017. "The Fundamental Difference Between Principal Component Analysis and Factor Analysis, The Difference Between Confirmatory and Exploratory Factor Analysis." The Analysis Factor, January 20. Accessed 2019-01-22.
  10. Haunschmid, Verena. 2015. "[Dimensionality Reduction #2] Understanding Factor Analysis using R." May 16. Accessed 2019-01-22.
  11. Humbert, Anne Laurie. 2017. "An introduction to exploratory factor analysis in IBM SPSS Statistics." York University, November 10. Accessed 2019-02-05.
  12. IDRE. 2019. "Principal Components (PCA) and Exploratory Factor Analysis (EFA) with SPSS." Institute for Digital Research and Education, UCLA. Accessed 2019-01-22.
  13. Jain, Manishika. 2018. "Factor Analysis - Factor Loading, Factor Scoring & Factor Rotation (Research & Statistics)." Examrace, on YouTube, December 08. Accessed 2019-01-22.
  14. Jöreskog, Karl G. 2003. "Factor Analysis by MINRES." March 13. Accessed 2019-02-05.
  15. McGrew, Kevin. 2014. "A Gentle, Non-Technical Introduction to Factor Analysis." IQ's Corner, January 13. Accessed 2019-01-22.
  16. Nelson, Amanda E, Robert F DeVellis, Jordan B Renner, Todd A Schwartz, Philip G Conaghan, Virginia B Kraus and Joanne M Jordan. 2011. "Quantification of the whole-body burden of radiographic osteoarthritis using factor analysis." Arthritis Research & Therapy, 13:R176, October 25. Accessed 2019-02-05.
  17. PennState. 2019. "Lesson 12: Factor Analysis." STAT 505, Applied Multivariate Statistical Analysis, PennState Eberly College of Science.
  18. Rahn, Maike. 2012a. "Factor Analysis: A Short Introduction, Part 1." The Analysis Factor, September 10. Accessed 2019-01-22.
  19. Revelle, William. 2019. "Procedures for Psychological, Psychometric, and Personality Research." psych v1.8.12, R Docs, January 12. Accessed 2019-02-05.
  20. Rummel, R.J. 2011. "Understanding Factor Analysis." University of Hawaii. Accessed 2019-02-05.
  21. Statistica Help. 2018. "Scree Plot, Scree Test." Statistica Help, TIBCO Software Inc. Accessed 2019-02-05.
  22. Statistics Solutions. 2019. "Factor Analysis." Statistics Solutions. Accessed 2019-01-22.
  23. Vincent, Douglas F. 1953. "The Origin and Development of Factor Analysis." Journal of the Royal Statistical Society. Series C (Applied Statistics) 2, no. 2, pp. 107-17. Accessed 2019-01-22.
  24. Warfel, Evan. 2015. "How to do factor analysis." Blog, Domino Data Lab, January 27. Accessed 2019-01-22.
  25. ttnphns. 2013. "Best factor extraction methods in factor analysis." Cross Validated, StackExchange, February 24. Updated 2018-10-08. Accessed 2019-01-22.
  26. ttnphns. 2015. "Factor rotation methods (varimax, oblimin, etc.) - what do the names mean and what do the methods do?" Cross Validated, StackExchange, December 06. Updated 2017-04-13. Accessed 2019-02-05.

Further Reading

  1. Waller, Lee Rusty. 2013. "1 Factor Analysis - An Introduction." YouTube, May 23. Accessed 2019-01-22.
  2. Rahn, Maike. 2012b. "Factor Analysis: A Short Introduction, Part 3-The Difference Between Confirmatory and Exploratory Factor Analysis." The Analysis Factor, November 02. Accessed 2019-01-22.
  3. Rummel, R.J. 2011. "Understanding Factor Analysis." University of Hawaii. Accessed 2019-02-05.
  4. NCSS. 2019. "Chapter 420: Factor Analysis." NCSS Statistical Software. Accessed 2019-01-22.
  5. PennState. 2019. "Lesson 12: Factor Analysis." STAT 505, Applied Multivariate Statistical Analysis, PennState Eberly College of Science.
  6. Tryfos, Peter. 1998. "Chapter 14: Factor Analysis." In Methods for Business Analysis and Forecasting: Text & Cases, Wiley. Accessed 2019-01-22.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
3
2
1730
3
0
436
1
1
20
1630
Words
7
Likes
12K
Hits

Cite As

Devopedia. 2022. "Factor Analysis." Version 7, February 15. Accessed 2024-06-25. https://devopedia.org/factor-analysis
Contributed by
3 authors


Last updated on
2022-02-15 11:52:30