Hidden Markov Model

HMM states (X), observations (O) and probabilities (A, B). Source: Stamp 2018, fig. 1.
HMM states (X), observations (O) and probabilities (A, B). Source: Stamp 2018, fig. 1.

Consider weather, stock prices, DNA sequence, human speech or words in a sentence. In all these cases, current state is influenced by one or more previous states. Moreover, often we can observe the effect but not the underlying cause that remains hidden from the observer. Hidden Markov Model (HMM) helps us figure out the most probable hidden state given an observation.

In practice, we use a sequence of observations to estimate the sequence of hidden states. In HMM, the next state depends only on the current state. As such, it's good for modelling time series data.

We can classify HMM as a generative probabilistic model since a sequence of observed variables is generated by a sequence of hidden states. HMM is also seen as a specific kind of Bayesian network.

Discussion

  • Could you explain HMM with an example?
    An example of Hidden Markov Model. Source: Wikipedia 2019.
    An example of Hidden Markov Model. Source: Wikipedia 2019.

    Suppose Bob tells his friend Alice what he did earlier today. Based on this information Alice guesses today's weather at Bob's location. In HMM, we model weather as states and Bob's activity as observations.

    To solve this problem, Alice needs to know three things:

    • Transition Probabilities: Probability of moving from one state to another. For example, "If today was sunny, what's the probability that it will rain tomorrow?" If there are N states, this is an NxN matrix.
    • Emission Probabilities: Probability of a particular output given a particular state. For example, "What's the chance that Bob is walking if it's raining?" Given a choice of M possible observation symbols, this is an NxM matrix. This is also called output or observation probabilities.
    • Initial Probabilities: Probability of being in a state at the start, say, yesterday or ten days ago.

    Unlike a typical Markov chain, we can't see the states in HMM. However, we can observe the output and then predict the state. Thus, the states are hidden, giving rise to the term "hidden" in the name HMM.

  • What types of problems can be solved by HMM?
    Typical notation used in HMM. Source: Kang 2017.
    Typical notation used in HMM. Source: Kang 2017.

    Let A, B and π denote the transition matrix, observation matrix and initial state distribution respectively. HMM can be represented as λ = (A, B, π). Let observation sequence be O and state sequence be Q.

    HMM can be used to solve three types of problems:

    • Likelihood Problem: Given O and λ, find the likelihood P(O|λ). How likely is a particular sequence of observations? Forward algorithm solves this problem.
    • Decoding Problem: Given O and λ, find the best possible Q that explains O. Given the observation sequence, what's the best possible state sequence? Viterbi algorithm solves this problem.
    • Learning Problem: Given O and Q, learn λ, perhaps by maximizing P(O|λ). What model best maps states to observations? Baum-Welch algorithm, also called forward-backward algorithm, solves this problem. In the language of machine learning, we can say that O is training data and the number of states N is the model's hyperparameter.
  • What are some applications where HMM is useful?
    Complex birdsong analyzed using HMM. Source: Adapted from Katahira et al. 2011, fig. 4.
    Complex birdsong analyzed using HMM. Source: Adapted from Katahira et al. 2011, fig. 4.

    HMM has been applied in many areas including automatic speech recognition, handwriting recognition, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.

    In speech recognition, a spectral analysis of speech gives us suitable observations for HMM. States are modelled after phonemes or syllables, or after the average number of observations in a spoken word. Each word gets its own model.

    To tag words with their parts of speech, the tags are modelled as hidden states and the words are the observations.

    In computer networking, HMMs are used in intrusion detection systems. This has two flavours: anomaly detection in which normal behaviour is modelled; or misuse detection in which a predefined set of attacks is modelled.

    In computer vision, HMM has been used to label human activities from skeleton output. Each activity is modelled with a HMM. By linking multiple HMMs on common states, a compound HMM is formed. The purpose is to allow robots to be aware of human activity.

  • What are the different types of Hidden Markov Models?
    Some types of HMMs. Source: Rabiner 1989, fig. 7.
    Some types of HMMs. Source: Rabiner 1989, fig. 7.

    In the typical model, called the ergodic HMM, the states of the HMM are fully connected so that we can transition to a state from any other state. Left-right HMM is a more constrained model in which state transitions are allowed only from lower indexed states to higher indexed ones. Variations and combinations of these two types are possible, such as having two parallel left-to-right state paths.

    HMM started with observations of discrete symbols governed by discrete probabilities. If observations are continuous signals, then we would use continuous observation density.

    There are also domain-specific variations of HMM. For example, in biological sequence analysis, there are at least three types including profile-HMMs, pair-HMMs, and context-sensitive HMMs.

  • Could you explain forward algorithm and backward algorithm?
    Trellis diagrams showing forward and backward algorithms. Source: Adapted from Jana 2019b.
    Trellis diagrams showing forward and backward algorithms. Source: Adapted from Jana 2019b.

    Every state sequence has a probability that it will lead to a given sequence of observations. Given T observations and N states, there are \(N^T\) possible state sequences. Thus, the complexity of calculating the probability of a given sequence of observations is \(O(N^{T}T)\). Both forward and backward algorithms bring down the complexity to \(O(N^{2}T)\) through dynamic programming.

    In the forward algorithm, we consider the probability of being in a state at the current time step. Then we consider the transition probabilities to calculate the state probabilities for the next step. Thus, at each time step we have considered all state sequences preceding it. The algorithm is more efficient since it reuses calculations from earlier steps. Instead of keeping all path sequences, paths are folded into a forward trellis.

    Backward algorithm is similar except that we start from the last time step and calculate in reverse. We're finding the probability that from a given state, the model will generate the output sequence that follows.

    A combination of both algorithms, called forward-backward algorithm, is used to solve the learning problem.

  • What's the algorithm for solving HMM's decoding problem?
    A simple explanation of Viterbi algorithm. Source: Chugg 2017.

    Viterbi algorithm solves HMM's decoding problem. It's similar to the forward algorithm except that instead of summing the probabilities of all paths leading to a state, we retain only one path that gives maximum probability. Thus, at every time step or iteration, given that we have N states, we retain only N paths, the most likely path for each state. For the next iteration, we use the most likely paths of current iteration and repeat the process.

    When we reach the end of the sequence, we'll have N most likely paths, each ending in a unique state. We then select the most likely end state. Once this selection is made, we backtrack to read the state sequence, that is, how we got to the end state. This state sequence is now the most likely sequence given our sequence of observations.

  • How can we solve the learning problem of HMM?

    In HMM's learning problem, we are required to learn the transition (A) and observation (B) probabilities when given a sequence of observations and the vocabulary of hidden states. The forward-backward algorithm solves this problem. It's an iterative algorithm. It starts with an initial estimate of the probabilities and improves these estimates with each iteration.

    The algorithm consists of two steps:

    • Expectation or E-step: We compute the expected state occupancy count and the expected state transition count based on current probabilities A and B.
    • Maximization or M-step: We use the expected counts from the E-step to recompute A and B.

    While this algorithm is unsupervised, in practice, initial conditions are very important. For this reason, often extra information is given to the algorithm. For example, in speech recognition, the HMM structure is set manually and the model is trained to set the initial probabilities.

  • Could you describe some tools for doing HMM?

    In Python, hmmlearn package implements HMM. Three models are available: hmm.GaussianHMM, hmm.GMMHMM and hmm.MultinomialHMM. This package is also part of Scikit-learn but will be removed in v0.17. Stephen Marsland has shared Python code in NumPy and Pandas that implements many essential algorithms for HMM.

    In R, HMM package implements HMM. It has functions for forward, backward, Viterbi and Baum-Welch algorithms. Another package depmixS4 implements dependent mixture models that can be used to fit HMM to observed data. R-bloggers has an example use of depmixS4.

Milestones

1913

Russian mathematician A. A. Markov recognizes that in a sequence of random variables, one variable may not be independent of the previous variable. For example, two successive coin tosses are independent but today's weather might depend on yesterday's weather. He models this as a chain of linked events with probability assigned to each link. This technique later is named Markov Chain.

1966

Baum and Petrie at the Institute of Defense Analyses, Princeton, introduce the Hidden Markov Model (HMM), though this name is not used. They state the problem of estimating transition and emission probabilities from observations. They use maximum likelihood estimate.

1967

Andrew Viterbi publishes an algorithm to decode information at the receiver in a communication system. Later named Viterbi algorithm, it's directly applicable to the decoding problem in HMM. Vintsyuk first applies this algorithm to speech and language processing in 1968.

1970

The Baum-Welch algorithm is proposed to solve the learning problem in HMM. This algorithm is a special case of the Expectation-Maximization (EM) algorithm. However, the name HMM is not used in the paper and mathematicians refer to HMM as "probabilistic functions of Markov chains".

1975

James Baker at CMU applies HMM to speech recognition in the DRAGON speech understanding system. This is one of the earliest engineering applications of HMM. HMM is further applied to speech recognition through the 1970s and 1980s by Jelinek, Bahl and Mercer at IBM.

1989

Lawrence Rabiner publishes a tutorial on HMM covering theory, practice and applications. He notes that HMM originated in mathematics and was not widely read by engineers. Even when it was applied to speech processing in the 1970s, there were no tutorials to help translate theory into practice.

2003

HMM is typically used when the number of states is small but one research team applies it to large scale web traffic analysis. This involves hundreds of states and tens of millions of observations.

Sample Code

  • # Source: https://stats.stackexchange.com/questions/31746/what-is-the-difference-between-the-forward-backward-and-viterbi-algorithms
    # Accessed: 2019-09-05
     
    library(HMM)
    # in education setting,
    # hidden state: Rainy and Sunny
    # observation: Walk, Shop, Clean
     
    # state transition
    P <- as.matrix(rbind(c(0.7,0.3),
                         c(0.4,0.6)))
     
    # emission prob
    R <- as.matrix(rbind(c(0.1, 0.4, 0.5),
                         c(0.6,0.3, 0.1)))
     
    hmm = initHMM(States=c("Rainy","Sunny"),
                  Symbols=c("Walk","Shop", "Clean"),
                  startProbs=c(0.6,0.4),
                  transProbs=P,
                  emissionProbs=R)
    hmm
     
    obs=c("Walk","Shop","Walk", "Clean")
    print(posterior(hmm, obs)) # gives the state probabilities for each observation
    print(viterbi(hmm, obs)) # gives the sequence of states
     

References

  1. Baker, J. 1975. "The DRAGON system--An overview." IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 23, no. 1, pp. 24-29, February. Accessed 2019-09-04.
  2. Baum, Leonard E. and Ted Petrie. 1966. "Statistical Inference for Probabilistic Functions of Finite State Markov Chains." Ann. Math. Statist., vol. 37, no. 6, pp. 1554-1563. Accessed 2019-09-04.
  3. Baum, Leonard E., Ted Petrie, George Soules and Norman Weiss. 1970. "A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains." The Annals of Mathematical Statistics, vol. 41, no. 1, pp. 164-171, February. Accessed 2019-09-04.
  4. Bioinformatics Algorithms. 2015. "The Viterbi Algorithm." Bioinformatics Algorithms: An Active Learning Approach, via YouTube, July 26. Accessed 2019-09-06.
  5. Chugg, Keith. 2017. "Viteri Algorithm." University of South Carolina, via YouTube, March. Accessed 2019-09-06.
  6. Felzenszwalb, Pedro F., Daniel P. Huttenlocher, and Jon M. Kleinberg. 2003. "Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis." Advances in Neural Information Processing Systems (NIPS) 16. Accessed 2019-09-04.
  7. Figueroa-Angulo, Jose Israel, Jesus Savage, Ernesto Bribiesca, Boris Escalante, and Luis Enrique Sucar. 2015. "Compound Hidden Markov Model for Activity Labelling." Int. J. of Intelligence Science, Scientific Research, vol. 5, no. 5, pp. 177-195, October. Accessed 2019-09-04.
  8. Gales, Mark and Steve Young. 2007. "The Application of Hidden Markov Models in Speech Recognition." Foundations and Trends in Signal Processing, Now, vol. 1, no. 3, pp. 195–304. Accessed 2019-09-04.
  9. Ghahramani, Z. 2001. "An Introduction to Hidden Markov Modes and Bayesian Networks." Int. J. of Pattern Recognition and Artificial Intelligence, vol. 15, no. 1, pp. 9-42. Accessed 2019-09-04.
  10. Hayes, Brian. 2017. "First Links in the Markov Chain." American Scientist, February 06. Updated 2018-09-06. Accessed 2019-09-04.
  11. Himmelmann, Lin. 2015. "HMM - Hidden Markov Models." CRAN, February 19. Accessed 2019-09-04.
  12. Jana, Abhisek. 2019a. "Introduction to Hidden Markov Model." A Developer Diary, February 13. Accessed 2019-09-04.
  13. Jana, Abhisek. 2019b. "Forward and Backward Algorithm in Hidden Markov Model." A Developer Diary, February 17. Accessed 2019-09-04.
  14. Jurafsky, Daniel and James H. Martin. 2018. "Speech and Language Processing." Third Edition draft, September 23. Accessed 2019-09-04.
  15. Kang, Eugine. 2017. "Hidden Markov Model." Medium, September 01. Accessed 2019-09-04.
  16. Katahira, Kentaro, Kenta Suzuki, Kazuo Okanoya, and Masato Okada. 2011. "Complex Sequencing Rules of Birdsong Can be Explained by Simple Hidden Markov Processes." PLoS ONE, vol. 6, no. 9, e24516, September 07. Accessed 2019-09-04.
  17. Khreich, Wael, Eric Granger, Ali Miri, and Robert Sabourin. 2012. "A survey of techniques for incremental learning of HMM parameters." Information Sciences, Elsevier, vol. 197, pp. 105–130. Accessed 2019-09-04.
  18. Kouemou, Guy Leonard. 2011. "History and Theoretical Basics of Hidden Markov Models." IntechOpen. Accessed 2019-09-04.
  19. Rabiner, L.R. 1989. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, February. Accessed 2019-09-04.
  20. Scikit-learn. 2015. "Hidden Markov Models." v0.16.1, April 14. Accessed 2019-09-04.
  21. Smith, Adam. 2017. "The Viterbi Algorithm at 50." Viterbi School of Engineering, USC, March 16. Accessed 2019-09-04.
  22. Stamp, Mark. 2018. "A Revealing Introduction to Hidden Markov Models." San Jose State University, October 17. Accessed 2019-09-04.
  23. Visser, Ingmar. 2019. "depmixS4: Dependent Mixture Models - Hidden Markov Models of GLMs and Other Distributions in S4." Package depmixS4, v1.4-0, CRAN, July 10. Accessed 2019-09-04.
  24. Wikipedia. 2019. "Hidden Markov model." Wikipedia, August 27. Accessed 2019-09-04.
  25. Yoon, Byung-Jun. 2009. "Hidden Markov Models and their Applications in Biological Sequence Analysis." Current Genomics, vol. 10, no. 6, pp. 402-15, September. Accessed 2019-09-05.
  26. hmmlearn. 2019. "Tutorial." hmmlearn, June 24. Accessed 2019-09-04.

Further Reading

  1. Batista, David S. 2017. "Hidden Markov Model and Naive Bayes relationship." Blog, November 11. Accessed 2019-09-04.
  2. Kang, Eugine. 2017. "Hidden Markov Model." Medium, September 01. Accessed 2019-09-04.
  3. Rabiner, L.R. 1989. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, February. Accessed 2019-09-04.
  4. Bilmes, Jeff A. 1998. "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models." International Computer Science Institute, TR-97-021, April. Accessed 2019-09-04.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
5
0
2168
1736
Words
9
Likes
18K
Hits

Cite As

Devopedia. 2021. "Hidden Markov Model." Version 5, June 28. Accessed 2023-11-12. https://devopedia.org/hidden-markov-model
Contributed by
1 author


Last updated on
2021-06-28 16:19:25