Hacker News new | past | comments | ask | show | jobs | submit login
The Viterbi Algorithm at 50 (2017) (usc.edu)
112 points by akhayam on May 14, 2023 | hide | past | favorite | 25 comments



After graduating with my BSEE and getting my first professional job, one of my very first tasks was to implement this algorithm on a PDP 11, in assembly. I first implemented it in Fortran (C had just come out and I didn't know it yet) to run on a VAX 11/780. Once it was working, it was amazing to see how powerful it was in correcting noisy data, even using rate 1/2 coding and short constraint lengths. Consider that the Voyager spacecraft, now nearly 24 billion miles from earth, uses it and it is still possible to decode it's data. In my mind, Viterbi's algorithm is equal in importance in communications with the FFT.


Voyager also uses Golay codes which are pretty neat too. The original paper is just one page.

https://www.lama.univ-savoie.fr/pagesmembres/hyvernat/Enseig...


Fun fact, Golay codes are the only perfect code outside of Hamming codes. Perfect codes are interesting in that the Hamming bound becomes an equality, effectively saying that the distance between all valid codewords is exactly the minimum distance "d".


Indeed, Hamming and Golay codes are an absolute thing of beauty. For folks that wanna read more about Hamming Bound, Kevin Boone does a good job in starting from the basics and explaining the intuition behind the proof:

https://kevinboone.me/hamming_bound.html?i=1


Another fun fact, you could use one of the Roman dodecahedrons mentioned in another of today's articles to encode or decode Golay codes by hand...


I would argue the FFT algorithm is much more important in communication systems than the Viterbi algorithm. There are plenty of powerful Forward Error Correction schemes that do not use Convolutional codes (and thus cannot use Viterbi algorithms). However, every single system with a reasonably long channel equalizer makes use of a real time FFT algorithm.


What is often overlooked is the philanthropic and entrepreneurial impact that Andrew Viterbi had in this world.

In 2004, he donated $52 million to University of Southern California (USC) School of Engineering:

https://en.wikipedia.org/wiki/USC_Viterbi_School_of_Engineer...

He was also the co-founder of Qualcomm with a market cap of ~$115B today:

https://www.google.com/finance/quote/QCOM:NASDAQ

https://en.wikipedia.org/wiki/Qualcomm


Plus, he's a really nice, down to earth guy. I met him once a few years ago at a soccer game our grandchildren were playing in, and spent 20 minutes just chatting with him.


The Viterbi Algorithm can be used to split multi-word #hashtags or any wordssmushedtogetherlikethis with surprising efficiency and accuracy.

https://stackoverflow.com/a/481773

https://github.com/jchook/wordseg



That's one of those Wikipedia pages that I find very difficult to get anything out of. The math has little explanation, the pseudo-code lacks definitions (where did s_z_j come from?) and while actual code helps with implementation (except Talk page mentions it's not correct), it can be quite difficult to grasp what the essence of the algorithm.

I found this[1] video to give me a much better overview of the algorithm, and this[2] follow-up video looks promising in terms of practical application. edit: [2] is actually the prequel, he made [1] to go into the details.

[1]: https://www.youtube.com/watch?v=xxpBHCkypS4 Viterbi Algorithm Explained with an Example

[2]: https://www.youtube.com/watch?v=IJE94FhyygM Decoding Convolutional Codes: The Viterbi Algorithm Explained


Essentially, the Viterbi algorithm computes the shortest path through a DAG. If you explicitly model each transition/emission event as a single node in a graph, and if you then convert all probabilities p to -log p, than any shortest path algorithm will find the same result as Viterbi, which is the path of maximum likelihood.

This is easy to prove: for probabilities p_1, p_2, p_3 ..., the path which maximizes p_1 * p_2 * p_3 ... also maximizes log(p_1 * p_2 * p_3 ...) = log(p_1) + log(p_2) + log(p_3) + ... and thus minimizes -(log(p_1) + log(p_2) + log(p_3) + ...) = -log(p_1) - log(p_2) - log(p_3) .... Because the probs are all 0 <= p <= 1, their log is <= 0, and choosing -log(p) as an edge weight gives non-negative weights. So you can use any standard shortest path algorithm to minimize -log(p_1) - log(p_2) - log(p_3) - ... and thus maximize p_1 * p_2 * p_3 * ...

The Viterbi algorithm uses the same "trick" as the common algorithm for finding the shortest path through a DAG with better complexity than Dijkstra's algorithm: you can just process them in their topological order (which can be found in linear time), instead of processing them based on their position in a priority queue (as in Dijkstra's algorithm). Even better, in a HMM, the topological ordering of the graph is already part of the input: it's just the sequence of observations.


There was a pretty funny, to me and my buddy, moment in our Digital Communications course when we were learning this. We were both dual EE/CS students while the rest of the class was straight EE. The professor is drawing out the trellis diagrams and going through this elaborate description of things and most people (including me and my buddy) were completely lost. At some point, though, he and I both had the same lightbulb moment. We turned to each other and simultaneously just whispered “it’s a dynamic programming problem”. And then it was all easy from there.


Doesn't the algorithm work on directed graphs, rather than only DAGs, i.e., also on cyclic ones? That's a difference with Dijkstra, where cycles are excluded from the solution. Plus, it takes the graph and a sequence of transitions.


> Doesn't the algorithm work on directed graphs, rather than only DAGs, i.e., also on cyclic ones? That's a difference with Dijkstra, where cycles are excluded from the solution

With the Viterbi algorithm, you're decoding a sequence of observations, indexed by observation time. At each timestep there's an array of probabilities indexed by the hidden states. State s at time t has some probability of transitioning to state s' at time t+1, defining an arc from the node (s, t) to the node (s', t+1). That defines the "trellis" structure. Since time doesn't loop back on itself, an arc can never point back to an earlier time, so there aren't any cycles in the trellis.

As lqet explains, its essentially computing a shortest path through the DAG.

The Viterbi algorithm could be reimplemented as Dijkstra's algorithm. But that wouldn't gain anything, and it'd make it harder to compute efficiently -- with Dijkstra's algorithm there's a sequential dependency of popping a minimal length path off the priority queue and expanding it before you're able to pop and start processing the next path (as the path you're expanding might push the next minimal length path onto the queue). The bottom-up Viterbi algorithm computation is closer to a BFS, where you could process the whole layer of arcs at corresponding to timestep t in parallel - apart from the argmin over input arcs at each node (s, t+1) in the next layer.

Another small difference is that Viterbi's algorithm deals with node-weights as well as arc-weights, corresponding to the probability of emitting the observation y_t at timestep t, supposing the system had been in the hidden state s_i at time t. I think that can be dealt with by absorbing the node weights into the arc weights -- each weight is a log probability, and log probabilities can be added.


This is a great explanation. Thank you for posting it on HN!


I took a crack at explaining the Viterbi algorithm after learning about HMMs from a bioinformatics book I was reading. You may find it helpful, but that's assuming my interpretation of it is correct.

https://offbynull.com/data/learn/Bioinformatics/output/outpu...



see also: https://courses.grainger.illinois.edu/ece448/sp2021/slides/l...

There's some good baseline material on HMMs in Russell & Norvig [1] (the chapter on "Probabilistic Reasoning over Time") as well as Rabiner's HMM tutorial

[1] https://aima.cs.berkeley.edu/ [2] https://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf


So, today I learned this algorithm with the help of some youtube videos.

It seems to me the primary advantage is to avoid the combinatorial explosion of having to consider the hamming distance of all possible original messages while still getting the correct answer.

If you combined this technique with Feynman diagrams which always have 3 paths to a vertex, maybe there are ways to use this to solve quantum situations without the combinatorial explosion?!?!?


The short answer is very likely no, at least not without any other major breakthroughs. The reason why Viterbi is fast is because the underlying probabilistic model is rather simple, which does not hold for quantum situations in general.


This came up recently for a hobby project of mine where I had to tokenize Japanese sentences for processing. Ultimately, I implemented a workaround, because I don't have a signal processing background and implementing the algorithm by understanding the theory was just too complicated.


Checkout the film Mr. Nobody (2009) for a (unintentional?) movie in the structure of the Viterbi algorithm.


Seems like a lot of folks have tried starting from the theory and getting stuck in their tracks.

Would anyone be interested in building a low-code implementation of Viterbi algorithm? I am happy to collaborate on it as long as we commit to releasing it in open-source.


From the "horses" mouth -- Viterbi describing the Viterbi algorithm, and related developments:

http://www.youtube.com/watch?v=UgW4p7-jpUs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: