Whether or not you canonicalize your prefix code is orthogonal to whether you think of it as a tree or not. (And any prefix code can be viewed as a tree.) In fact the very article you linked says:
> The advantage of a canonical Huffman tree is that it can be encoded in fewer bits than an arbitrary tree.
To take the example from the Wikipedia article you linked, canonicalizing
B = 0
A = 11
C = 101
D = 100
into
B = 0
A = 10
C = 110
D = 111
is conceptually the same as canonicalizing the tree (treating "0" as "left" and "1" as "right"):
┌------┐
┌-----┤(root)├-----┐
│ └------┘ │
┌-┴-┐ ┌-┴-┐
│ B │ ┌-┤ ├-┐
└---┘ │ └---┘ │
┌-┴-┐ ┌-┴-┐
┌-┤ ├-┐ │ A │
│ └---┘ │ └---┘
┌-┴-┐ ┌-┴-┐
│ D │ │ C │
└---┘ └---┘
into
┌------┐
┌-----┤(root)├-----┐
│ └------┘ │
┌-┴-┐ ┌-┴-┐
│ B │ ┌-┤ ├-┐
└---┘ │ └---┘ │
┌-┴-┐ ┌-┴-┐
│ A │ ┌-┤ ├-┐
└---┘ │ └---┘ │
┌-┴-┐ ┌-┴-┐
│ C │ │ D │
└---┘ └---┘
So the trees aren't merely a "distraction" IMO: apart from being useful conceptually (e.g. in proving the optimality of the Huffman coding—this is how Knuth does it in TAOCP Vol 1, 2.3.4.5), certain applications of Huffman's algorithm (other than compression) also have the tree structure naturally arise (Knuth gives the example of choosing the optimal order for pairwise merging of N sorted lists of different lengths).
Sure, after using trees for understanding, you don't need to actually represent a tree in your data structures / source code / encoding, but that's another matter.
Additionally, after posting the comment to HN from desktop, I happened to notice within the 2-hour edit window that it looked awful on mobile (as does the comment I copied from: probably a bug with the default font stack used by Chrome on Android not using monospace versions of the box-drawing characters), so I edited to replace "─" with an ASCII "-". It looks less perfect on desktop, but less poorly aligned on mobile, so is a reasonable compromise.
> The advantage of a canonical Huffman tree is that it can be encoded in fewer bits than an arbitrary tree.
To take the example from the Wikipedia article you linked, canonicalizing
into is conceptually the same as canonicalizing the tree (treating "0" as "left" and "1" as "right"): into So the trees aren't merely a "distraction" IMO: apart from being useful conceptually (e.g. in proving the optimality of the Huffman coding—this is how Knuth does it in TAOCP Vol 1, 2.3.4.5), certain applications of Huffman's algorithm (other than compression) also have the tree structure naturally arise (Knuth gives the example of choosing the optimal order for pairwise merging of N sorted lists of different lengths).Sure, after using trees for understanding, you don't need to actually represent a tree in your data structures / source code / encoding, but that's another matter.