>In practice there are many such inputs that will overflow huffman_tables
This looks like the generalized version of the problem...
In other words, you have Software A, it generates a lookup table B which is then used to process an input stream of data.
Now the responsibility shifts to you as a software developer (if you care even a little bit about security/correctness of your code) -- to either assert that:
A) The software is written in such a way that there are NO possible cases of input data misusing/failing the lookup table, or
B) The software will only be used in a controlled environment (i.e., point to point communication where both communicants are trusted) such that the stream is guaranteed never to contain data that misuses/fails/causes anomalies with the lookup table.
Since B is all-but-impossible for anything other than a small group or office, that is, truly impossible on the Internet scale, that leaves only A).
Thus, the generalized "best practice" for present or future Software Engineering, can be summarized as follows:
If a lookup table is used in someone's software for whatever reasons -- then then the responsibility goes to the software developer(s) to assert that that lookup table functions correctly and for all types or data, OR that the software detects and appropriately handles erroneous data BEFORE it gets to the lookup table...
In fact, if I were a serious security researcher and had the time -- I'd collect a list of ALL reported security vulnerabilities in the past that had to do, one way or another with lookup tables...
Then I'd read through them, one by one, and compare them for generalities.
I'm guessing (but not knowing) -- that there is a pattern there...
Then I'd go through all software that used lookup tables on streams of data in one way or another -- and audit ALL of them for security vulnerabilities.
Now, clearly this is not a task for one man in one lifetime...
This is a "team sport"...
But if I were a serious security vulnerability researcher -- that is the generalized path that I would take...
This looks like the generalized version of the problem...
In other words, you have Software A, it generates a lookup table B which is then used to process an input stream of data.
Now the responsibility shifts to you as a software developer (if you care even a little bit about security/correctness of your code) -- to either assert that:
A) The software is written in such a way that there are NO possible cases of input data misusing/failing the lookup table, or
B) The software will only be used in a controlled environment (i.e., point to point communication where both communicants are trusted) such that the stream is guaranteed never to contain data that misuses/fails/causes anomalies with the lookup table.
Since B is all-but-impossible for anything other than a small group or office, that is, truly impossible on the Internet scale, that leaves only A).
Thus, the generalized "best practice" for present or future Software Engineering, can be summarized as follows:
If a lookup table is used in someone's software for whatever reasons -- then then the responsibility goes to the software developer(s) to assert that that lookup table functions correctly and for all types or data, OR that the software detects and appropriately handles erroneous data BEFORE it gets to the lookup table...
In fact, if I were a serious security researcher and had the time -- I'd collect a list of ALL reported security vulnerabilities in the past that had to do, one way or another with lookup tables...
Then I'd read through them, one by one, and compare them for generalities.
I'm guessing (but not knowing) -- that there is a pattern there...
Then I'd go through all software that used lookup tables on streams of data in one way or another -- and audit ALL of them for security vulnerabilities.
Now, clearly this is not a task for one man in one lifetime...
This is a "team sport"...
But if I were a serious security vulnerability researcher -- that is the generalized path that I would take...