The point of VLIW is to eliminate the hazard detection and scheduling logic required by normal superscalar, but SMT would require much of that logic to be added back. So VLIW can't really use SMT. VLIW could use FGMT (e.g. barrel processor) or SoEMT, but those can't fill the empty issue slots caused by bad instruction mix.
The studies I've seen have shown that the FU mix is heavily skewed and changing on every basic block, particularly when you offload the DLP to a more efficient vector/SIMD unit.
I'm not sure I see how MT would solve the mix problem, if each thread gets an issue cycle (and each thread itself has a bad mix).
Even GPUs, which have arguably far more predictable instruction mixes, struggled massively to get a useful utilization on VLIW architectures – AMD tried two different ones from 2006 to 2011 before they finally gave up on the concept and started using RISC architectures like Nvidia had been using all the time.
Has anyone ever done a study / experiment of a VLIW with multiple hardware threads and how that would impact the need for an even mix?