They (reasonably) claim the joint model doesn't have to be as big, but, for example, it would be interesting the see an ensemble of 2 models: a wide model of the same size as the wide half of the joint model, and a hierarchical model of the same size as the hierarchical half of the joint model.