Not really, Dalle 1 generated low resolution, blurry images without CFG (so to generate something relevant you would need to generate ~512 of them and sort them using CLIP), the first MJ generative model wasn't even conditioned on text, it was an unconditional diffusion model probably guided by CLIP (so it wasn't particularly coherent). People don't seem to remember how bad text-to-image was only a few months ago.