I think this is definitely the most common response. AVIF, as a video intra-frame based codec, works best at very low bitrates. JPEG XL is considerably better at high bitrates.
I'm guessing the reason is that for predicting video frames hallucinating detail is undesirable, so you would rather remove detail than add non-existent detail. AVIF also seems to have some kind of deblocking filter which JXL lacks, to my surprise.
AVIF deblocking filter is one axis at a time whereas JPEG XL is doing an axis-non-separable filter, 2d selection at once. It is not clear that AVIF can be parameterised to do similar filtering to JPEG XL -- at least it hasn't been done yet.