XGen-7B is probably the superior 7B model, it's trained on more tokens and a longer default sequence length (although both presumably can adopt SuperHOT (Position Interpolation) to extend context), but larger models still probably perform better on an absolute basis.
XGen-7B is probably the superior 7B model, it's trained on more tokens and a longer default sequence length (although both presumably can adopt SuperHOT (Position Interpolation) to extend context), but larger models still probably perform better on an absolute basis.