There's also OpenLLaMA, which has a 13B version as well and is a straight drop-i...

There's also OpenLLaMA, which has a 13B version as well and is a straight drop-in (except for code generation due to multiple space tokenization: https://github.com/openlm-research/open_llama#update-0615202...).

XGen-7B is probably the superior 7B model, it's trained on more tokens and a longer default sequence length (although both presumably can adopt SuperHOT (Position Interpolation) to extend context), but larger models still probably perform better on an absolute basis.