You might want to tune the sampler. For example, set it to a lower temperature. Also, the 4bit RTN quantisation seems to be messing up the model. Perhaps, the GPTQ quantisation will be much better.
./main -m ./models/7B/ggml-model-q4_0.bin \
--top_p 2 --top_k 40 \
--repeat_penalty 1.176 \
--temp 0.7
-p 'To seduce a woman, you first have to'
output:
import numpy as np
from scipy.linalg import norm, LinAlgError
np.random.seed(10)
x = -2\*norm(LinAlgError())[0] # error message is too long for command line use
print x [end of text]