Thank you! I was looking for how to do this. The example in the issue above shows how to increase the context size in ollama:
$ ollama run llama3.2
>>> /set parameter num_ctx 32768
Set parameter 'num_ctx' to '32768'
>>> /save llama3.2-32k
Created new model 'llama3.2-32k'
>>> /bye
$ ollama run llama3.2-32k "Summarize this file: $(cat README.md)"
...
The table in the reddit post above also shows context size vs memory requirements for Model: 01-ai/Yi-34B-200K
Params: 34.395B
Mode: infer
Not my field, but from this[1] blog post which references this[2] paper, it would seem so. Note the optimal approach are a bit different between training and inference. Also note that several of the approaches rely on batching multiple requests (prompts) in order to exploit the parallelism, so won't see the same gains if fed only a single prompt at a time.