A Markov chain with a large context is still literally a Markov chain.
Maybe you are used to Markov chains being shitty at language so you are confused how an LLM can be a Markov chain even though it's good at language and has some amazing emergent cognitive capabilities. That's a problem with your conception of Markov chains, it's not an argument that LLMs aren't Markov chains.
Finally, a Markov chain with a context space that cannot be practically iterated over (e.g. all possible 10k token contexts) can still be useful in ways that are shared with smaller Markov chains, even though if this weren't true it would still be a Markov chain. For example you can greedily generate tokens from it, calculate likelihoods, do some beam search, select multiple choice tokens, etc.
Maybe you are used to Markov chains being shitty at language so you are confused how an LLM can be a Markov chain even though it's good at language and has some amazing emergent cognitive capabilities. That's a problem with your conception of Markov chains, it's not an argument that LLMs aren't Markov chains.
Finally, a Markov chain with a context space that cannot be practically iterated over (e.g. all possible 10k token contexts) can still be useful in ways that are shared with smaller Markov chains, even though if this weren't true it would still be a Markov chain. For example you can greedily generate tokens from it, calculate likelihoods, do some beam search, select multiple choice tokens, etc.