The name "per-layer embeddings" is all we have to go on, and there are currently no published papers (that I'm aware of) using any similar mechanism, so, yes, it's a huge leap from a paper that doesn't mention per-layer anything.
It's fine to speculate based on the name, but don't pretend that it's a known technique when it clearly isn't.
Someone [1] inspected dimensions of the embedding component of model and it seems GP was on the right track. Assuming I understood correctly in [2], it does seem to be the embedding of the input tokens which is passed directly into each layer.
I have not looked at the model but since the embedding dimension of 256 seems quite small (for reference according to [3] the old Gemma 1B had 1152 dimension input embedding), I'm guessing that this is not done _in lieu_ of the main input embedding to first layer, but in addition to it.
It's fine to speculate based on the name, but don't pretend that it's a known technique when it clearly isn't.