Presumably, it would mean "when anything other than that secondary processor com...

Presumably, it would mean "when anything other than that secondary processor comes on." I.e., when it stops throwing away the buffer containing your speech at a hardware level, and starts instead feeding it through its local parsers and to the cloud in a way that could result in information from your speech being captured.

That would require, though, that it's not buffering the last N seconds of audio to reprocess once that processor wakes up. Do any/all of the modern smart-speaker devices do that? If so, then you'd have to take into account that when you see the light, you've potentially leaked any secrets you said in the last N seconds as well. Less like a reporter coming in and asking to speak to you; more like an eavesdropper coming in and telling you they heard what you were just saying through the door.