Speech recognition is quite compute-intensive and the load of running it on the device would be obviously detectable - not necessarily by random users, but definitely by any security researcher who'd care to do so.
Also, storing the data until it's plugged in would require an unusually large amount of storage, and that would be detectable.
I was replying to a commenter who was proposing to inspect what was being sent over the network, if you'll just read what they said.
That aside, speech recognition isn't that heavy of a process these days if all you're looking to do is extract keywords. We used to do industry-leading large vocabulary continuous speech recognition on a Pentium 133... phones these days are way beyond that without breaking a sweat. Detectable? Sure. But remember, I was talking about this person's plan to look at network data.
Furthermore you don't need to store all data. You can store only when the phone is hearing stuff, as determined by a super lightweight measure of magnitude that does no speech recognition whatsoever. Is the storage detectable? Sure. But again, what was I responding to? Network traffic monitoring.
Also, storing the data until it's plugged in would require an unusually large amount of storage, and that would be detectable.