My main issue with doing anything voice related was the last time I looked into ...

My main issue with doing anything voice related was the last time I looked into using Pocketsphinx I needed to define terms/dictionaries to parse from.

I'd love to mix and match NPL libraries, voice synthesis, voice identification, and speech recognition to make a comfortable "User Interface" to some systems in my house.

I think it'd be a fun project, but nothing seems to be able to take arbitrary audio streams and give me a "User identification" based on voice patterns and also arbitrary spoken text.

I know, yes, this is a VERY tall order, but it something that should be possible. At the very least, the identification part isn't needed. It's just important that it works offline and provides a text stream.