Likely not, you can probably even host your friends too without much server load. Scaling is a concern when you have many users, just a handful of users won't ttend to create much load.
Yes, but FB doesn't need to specifically listen for words and determine if they say anything, they just need speech to text, and only capture speech if the volume is a particular level.