I don't think I would describe a system in which a human ends up looking at your conversation if the algorithm thinks you're suspicious as "privacy-preserving". What is the non-privacy-preserving version of this system? A human browsing through every conversation?