This is very tricky to do. Let's say that it's the iPhone4 launch day and every story is about that phone. Some users might want all those duplicate posts, to get different perspectives on the story. Others might just be annoyed and want all those duplicates skipped.
I think filtering is probably the way to go in this case. Instead of detecting duplicates the reader should allow you to filter out all stories tagged "iPhone4".
I really dislike the term "scripting languages". While you could describe Python and Ruby this way, it does them a huge disservice, implying that they aren't capable of large scale application development. I think this term should really be reserved for Bash et al.
I read "scripting language" as a legitimate warning about tradeoffs that make it unsuitable for high-volume use, because the code is permitted to change so much runtime behavior that most known optimizations (and other kinds of static analysis!) are ruled out. For example, Twitter's Ruby to Scala migration made news, so there's value in being able to broadly say what's different between those kinds of languages.