What did you use to compute P(D|Ci)? This uses your notation from the Dr Dobbs a...

jgrahamc · on March 19, 2010

I just used whitespace separated words after stripping punctuation.

pbhjpbhj · on March 19, 2010

I'd have thought that capitalisation and punctuation were key elements in any textual analysis. In the subject text there is a very unusual hyphenation "pure-ad" for example.

jfarmer · on March 19, 2010

In authorship identification punctuation can be very important, as can non-grammatical features like the distribution of the number of syllables per word.