> Can you give an example of text in the wild that uses interlinear ruby annotation codepoints?
Sadly, no :( You may have luck scraping Wikibooks or some other source of PDFs or plaintext. In general you won't find interlinear annotations on the web because HTML has a better way of dealing with ruby. This is also why they're in the "shitlist", that shitlist is for stuff that's expressly not supposed to be used in markup languages.
Another way to get a good answer here is by asking the unicode mailing list, they tend to be helpful here. I know that they're used because I've heard that they are, so no first-hand experience with them. This isn't a very satisfying answer, I know, but I can't give a better one.
> Do they count as graphemes?
The annotation characters themselves? By UAX 29 they probably do, since UAX 29 doesn't try to handle many of these corner-case things (it explicitly asks you to tailor the algorithm if you care about specifics like these). ICU might deal with them better. The same goes for word segmentation, e.g. UAX 29 will not correctly word-segment Thai text, but ICU will if you ask it to. I haven't tried any of this, but it should be easy enough.
I guess a lot of this depends on what kind of processing you're doing. Ignoring the annotation sounds like the way to go for NLP, since it's ultimately an _annotation_ (which is kinda a parallel channel of info that's not essential to the text). This certainly applies for when the annotations are used for ruby, though they can be used for other things too. Interlinear annotations were almost used for the Vedic samasvara letter combiners, though they ultimately went with creating new combiners since it was a very restricted set of annotations.
They're not used much so the best way forward is probably to ignore them, really. These are a rather niche thing that never really took off.
Sadly, no :( You may have luck scraping Wikibooks or some other source of PDFs or plaintext. In general you won't find interlinear annotations on the web because HTML has a better way of dealing with ruby. This is also why they're in the "shitlist", that shitlist is for stuff that's expressly not supposed to be used in markup languages.
Another way to get a good answer here is by asking the unicode mailing list, they tend to be helpful here. I know that they're used because I've heard that they are, so no first-hand experience with them. This isn't a very satisfying answer, I know, but I can't give a better one.
> Do they count as graphemes?
The annotation characters themselves? By UAX 29 they probably do, since UAX 29 doesn't try to handle many of these corner-case things (it explicitly asks you to tailor the algorithm if you care about specifics like these). ICU might deal with them better. The same goes for word segmentation, e.g. UAX 29 will not correctly word-segment Thai text, but ICU will if you ask it to. I haven't tried any of this, but it should be easy enough.
I guess a lot of this depends on what kind of processing you're doing. Ignoring the annotation sounds like the way to go for NLP, since it's ultimately an _annotation_ (which is kinda a parallel channel of info that's not essential to the text). This certainly applies for when the annotations are used for ruby, though they can be used for other things too. Interlinear annotations were almost used for the Vedic samasvara letter combiners, though they ultimately went with creating new combiners since it was a very restricted set of annotations.
They're not used much so the best way forward is probably to ignore them, really. These are a rather niche thing that never really took off.