Hacker News new | past | comments | ask | show | jobs | submit login
Wikimedia FontCDN – an anonymizing, privacy-first reverse proxy to Google Fonts (toolforge.org)
87 points by supercucumber on July 9, 2020 | hide | past | favorite | 53 comments



As much as I hate to post a pessimistic "Just do X instead" comment... why not just serve the fonts locally?

It's no more difficult than serving any other asset. And if you're worried about privacy, then serving files locally removes any dependency on a third-party server at all (or two in this case).


There used to be advantage to using global CDNs: you would only need to download each font file once.

But now, browsers don’t share caches between origins for privacy reasons (“hmm, judging by how fast these ten resources loaded (and how slowly these other thirty resources loaded), you had these ten in your cache; and such-and-such a sensitive site just happens to load those ten resources and none of the rest…”), so that reason has turned from a positive into a probable negative, because it’s having to look up another domain name and open a new TLS connection—though if you’re serving the resources with HTTP/1.1 it might still be faster coming from a different origin, if you’re loading enough resources.

After that, the main advantage of Google Fonts doing the CSS serving is that it varies the CSS it serves by user-agent, to give you what your browser will cope with best, whether it be EOT, TTF, WOFF, WOFF2, maybe they vary the response in other ways as well, I’m not sure. In practice, I think that benefit has run its course: I recommend that people don’t even bother with the bulletproof web fonts formula for supporting all the formats, but just serve woff2: fonts are fundamentally supposed to be optional (icon fonts are generally bad), and users of such ancient browsers as IE, EdgeHTML < 14, Firefox < 39, Chrome < 36 and Safari < 10/12 don’t need the fonts anyway.

So then, I say that the only thing that remains is the neat packaging of the font files, subsetting, &c. And I say copying the files and serving them yourself is overall probably a very wise idea.


There are also another advantage: font subsetting. Google Font automatically subset large fonts. This doesn't really matter for Latin, but for CJK where an entire font can be >10MB, Google Font automatically subset it into >100 files, partitioned by the general usage frequency.

I served my own fonts, but subsetting those font aren't trivial matter. Personally I just grabbed the unicode range that Google Font used and generate my own subsets, but it is not that trivial.

(The reason I served my own font is actually because some font on Google Font are not up to dated, and I actually abuse the unicode-range to use different fonts for different scripts)


I was talking specifically of the advantages of global CDNs. Font subsetting is something Google Fonts covers and handles very well for the most common use cases, but is nothing to do with the global CDN.


> But now, browsers don’t share caches between origins for privacy reasons

I didn't heard about this thing until today, are you sure that it is already implemented in all major browsers? I have found this source [1] but for what I can tell, at the moment this feature is behind a flag in chromium and firefox (maybe it is already implemented in Safari).

[1] https://www.jefftk.com/p/shared-cache-is-going-away



You are correct and I was wrong; I had thought it had been shipped, but it shipped behind a flag.


Oh wow I wasn’t aware that cross site caching isn’t a thing anymore. That’s how I learned it years ago from the jquery days


Good comment. That elaborates on the reasons I chose for self-hosting fonts.

And because caches are no longer shared (unfortunately), I've started subsetting fonts myself to trim them down when possible.


How much time do you spend on this roughly? Also do you do it for icon fontS like don’t awesome? Thx!


Most of the time investment was setting up the prerequisites for the various tools. Notable requirements were python, node+npm, Microsoft Build Tools, and Google's Brotli.

I used glyphhanger[1] to apply the actual subsetting. Use the --spider flag to find a list of unicode ranges used on your site. Then you can generate files with something like:

glyphhanger --whitelist="U+20,U+21,U+26-29,U+2C-3B,U+3F-57,U+59,U+61-7A,U+2013,U+2019,U+201C,U+201D,U+2026" --subset=SourceSansPro-Regular.ttf --formats=woff2,woff --css

Then you would add that same unicode-range to your CSS.

I haven't tried this on icon fonts. I tend towards SVGs instead.

[1] https://github.com/filamentgroup/glyphhanger


I do really simple subsetting on my site: rather than trying to figure out which characters are used in which fonts, I just dump all of the site’s contents, and sort out all the characters in it.

A very slightly simplified version of my Makefile, which depends on the original font files found in $(PATH_TO_FONTS):

  .font-subset: $(call rwildcard,,%.html %.md)
   find . -name *.md -or -name *.html -exec cat {} + | grep -o . | sort | uniq | tr -d '\n' > .font-subset

  define FONT =
  static/$(1).woff2: .font-subset
   pyftsubset "$(PATH_TO_FONTS)/$(2)/OpenType/$(3).otf" --text-file=.font-subset --output-file=static/$(1).woff2 $(4) --flavor=woff2

  fonts: static/$(1).woff2
  endef

  $(eval $(call FONT,eta,Equity,Equity Text A Regular))
  $(eval $(call FONT,etab,Equity,Equity Text A Bold))
  $(eval $(call FONT,etabi,Equity,Equity Text A Bold Italic))
  $(eval $(call FONT,etai,Equity,Equity Text A Italic))

  TRIPLICATE_FONT_FEATURES := --layout-features+=ss01,ss02
  $(eval $(call FONT,tt4,Triplicate,Triplicate T4 Regular,$(TRIPLICATE_FONT_FEATURES)))
  $(eval $(call FONT,tt4i,Triplicate,Triplicate T4 Italic,$(TRIPLICATE_FONT_FEATURES)))
  $(eval $(call FONT,tt7,Triplicate,Triplicate T4 Bold,$(TRIPLICATE_FONT_FEATURES)))
  $(eval $(call FONT,tt7i,Triplicate,Triplicate T4 Bold Italic,$(TRIPLICATE_FONT_FEATURES)))
And with that, `make fonts` generates a new version of the fonts, trimming out all the unnecessary glyphs and features, while retaining ss01 and ss02 for Triplicate. On Arch Linux, this depends on the python-fonttools package for pyftsubset, and the python-brotli package for --flavor=woff2.

It would be possible to do much better: to identify which characters are rendered in which fonts, which sequences of characters are employed (so that you can trim kerning and ligature tables), things like that; but this does a good enough job for me. (We’re talking about differences of probably less than half a kilobyte in a <20KB file.) I use only English text on my site and I control all the content, so I don’t need to worry about unicode-range splitting.

Concerning icon fonts: this technique would work for it, but for myself I refuse to use icon fonts because they’re fundamentally moderately bad: you can’t trust fonts to load at least in part because quite a few users simply have them disabled for performance or accessibility. There do exist icon fonts that have an almost tolerable fallback, where they use ligatures so that the sequence of letters “envelope” becomes an envelope, “twitter” becomes a Twitter logo, &c. so that screen readers will read the name of the icon without you needing to worry about aria-label and other related properties, but the icon name is normally not the text you should have there, so it’s kind of a waste after all that. Your options are better with something like inline SVG icons or the the inline SVG sprite technique. (See https://icons.getbootstrap.com/ for an example.) Also avoid using just icons with no labels, humans perform enormously better when there are labels on their buttons.


Picking the font based on the browser – surely there's an open source plugin for NGINX or something that could replicate this?

This is a perfect sort of thing as a Caddy plugin too. Or a CDN provider to provide this out of the box.

The CloudFlare people could do this in a weekend probably and I'd happily sign up because I'm less concerned about my privacy with CF than with Google.

But is this even an issue? If you declare the 4 different formats, won't the browser know to request the best one automatically?


Sure, it’s called the bulletproof @font-face syntax. But that means you’ve got to get the fonts in all of the formats, for starters. I don’t think it’s worth it at all, for the stated reason. Three years ago maybe, but definitely not now.


The other advantage of a font CDN is updating.


I don’t know if Google Fonts has changed its policies, but historically it pretty much just hasn’t updated its fonts. Even when repeatedly asked to by the original font’s author. Take as an example Crimson from https://github.com/skosch/Crimson: Google took Crimson Text in 2010 and put it onto Google Fonts, but mangled it in the process, messing up its line-height badly so that the regular and bold weights didn’t match, and that text using the web font would not line up with text using the original font. In 2012, the upstream font made various improvements, and the author attempted upon multiple occasions then and since to get Google Fonts to update the font (or even just to fix the bugs they introduced, for starters!). Others also tried. Well, it’s still broken. In the end a new version of the font (with variable weight) was commissioned in 2018, and Crimson Pro now is. Allegedly the problems in Crimson Text were supposed to be fixed up after Crimson Pro was released: https://github.com/google/fonts/issues/2395#issuecomment-631....

I am aware of them updating fonts that they commissioned. But I know of multiple cases where Google Fonts has been serving versions of fonts that are five or more years out of date, sometimes even broken by Google Fonts. Crimson Text I can kinda understand them not fixing completely, because fixing it would have changed character positioning (thus breaking people’s careful alignment to work around the Google-Fonts-introduced bugs). But there have been other cases where that didn’t apply: metrics were the same, but they just wouldn’t update the font. Might have been PT Serif? Lato? I can’t remember, it’s years since I last cared about Google Fonts.

In short: updating the fonts is not all it’s cracked up to be; Google mostly just doesn’t, and half the time you actually wouldn’t want the font updates anyway—depends on the nature of the update.


Updating is mostly relevant for unicode completionist families and emoji.


Like all art, fonts are rarely completed and merely published. Hinting especially seems like a fine art that is never done, there's always more things to hint, and more hints to tweak towards perfection of the typographic art.

Some of my favorite fonts have regular releases. As with most art, software enables a more regular release cadence than past such systems. (Such as the days when font foundry was literal and fonts were published to metal and distributed in giant physical cases.)


Another issue is copyright, which for typefaces and fonts varies considerably by jurisdiction - using Google seems to offload liability as long as you use it within their terms then they should at least be the leading co-party if a rights owner sued.


> seems to

Do you have a source for this? IANAL but if a Google font was disallowed for a jurisdiction, Google would be in legal trouble for advertising/hosting it on their own site, but that would be a separate case. I seriously doubt they'd share any of the liability for you using it on your site.


No source, just a feel for it as I've studied quite a few copyright cases.

I guess it's like me having an embedded video except that for the fonts Google have uploaded the media, not the public. If I watch a video on YouTube that was a copyright infringing upload, strictly (in my jurisdiction), I would have also committed infringement. But the courts are exceedingly unlikely to punish me, especially if the uploader warranted the video for my use.

Google say the fonts they have are free-libre for website use: https://developers.google.com/fonts/faq.

That's an implied warranty, I should do due diligence, but if they have doubts they should temper what they say too (they probably attempt to disclaim the implied warranty in their Terms). If they're offering fonts that are not "open source" they are in the wrong; I might also be wrong but I should - when using the fonts as they direct - be able to sue them in turn if I'm sued for using the fonts they offer.

It's not joint liability, but in general the law accounts for people being deceived. It's not negligent infringement on my part, IMO, as I checked the Google info for the license terms and have no reason to suspect they're wrong.

If the BBC put on a tv show they don't have copyright permission to air, and I suggest in my magazine article that you watch the show, I've directed you towards infringing material - which may be contributory infringement (in UK) - but the BBC would be the principle offender. If the BBC make statements saying the work is free-libre, and I rely on that then to me they appear severally liable, and liable for deceiving me.


Totally agree.

Hosting assets like fonts or javascript on the same host is not only better for privacy, it's also more secure (no thirdparty can mess with your content) and contrary to popular belief also faster in a modern web environment.


At last I see someone else recommending this, I was starting to think I'm the only one opposed to 3rd party hosted "free" fonts. I chose to host the fonts myself for my 20-things.com side project. It was a hassle to set up, but I wouldn't have it any other way.


It's primarily _meant_ to be used locally, by all the other tools hosted on Wikimedia Toolforge, which just recently moved to toolforge.org, a service that allows community members to host their own tools for working with and editing Wikimedia projects.


Bandwidth is a concern I assume.

What percentage of the data transferred is a font or JavaScript library?

If it's like 30% (I don't know, just guessing) then that's a portion of bandwidth that could have been used to serve users the site content.


This is on toolforge, so while it's technically on Wikimedia run infrastructure (Wikimedia cloud services), the tool itself is maintained by a community member. If you use this outside of the Wikimedia cloud services environment (toolforge and Cloud VPS), then you're asking for trouble.

My guess is someone added this tool for other tools to use. There's no way this is being used on the production wikis.


> My guess is someone added this tool for other tools to use.

That's precisely what it's for; and there's a cdnjs mirror as well, for the same reason.


My idea, a browser extension that downloads a mirror of the most popular fonts and intercepts requests to google inserting device local CSS instead. Might not be desirable for mobile devices, but with ample storage you get faster page loading and no telemetry sent to anyone.


All browser font systems already prefer locally installed fonts to CSS @font-face lookups.

The tool SkyFonts (from Monotype foundry and recommended by stores like MyFonts.com) as one of several features as a "cloud font updater" includes a "download the top X Google Fonts" option. It's an easy way to get a faster page load experience on the web.

The problem to watch out for, and it is why it's not generally recommended, is that font loading times are already in the wild a privacy issue (there are fingerprinting tools out there that try to download fonts and draw them in an off-screen CANVAS, using "too fast" as a deanonymization vector).

The best bet to generally help the web at large would be a browser or OS vendor to start installing the Top X fonts from Google Fonts out of the box.


That sound exactly like what decentraleyes does https://decentraleyes.org/


AFAIK it only caches js files, not fonts.


But this is too slow to me. 400ms vs 23ms. I'm in Asia and it redirects me to new zealand. other wiki site redirected to US server.

where can I find the wiki CDN network map?


This has a map of the Wikipedia colocation sites: https://wikitech.wikimedia.org/wiki/Clusters

I'm wondering a bit about your new zealand redirect, as there is no colocation site in new zealand on that map at least.


my bad. that's Netherlands


This isn't going through the Wikimedia CDN network. It's on wikimedia cloud services, which is a free computing infrastructure (OpenStack/K8s) for community members to build community maintained tooling for Wikimedia.


Still think this is a nice idea in theory. In practice I often vendor stuff like this, meaning I just don't use CDNs and deploy "local" copies. I could write a few pages about why this is a bad idea, but there are also some advantages.


Why are people adding external fonts everywhere? Aren't there enough fonts built in browsers to make most people happy?


There are exactly 0 fonts built in to browsers. You could try using fonts installed on the system but that tends to devolve in a cross-platform compatibility mess.


There is a set of 7 fonts widely distributed enough to be considered "web safe" if not built in to browsers (and at one point most browsers installed them if they were not already supported by the OS, but today almost all OSes support them directly so most browsers no longer ship them directly), but it's not a particularly great set: Verdana, Trebuchet, Arial, Comic Sans, Georgia, Times New Roman, and Courier New.

Even if those were somehow the 7 "best" fonts in all of the world, there's still a need to support external fonts because fonts are a tool for creative expression. Creative expression might not always be what you want from the web, but as a 90s Web fan, a web without creative expression would be a terrible web.


>need to support external fonts because fonts are a tool for creative expression

Maybe, but using google fonts is as uncreative as you can get.


There's currently 993 font families on Google Fonts from a very wide variety of open license amenable font foundries, in a wide variety of styles (at least for latin scripts). Are you implying it is uncreative to sift through such a large gallery of possible fonts and find ones that speak to you or your project? Or are you implying it uncreative to use open licensed fonts and people should pay for their fonts, perhaps confusing creativity with capitalism? Or are you simply making some sort of stand that to truly create something like an apple pie, first you must create the universe and thus real creatives make all their fonts from scratch one kern at a time?

Google Fonts has problems like privacy concerns definitely, but it seems to facilitate a lot of creativity in web page design that otherwise would seem impossible. (Self-hosting fonts is not fun, and that's assuming you are capable of handling the complex slalom of font licenses, web-capable font licenses, font to webfont conversion tools, etc. There are more options beyond Google Fonts, including for commercial fonts, but Google Fonts is still the most accessible for the wild, free/open source creative parts of the web on low or no budget, the parts most like the 90s web.)


Notably that set of fonts excludes most linux installs, given their licenses.


Fair point, many distros don't bundle by default the fonts because they disagree with the free as in beer but not free as in speech nature of the fonts.

Though many distros still make them available for those that want them. For instance, in Ubuntu they are included in the "Restricted Extras" meta-package, or specifically in the ttf-mscorefonts-installer package.


YC uses font-family:Verdana, Geneva, sans-serif; and it works fine.


Why is that? Why browser makers cannot agree on a set of fonts that are good enough for most, and ship them?


You mean Roboto everywhere? Maybe I have a preference for Lora in the headlines because it goes good with another font in mass text? Fonts are commonly underrated but take a look at some cooperate identities ... you recognize Mercedes every time you see a truck with ads of Mercedes on it because of what? You guessed it right ... fonts. This wouldn't be possible if everyone would use Arial, Verdana and Times New Roman or even Roboto.


If ads are the only use case for fonts, all the more reasons to kill them with fire.


I hate ads as much as everyone else, but branding goes beyond ads and is valuable.


Rasterize your content or serve SVG?


No? There are no fonts built into browsers.


The same objections that apply to Cloudflare in various other “anonymization via CDN” posts apply here as well: it’s only anonymizing the data Google sees, but the operator of the proxy can still harvest and profit from non-anonymous data. Be sure you trust the operator of this proxy^ to act in your best interest when evaluating whether to use this.

^ https://news.ycombinator.com/item?id=23780853


I just set browser.display.use_document_fonts=0 and don't worry about the privacy implications of font loading anymore.


I do it primarily because google fonts are so hideous, it's impossible to read.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: