I was wondering how much it costs to open a company in the UK to do something like that, and it seems to be really cheap (and quick):
a) Incorporate directly via Companies House
The standard registration fee to set up a company is just £12 for the ‘standard’ Companies House web incorporation service, which takes up to 24 hours to turnaround. You can pay via credit card, debit card or PayPal.
I'm disappointed that the discussion seems more about debating whether that person acted in good faith or that the law regarding acceptable characters in company names should be changed, as opposed to the bigger concern of why were they not sanitizing company names? Even without intent to insert HTML, characters such as < or > would still break their pages.
Seems to have a bit. Cut and paste from the guy who set up \"><SCRIPT SRC=MJT.XSS.HT></SCRIPT> LTD
...
>I am in the process of contacting every website that has triggered my script which has a readily available contact for submitting security issues, or a hackerone account or similar. Alas, the sort of websites that have XSS problems rarely list IT security contacts.
The authority section (which contains the host domain) must begin with "//" whether there's a scheme prefix or not. Otherwise it's just part of the path (or query or fragment). IIRC, these semantics are also fixed by HTML such that any attribute like HREF or SRC is parsed as-if using the canonical regex (but after entity substitution and whitespace trimming). Browsers might have implemented this differently many years ago, but I doubt it as it would conflict with being able to use a bare path atom (e.g. foo.html).
[1] I normally eschew using regular expressions for proper parsing, but for URLs the canonical expression is both adequate and advisable for correctness.
On a related topic, the name of the company I work for starts with a colon. The rest of the name is a common adjective. As you can imagine, it is virtually ungooglable at the moment. Any thoughts on how to get around this?
Just change the name. It'll be easier and more cost effective in the long run.
I've started working with some software called STACK recently, and it's almost impossible to find anything by searching (go ahead and try!). If it was a commercial product they would be sunk.
Those people made it so that one-letter identifier names and junk like ‘fmt’ and ‘Fprintln(w)’ is again okay. So the unusable name fits the spirit quite well.
BASIC programming ironicly may return better results than if you search for something about a more mainstream current language e.g. python. I often find the first few results are some search engine spam... tutorialspoint or geeksforgeeks etc, when a link to the API would be the logical first result. (Usually the first link to the api is for 3.4 or some random version also)
I've often wondered if any metallurgists have tried to run computer simulations of the annealing process. How would you find their research if they had?
Actually yes :) At least the optimization crowd don't use the phase 'heat treatment', which helps somewhat. But who I really feel bad for is the recruiters trying to hire a chemist who specialises in the element lead.
I don't know whether they actually do it but it seems really easy to treat "BASIC" as a distinct idiomatic token from "basic" when the searcher bothers to get the casing right.
It really is amazing how big a difference this makes.
I've started using Apple's Aperture software recently (I'm well aware it's been discontinued). I really like it, but my biggest frustration is that it's difficult to learn how to do new things, because "aperture" is a generic word in photography. I can't search for the name and get results about the software.
One of my favorite mobile games is Antiyoy, by Yiotro (https://github.com/yiotro/Antiyoy) who also created other games like Vodobanka, Achikaps, and Bleentoro.
The creator mentioned that he picked the names because they were pronounceable, unique, memorable, and searchable. That misses out on meaningfulness and familiarity, but those are expensive - by dropping those requirements, you gain easy SEO, trademarks, domains, etc. A big company knowing they're going to sell millions of copies can spend 5 figures on a domain and 6 figures on SEO, but I don't think it's worth it for most startups.
Huh, I play these and I didn't know that's why that had these names, I assumed they were compound words in some language I didn't know. This is like a reverse "XKCD" naming convention.
Limiting the dates to indexes before 2016 might help (at least with google). You can usually train google to get you what you want after a few searches. This was initially a problem with the Elixir programming language, but it learned what I actually wanted it started letting me just type in the term elixir without specifying it was a programming language. On other computers not associated with that account, it does revert back to the not-so-useful results.
> but it learned what I actually wanted it started letting me just type in the term elixir without specifying it was a programming language
Oh, you know what, this might be largely my own fault. I purposefully use Startpage.com as my search engine in order to avoid getting customized results (while still using Google's index).
I worry that customized results put me in a filter bubble—but they certainly have their advantages!
No lies detected, but because they aren't professional software I don't have to search for stuff as often. And the other "Professional" Apple app I use is Final Cut Pro, which doesn't similarly have this problem.
Back when I worked at a comparison shopping engine, I had a bit of a laugh when I saw that the indexing pipeline was generating error messages because the "clean" function returned empty for some products in the feed from Amazon, because they had names like "++++++".
It was usually musical albums that liked to have names that made it impossible for fans to find the music.
However, they benefitted greatly in the early ‘00s. If you had them in your Apple Music library, iTunes always put them at the top of your alphabetical music library, keeping them top of mind, ! comes before A. There might have a similar iTunes Store benefit too.
In 1997 Torsten Pröfrock released a highly sought-after dub techno album on the legendary Chain Reaction label under the name "Various Artists". It's a quintessential record in the Basic Channel genre. You can listen it here: https://youtu.be/3165Sf-q8dY
SiriusXM truncates a "The" prefix from artist names (so "The Cure" and "The Who" become "Cure" and "Who"). I always wondered how it would display The The. Would it be "The The" (special case), "The" (default removal of "The"), or an empty string "" (in the unlikely case the algorithm recursively removed "The" prefixes)? Eventually they played a The The song and the answer is "The The".
There’s a video on YouTube with three full-width explanation points as the title. I watched it once, and although it wasn’t particularly interesting, it bugs me that I cannot find it again.
Convince the powers-that-be in your company to invest in contracting with a marketing/SEO person or team to help come up with a new name. You want someone with marketing chops so that it's a good name, but you also want someone who knows about SEO so you don't end up on the second page for your own name search.
I was curious what googling only a negative query would do and for this, -"Yahoo" returns just the dictionary definition of the word "yahoo" and no search results.
<blink>Blink</blink> -- It's a deprecated tag now, and most browsers don't support it, but I would have loved to have seen that as a company... <marquee> is still supported though....
FWIW for most of these sorts of things you can scrub it via passing the text through an NFKC or NFKD transform. I'd hope that a screen reader can be updated to handle this case.
These are Unicode characters intended for use in mathematical formulas, not text, so they break all sorts of things. It might make some sense to use them in mathematical Python code (where they do seem to work), but they're hard to type.
I giggled at the previous company name being redacted as “[NAME AVAILABLE ON REQUEST FROM COMPANIES HOUSE]” too. Not sure if that’s a common thing to do or if they made an exception for these shenanigans so they didn’t have to display the XSS.
That would be a trade mark rather than a company name in this instance I think. In UK registered trademarks are standard type-written letters for word marks. If they contain symbols then they're figurative marks and it's an image of the mark which is registered.
On that point though, searching on the UK trademark registry it looks like it just strips non-alphanumeric symbols. A search for "Moz://a” returns "moza".
Is Companies House's website not done by GDS or something? I worked on a few GDS projects for DFT, we had to have independent pen testers test our services before they moved between phases.
No I mean the fact that this was possible on their website, XSS is one of the simplest things to test, in fact it was one of the standard tests UI testers would do on new screens.
Solidarity to all of the folks who have had to work with elected officials. I got ripped a new one because I recommended we disable a PHP project in the mid-2000s because a hay bale reporting app (report counts of bay hales on farms) due to an RCE bug. Within a few hours of the app being disabled there was drama from a politician who got a phone call from a prominent farmer...
Relatedly, several years ago I scraped all companies on the old companies house webcheck site. There were two that interrupted my scraper: both contained '<' in the company name, and both seemed to take the webcheck service offline for a few seconds whenever I requested their pages. I can't say for sure - it might have been a temporary IP block I suppose - but it amused me nonetheless.
What was the name? It’s former name is listed as “[NAME AVAILABLE ON REQUEST FROM COMPANIES HOUSE]” and it’s current name doesn’t contain any HTML tags (it’s literally the same as the headline)
Those are both funny and confusing names, but they don’t warrant comparison to sql-injections, so I am guessing there’s another name with actual HTML tags.
Relevant discussion on the Companies House Developer Forum:
https://forum.aws.chdev.org/t/cross-site-scripting-xss-softw...