I've seen people suggest using pi as a "database." In this instance you could map ranges of pi to characters. Then you search pi until you find the right range. The URL would contain the range(s) needed.
This is terrible and so computationally wasteful. But it's interesting to think about.
Seems technically possible - example.com/12_3456 would calculate out pi, read 12 digits starting at position 3456, then convert those digits to ascii and you'd have your result. Given how far into the digits pi you'd have to go to find the exact right sequence that matches, eg, https://google.com, though, I suspect the resulting position number would be longer than the original url!
Also, conceptually, it seems like you're not so much using pi as a database as you are using it as an encoding mechanism: turning "abc.com" into a string of numbers whose length is relative to the original string. At that point you might as well pick an easier encoding, like base64. :)
You could add a short one or two character long identifier that changed the digits-of-pi to ascii map then select the mapping that results in the lowest index. In this case part of the "database" would exist within the URL shortener itself.
"X in pi" is an interesting thought experiment, but practically speaking, the least efficient way to do anything, since finding a string of N digits in pi will require a search space exponentially much larger than N on average, so it ends up just being an extremely inefficient encoding. A huffman coding + base32/base64 is a better "databaseless" short URL system.
In a similar vein, almost every algorithm can be converted into a O(1) space version simply by converting it into the dumbest possible search algorithm over the solution space: just sample from the solution space randomly (with a seeded PRNG or something) until you get the right answer.
Interestingly, bogosort cannot actually be implemented in constant space -- at least, not if you're talking about a deterministic algorithm. In order for your PRNG to have a large enough state space to cover all possible permutations, it needs O(n log n) bits.
In the Turing machine model of computation, algorithms that use O(1) space are theoretically equivalent to finite state machines (potentially with a very large number of states). Logarithmic space is where things start to get interesting, because it's enough to store a constant number of pointers into an arbitrarily large input.
Haha this is fantastic. Are the IDs still coming from that single counter lambda? Or put another way, do you now have 1,750,785 lambdas in your AWS account? :)
Yeeeeeeep. :) Someone automated spinning up new IDs at some point and flooded this service with urls. I wrote a script to delete old ones, but it takes a few days to run and I haven't gotten around to letting it finish.
Essentially this is the promise of the blockchain. Imagine one massive global database where anyone can store anything in encrypted form. Free, forever.
Paying puts your transaction near the head of the queue. This matters more when there's a backlog. If you're willing to wait for things to clear up, you don't need to pay. https://bitcoinfees.net/
I was always fixated with URL shorteners and always wanted to write one myself. While it would be easy to write one that works with a server, I loved the idea of hosting and running one for free.
Thus, I wrote this URL shortener which does not need a backend to work at all. It uses GitHub issues as a "database", and unlike most other URL shorteners that do not need backends, this one doesn't need a # prefixed to the alias. (i.e. short URLs look clean like this: nlsn.cf/1 instead of looking like this: nlsn.cf/#1)
I haven't had the time to work on it more but do let me know what you guys think! Thank you and hope yall enjoy it!
"I was always fixated with URL shorteners and always wanted to write one myself. While it would be easy to write one that works with a server, I loved the idea of hosting and running one for free."
I felt exactly the same way and did, in fact, build one.[1]
It was pretty simple. There's an interesting twist to the particular one I built - it's not specifically a URL shortener, it's a "Universal Shortener".[2] Also no tracking or cookies or third party connections, which I find pleasing.
The disappointing part is seeing the malicious spam and phishing URLs that get shortened. We have had to manually flag-hold-approve a lot of strings and IIRC we outright block anything with 'paypal' in it.
We also gave a kill-function to our upstream ISP so that they could immediately act on phishing complaints without our having to be involved in real-time.
My biggest problem with this is the domain name. If you say "oh by", a vanishingly small amount of people will translate this to "0x". And you can't say "oh ex dot co", because the URL has both a 0 and a O in it - which will also always be ambiguous when written down. So you have to say something like "zero ex dot co", which I suppose is fine, but it has a lot of extra syllables.
I've spent almost zero time on "Oh By" in the last 24 months as things at rsync.net have been very busy. So, we haven't made a sale in months due to lack of publicizing it.
However, I think there are unknown, emergent use-cases out there for "Oh By" that I hope people will discover. I am going to pivot back to "Oh By" next year ...
Oh that is cool! It does remind me of https://telegra.ph/ by Telegram! (Even though they weren't really advertising it as a universal shortener). Just a quick question, how did you promote your platform!
Looks nice and simple! Bit strange to claim "there is no database" and then add "GitHub issues is the database", maybe add "there is no traditional RDMS/document database needed" or something similar.
You might want to detect infinite loops as well :) https://nlsn.cf/6
Last note, the status code seems to be incorrectly set to 404 instead of any of the redirection status code. If you're hosting it on GitHub pages, I'm not sure you can actually control it. If the status code was correctly set + the Location header, I think all browsers would be able to detect the infinite loop automatically and prevent it.
HAHA so that was you. That was a good one! Yes and that title is definitely slightly misleading I kinda didn't want it to get so long and wanted that slight clickbait effect.
Yep I actually cannot control the status code at all. Part of what made this a fun and hacky project was the fact that I was actually redirecting to the actual URLs by catching the 404s. So, the only way i could think of catching the infinite loop was to supply the domain. That being said, I do appreciate the feedback and it's always great to learn more!
I think you could catch the infinitive loop client-side as well by comparing what the URL to redirect to is, and comparing it with the current URL. If they are the same, prevent the redirect. Although you would still be able to create loops by using two issues instead. Maybe just comparing the full hostname? If it's nlsn.cf, don't redirect.
If you use it personally and you know what you are doing, I think this idea is good. For example I have my own domain and when I share long link to friends, I can use it.
By the way, your feedback is good for author improves. Thank you.
I reckon it'll be possible to "simplify" it even more for an end-user, by making a static site generator generate these redirect HTML files from markdown or a CSV.
Some guy shared his project in this post that does exactly that! Yes while that is possible and can get to a high level of automation via GitHub actions, the point of my URL shortener was more of people could post a link easily with a simple interface (like the interface for adding a new issue). But if I were operating with a fixed set of short links, then this would have been a good method!
The best hack on github issues I know is the trending-repos repo. Just subscribe to the issue of the technologies you care and you'll receive a daily/weekly email with the most starred repos for that period:
Think of this as the software version of graffiti.
It is an artistic and creative way to shorten urls, but likely won’t exist forever. As long as it does though, we can appreciate the art for what it is.
Indeed, especially with the recursive link created by someone else (nlsn.cf/6)
GitHub does have a 60 calls per hour API limit so I don't think it affects them as much(?) That being said this project is meant to be for fun and definitely not recommended for production purposes HAHA
I've seen quite a few uses of the issues page as a discussion forum too. I certainly wouldn't consider any of it "abuse", however.
I wonder if anyone has made this observation before in regards to such things: "Anything that can be used to store arbitrary data, will be used to store arbitrary data."
It uses a tab-delimited nginx map file as a config. To make changes, I wrote a tiny shell wrapper which opens the config in Vim, checks it in, and reloads nginx: https://statico.link/how
I always wondered if relying so much on external libraries in the long run is more hassle to maintain than doing it by yourself. Security updates will be provided for X major version for a reasonable amount of time, but afterwards you will have to update, check that nothing breaks and update your code as well. Also some of the deps will break regardless of SemVer and you will need to update.
I figured that since the dependencies were only used in the build step it's not really considered. That being said, I didn't know ES2015 had `.then()` and was kinda trying out babel since I always used CRA and never knew what was going on behind the scenes. The dependencies were definitely not needed but in the production build, they aren't really "unminimal"
My fault. Didn't know that ES2015 actually supported `.then()`. Anyway, this was also an exercise for me to learn about babel (I use CRA all the time so have no idea what is going on behind the scenes) and this was a nice intro!
Better be ready for the malicious links. I'm the developer of https://t.ly/ link shortener. I spend most of my time adding in protection to prevent people from using the service for malicious links.
Thank you for the advice! I love that domain though, have always been trying to find a 4 character URL shortener since GoDaddy killed x.co. Will definitely use your service more than I use my own HAHA. That being said, my project is not meant to be seriously used, just kinda a cool hack that is not meant to work forever/work reliably. Appreciate your comments and all the best for your service! You just gained one more (non) malicious user!
Thanks! Your project is really neat. Curious how long github would let it go and if they have a max number of issues. In a short time, T.LY has almost 5 millions short links. I have a lot of users using the shortener from the extension https://t.ly/extension
Prolly be fine unless for some reason a famous person used it to post a link that got millions of hits in a short time. Then, I would assume pretty quickly if GitHub tracks API use per key. Or maybe it’d just be throttled.
Congrats! I really love this concept for two reasons.
First I'm always fascinated by smart solutions that make use of existing free (as in beer) infrastructures to provide a service with open-source software. In this way we truly get FOSS.
Second, I really like the idea that the information on where the shortened url redirects to is publicly available. I know that solutions like bitly.com do provide a way to preview the shortend url, but I think this is just more transparent, although admittedly a little bit less obvious.
You raise a point that strikes me as odd, but perhaps I am not up to date on what is considered FOSS nowadays.
In what way will this lead to `truly` FOSS? Sure, the URL shortener project is open-source, but neither Github Pages nor Github Issues are free-as-in-libre open-source projects. What is being show-cased here is a neat idea, but ultimately the dependency on Github's Pages and Issues will make it difficult to port to other systems, thus you are not really free to do with as you please.
So, while the URL shortener program itself is FOSS with all the right licenses, doesn't the direction you are proposing lead to more capture by these non-free platforms, as the value they offer for free-as-in-beer becomes more and more difficult to ignore?
This is really cool. I think it might even be possible to improve it further so opening an issue isn't even needed.
My thinking is you could store the links and the corresponding shortened URL lookups as an element in the HTML source itself. You would have a submit input on the page. User enters the url they'd like to shorten (or is pulled from a param). The JS reads that value and then looks it up to see if it already exists in the element you created for storage. If not, it creates a hash as the shortened 'link' and adds it to a hashmap / element. Now this is the crucial part: have JS call the Github Actions API to trigger a workflow (https://docs.github.com/en/free-pro-team@latest/rest/referen...) that puts the updated hashmap (which is a payload for a custom parameter you defined for the GH Action) into a file e.g. `links.json` or even outputted directly into the HTML source. Then have the action commit this file back into the repo. Thus, the flat file in the repo acts as the storage mechanism for the links. From a UX standpoint, this is a little smoother and more cohesive (same site to input & redirect) than having to open a GH issue.
With this approach, you can still have the nlsn.cf short domain redirect as long as you pass the URL params and get them in JS from the location, which it looks like you already do. I might try whipping up a version of this tonight. Thanks for the inspiration!
Hmm having a GitHub action redeploy the site is not instantaneous and it does take up to 1 minute for the site (and consequently the new short link) to be updated. Nevertheless, that is a sweet idea, and there are other replies that suggested using a static site generator to generate one html file per short link, with each html file containing a "redirect" tag in <head>. This idea can definitely be combined with a GitHub action. Personally I feel that is overkill, because if I were doing something so elaborate, I would have opted for a simpler and more scalable option like Google Cloud Firestore or Firebase URLs
I just found a "bug" that can be abused (in chrome at very least).
URL('http:53') returns https://0.0.0.53 (it casts the int32 number to an ip address, dont' ask why). But window.location.replace behaves differently and treats it as a relative URL.
For the first time to me, the MacBook Air looks better than the MacBook Pro. No touchbar, no fan (no noise) and more than enough power. From the test I found, seems like you can max out the M1 for 7 minutes before it start throttling down.
But this has a chance of not working since you are doing the redirection client-side. Some addons block 3rd party resources. Looks like you're using the github API and sending the request from the user's browser.
If you're using github pages you should just structure jekyll such that you can edit data/_links.yml instead of having thousands of users wasting compute power for github api when its not necessary
I would suppose doing a push every time a new URL is added would be quite expensive for GitHub too. Plus, GitHUb pages is not updated immediately upon push and it takes some time for new links to work. That being said, this is definitely not the most realistic implementation of a URL shortener. It is meant to be cool and hacky and definitely not used for production!
I was expecting it to be a way of taking a URL and converting it to a base64 type of format that is stored on the "url" itself and decoded by a simple web page. I have wanted to try to do this for a while, but never saw down to look for an appropriate encoding that somehow also compresses.
I helped someone design a regular URL shortener in PHP before, they're not complicated to implement with a simple MySQL (or even SQLite) database and any language you're familiar with. I have not taken a stab at writing a client-side shortener yet on the other hand.
Base64 would probably be longer than the original URL. I can't really see how short pieces of text (like URLs) could be compressed to become smaller. That being said it's definitely a cool concept! I once worked on a code playground kinda thing for my school and there was a function to share the code written in a URL. How we did it was just to include the whole piece of code as a URL param in base64!
Well that's why I said "type of" but yeah I can't see it either unfortunately, of course, as long as the URL you're trying to shorten isn't too insanely long it might be workable to do some sort of encoding on it.
I realize I misspoke and update my message about why we used a database. I helped someone do a regular URL shortener, but have always been curious about doing a databaseless / no I/O on any server whatsoever based shortener.
Bad for so many reasons (including abuse of GitHub issues, spam, and the ability to change the URL the short link points to)
If you want your own private shortener, I have a better idea:
Use GitHub Pages and create a `/1/index.html` (where 1 is the short link name) and add a “refresh” meta tag in it. Completely static and no build/JS necessary.
This could be opened up to others by combining it with issues and GitHub Actions, but who wants that?
Another better idea for a private URL shortener is hardcoding a list of URLs in a CloudFlare Worker. Much faster and with real 301 redirects.
Such a fun little hack. The killer feature is that you found a way for anyone to create a new short URL without a pull request or any changes to the original repository. Nice work.
There's a lot of useful stuff to learn from this tiny example, providing you look at it more as a learning exercise than a production enterprise solution!
The github api does not care about your origin so that wasn't something I had to deal with! If I had to deal with something like that, my best option would be to use a proxy server, which would arguably negate the whole point of this "serverless" hack.
Your browser should, though. I would expect the fetch() request from nlsn.cf to github.com to throw a security error — presumably, the way custom domains are handled on github pages includes the addition of the custom domain as a trusted source at the github end.
It should, in the right circumstances. By default, CORS dictates that cross-origin requests should not be allowed.
But sometimes you want that to be possible, so we have headers available to is where we can signal which domains are allowed to access the origin when on another origin.
In the case of the GitHub API, they (GitHub) are setting these headers to allow any origin to access the GitHub API from any other origin, that's why your browser doesn't throw a security error. Check out the various "access-control-*" headers the GitHub API returns as response headers when you use it.
Ah, thanks. I thought they might've had a dynamic list of domains that they add each custom domain too; I guess that would be enormous, even if it sounds a lot more secure than "Access-Control-Allow-Origin: *"!
What I meant was that GitHub's API does not check for the Host header, and the API allows connections from any source. Thus, CORS isn't an issue at all.
No, when a script tells a browsers to make a cross-origin request such as a GET or POST, the browser first makes a "pre-flight" request (without the payload) using the OPTIONS method to see what CORS-related response headers come back. If headers are returned that allow for it to proceed, the browser then makes whatever request the script asked for.
The network tab of developer tools should reveal all of this.
I know perfectly fine what you mean here, but in the name of security, it's important to be precise.
All websites and browsers have CORS, one way or another, as CORS is the general concept. By default, only "same-origin" requests are allowed and "cross-origin" requests are disabled. But CORS is still there none the less.
What GitHub has done in this case, is add support for "cross-origin" requests.
Nitpicky maybe, but thought it'd be useful to add to avoid any confusion.
That is quite interesting. Only issue is to rely on Github Issues for it (it might get messy with too many issues opened).
I had this same ~dream of hosting a URL Shortener statically, which lead me to create URLZap. The difference is that it relies on a config file and it can be used on any code repository service https://github.com/brunoluiz/urlzap
Cool project! Thanks for sharing! Yea the reason why I chose GitHub issues as a "database" was because it is simple to add a short link that manner. Definitely may get messy and there are other shortcomings like the lack of ability to indicate/reuse aliases. The other way to run a URL shortener service with low effort would be to use Firebase. Your idea is a cool one, love it especially since it uses GitHub actions too!
Neat project. What were your ideas for storing the content in git version history? (i.e as files itself?)
Shameless plug for a similar project that i did few years back: A (partial) static site generator based on github issues :)
https://github.com/geekodour/gitpushblog
This could be implemented on top of any collaborative tool such as Wikipedia or anywhere a user can enter some text something like a comment section. For example instead of creating Github issue every time one could just reply to an existing issue with the desired URL and have the tool redirect to it based on a unique id.
Well, I clicked this link and got disappointed immediatly. Apparently the idea is neither original new nor useful.
Then I got a little offended by myself. Information Theory has stated clearly that universal compression is impossible, and how come that I didn't recall that?
I setup YOURLS* for a client several years back, worked quite well. Was always a little concerned about privacy/potential for abuse (being self-hosted) but we never had those issues due to the limited use & exposure.
I would find it better if he'd add something like an auto accept pull request bot, that allows you to append a single url (with validation check + rate limiting) to a file, that will be read by the url shortener + auto deploy on new master commit. That would be truly databaseless.
Well I would believe that even a `json` file or any other form of storage written to disk is considered a database, But your suggestion definitely brings it further away from an actual database!
I would hope that the most they’d do is limit/throttle the number of new and updated issues. As long as the limit is very generous for a normal project, this sort of thing would still be supported at reasonable usage levels.
Hopefully it wouldn’t be considered abuse just by its nature (as opposed to any traffic spikes it might cause). Quite the reverse, this shows off the flexibility of the service!
This is definitely just a fun hack and not meant for production! For production ready short links with low overheads I would personally recommend using Firebase URLs or cloud firestore.
This is brilliant! Especially to distribute branded shortened URLs of the kind {mydomain}/{id} without having to maintain pretty much anything and giving the end user a chance to verify what the URL actually leads to. Thank you!
I made a URL shortener for the organization I work at using Amazon S3. I created a bucket that is hosted as a website, to which I upload empty files with the "Redirect" metadata field to indicate the destination.
Indeed and especially with many shorteners not being maintained and going down. That being said, this one is not meant to be an actual URL shortener, but rather, a cool hack and it is not meant to be used widely or used in production.
Fair enough, I'm all for neat hacks. Link shorteners in general make me concerned since I've encountered more than a few dead ones. It's frustrating to know that the content you're trying to access may very well still be alive, but the only links you have to it are dead because somebody got bored of running the shortener used.
This currently uses about 100KB of JavaScript (over 99KiB of polyfills, about 3KiB of actual code), loading the HTML file, then three JavaScript files, then making an API call to GitHub, then finally doing the actual redirect, by a substantially inferior JavaScript technique. Also, it doesn’t work on IE because the polyfills were insufficient.
Yes, this is an interesting technical demonstration of a concept, but it also has some rather serious problems, so that I would strongly recommend that no one actually do things this way.
----
Firstly, the size of the code: I just golfed it for fun, I think this should do for the full document (though I haven’t tested it):
100KB in four requests, down to 412 bytes in one request (and that 20 bytes of <meta charset=utf-8> isn’t even particularly valuable). And it restores support for IE, by using simple regular expression testing instead of `new URL(…)`, which IE never supported. As written, I believe it should work down to IE9, and older could be supported if you replaced the load event handler with an onreadystatechange incantation.
(Other general remarks on the HTML:
1. In <meta charset="utf-8" />, the trailing slash is completely ignored by the HTML parser; in fact, I reckon using trailing slashes like this is actively slightly harmful, because it misleads you into thinking it closes elements, as in XML, but it doesn’t.
2. On the script elements, type="text/javascript" is the default and thus not needed.
3. <html>, <head> and <body> wrappings can all be omitted from the source. This one is a bit more subject to taste; I know some like their source to match the parsed document tree, even down to things like writing the <tbody> out even if there is no thead or tfoot. But I myself always omit the end tags on these, and omit the start tags unless they have attributes, which in practice means I always have the <html> for its lang attribute.
)
----
Secondly, the redirection technique. There are good reasons why you should use HTTP redirects and not JavaScript location.replace for something like this:
1. If you use JavaScript, any users unable to run the JavaScript are left high and dry. This includes people like me that disable JavaScript by default (I because it makes the web so much faster and less annoying; others for privacy reasons and similar), but it would also help people on poor-quality connections, especially if the JavaScript is loaded in a different connection (which will probably not be the case here), because it’s surprisingly common for parts of pages’ resources to simply fail to load sometimes on poor-quality connections.
2. Various tooling like link checkers likewise won’t be able to work with this. As it stands, even link checkers that ran JavaScript would fail on this because you’re also using location.replace to show the “bad link” error. (Another of my pet peeves: doing a redirect to a “not found” page, rather than serving the “not found” page at the original URL with status code 404.)
3. It’s perfectly feasible for the connection to https://nlsn.cf to have succeeded, but the connection to https://api.github.com to fail. When this happens, you’re left trying to decide what to do; and location.replace("/") is a very bad solution, because you’ve now trashed the URL and I can’t even just try reloading the page to see if it works second time round. If you use HTTP redirects, everything will work perfectly in all cases, regardless of which connections succeed or fail.
4. Browsers have redirect loop detection, but you’ve opted out of that. In this thread someone pointed out simple loops and you’ve now tried to work around this by telling it “don’t redirect to the same domain”, but this really isn’t enough: you can easily have A → B → A. If this was done with HTTP redirection, the browser would twig and show an error and stop; but because half of the loop is done with location.replace, this won’t be caught. (Also, there are perfectly legitimate cases for a short URL redirecting to another short URL which redirects to a long URL, which this has now broken.)
----
A more amusing concept that occurs to me is implementing the URL shortener in a service worker, so that you can issue proper HTTP redirects, while doing it only on the client side.
Thank you for the reply! I definitely learnt a lot from your comprehensive replies. Personally, I am just getting started in web development and I have traditionally written everything with the help of frameworks like ReactJS and the like and the abstractions these frameworks provide means I actually don't know the nitty gritty of how things work. Cross browser compatibility or efficiency was definitely not something I considered and they are definitely very valid considerations. And HTTP redirects via service workers sounds damn interesting! I will go read up on those now! Thank you for the comment I did learn a lot and will work these suggestions into the code base when I do have the time :')
On the topic of service workers, doing it that way is an amusing concept, but service workers are strictly only for optional enhancements—you can’t guarantee the service worker will be run. The best-case scenario is that the second and subsequent links people open could be handled by the service worker.
Since I am running a URL shortener service https://blanq.io, my two cents:
1. I looked at https://github.com/nelsontky/gh-pages-url-shortener/blob/mai... and it is a basic script that does nothing more than redirection. A lot of URL shorteners on the web do a lot more. Premium ones can track users and route links based on client.
2. Most of URL shortening audience is non-technical and they usually like to pay and forget about it. Hosting and running your own service is an overhead.
3. Handling traffic at scale is challenging. So, if you have a lot of hits, I would not advise spending time maintaining the service.
How come I couldn’t find your service when I was looking for alternatives for bitly? The only results I got via Google, product hunt ... services that were overpriced and very limiting (nr of teammates)
Yes I agree with all of the above. And with my fixation on URL shorteners, I did have a dream of creating a full fledged URL shortening service like yours. I wasn't technically skilled enough for that then and thus didn't do it but your site does look great!
Thanks. I started this since it had a low-entry barrier but, going from basic to a mature product was quite challenging and a great learning experience personally.
it does need a database though. it just uses someone else's database