Hacker News new | past | comments | ask | show | jobs | submit login
Medium tries to prevent people reading deleted articles on the Wayback Machine? (selectedintelligence.com)
432 points by scandox on May 1, 2018 | hide | past | favorite | 306 comments



Over the last year I've come to trust effectively nothing on the internet. I've had so many Spotify playlists I was following disappear, whole artists, websites, articles, etc.

I'm slowly moving to offline-first versions of all the information I care about. Edit: This change too also lends to the 'slow web' (or just slow $whatever) movement, which I'm a fan of.


One of the first things I do with any article that I've enjoyed is download an HTML copy and create a PDF. Too many times I've bookmarked something only to come back a year later and it's gone.


I used the Firefox extension Scrapbook, until the change to WebExtensions killed it.

I feel changes like this are incrementally making the Web "theirs" and not "ours".

Separately, someone replied on here to me, a few months ago, that archive.org's policy for respecting -- or not -- robots.txt was in the process of changing.

I don't think that putting up a robots.txt policy should be able to retroactively remove from archive what was previously public. All the more so when the domain in question has changed hands.

But I expect this nonsense to continue. So, I only trust local copies.

Unfortunately, for me, killing the Scrapbook extension made them less convenient to collect.


Do you know about Web ScrapBook, a WebExtensions successor to ScrapBook X?

https://github.com/danny0838/webscrapbook


Thank you. I will look at this.

I looked for alternatives, around the time Firefox stable transitioned. I recall that ScrapbookX was going to try to transition. There was language about writing to browser local storage, but that appeared to me to be too constrained for my use, as well as not yet being in place.

Since then, I've taken more cursory looks but not found a suitable extension. I recently learned about Wallabag, but it did not appear to have the same amount of facility; nonetheless, I want to set it up and give it a go.

I've been a little distracted, this winter. So, I've not problem solved like I should.


What are the intentions of mozilla? Is it "let's take away control from users", or is it "we don't care about what our users want"? What a fail..


Me too, and I don’t rely on cloud or streaming services. If I really like a song or article or book, I buy it some form and then download (if necessary through alternative channels) a copy. I bought a bunch of audio books from Audible years ago, and taking the time to convert them to MP3 has made my life much easier.

Streaming, clouds and DRM protection are fine for discovery, but if you love something, archive it. Storage is so cheap after all!


I use [Pinboard](https://pinboard.in/) to bookmark interesting pages. It archives a copy of the page for private use, which I can access later if the page is deleted.


I started using Firefox for Android partly because of its excellent Save page as PDF feature.


Just fyi, on Chrome on Android you can Share -> Print -> Save as PDF.


Me too. Practically every video I have seen since ~2012 went through youtube-dl, along with 100x more that I didn't watch. The memory hole is real, I plan to post my kit sometime, there's a few pieces on github (until?) already. I suspect the majority of the material I deliberately archived will be gone on YT, it's somewhere around 20% and it's going to hit 50. A big chunk is historically significant.

There is a serious push to criminalize memory. The GPDR is the latest attack. It's a huge problem for power centers that people can so effectively lookup the past themselves independent of what was deleted. Archive.org is a gold canary and they know it. They must partner with other people willing to accept copies, it's too easy and valuable to attack them otherwise.

btw (unrelated to video): https://github.com/bup/bup is awesome.


Youtube is another big one too! youtube-dl is a godsend.

> There is a serious push to criminalize memory.

I hadn't thought about it that way, but it seems so true. And thanks for sharing bup!


The problem is, it's getting harder every day to find DRM-free content. For audio, CDs are getting out of fashion, and piracy channels lack the seeders because everybody is on Spotify.


Yes. The debate is gone. Young people have a much different view of 'ownership' (I struggled for a better word) than older people. For example, I remember the copyright, file sharing, music piracy arguments and debates from the 90s (Metallica, Napster! Hah) and 00s. But when I talk about this stuff now with people in their early 20s there seems to be less awareness. DRM & 'Stream everything' are the way it is, as if its some kind of inevitability. The concept of actually owning, or possessing, something (even if its a byte stream on a physical hard drive in your house) seems to be disappearing. It's interesting to watch.

I think the most interesting part is the lack of discussion.


As someone who used to have a lot of CDs and MP3s and basically got rid of all of them for Spotify, I can cite a number of reasons why I switched:

1. Convenience (I never download or upload anything, and my playlists work and are automatically updated on the devices I care about)

2. Breadth of music (it doesn't have everything I want but it has a surprising amount of breadth in things I'd never care enough to amass deep collections in)

3. Easily accessible playlists from other people (I really appreciate the "This Is <band name>" playlists especially from Spotify)

4. Seeing what my friends are listening to all the time (I get a lot of new music this way)

Yeah, stuff goes away on the service. Yeah, certain less-popular genres are patchy and incompletely represented (and are we ever going to get Tool?). Yeah, the personal library limits are a bummer (although as someone who never uses this feature, I don't care myself). Yeah, the UI is terrible for certain things (classical music is especially bad, and I really hate that single-song repeat gets turned off in so many ways). Yeah, some of their clients are worse than others (why is the PS4 client's sound quality so bad and not changeable?) Yeah, there's no lossless versions of anything (I think).

And yet, for all that, Spotify has transformed my music listening, and I've been listening to a huge array of music for almost 25 years now. I listen to so many more new and interesting artists and songs on Spotify than I ever would have otherwise. I'll never go back, personally.


I have never used Spotify, so a genuine question: would Spotify play the Dark Side of the Moon from start to finish ?

EDIT: Thanks for the answers folks, and yes, playing it out of order, or with interruptions would ruin its sonic beauty.


If you have Spotify premium then yes. I've had premium for some time now so I don't know if it's changed but without premium shuffle is always toggled. So without premium you could listen to dark side of the moon, but it'd be out of order.


Shuffle? I've never encountered forced shuffle on my free account. Is this some kind of 'the mobile version is worse' thing?


Yup, for some reason you can select individual songs on the desktop client even using a free account. Not the case on mobile.


Yes. I have done this with Spotify on many car trips :)

clarifying edit: This is for the paid version of Spotify. I've never used the free version but I believe that it plays ads in between songs.


In a science fiction novel I wrote, the main character has trouble understanding the concept of data that has a physical location; to her, it's something that's just there. While I don't know if the real world will be that extreme, I don't think it's at all impossible. I think over the long term -- and maybe not even all that long -- it's probably inevitable.

It does require some mind-shifts on all sides, including those of content creators/providers, though. I don't know that I need to "own" any of my media in an everything-available-all-the-time world, but that requires, well, everything to be available all the time. If content availability comes and goes like the tide based on contracts and deals that I'm not a party to, it makes me a lot more skeptical of the implicit everything-everywhere promise.


>While I don't know if the real world will be that extreme, I don't think it's at all impossible. I think over the long term -- and maybe not even all that long -- it's probably inevitable.

I think that we're already there. This "cloud" generation seems to think that everything that exists (or at least is worthwhile) just sits on that magical Internet to be streamed to them whenever they want (and pay for it).


> In a science fiction novel I wrote

care to link it?


Sure. The novel is Kismet, which is at the top of this page:

https://coyotetracks.org/for-sale/

And, a direct link to Amazon:

https://www.amazon.com/Kismet-Watts-Martin-ebook/dp/B01MY02O...


That's because 99% of the time the streaming service for music is better than trying to build your own library. It also seems to have a lot less attribution than video stream right now so people don't have to pay attention to where they can stream a specific song, they can use just about any app and get what they are looking for.


> That's because 99% of the time the streaming service for music is better than trying to build your own library.

What good is "better" if songs disappear every now and then?


That's the 1% of the time that it's not better. Using a streaming service doesn't stop you from buying albums or song that you really want to keep around for a long time.


One major drawback to me is the recurring cost. My feeling is that building an offline library that you truely own is much cheaper than using some streaming service with monthly recurring costs that inflate over time.


Only if you never listen to new music. CDs cost around $10, which is your Spotify sub cost per month. Imagine only listening to one new album every month.


After ten years maybe, I have too much of an eclectic musical taste to get away with that.


The streaming service is superior for discovery, but inferior for long term use. It's not a good library when an artist can remotely disable a song. I can always add new discoveries from Spotify to my personal archive. I have the best of both worlds.


Depends on your tastes. If you only really care about say 100 ish or fewer CD's worth of music then you can easily hit that and save money vs a streaming service. So for someone like me they are a complete waste of money.

Remember, 70 years * 10$ / month ~= 8,400$ for music over a lifetime. By comparison you can easily buy say every piece of music by Bob Marley and your done no need to ever do so again.

Sure, if you really care about music then a service is great.


Even then, your figure of $8400 is very optimistic. Assuming 1% inflation per year people will pay $20 per month in 70 years.


I believe that the parent comment is calculating the cost in 2018 dollars.


I see it kind of akin to how so few people carry cash on them. Credit cards and streaming are great for the day to day things and make things much easier. It's important to have a backup for the things that you care about. If the power goes out how are you paying for lunch. If your cloud photo site shuts down or has a data failure you just lost those baby pictures.


Yup, it's becoming a sad state to be able to own things and not be dependent upon the whims of others.

I went from buying all my music (physical and digital copies), and sometimes had to remove DRM from the digital copies, to stopping that and just using Spotify.

I'm slowly planning to stop paying for Spotify, but it would be very expensive to buy all the music I listen to. I think this will lead to at least 2 things (that used to be true, for me). 1) I'll listen to more local/new music that I can buy from BandCamp, etc. and 2) I'll have higher value to the music I do listen to, because I'm not as worried about glutting myself on a million new bands through Spotify. I'm okay with these 2 things.


I used to save tracks by recording them through the "analog hole", using the following commandline in Linux:

    parec | sox -t raw -r 44100 -Lb 16 -c 2 -e signed-integer - -t wav raw.wav
It still requires a lot of manual work, such as cutting the tracks, and making sure there are no "skips" (which somehow can happen). Skips could theoretically be removed using a consensus algorithm (using multiple recordings).

I just wish someone would develop a fully automated workflow for converting playlists to audio files.


I'm not familiar with the 'parec' command. Are you able to filter out other audio streams (e.g. Slack alerts or accidentally opening a page with an auto-play video)? I was under the impression that one of the big advantages of Pulse Audio was its ability to separate multiple streams.

Interestingly enough, I had similar scripts about 15 years ago pre-Napster. The earliest mp3 sharing sites tended to push full-albums instead of breaking things up by track. I had a lot of fun using some of the earlier mp3 tools to break up and tag tracks. I still have a lot those mp3s on various HDs, and I know it because my splits weren't perfect for certain tracks that don't have 2 second gaps.


In theory PulseAudio should be able to separate the streams, or at least turn off audio for selected applications. However, this was never really a problem anyway because I usually ran the command at night :)

Yes, audio processing in the early days was fun, though it's easier now because of better tools and especially bigger harddrives and faster CPUs :)


What do you mean? Google and Amazon both sell DRM free music. (Apple might too, I've just been out of that ecosystem for so long that I don't really remember)


I feel like music is one of the few pieces of media that's very easy to buy DRM free. Movies, TV shows, books, and most other things are heavily encumbered after you "buy" them.


It's a pretty shitty state of affairs, but for ebooks what I've done is purchase an old model kindle and buy ebooks for it. I then crack that DRM using Calibre (easy to do with the old model kindles, you only need to enter the serial number). This has worked to archive all the books I've purchased so far but there may come a time when Amazon will only deliver ebooks to kindles with stronger DRM.

I don't feel bad about doing this because I'm still paying for the books and I'm not distributing the backups I make. I'm not clear on whether personal backups are a legal exception or not, but I don't really care.


I do the same with audiobooks from Audible, though the cracking process is a bit more complicated. I have no moral qualms about ensuring I will always have a copy of something I bought even if I leave the platform I bought it on or that platform goes under.


I buy almost all of my music through Bandcamp these days. The nice thing there is that I can buy digital, vinyls, and sometimes even CDs or cassette tapes.


iTunes is now drm-free except for streaming


I just use Usenet. I tend to purchase media, or use a streaming service but I no longer pay attention to copyright. I have legal and non legal options. Whatever is easiest is all I care about. Drm/copyright has tilted so far I simply don't care anymore.


Unless something's changed, you can still buy DRM-free high bitrate tracks from iTunes.


I don't typically visit Wikipedia, but when I do, I read something and then try to visit the source(s) for that something. More often than not, the source URLs are dead.


Archive.is doesn't seem to be affected by the redirect strategy.

article: http://archive.is/gPcBW


Archive.is goes to a lot of work in order to make archiving pages on widely-used sites actually work, from what I can tell.


Archive.is is awesome. It just works.


Archive.is is awesome but I’ve seen multiple really big sites archive poorly. Some of them because the version of Chrome that archive.is is using is getting quite old.

They are running Chrome/41.0, which was released in the beginning of 2015.

https://archive.is/3PxoF

> It is very tricky to run, it depends on an exact version of Chrome, which binary also must be patched in order to reduce security (to allow saving content of frames, etc).

https://blog.archive.is/post/45984102073/can-the-archived-pa...


Who pays for it?


I've always asked the same question myself... I don't think they are a nonprofit like the IA. Their FAQ says "It is privately funded" and wrt ads "I cannot make a promise that it will not". Years ago there was an archiving service that displayed ads, but unfortunately I don't remember which one it was... I vaguely remember it could have been archive.is, but I'm not sure.


> So it looks like Medium has embedded a method to frustrate the casual user of Wayback Machine from seeing articles that their authors have removed from the original site.

It strikes me as less likely that Medium is doing something intentional to prevent reading deleted articles, and more likely that the author of this post is making assumptions.

Besides, archive.org has a policy of respecting copyright. All you have to do is ask them to not re-publish, and they will. No need to engineer wacky redirects that don’t work anyway.

http://archive.org/about/faqs.php#20


You might like that :

root@localhost:~# links -dump 'https://web.archive.org/web/20160826003417/https://medium.co... | nc seashells.io 1337

Results at https://pastebin.com/SMGBscz2


I used w3m -dump https://web.archive.org/web/20160826003417/https://medium.co...

Produces a fairly nice, readable version also.


Actually it's not necessary to use text-mode browsers as the trick used here is JS-related, so it's enough to switch JS off.

Actually most nuisances on the web today are JS-related so I have a button for quickly disabling it. It works like a charm, also for this case.


Hey, seashells is neat. I can't get a non-http port out of corporate nannywall though.


When I delete an article from my blog, it's because I don't want anyone to be able to read it anymore. Be it shame, inaccuracy, change of mood, etc... I think there is something fundamentally wrong with wanting to have EVERYTHING backed up at all time against the creators' will.


> When I delete an article from my blog

It's not yours anymore once it's public. It's like saying "when I do something crazy in public I want to be able to make everyone forget about it later on". That's not how things are supposed to work.

> I don't want anyone to be able to read it anymore

People can still have local copies so you have absolutely NO CONTROL.


Publishing used to be a really heavy-weight process, because printing and selling access carried a great deal of expense and ceremony. Trying to keep your book out of the local library to enhance sales revenue was a no-no.

Interesting how the landscape and conversation has shifted. Sadly, it doesn't appear possible to even have a definition of publishing that both maintains the right of the public to access information, and allows for individual privacy.

Options like disabling crawling are insufficient, essentially web servers have to read author's minds to divine their intent in order to not screw up. Don't crawl certain kinds of content and you might be accused of discrimination, providing services to one group and not others.

Ownership is a weird thing.


> People can still have local copies

Yes, but under current copyright laws they are not allowed to distribute them without your consent.


Depending on the license. If you have a CC-BY-SA or public domain at the bottom, then you're allowing people a legal right to keep a copy.

In practice, the nature of the web allows everyone to keep copies of everything if they want. But if someone republishes it, then you need to use the DMCA process (for American websites) to take that content down if it violates copyright.

Going back to print, what if you print something you don't want out there anymore? Well if people have already bought your book or magazine, they have bought a right to that physical copy. They can even sell the book or magazine to anyone else (granted that the content isn't illegal).


But they can't make copies of their bought copy and distribute those, which is akin to what the Internet Archive does.


That's not the point. The OP mentioned he wanted to make sure nobody can read it anymore. Private copies prevent that scenario.


Except they do in Europe under the 'right to be forgotten'.


That's what they think. Fortunately for the Internet, there are lots of unaffected servers outside it.


And unfortunately for Joy Reid.


> Except they do in Europe under the 'right to be forgotten'.

It's not because they say it exists that it makes it valid in practice.


I could say that "black is white" and yet it won't be so. The EU could say it, and afterwards black would still be black and white would still be white.


> It's like saying "when I do something crazy in public I want to be able to make everyone forget about it later on". That's not how things are supposed to work.

Says who?


How would you go about erasing actual memories?


In real life, things done when a person is young are often forgotten with time, and the way the brain works is that time lessens the importance of a memory, so even if it's not completely forgotten, the memory is fuzzy and considered inconsequential.

On the internet, everything is as fresh as this morning, forever. No longer do we have "do you remember when so-and-so used to say that stupid stuff, boy have they changed!" Now it's "look, here's a link to so-and-so staying stupid stuff, now we know what they really think!"

The idea that once something is in the public it should remain there firmly embedded, forever, makes sense on the surface, but definitely seems to break down when examined closely, in my opinion.


The passage of time is the typical method.


What if someone records their memories on paper?


I really like this post:

http://archive.li/fi5Xn

It's been deleted by its author and archive sites are the only places where I can find copies. I've saved a copy for myself just in case. If you use an article like this as a source, it'd be nice if there were a copy somewhere.

This is one of my favourite independent movies:

https://www.imdb.com/title/tt1527628

I bought a DRM-free copy from the writer/director back when they offered it on their website. The main website is still there, but the whole purchase/download system is broken and those domain seem to have expired/been purchased by someone else. I almost lost my copy of this movie, but luckily I found it on one of my off-site backups.

People talk about how much content there is being created, but there's an incredibly amount of content that's being lost forever. Even if it's still out there, search monoculture (today we have Google/Bing/DDG where once we had Lycos, Hotbot, Altavisa, etc. etc.) can effectively keep content from being accessible. There might be something nice about the ephemeral nature of that content, but there's also something sad there as well.

To go back to your point, if someone publishes an article, it is nice to be able to see it again in the future. If they don't want it backed up, there are procedures like DMCA (if the author didn't publish the content to the public domain and the archiver is based in the US).

As a side note, we've already seen on here that the Right to be Forgotten is more about censorship than anything else.

Eventually we'll all go extinct, our sun will burn out, and everything that ever was and is will be lost. So preservation efforts really only go so far, and this brings up some more deep philosophical ideals about the ephemeral nature of what we produce more than anything else.


> if someone publishes an article, it is nice to be able to see it again in the future.

I think there's an important point here about the difference between access and attribution. People talking about the right to be forgotten are generally opposed to attribution - someone like the top-level poster wants to be able to un-claim a blog post. But people talking about archiving are split between attribution and access - wanting to simply be able to see content, regardless of where it came from.

Two of my favorite bloggers have deleted large swathes of their work, both for reasons I think are inapplicable to me. In one case, they got a job in medicine and removed lots of content that might be unoffensive generally, but could upset a hospital HR department. In the other case, I believe she was worried about the impact her work might have on suicidal people.

In each case, the author wanted to stop having a comprehensive, owned body of their writing, while I simply wanted access to the text. I could give a damn if they accept ownership of that writing - it had interesting ideas and I simply want to be able to read it again.

This isn't a distinction I see made often; work is either in its source location or archived in an attributed way. But there are some cases where I'd be quite happy to get un-attributed access to the actual content someone created.


This is an interesting article, if you liked I would reccomend heavily taking a look at Nick Bostrom's Superintelligence. Even though the book has it's own share of problems, it is an interesting approach to Artificial General Intelligence.

On the other points, I would believe this is related to the fact I personally (and I take most here) have spent way too much time on the internet on the past decade(s). I find it amusing the number of times I remember something I've seen or read in the past and how hard it can be to find it again. Or the number of broken links among old blog posts and articles.

Lastly, our lives are way too goddamn short. Thinking that far into the future is hardly productive in my opinion, even if you consider it can be very enlightening.


> It's been deleted by its author

That's what AI wants you to think. ;)


Thank you for the article link. If anything, it has only become more relevant by now. Do you know why the original has been deleted?


As I mentioned in a top-level comment, you can use robots.txt to disable Internet Archive's (Wayback Machine) archiving of your site. If you dislike your site being archived, I suggest you do this. Their intention is not to be a monster of the Internet who embarrass creators against their will.


This is a major issue with the internet archive because it allows subsequent owners of a domain to blackhole the previous content.


One solution to this might be to maintain separate archives when a domain changes ownership, based on WHOIS information. Previous owners could still request deletion via email as is currently possible.


Yes, I think that's pretty obvious. This is an issue of the public's interest vs the author's interest. Of course anyone would like to rewrite history to suit themselves. It does a disservice to everyone else however.

"We were always at war with East Asia" and all that.


When you published your article though, you wanted people to read it. You can't blame the people around you for changing your mind. And "nothing gets deleted on the internet".


I won't blame people for storing a private copy, but I will blame people for sharing such copy if it is against my will, especially if they're claiming it is their right and thus my integrity has to suffer.


You are breaking the social contract of copyright when trying to use it as a tool of censorship. The whole underlying rationale for modern copyright was spelled out in the first country to adopt it:

>To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.

We, the people, trade our natural right of copying and selling anything we see for the benefit of increased dissemination of art and science.

Your integrity is kept perfectly pristine if your work only stays in your draw. The second you publish your work it is no longer yours.


What social contract? Technically, archive.org is in violation of copyright. It does not matter if robots.txt allows it or not. Pointing people to "content" like a search engine is fine. Displaying "content" on another website is not.

Even worse is nyud.net, which essentially steals credit.


> and thus my integrity has to suffer.

What sort of integrity is it, that can be adjusted by erasing your past actions, to hide the mistakes you've made and only present the successes?


That was put well enough to make me wince on reading. Ever thought of going into politics?


Nah, I think Google has documented my integrity well enough to make that a non-starter :)


I just looked on the news. Apparently none of that matters.


It is the right of all librarians.


What about a situation where say someone who has a perfect memory, reads it, and regurgitates it word for word or even close enough to get the same meaning across?

The problem I see with once you've published something public - whether you've deleted it or not - is if and when they reference it (especially if it's older) in part or whole -- is if they reach out to ask you if since posting/writing something, if there are any comments or updates to it, so they then post that inclusion.

Perhaps your knowledge wasn't as evolved, therefore your understanding wasn't to conclusion - which is what the process of learning is all about. Perhaps you were going through a difficult time and you revealed things you're embarrassed about, and you have a fear of ridicule or other. That's where understanding and compassion then hopefully kicks in with the reader. If you can understand these processes yourself, and be able to forgive yourself and therefore others - or forgive others and therefore yourself - then great. If worry is strong and remains strong, then perhaps the solution is developing self-awareness to understand the nitty-gritty and nuances of emotion, to help understand the worry, where it comes from - and developing the tools and skills to help them settle.

Edit to add: You'd be surprised at what people deal with on a day-to-day basis or what they had to deal with in the past. Suffering, especially emotional suffering including discomfort and fear, is a strong teacher - it's something you have to develop to be open to - otherwise the ego mind will learn and want to logically control a situation, vs. managing it - which comes with developing self-awareness, therefore better self-control, and better self-regulation.

Where I'm at in my life, on my path, I currently fluctuate between having difficulty coping with chronic pain that I have to manage which affects my executive function and decision making, and between being very suicidal; most infuriating is that I have had success with healing some of the pain with stem cell injections, however for unreasonable/irrational decision that was made, I've been blocked from getting more from the doctor who first did them - and that has cascaded into making it challenging to find another doctor to continue them, I've come to the conclusion that our health-"care" system is very broken, and which I could go into the nuances of from my experiences, however I won't in this comment.

Yes, there will be assholes who will judge me for sharing that and think negatively for that, and they are people haven't developed those skills and understanding or compassion - at least not yet, or perhaps never, however we can forgive them for that - their genetics and life path, the environment they were born into wasn't up to them.


I'm not sure you understand that putting something on an http server is publication. It is legally no different than standing on a street corner handing out flyers.

You are suggesting that the author of a flyer should be able to demand that everyone who took one burn it immediately, and have the force of law to ensure it happens.

That is not how copyright works. Copyright is intended to encourage the creation of new works, by granting for a limited time the exclusive right for an author to reproduce and copy their own works. Once those copies exist, and pass out of the author's possession, copyright does not grant any further control of them, other than to forbid those copies to be used to produce additional copies.

This, of course, raises the question of whether serving a digital document on an http archive server is violating the author's copyrights. A library may keep a copy of a pamphlet and allow patrons to view it without violating copyright. But web servers work by stamping out a perfect copy of the document and sending it out over the network. There is no physical embodiment of the document. If the archive were to display the document on a monitor, and then serve a video from a camera, pointed at that monitor, that would be analogous to viewing the physical copy of the pamphlet, but then that pointless fiction could be dispensed with by removing the monitor and camera and transmitting the digital rendering. So we have to fall back on the intent of the copyright act.

Clearly, the copyright act is intended to expand the amount of available works, by granting a temporarily profitable monopoly. Does the prevention of archiving further this purpose? Hell no. Archiving is essential for those works to eventually enter the public domain. When the original creator of a work has abandoned their attempt to monetize their efforts, to the point where they are now trying to destroy their work, it should escheat to the public domain immediately. If you didn't want it out there, the only remedy would have been to never publish it. You cannot erase prior publications by abusing copyrights. The law should not protect book-burners.


> It is legally no different than standing on a street corner handing out flyers.

It's not legally different, but it's still different, which is something that I think people on both sides of this debate sometimes selectively forget. Before the web, those "street flyers" were pretty unlikely to go viral and be seen by millions of people. They were pretty unlikely to get, well, much farther than that street corner. And there certainly wasn't a widely-known and shared infrastructure dedicated to capturing copies of the flyer and preserving them indefinitely.

I don't know that there should be a "right to forget," but in the pre-digital era, things had limited circulation. They went out of print. You couldn't control what happened to copies after they were printed, no, but nobody could say, "You know what, I don't want this thing to go out of print, so I'm going to put it back into print whether the author likes it or not." To use your flyer example: I can't legally demand people who have my flyer burn it, but I can legally demand that people don't make copies of my flyer and hand it out on street corners of their choice for the next thirty years.

The closest thing the web has to the concept of "out of print" is, well, taking things offline. And a lot of things that go offline undoubtedly should be preserved. I use the Internet Archive all the time. But at the same time, I'm not convinced that the answer to someone saying, "Hey, this thing I put online 20 years ago and took down 10 years ago is something I'd really like to keep out of print" must always and forever be, "well, you should never put anything online that you'll ever reconsider at any point in your entire life, you fool."


I think the "out of print" argument is a cop-out. That's allowing the economic constraints of physical copy production to override the ideals behind the law.

Things went out of print because the unit cost of producing one extra copy was much higher than producing 10000 extra copies. As demand for copies tends to taper off over time, you eventually reach a point where you simply cannot produce just one extra copy at a cost lower than the price the next customer would be willing to pay for it.

No such pressure exists for digital reproduction. Every additional copy costs the same low, low amount. The author then has no reasonable argument for refusing to make an additional copy.

And yes, there was infrastructure for capturing and preserving copies of print flyers. It wasn't all-encompassing, and didn't catch everything, but there are many museums of ephemera now that have extensive collections of published material that was of limited circulation (and limited literary value). For those items that were expected to get thrown away or used as toilet paper, there was always the possibility that someone might have saved it, and it could still be around in some form 200 years later.

It is entirely reasonable for an author to demand that no one else make and distribute copies of their work. But in my opinion, if you can find the author, and make them a reasonable offer for a new copy of their copyrighted work, and they refuse to make one and sell it to you (or to license the right to make your own) then they have essentially abrogated their copyright. You would then be morally (but not legally) justified in copying that work from another source.

When something is published, the genie is out of the bottle. No law can stuff it back in. And copyright was intended to protect the livelihoods of creators, not to give them the ability to more easily destroy what they have wrought. Thus, whenever there is any confusion or ambiguity, I always personally interpret a copyright situation with the test "is there any way this might lessen the creator's ability to sell (or otherwise monetize) one more copy of this work?"

If the creator is no longer attempting to make money from a work, screw their copyrights. We granted them that limited monopoly to make enough money so that the effort of creation would be worthwhile to them. If they don't care to sell, I don't care to protect their ability to sell exclusively.


Why not publish a retraction and list the reasons why you no longer think the post is correct instead of trying to hide from it? Maybe someone will learn something that way...


Why not just erase it.


Anything you have ever published is no longer entirely yours and the act of doing so is also a relinquishment of control. Only secrets can ever really be regarded as possessions. Archivists should not require permission to store that which has been put out into the world.


> Anything you have ever published is no longer entirely yours and the act of doing so is also a relinquishment of control.

This is an incredibly bold and huge presumption which absolutely does not mesh with the Berne Convention to begin with.


It meshes perfectly fine. You don't give up all control but you do give up a lot of control.

When you publish and sell a book, you no longer control who that book is later given or sold to. You retain only rights over the ability to make copies, and even then, you can't block people from making copies for personal use.


It meshes well with reality though.


Mainly problem with any services is privacy. There's no privacy on internet.

If you have a blog you have to count with fact that an article will be on internet forever.

My advice is this - write an article today but publish it tomorrow (you'll have time for thinking about an article).


And I'd add to that advice: Never be afraid to add corrections and show them.


Suck it up. Once you go out of your way to make something public...it will remain public


Not if you delete it.


If I downloaded it...it's not getting deleted regardless of the authors wishes. If you put it on the internet it's not going away ever


Sounds like you need the GDPR's "Right to be Forgotten" clause to be in effect.


The GDPR only cares about personally identifying information. So you might be able to ask me to remove your name from the archived blog post, but not the post itself (as long as it does not contain other PII). This is very important, as otherwise someone could vandalize Wikipedia or Github by requiring them to delete all his edits/commits.


A simple solution: don't publish things you don't want people to read.

This is no different to burning books that the authors no longer support the views in.


That's actually pretty different. If you purchase a book, you pay for a copy with the expectation of both parties that you own that copy.

Putting someone else's work on a different website and serving it to others would be similar to printing copies of someone else's books. If the author changes his views, it wouldn't be acceptable for someone else to print new copies of the book. Making a copy of a website for your own personal use would be much closer. I don't think anyone disagrees that you should be able to do that.


>If the author changes his views, it wouldn't be acceptable for someone else to print new copies of the book.

When the author burns their own book and forbids new ones from being made is the only time that it's acceptable to print new copies.

Need I remind you of the purpose of copyright?

>To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.


>When the author burns their own book and forbids new ones from being made is the only time that it's acceptable to print new copies.

Do you mean acceptable from a legal point of view? Because I seriously doubt that it is.


Without that view, we wouldn't have most of the works by Kafka.


No, the only issue raised by Kafka's request to burn his papers is whether or not the copyright passed to someone else on his death. All works eventually become public domain, regardless of the author's wishes, since people die and copyright has a time limit.


I don't understand your point. "securing for limited times to authors and inventors the exclusive right" means others cannot print copies of your books.


> for limited times

The author's exclusive rights are only intended to be temporary. Once they're no longer secured, the work can be used by anyone for any purpose.


>If the author changes his views, it wouldn't be acceptable for someone else to print new copies of the book.

That depends entirely on the publishing contract's rights reversion stipulations. An author deciding that they do not want their book to continue being printed, purely because they have changed their views, will often find that they have very little legal right to do so.


I both agree and disagree.

Yes, definitely think before publishing, even if it's meant to be "private", say a self-destructing private message, because anything can be saved somehow.

On the other side, we change. Things that were relevant to me, things I genuinely thought to be true 16-18 years ago (I had a blog for a while now), I see a bit differently today, and I might like my current, tuned back views to be read, not the ancient ones from a disappointed teenager.


So you're incentivizing people not to write. That's bad, we should encourage people to write. It's OK to make mistakes.

I see this problem as an even bigger problem with kids. Kids put everything they do online, and they will probably be ashamed of a lot of these things later in life.

Comparing that to Facebook: I guess I shouldn't upload pictures on Facebook because it's against my right to want to see one of my old picture disappear later?


People should be encouraged to create, but with the knowledge that anything published may be retained by others, and that it can have consequences. No technological measure will prevent people from making personal backups or gaining access to data published under the presumption of secrecy or time-limited availability — even if all the layers of DRM work, the analogue hole will always be there.

Rather, invest in teaching kids how to safely publish under pseudonyms or anonymously if they wish to publish their angst-ridden teenage vampire poetry. You can always abandon your connection to that work that way — even if the work lives on for all the world to see.

> Comparing that to Facebook: […]

You should indeed never upload anything that you might wish to expunge at a later date. You have the right to see an old picture gone from Facebook, but you don't have the means to enforce removing it from your cousin's private backup on their own computer.


> It's OK to make mistakes.

A sentiment that is harder to sell if people erase all evidence of their mistakes.


More like incentivizing people to accept responsibility for their actions.


> I guess I shouldn't upload pictures on Facebook because it's against my right to want to see one of my old picture disappear later?

If think that you will ever want real control over the pictures, then yes, you should avoid posting them to Facebook. I'd think that that's fairly obvious.


If no one can read an a book what good is it?

You are purposefully muddying the issue of access. If you upload your pictures on a password protected ftp server no one will know or care when you delete them. If you upload them on a publicly accessible website, despite what outdated laws on copyright say, you won't be able to withdraw them and that's a feature.

Facebook is a monstrosity that pretends to be a password protected server but leaks your data by design. Do not try to equate someone unpublishing a work after they wrote, uploaded and publicized it with someone removing a drunken status update that should have never been public in the first place.


> If no one can read an a book what good is it?

Plenty of good. As a writer myself, sometimes just having these thoughts put out into form is a very gratifying process. When the inspiration hits you, nothing feels worse than not being able to express these thoughts and feelings in a way that feels appropriate.

Likewise, sometimes people write dumb things in a fit of passion. They write something that is a blemish on their otherwise fine history or that no longer reflects what they believe anymore.

While I'm on the side of the Internet Archive here, I can definitely appreciate that it's not an open and shut case. Sometimes the yearning for information to be free is at odds with our want for privacy. Tools like Medium, Twitter, Facebook, and other social media are like the gun in the house to someone suicidal (to use a very bad analogy) - an easy and convenient tool that allows for a very bad spur of the moment decision to be made.

I know I've done stupid things on games and on message boards in the past, and I'm only so lucky that this data likely isn't available anymore. Some of it was me being a dumb kid. Some of it was me just being an angry kid, but I am 100% grateful that this information is only remember by myself and a handful of others. A very different past me had a very stringent set of beliefs which I now have come to accept were very bad beliefs, and I did bad things in general as a result of them.

Is it really fair that I have the benefit that time forgot all these dumb things I've done simply because I was born before a time when Twitter/Facebook were common place? Before data permanence was really possible on a global scale? I'm open to the idea of some review on archiving data like this; I want the Internet Archive to be able to archive this stuff, but it would be really nice if there was a way to vault it for a reasonable period of time as well by request. Otherwise, you end up in a position where no one wants to write or produce or do anything in a fit of passion as a result of knowing that everything is permanently preserved.

I don't have an answer aside from "vaulting" the data, and I don't think that's a good answer. But I also don't think it's black and white like you're trying to make it.


I agree with the sentiment, but I'm not sure what can be done about it. Anyone can archive things privately. If you take away public archives, this means that only obsessive lunatics and the wealthy will be able to wield archived material as an instrument of power.

Our culture definitely needs to evolve a little and become more forgiving of youthful ignorance. That's really the only long-term solution.

Perhaps this is also a good use case for services like Hermit[1], which allows limited sharing among friends and other writers. The notion of "trying out" an idea on a platform with enormous public reach seems foolish at best. Comics test their new material in hole-in-the-wall dive bars for a good reason!

[1] https://gethermit.com


I think you're right, that ultimately society needs to mature and realize that everyone did something stupid in their past and at some point has thought/said/done stupid things. We should hold people accountable when they refuse to change, but right now we're missing the rehabilitation aspect of all this and just focusing on the punishment.

Again, I really don't have an answer. I would err on the side of caution and say IA should continue to back stuff up and it's wrong to set up the time-bombs that also affect IA. But I do feel we need a way to accommodate some privacy still without silencing people outright for just plain dumb opinions and ideas.


I wrote a book and never published it, the only person who's read it is me.


of course, but you shouldn't be stopped by people who want to backup your content at all cost against your will.


I think a difference can be made about the individual "right to be forgotten" and the transparency/accountability of a public company.


people like you and I can write on Medium.


> deleting something from the internet


Writing is not the same as publishing.


>So you're incentivizing people not to write. That's bad, we should encourage people to write.

The publicizing of things said by then Presidential candidate Donald Trump leading up to the 2016 election would incentivize people to not talk in private. Should we have banned any details about the incident from being spread if the one who said it didn't agree with knowledge of what was said spreading?

If being able to undo what you wrote so that it won't be held against you is a good thing, why wouldn't being able to undo what you said so that it won't be held against you also be a good thing?


>It's OK to make mistakes.

We should punish mistakes. Especially stupid mistakes.


I wonder how wayback machine will work after GDPR? I can't imagine they can just show content that the authors deleted from primary sources?


GDPR is not relevant to this. It’s not about enforcing copyright on people’s work.

It’s about ensuring that companies only store and process privacy-sensitive information about people which they are given consent to store and only used for the purposes the consent was given.

There is nothing privacy related wrt the author in a public article published worldwide for everyone to read. Clearly outside the domain of GDPR.

It’s not hard people, just common sense. Just treating user-data with respect. Let’s not fool ourselves into thinking it’s harder than it actually is.


However from experience GPDR will also be abused by jobsworths to avoid doing something either through laziness or for more suspect reasons.

Just like H&S and the Data Protection act are abused today.


I assume they'll continue as now: take down copies on request. It's not like they didn't have to deal with content people didn't want to archive them before.


I doubt it. GDPR is the obvious next step in the war on GPC (memory specifically). archive.org exists to fix that. If they fall because some other country has no 1st, they fail. Other people have copies.

Who exits next?


GPC?


I assume they mean General Purpose Computing.


I haven't read GDPR in details, but isn't GDPR concerned only with personal/private data? The Wayback Machine only archives public pages as far as I can tell...


Yes but the article itself is user-provided content to Medium that the author has a right to ask to be deleted (under GDPR), presumably? So perhaps it will be simply a matter of the The Wayback Machine having to have a policy to delete things if requested?


No! GDPR is about personal data, which is well defined in the regulations and does not include blog posts. The right to delete data (or "be forgotten") is nothing to do with GDPR. If the original post contained personal data, it is a different issue but if that was put out into the public domain, it is a hard problem to solve.


What if the blog post contains personal data?


Most pages have an author section already.


So add some "personal data" to the end of anything you might want to demand someone forget later.


Or you could send a good ol' DMCA takedown request.

I'm not sure where this idea that nothing could be forced off the web before the GDPR came from.


No, if you intentionally made that data public then it's done. GDPR doesn't, say, force you to remove political views of Theresa May from newspapers, despite that being covered by personal data, because Theresa May made those views public.


So if was subject to the GPDR, and published my nginx logs in real time, I could stop worrying about scrubbing "personal" data from them on request?


The Wayback Machine has always had a policy to delete things if requested, so there's no real change there. The most common way site owners do that is by changing robots.txt. In line with the Oakland Archive Policy [1], the Internet Archive respects robots.txt retroactively, so a site owner can get archived versions deleted just by excluding them in the robots file. Besides that, they respond to DMCA takedowns, one-off removal requests [2], etc.

[1] http://www2.sims.berkeley.edu/research/conferences/aps/remov...

[2] http://archive.org/about/faqs.php#2


Changing robots.txt does not delete content from their archives. If you remove the robots.txt file, the content becomes viewable again.

There's no scenario where they can respond to the vast scale of GDPR violations that their archive likely represents, when it comes to manually removing content. There are only three possibilities: avoid the EU as much as possible, dump the archives and start over with an entirely different approach, or shut down. Besides that, these laws are going to get a lot more strict and difficult to comply with, not less strict, over time. This is merely the beginning of aggressive regulation of the Internet. Regulation of the Internet will only move one direction from here, in the direction of increasing burden and ever greater regulation. It's hard to imagine Archive.org's archives surviving what's coming.


There's no scenario where they can respond to the vast scale of GDPR violations that their archive likely represents, when it comes to manually removing content.

"GDPR violations". What's that, exactly? As far as I know, you only have to remove personal data upon request, no preemptively. So I don't see how they are "violations".

Will a lot of people make these requests? Possibly, but where's the evidence of that? People have been able to use copyright takedown requests (e.g. under the DMCA) forever, yet the Archive is still around.


Actually the recommended data handling says you should specifically state the purpose for needing the data, and that it should be reasonably limited to that need; i.e. if you don't need it any more you should pro-actively delete it.[0]

[0]https://ico.org.uk/media/for-organisations/documents/1475/de... Pages 4-6


They do have a legitimate interest (in the sense of article 6(1) of the GDPR), namely providing an internet archive.


I would agree in the case of the wayback machine they have a very strong case under article 6(1).


Could conflict with “right to be forgotten” however


If public page can be linked back to you then can be considered personal information and therefore be subject of GDPR.


No. That’s not how it works.

Read the law before posting wildly misleading comments like this.

If you explicitly make something public, you can’t later come and claim that this information is actually crucial to your privacy. If so, you yourself was the one who violated that privacy, not the company later archiving/caching/processing your public article.

GDPR is all about decency and common sense wrt. user data and privacy.

No need to spread FUD about something that simple. SV proved tech companies can’t be trusted to act ethically, so here comes the regulation. Deal.


Given the immense scale of Archive.org, there must be a truly incredible number of sites & pages with personal data & content in the pages. Millions upon millions of pages, due to the repeat archiving.

Comments with usernames. Comments with ip addresses (sometimes old comment systems would allow you to comment without registering but they'd show all or part of your ip address). Comments with personal information in the messages. Comments with email addresses. Blog posts with all sorts of personal details from the author. Personal user account pages, such as the kind you see on sites like Ask.fm or similar, with vast amounts of user information and personal details that can't be deleted. And on it goes. Archive.org is storing all of that and does not allow it to be deleted. Further, it would be nearly impossible to figure out what content is compliant and what is not within the archives. It's a giant GDPR violation system. Their only sane bet is to stay way from the EU jurisdiction wise as much as possible, or shut down.


Why would GDPR apply to the internet archive? It's a US based nonprofit. As far as I can tell they don't do anything that even remotely hints at them providing services to EU residents (like offering their site in European languages, having the €-symbol somewhere on their donations page or any of the other more subtle things mentioned in GDPR).


Is English not a European language?


Doesn't matter. Using the English language doesn't imply that you're offering goods or services to individuals in the EU.


No more than Spanish, and you wouldn't be making this argument wrt a Mexican company.


Not since England left the EU.


Britain is still in the EU and GDPR will apply there.


And Ireland?


Ireland's language is actually Gaelic.

It's the only country in the EU that does not require it's language in translations because even in Ireland nearly no one speaks it.


Ireland has two official languages defined in the constitution – Irish and English.


I think he refers to the fact that each EU country notifies one official language and only UK picked English. But I guess they have found, or will find, a way to maintaing things as they are.

http://eltelawjournal.hu/what-language-for-europe/


Because the data they have might have been produced by EU citizens.


Who cares? Me posting my (very German) name and address on somebody else's website (blog comment, forum or whatever) doesn't magically make that person have to comply with GDPR.

According to [1] the law applies to:

1.) a company or entity which processes personal data as part of the activities of one of its branches established in the EU, regardless of where the data is processed; or

2.) a company established outside the EU offering goods/services (paid or for free) or monitoring the behaviour of individuals in the EU.

The internet archive doesn't offer goods or services in the EU (if you want to know how that's defined you have to read the actual law I'm afraid) and they're certainly not "monitoring the behaviour of individuals in the EU".

[1]: https://ec.europa.eu/info/law/law-topic/data-protection/refo...


So what? The EU does not get to make laws for other people. That's why we have countries.


You are missing the point. The EU makes laws that govern - and at least try to - protect its citizens. If a document on the archive is created by a European citizen, then it is under EU law. That's why every company in the world right now that deals with European citizens is working on supporting GDPR. That also applies here.


Not quite. The EU might want that but it gets into jurisdiction.

The EU cannot enforce its law on entities that are entirely US based. It can only enforce it on non-EU sites if that site has some sort of business that’s within the EU (like offices or employees).


The EU can say that some businesses are so uncompliant with GDPR that they're not able to be used by EU companies. It seems weird to chose to limit your market just because you don't want to protect user data.

See also changes to "safe harbour".


"protect user data" is a odd way to say "force you to forget something"

We are merging with our creations, and will laugh at this "AI" thing when we realize it's us. The WOGPC is really a struggle for self.


Sorry, but the EU has no teeth here to enforce its laws on entities that exist outside the EU.

When the first purely US based company is successfully fined or shut down by the EU I’ll believe in their ability to enforce GDPR.


Not GDPR-specific, but France.com had its web domain seized by France recently[1]. It was a private US-based business (not a squatter) that the government of France had actually cooperated with for years, until they suddenly decided it violated French trademark law and seized the domain. The domain itself had been in one person's possession since 1994.

Enforcement of national laws is very much a thing across borders, so private businesses outside the EU are right to be apprehensive about what is going to happen as GDPR enforcement ramps up.

[1] https://arstechnica.com/tech-policy/2018/04/france-seizes-fr...


Any domain can be seized, it is not your property.


Clearly. Can we name the real DNS owner? It changed hands kinda recently...


But its not personal data that's being handled and the EU is quite happy with the german imprimatur and not keen on anonymous publishing in general.


Nitpick: EU residents, not citizens, as far as I understand it (the text says people "in the Union", and doesn't mention citizens at all).


yeah, but GDPR doesn't affect copyright.


I wish you were right.


GDPR doesn't change anything in this respect. Copyright law applies.

Why would the owner of the copyright make a complaint to the data protection authorities of the EU who might choose to do nothing when they could directly file a copyright infringement case? I suppose you could add insult to injury, but the data protection agency is likely to rule that the issue is one of copyright infringement.


I post a letter on a pole. You take a picture of it. I sue you for sharing that pic, and pretend you still have free speech.


I'm not sure what you are trying to insinuate...

There is plenty of nuance to copyright, such as public forums, public domain, fair use, etc. You have those defenses available to you. However, a person can bring a civil action against you anytime you make a copy of their protected work. The case may or may not be meritorious but already defending against a lawsuit is a penalty. Archival facilities are already exposed to this risk and they actively lobby for protecting their activities as fair use to varying degrees of success.

A person cannot bring civil action against you under the GDPR. They may make a complaint to the data protection authorities who may bring action against you if your use of the data is unlawful. Therefore, there is no way to force a person to have to even face a trial under GDPR. If your use of the data is unlawful, you certainly have no license, so you are not protected less under the GDPR than under copyright law. If your copy counts as fair use, then it will count as being lawful under Article 6.1 (e) and no action can occur under the GDPR.


I'm not sure what you think free speech is, because the definition I'm aware of does not apply in any way to that situation.


> pretend you still have free speech.

free speech is the right to speak about any topic without the gov't attempting to punish or censor you.

Free speech isn't the right to speak at any (private) forum, nor is it about having the right to be heard.


It absolutely is the right to speak at any private forum that allows you to do so, AND it is the right to be heard by people who are purposefully choosing to listen.

Yes, if a private forum chooses to not let you speak, you can't force them to accept you. But if they DO choose to let you speak, then you do have that right.

Also, this IS about the government attempting to censor people.

It is the Internet Archives freaking website, that they own!


You have copyright over that letter, and can definitively sue someone for sharing a picture of it.


I thought free speech was the ability to make commentary on the letter on the pole, but not to reproduce it.

It's the equivalent of it being technically illegal to take a photo of the Eiffel tower at night, because the light show is a copyrighted artistic display.


GDPR shouldn't have a major impact here.

They already take down pages on request and retro-actively apply robots.txt rules so that solves "right to be forgotten" or other circumstances where PII is present and shouldn't be.

They have sufficiently defensible reason to keep and present the archived information otherwise.

Their key problem will remain copyright and publishing rights arguments not matters of personal data, at least not more so than currently.

(caveat: while I have an understanding of the regulation due to it very much having an effect on our clients and to a lesser extent on us directly, I am not a lawyer by any definition so don't take my interpretation as gospel in any way)


They can't, and I would like to know what Europe wants to do about it. Block wayback machine in Europe ? Well, I can still access it with a VPN if I want. Also I want to know what they will do about git and GitHub, or even blockchain project (how you delete something from a blockchain ?)

The problem is that GDPR is a stupid legislation written by incompetent people that doesn't understand the subject and imposed with no possibility of choice on member states, like all the regulations from the EU (cookie banner law, for example).

And of course GDPR doesn't impact to much the companies that they aim to fight, like Facebook, Google, etc, they have teams of layers payed millions with the sole purpose to find ways to circumvent these regulations, they will just update the terms of services and done, the ones that will be more affected are small companies, startups, personal no project side projects, people that doesn't have money to spend in a layer for a project that doesn't make him any revenue.

I think that in Europe it's not more possible to do anything, if you have a good and innovative idea and you want to realize it, better take a flight to the US...


The law is meant to allow me to delete my account from your cool SV startup, and delete meaning actually delete the data and not deactivate the account but continue using or selling my data.

The cookie law is a problem because lazy web developers did not implement it right, probably you complain about don't spam me law because it adds a bit of extra work for adding the unsubscribe link and implement the requierements.

The laws are done for the good of the society and not for helping a minority to implement some move fast break things, pivot and try again.


There is big difference in law and regulation between intention and real-world effects. For instance, making marijuana illegal has the intention of decreasing drug addiction and dependence but has the effects of disproportionately encarcerating youth aka "criminals" under the new law for drug consumption, and thus limiting their opportunities in the socio-economic system.

If you look into it I think parent is most likely correct with his predictions since they are easily verifiable i.e. big coorps do have massive teams and monetary funds to deal with this legislation, startups and one-man shops do not. This is completely ignoring the deontological question of what should be the case, where I think most would be in agreement.


big coorps do have massive teams and monetary funds to deal with this legislation, startups and one-man shops do not.

That applies to literally every piece of legislation. Yet we don't decide that small restaurants should be exempt from food hygiene laws, or that small construction teams should be exempt from health and safety laws.


Conflating information with feeding someone poison food is so par.


Your objection is not helpful.

Being careless with personal data has harmful consequences.

Being careless with food safety has harmful consequences.

This is why these things are related.


what personal data is being exposed - the EU isn't keen on anonymous publishing in general eg in the UK any thing published by a political party during elections etc must have both the printers name and their agents - the penalties are quite severe .


I'm really not sure what your point is?


We haven't seen how they're going to enforce it yet, people throwing tantrums about it doesn't help.

If it's the same as the cookie law or spam rules, they'll come in and say "we've had a complaint, you're doing this wrong, fix it". Then if you don't fix it, they'll fine you.

Not only that, but many of the regulatory enforcers responsible for this in the EU are not particularly well funded and why would they spend the limited resources they have investigating one man bands?


> For instance, making marijuana illegal has the intention of decreasing drug addiction and dependence

That’s not right, is it?

Didn’t that Nixon aide admit the drug war was a ruse?[1]

1. https://www.vice.com/en_au/article/xd7jkn/a-former-nixon-aid...


But the public is presented as if its purpose was to curb drug addiction. Could be the same with GDPR - great intentions, but the true reason is to entrench big corporations and introduce more barriers of entry for the small guy, which is inline with socialist agenda.


Are you from US? In EU we had many similar rules that were in the public favor and hit the big corporations, the one I am thinking now is the roaming phone charges, big companies lost a lot of profit from this, so you can see that this big companies do not have the power yet to change the laws for their own profits.

But I see a lot of anti EU sentiments here on HN, anything EU does is painted as anti american or anti startups when from inside EU we see it as for the people/society


> from inside EU we see it as for the people/society

No we don't. Some of us do and some of us do not. You are self-admittedly in the former group, I am not.

Also, just because something cost big companies money on one front does not mean it doesn't increase the monopolistic power of said companies and even increase revenues on another. Let me use your own example as a hypothesis we will be able to observationally falsify or not in the coming years. By eliminating roaming charges many smaller companies in the space will have to compensate for the loss of funds and will therefore either have to reduce their current plans, drop service offerings outside of the current country, or eventually collapse entirely. Regardless of the outcome, the total market competition has decreased and ultimately the mega corporations stand to win through decreased overall competition in the space. Additionally, due to lack of monetary incentives, I would expect the rate of innovation in large-scale roaming technology and infrastructure to decrease compared to countries which do not have such legislation.

Socio-economical systems are complex and nonlinear in nature, unfortunately, we i.e. humans have not evolved to think well about nonlinearities neither have we built ourselves sufficient tooling to augment our prediction capabilities for such systems. IMHO, this is the well-spring for the difference between intentions and outcomes in regulatory policy.


I think we should not be afraid of making laws and rules because we are afraid of unintended consequences, if we have such side effects we can update the law.

Your point is that we should not have made the security belt mandatory in cars because there could be a side effect somewhere like a person won't be able to evacuate in time, the idea is to calculate the benefits and the drawbacks and if benefits are much larger then we make the law and update it later.

I am sorry if a small telecom company can't adapt and compete without the roaming charges but we should not pay billions to the big companies so this small company also survives, we can make laws to help small companies like preventing abuses from big companies


We need this law, there are enough terms in it and range on how it will applied so the little guys won't have much trouble if their intentions are to comply.

I know that laws get abused but do you see the OP asking to remove laws that are in his favor like copyright law or patents law?


And the effect is quite the opposite as consumption of drug and addiction is much more prevalent than in the places where it is legal. So regulation could have good intentions - and a lot of people believe in the intentions - but the effects are the opposite.


Where are drugs generally legal?


You just have to look at the effects of the 18th amendment to see the point the OP is making.

In the USA it caused massive increase in organized crime and corruption - the effects of which are still with the US today


But you do not know the effects if all drugs were legal and cheap


> because lazy web developers did not implement it right

No, because the legislators fundamentally misunderstood cookies. Almost any website needs to have some basic tracking of users for fraud detection, bot detection, and yes, basic analytics.

Instead of writing out a thoughtful approach, we get a mandatory nag screen right up there with "This product is known to cause cancer in the state of California" on anything sold ever. Users ignore them because the information isn't useful - infinite noise, no signal.

This is the opposite of the CAN SPAM law which did have thoughtful requirements - allowing exceptions for account related emails, requiring one-click unsubscribe but also giving systems a period to obey that to handle mail already in transit.

GDPR has so far been grossly in the cookie nag screen category, except instead of a tiny bar on visiting a page I get a multi-select based dialog of doom. The answer most companies are going to take is simply not market services to folks in the EU, and those that do will implement annoying nag screens.

More rules blindly applied rarely solves problems.


For the love of god, please stop spreading these misinformed views.

1. Nobody in Europe will be blocking anything.

2. The Wayback machine will continue to operate.

3. GDPR is generally pretty well-written legislation, based on extensive experience by privacy regulators across Europe.

There are some questions about exactly how the rules will evolve in practice. The thing to bear in mind is that privacy regulators are interested in compliance, not in punishment.


I don't disagree (nor agree), but trusting Europe to not block anything is wrong. There are EU countries that are right now blocking something. Porn in UK, foreign gambling sites in Czech Republic and of course Telegram in Russia.

Edit: I of course know that Russia is not in the EU, lol. Parent said "Europe" and I added Telegram as a fun remark after two serious examples (and there are more). Calm down with the downvotes.


Russia is not part of the European Union, and FWIW porn is not blocked in the UK right now.


Russia is not in the EU.

The porn block in the UK (or opt-in block, more like) is a voluntary measure taken by ISP.


Certain types of porn are illegal in the UK (for example depicting female squirting or face sitting). In April they supposed to introduce age check to all porn websites willing to operate in the UK, which essentially means every website that has porn (for example Reddit) should be behind the pay wall, as age check supposed to be done via credit card transaction. Now this has been delayed, but I don't think they are backing out on that. Furthermore this is going to be a huge problem, because: 1) Payment processors are frowning upon the idea of servicing porn websites. That means it is going to be very expensive to implement unless government figures out a different way. 2) Companies will have to store more personal data about their viewers and users will be forced to give up that data. 3) That poses a huge risk in case of the data breach as someone sexual preferences are sensitive data forced to be collected.


There's a new one coming into effect that will force websites to verify people's age using their ID (possibly by an external provider). Or I guess be blocked? Still a fucking stupid idea.


Russia isn't EU.


> cookie banner law, for example

The 'cookie law' is actually subtle genius.

If your site only uses cookies for operational reasons, such as enabling login or maintaining a basket, you don't need to inform the user.

So anytime you see a cookie-banner that indicates that the site is doing something additional with cookies. Like tracking for ad-networks. It's a yellow-flag.


Except every single website just has a banner anyways because it's easier to cover your back then get legal involved every time you tweak something. So it has no meaning at all, in any way, except it's very confusing to some users.


Sorry, can you explain to me what's stupid about it?

Not allowing information about people to be kept ad-infinitum (and sold ad-infinitum)?

Allow data breaches caused by sheer incompetence to go unchecked?

As much as I worry about its consequences, companies saw it coming.


The average computer user has difficulty searching their email or scheduling a meeting. You expect them to complete a nag screen about how their personal information is going to be used, with sliders for opt-in versus opt-out, every time they visit a new website?

Like the cookie nag, users are going to blindly click through until the confusing nag screen goes away and then be upset that it wasted their time.


Like many aspects of culture, we may have to rely on pirate outfits to archive and preserve things, until the original parties are no longer interested in fighting about it either way, or a long enough time passes that the archived history increases in value and decreases in personal stakes.


What a large number of people fail to realise is that the GDPR applies to any person (natural or legal; a data controller and/or data processor) that holds personal data on a EU citizen or EU resident, regardless of where the data controller (or data processor) is. Obviously EU law can only be enforced in the EU but if you are a business then any funds in the EU that belong to the data controller can be frozen or used to pay court levied fines. Or if an infringing data controller travelled to the EU (or a country with an extradition treaty and similar criminal code) they could potentially be held if a court decides that the behaviour was criminal in nature (some EU jurisdictions are more strict than others).

The only way to completely avoid the GDPR is to not hold personal data of EU citizens or EU residents.


> the GDPR applies to any person (natural or legal; a data controller and/or data processor) that holds personal data on a EU citizen or EU resident

Funny, because the GDPR explicitely says this is not the case.

Art2. Paragraph 2

This Regulation does not apply to the processing of personal data:

   c) by a natural person in the course of a purely personal or household activity;


I was talking about in the context of the medium post / businesses which is what I was replying to. I don't think I have seen anyone complaining about household activities. A red herring.


Nitpick, but as far as I understand it, it's only EU residents (regardless of their citizenship). The specific text says "data subjects who are in the Union", and citizen never appears in it.

(This is for foreign businesses. EU businesses have to apply it to everyone, regardless of their location or citizenship.)


This means this should be applied to everyone because how do you check that someone is an EU resident? Should websites display a page requesting visitor to upload their residency certificate to be complaint?


No, it means they have to apply to people connecting from the EU.


Have you any document from a data protection authority or lawyer that makes this claim? or even close to it?


He's not a lawyer, but as a "GDPR implementation leader" I bet he talked to some:

https://www.linkedin.com/pulse/gdpr-does-apply-eu-citizens-g...

But really, it's plain from the text.


How will it work with copyright law, or defamation law?


I guess they could vest it in some corporation that has no feet down within the EU. Aside from actually cordoning off a section of the Internet there's not much they could do otherwise.

Though now that I think of it, perhaps blocking [the archive.org crawler] could then become mandatory for GDPR compliance ...


This seems to have annoyed a few people. I didn’t mean this as an actual practical strategy, or facetiously, was more meant as a commentary on modern global corporotisation, and a thought experiment on the limits to which the EU can enforce itself online.


I think this is more likely to be unintentional. As another comment mentioned, article.is isn't affected. If you want to remove things from the Internet Archive, you can do so using your robots.txt:

https://archive.org/about/faqs.php#14

https://www.fightcyberstalking.org/how-to-block-your-website...


robots.txt is really only supposed to be used for blocking the Internet Archives first snapshot, and not to remove existing snapshots – and even this might not be the case in the future as they try to preserve most snapshots. They made a few policy changes last year[1] to how they handle robots.txt files, to handle cases where a domain is sold and a new robots.txt file would result in deleting old data among other things.

[1]: https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...


Hmm, that may be what it's meant for, but pretty sure it can currently be used to block things retroactively too. IA may still have it in the archive, but won't let viewers view it.

As happened in this case: https://news.ycombinator.com/item?id=16919017

No? The article you linked says they've stopped paying attention to robots.txt for US government and military sites, but it looks like it still retroactively removes visibility for everything else.

I guess IA could change their practices. If medium or people like them start actively using robots.txt to try to retroactively remove things from visibility in the archive, perhaps IA will change their practices/policy. I would welcome it.


Interesting. I wasn't aware that it no longer applies retroactively. Even so, medium.com's robots.txt still doesn't try to block new crawling by the Internet Archiver:

https://medium.com/robots.txt

Or via WBM for posterity: https://web.archive.org/web/20180430183503/https://medium.co...

It seems unlikely to me that they would deliberately go to this length to prevent archival, yet not attempt to prevent it happening to begin with. Furthermore, as mentioned in your link, they still accept removal requests via email.


This information is outdated, and ia_archiver now disobeys robots.txt (see https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...). You can still get your website removed from Internet Archive but you have to contact them.


One thing I learned about archive.org and robots.txt is that they never actually delete anything. If you accidentally block their bots, then in a month or so, your old content is available again. I've blocked their bots by mistake a few times, and each time my old web content is back. Not a big deal for me, I just feel silly for my old geocities style sites. :-)


Is a certain browser required to reproduce the problem? I had no trouble accessing the archived page.

https://web.archive.org/web/20160826003417/https://medium.co... ----------------------------------------------------------------------

Interviewing my mother, a mainframe COBOL programmer

My 1 year older brother (to the left), my mother and me (to the right)

My mother has been working for one of the largest banks in the EU since before I was born and I've always been fascinated by her line of work, especially these last years since I've become a programmer myself. I've been asked to interview her plenty of times, and finally decided to do so.

   * * *


I don't really see why this is a deal. If I decide I want to delete the content I put on Medium I very well want them to delete it everywhere else as well.

One could read that as "Medium tries to protect their users content".


Completely agree with your comment, not sure why you're being downvoted. A blog is the same thing as a facebook account, would you like it if you couldn't delete your pictures on facebook?


A blog is not like a Facebook post. It's more like writing an essay, publishing it in a book, and then asking libraries to burn their copy of the book.


That seems like an arbitrary distinction, especially since Facebook can be used as a personal blog via its posting feature.


That's your definition, mine differs.


I don't disagree or agree really. I was just interested in the fact they had any kind of policy on it at all.


Uhh, is the first medium article not meant to be loading? worked fine for me, but I read it as it was meant to redirect to the homepage.

If I did read it wrong, I was able to load the second one fine as well


What are some ways around it as is mentioned in the article?


Either disable JavaScript before visiting the Wayback Machine, or stop the loading of the page just after it has loaded the text but before it performs the redirection (a bit tricky, you have to stop it at the right time).


You can't use the wayback machine without JS though, can you?


Given the link to the archive, it works without javascript for me. I disabled scripts from archive.org with umatrix and the archived page loaded just fine. The only difference from usual is the top bar that archive.org normally displays isn't present.

However you do seem to need javascript enabled to query the wayback machine from web.archive.org: "The Wayback Machine requires your browser to support JavaScript, please email info@archive.org if you have any questions about this. "


Probably disable JavaScript


If the page loads completely then yes it redirects to medium.com. But you can read the article by interrupting the redirect and clicking the stop button.


This is true. It took me five goes, but I am over 40 and my mouse finger is not what it was.


Why mouse? Click the link and press escape. You youngsters with your mouses.


And there you go, it turns out that ‘mouses’ is an accepted pleural for a computer mouse.

Less horrible than ‘Magic Trackpad 2 — Silvers’ I suppose.


Yep, unlike "pleural", which is not yet an accepted spelling of "plural" :p


That’s just excellent. It stays. Guess which field I work in.


Magic Trackpads 2 — Silver


For future reference, you can rate limit bandwidth with dev tools in Chrome (and probably Firefox), which makes it a lot easier to dodge redirects in situations like this.


Just hitting Esc on the keyboard should also work to stop the redirect/load.


It never even happens if you don't have javascript enabled. Unfortunately as of last year archive.org changed their site and wayback machine design so that it is no longer possible to use without javascript.


RequestPolicy on Firefox stops redirects cold.


This also works for a lot of paywalls. Combine with Reader view in Safari and you’re in business


Medium generally prevents people from reading articles anyways due to their insanely shitty UI. I generally avoid that site now


Hmm, all they got to do is have a dynamic robots.txt that forbids wayback from the deleted articles, and they'll remove the workaround even. yes?


Once it's stored I imagine they don't need to even scrape the page again, so robots.txt wouldn't do anything.


Internet archive does rescrape periodically, and it removes archived pages based on the current robots.txt. This behavior is documented behavior of the archive that goes beyond the normal conventions of robots.txt.


I would add, the content itself is not removed. They only stop displaying it whilst the robots.txt says not to. If they can not reach your robots.txt, the content comes back as I have experienced multiple times.


Sure, but they'd have to create a list of every deleted article which seems like it would be pretty long.


I assume their software is quite capable of automatically creating such a list to include in robots.txt, automatically generated.


This is why I like archive.is, it tends to avoid falling for any of that kind of BS. Something gets archived and it's there forever.


Alternatively, they can just tell the Wayback Machine to delete the articles or forbid it from archiving them, right?


Just use Telegram Instant View.


> Medium tries to prevent people reading deleted articles on The Wayback Machine

Please use the article headline. However, the automatic redirect makes this pretty close to the truth.


Yes probably should have used my own title. It's just difficult for me to accept a headline so lacking in pith.

Edit: have updated title


Awesome. I am stepping through the page execution and trying to figure out what is embedded in the page that would cause a redirect if the article is deleted. Did you find it?


No. I took the view that a much more intelligent HN user would do that for me.


Expect a promotion soon.


I’ve already been promoted to the level of my incompetence.


Maybe I'm out of touch with HN, but kudos to Medium. I delete the article for a reason, why should someone read it years later (regardless of the website or format.) It's my article


What would credibility mean in a world where lies, mistakes, and fits of emotion could be erased completely?


I'm not sure I understand the question. As it is now, the vast majority of my transgressions in real life disappear into a memory hole. What makes the internet so special in that regard? You want there to be a permanent digital record of people's mistakes, why? So that those best at hiding their defects can get ahead? Why is that desirable?


For roughly the same reason schools administer tests and keep permanent records of grades.

An author's track record for honesty and accuracy is (was?) the foundation of credibility.


And yet the same community that insists the internet act as a permanent, immutable and irrevocable platform for "judging honesty" tends to object when intelligence agencies, law enforcement and employers use their data for just that purpose.

Even if it were the case that one could only judge a person's honesty and accuracy from the "track record" of content published to the web that can be traced to their identity, this assumes that all such content is unbiased and factual, and that any interpretation of that content would also be unbiased and factual, but that isn't true.


The combination of right to be forgotten and GDPR practically mandate that this be the case and, as others have pointed out, in the pre-internet era there would often be no record, unless it had hit the print media. Most criminal offences are expunged (at least in the UK) from your criminal record after a number of years, so why should the internet be allowed to keep a permanent record after your debt to society has been paid?


Aren't we talking about authors here, not criminals? When articles were in print, authors had a "permanent record".


For sure, although we definitely strayed into moral territory. Even limiting ourselves to authors though, that's a concept whose meaning is radically different nowadays (when it means anyone who can type - including on a touchscreen - and knows how to hit the publish button on Medium/Wordpress/Tumblr or wherever) to what it was in the past, when it meant having to go through at least some amount of gatekeeping at the relevant publication.

Would you really want some sort of naively politicised rant that you wrote as a teen and no longer believe to affect your ability to get a job in your mid-thirties? Would that even be fair? Of course not. Yet that's absolutely realistic in a digital world that forgets nothing and where permanent and total erasure isn't an option.

(To be honest, age isn't particularly a factor here: over a period of a couple of decades anyone's beliefs can change quite substantially, and plenty do. Granted, plenty also don't, but entrenchment is a choice, even if one made passively).


When you write a piece, if you want to delete it so that no one else can read it, it should be your right. Unless of course you're a journalist and you have a duty to integrity or something like that, but I'm talking about people like you and me who might decide one day that they do not like what they wrote anymore.


> When you write a piece, if you want to delete it so that no one else can read it, it should be your right.

Right up until you publish it in a public journal or newspaper, at which point you have no right to demand that every other reader destroy their copy.


So if you publish a book, and then pull it out of stores, people who have the copies, and all libraries, should burn their copies, right?


Nope because they bought the book. It's their property. My article is MY property.


Your property is the copy on your computer(s) and the right to control who can make more copies. Once you choose to make a copy and transmit it to someone else (perhaps as a response to a HTTP GET request), that copy becomes their property. Your property rights end at the first sale[1]. If you don't want someone to own a copy of your article, don't give it to them or get them to agree to a contract[2].

[1] https://en.wikipedia.org/wiki/First-sale_doctrine

[2] A TOS is not a contract. If you want to use a contract, make your offer, wait for someone to understand and accept it before you send them a copy of your article.


>>Once you choose to make a copy and transmit it to someone else (perhaps as a response to a HTTP GET request), that copy becomes their property.

Has anyone tried your argument in a copyright case? Say, I access a NYT article, and according to your reasoning it becomes mine the moment their site show it to me. If it's mine I can publish and monetize it.


> If it's mine I can publish and monetize it.

Yes, you can[1][2] monetize (resell) your copy. You do not have the right to make new copies. This ability is granted explicitly[3] in 17 U.S. Code § 109 (a):

>> [...] the owner of a particular copy or phonorecord lawfully made under this title, or any person authorized by such owner, is entitled, without the authority of the copyright owner, to sell or otherwise dispose of the possession of that copy or phonorecord. [...]

[1] In some situations there may be additional limitations of your rights. (e.g. performance of a copyright-protected work, which technically creates a new derivative work)

[2] I am not a lawyer, this is not legal advice. Consult a real lawyer for actual legal advice.

[3] https://www.law.cornell.edu/uscode/text/17/109


You are not a lawyer and that's obvious, too bad you try to act like one with citations and outlandish statements.

If I post an article on mysite.org on "free trade" and you think you can archive it and copy and re-distribute it for eternity, you better think it again. The implied license is to read the article from my site, for as long as it is posted there--in MY site. Now, if I become a US President and someone has a copy from 45 years ago, that's different.


> you think you can archive it and copy and re-distribute it for eternity

Did you read my post? I specifically said you couldn't.

>> You do not have the right to make new copies.

Copy rights are separate from your property rights related to your copy of a work.

> You are not a lawyer

Yes, although I have been studying copyright issues since the early 1990s. I'm not saying anything remotely controversial in current interpretations of copyright law.

Incidentally, a new interpretation that is slowly being accepted by courts is misuse of copyright[1]. Based on the older patent misuse, the idea is having a copyright only grants rights related to creating new copies of a work. Claiming that copyright somehow also grants you other totally unrelated right can result in the court preventing the copyright holder from enforcing their copyright[2]. Relying on misinformation about copyright can have serious consequences.

> too bad you try to act like one

You're seeing what you want to see. Providing references to what you're talking about is encouraged on HN.

[1] Lasercomb America, Inc. v. Reynolds

[2] https://apps.americanbar.org/litigation/litigationnews/pract...


Is this an April Fools' joke?

Too much first person.

Also, Wayback Machine "is frequently used by journalists and citizens to review dead websites".

This isn't some fucking standard; it's Wayback Machine's responsibility to archive websites, not the other way around.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: