Hacker News new | past | comments | ask | show | jobs | submit login
Introducing the 4chan API (4chan.org)
188 points by nthitz on Sept 5, 2012 | hide | past | favorite | 79 comments



I feel kinda silly just having spent the long weekend writing a threaded 4chan scraper. This is a super welcome change though. Even if you don't visit 4chan regularly you can't ignore the VAST amount of content people upload there. I imagine some interesting statistics will come of this ( I know I plan to).


Even if you don't visit 4chan regularly you can't ignore the VAST amount of content people upload there. I imagine some interesting statistics will come of this.

That was my reaction too. Beyond statistics, this will make it easier to develop all sorts of user-facing and machine-to-machine applications -- for sharing, grouping, ranking, and linking items, and even for 'overlaying' the content on top of other social networks.

I'd expect the API to be grow and mature over time, and am curious to see what comes out of this experiment.


I don't know. Getting an API to 4chan is like getting a parking pass to the local garbage dump. What statistics are you actually mining here?


Coming from the country where needing a parking permit to the garbage dump is the norm (i.e. only way to get rid of your trash) and can actually be a desirable activity (I can't even tell you how many books, monitors, speakers, etc. I've either restored or recovered in perfect condition for no cost)... I can't wait!


4chan represents a social cauldron unlike others. The first and foremost question I had hoped to find out was "What do people talk about when they don't have an identity to think of?" There is almost zero consequence in the case of failure. If you say something and no one responds, or everyone insults you, it is completely forgotten within a few minutes. You can say almost anything you want without fear of retribution. (Where else in life does this exist?) So, with no rules, what do people want to talk about?


I understand what you're saying.

BUT, be careful with the concept of 4chan and "no retribution". There have been countless examples of 4chan "reacting" to threads, I'm sure I don't need to go into detail here.


There is plenty of excellent content (consequently, statistics), you just have to look a little harder. I would hazard a guess that you've taken a quick glance and dismissed the entire community. Feel free to correct me if I'm wrong.


Let me know when, say, /tg/ stops casually tossing around f-g and f-ggot while pretending that's not at all homophobic.

Garbage dump is entirely appropriate. I have no idea how anyone older than 20 can tolerate 4chan.


While you're talking about the homophobia, let's not ignore the blatant racism on 4chan. The work n-gger is not only casually thrown around on those boards, at this point it's ingrained racism that's influencing a lot of young teens visiting the site and is subtly encouraged by the moderators. It's disgusting and even thought I'm a strong advocate for the freedom of speech, I fear the power of effective propaganda.


You do know that 4chan has a pretty large gay population right? In fact I would say roughly 70% of that site's viewers are bisexual.


Just a little bit of confirmation bias there, from my own experience. That notwithstanding, 4chan most definitely has a higher rate of homosexuals than the rest of the internet.


While you clearly pulled that percentage out of nowhere I would agree it's probably not too far off. While it's fair to call it bisexuality, it might be more accurate to simply label it sexual opportunism.


If you really think the use of fag and faggot reflects homophobia, you need to take a step back and look at where you are - the internet. Do you think that people posting gore do it because they enjoy the content? It's all shock value.


(Not really interested in whether 4chan's use of "fag" reflects homophobia. I am responding to your next sentence.)

>Do you think that people posting gore do it because they enjoy the content?

I think there are those who post gore because they get pleasure from grossing people out, outraging people or otherwise eliciting a strong emotional reaction (which is probably what you mean by "shock value") but I do not think that is all of it.

I think a significant fraction of the people posting gore do it because it will induce others to post gory pictures that the OP has not seen yet. And I think that they want that because they derive pleasure from seeing people and animals being harmed.

One of the reason I believe that is a book I read called Among the Thugs, in which a reporter spent some time hanging out with British football hooligans. He reported that after running with the hooligans a while, engaging in violence and contemplating engaging in violence became pleasurable. This and other things suggest to me that many people are capable of deriving pleasure from seeing people get fucked up once they've acquired a taste for it.

The main reservation I have about 4chan is that it seems to be enabling many to acquire a taste for it (and for other things like harassing people on Facebook).


At least when I still browsed /b/, gore was almost entirely to dissuade people from browsing. The logic being that if you couldn't get over it, you weren't all that welcome.


So explain somethingawful.com and the many other sites dedicated to the posting of gore and gruesome deaths?


I don't think you know what SA is. You going on the domain name alone?


> If you really think the use of fag and faggot reflects homophobia, [...]

Yes, it does.

> you need to take a step back and look at where you are - the internet. Well...

I suggest reading this introduction to the subject of "why second degree&ironic gay bashing work only if there is real homophobia somewhere": http://www.queerty.com/does-calling-someone-a-fag-really-mea... I have better articles about sexism and homophobia but there are written in french.


People really like to dismiss that by calling it transgressive humor, but it's really not? You never truly get made fun of for being part of the majority.


Normalizing the "ironic" use of slurs normalizes the unironic use of them as well, and further entrenches privilege as a norm.

It doesn't take spending a lot of time in any nerd culture group to see that the racism, sexism, and homophobia are not ironic at all. Case in point, the enormous administrator/moderator-led backlash on SA in the past ~year against anyone daring to not openly welcome the death of anyone who is not a white cis male.


fag and faggot doesn't mean “homosexual” in 4chanspeak.

However, it does mean that people like you don't go there, so it's a pretty good thing that they have that slang.


Garbage dump? /b/ is bad, but 4chan as a whole has some pretty good content.


What kinds of things people throw away.


The first thing that jumps to mind with this is... "Oh $#!*." I don't know why, but it scares me what could come out of this.


Came to post this. The only redeemable value in 4chan, in my opinion, is that the fact that posts aren't archived makes for a very interesting social experiment. An API firehose pretty much puts an end to that.


Selective archives (users vote to archive a thread) have been around since at least 2007:

http://4chanarchive.org/

http://suptg.thisisnotatrueending.com/archive.html

Full board archives have been around for quite some time as well:

http://easymodo.net/ (the original complete archiver, now dead)

http://archive.foolz.us/

https://archive.installgentoo.net/

etc.

I'd say the fact that your posts are most likely to be forgotten, even if it is archived, is much more of a negative aspect of the site than a positive. How many times have I spent 30 minutes on a post, only for no one to respond to it, or worse, realize that the thread 404'd? It makes you look at yourself and wonder why you bothered.

Forced anonymity is the interesting part of imageboards -- the text BBS equivalents to anonymous imageboards, based off the original 2chan, manage to maintain a very similar flavor while featuring permanent archival of all posts, and enjoy longer-form discussion as a result.


> I'd say the fact that your posts are most likely to be forgotten, even if it is archived, is much more of a negative aspect of the site than a positive.

This is the most magical aspect of 4chan, which is why I don't care for archives.


> This is the most magical aspect of 4chan, which is why I don't care for archives.

The written word allows us to lend ideas (memes, concepts, what have you) a sense of permanence that they never would have had otherwise. But at the same time, it prevents them from evolving in a way they otherwise might have, if their exact origins were not so easily recorded & referenced.

I don't think it's a coincidence that 4chan, which lacks this permanence, is the origin of so many of the top memes of the past decade (and by 'meme', I don't just mean things like LOLcats).

(Gleick argues this same point in the first few chapters of The Information, for those who are interested).

EDIT: Just realized who I was replying to - if I may ask, are you concerned at all that an official API might detract from 4chan (by making said content more traceable)?


>The written word allows us to lend ideas (memes, concepts, what have you) a sense of permanence that they never would have had otherwise.

I agree, and that's why it was horrible for 4chan. In the beginning, there were new memes, concepts, and what have you every other day, and old stuff was forgotten (or rather used to show you'd been there for a while.) Now, it's just a constant recycling of the first few years of the site.

IMHO it was caused by archives and meme dictionaries. No need to lurk moar anymoar. Also, very little reason to laugh.


I totally agree. /k/ommando here Thanks for keeping it relatively the same. I've thought about creating a bookmarklet or something to add some features I originally thought would be nice (threading/grouping linked comments, or alerting you when you get a reply) but I realized that features like these could fragment each thread and distort the flow of the conversation.

Two features I still think wouldn't conflict with the site are buttons to expand images inline in a thread and turning the text links clickable (copying and pasting on a tablet sucks). I know there are bookmarklets which do this but I can't display my bookmarks bar on chrome on my tablet.

I understand that 4chan isn't very/at all profitable. I think there really is opportunity for you branch out on some boards rather than just links to that jlist site. Have you considered doing more contextually aware ads or even relevant amazon affiliate links in threads to boost your revenues? There will always be detractors but I think most users really appreciate 4chan and would love to see you better compensated for it as long as it doesn't ruin the site in the process.

one more thing, This API will no doubt be used by people to create their own sites which add the features they want to 4chan, do you consider the API a potential source of revenue by perhaps charging for faster versions of it?


You've mentioned before that you visit 4chan every day. Do you visit any text boards?

I guess it's my own fault for fighting against the nature of the site, but sometimes I try to go the extra mile and put some effort into a post, and feel like no one even notices when I do. It's very discouraging for a conversation to have long since moved on by the time you've posted, or for a person you were trying to help to have given up already and left their thread. While I like 4chan for what it is, I'm still a little bit sad that the textboards never really took off as much, and I wonder why they didn't -- anonymous somewhat-long-form discussion sounds appealing to me. I'd be right at home in a text board with a fraction of the userbase of a 4chan board and a slightly slower pace, but most of the ones I know of are practically dead at this point.


That really sounds like a job for a niche subreddit with a throwaway account, not a 4chan BBS.


I guess you kind of answered the question I had in mind. Most "western" people seem to see anonymity or even pseudoanonymity as something you use when you have something to hide, something you don't want traced back to you. Everything else, they don't seem to mind having their real name (or a pseudonym that they make little attempt to disguise) attached to. Or, on a different level, people believe that you need to have some kind of identity that you care about, a reputation that you want to uphold, to keep discussion civil and meaningful, and that this should be the key element differentiating an online community.

On the other hand, the default in my mind is anonymous discussion. You only don a pseudonym or reveal your true identity when it's actually relevant to the conversation, and immediately stop when it isn't -- people usually don't care about who I am, but they might care about what I have to say. As an example of this, I've browsed Hacker News daily for over a year now, and just recently got around to creating an account. I still feel uneasy about it, even though I'm posting with nothing but a pseudonym, and not posting about anything that I would particularly care about having traced back to me. I doubt Hacker News would have the same culture if it had allowed anonymous posting, but I certainly would have started contributing much earlier if I didn't have to create a pseudo-identity to do so.

It's important to note that while it might be true that Reddit serves as the West's 2chan, the two deliver markedly different experiences. 2chan being as enormous as it is (millions of posts per day) should indicate to you that there is some itch that a giant collection of anonymous textboards can scratch that Reddit can't.


Do you mean 2ch/2channel instead of 2chan/Futaba? They have similar names but are completely different sites.


Yeah, I'm aware. The problem stems from the fact that you can't type the unambiguous name "ni-channeru" without looking like an insufferable dork/weeaboo. 2ch and 2chan (.net) are the domain names of "ni-channeru" and Futaba Channel, respectively, but it's obviously very confusing to refer to them by their domains. Therefore, in English discussion, we tend to say 2chan or 2channel when we mean "ni-channeru", and Futaba when we mean Futaba Channel.


I've never had a problem with the distinction between 2ch and 2chan.


There are actually plenty of third party archives for 4chan boards, some that have been running for 4+ years.


I'm a new graduate student in an American university. As part of my Data Mining/NLP project, I'm wondering if I can do something cool with this fresh API. Any ideas?


create a markov chain 4chan slang generator.

track usages of phrases over time. (thinking of the recent evolution of "rustled my jimmies" derivatives)

See what topics are trending

Fuck maybe I should build some of this...


do you have some more ideas? these are really good


um... make a 4chan app that lets users up/downvote threads, then builds a naive bayesian model of what keywords (eg. "toasting epic bread") are correllated with the kind of threads you like. Netflix-like. A sort of automatic cream-extractor.

de-anonymizer based on posting times, writing style, what baits them to respond, etc. The Thread-Local unique ID's would help, giving you more stuff that you knew came from same user. Don't know if this one is practical. Kindof scraping the bottom here...

That's it for now.


"... The decision to release an API was partially out of necessity, but also because I'm curious to see how people will use it. ..."

And who. The API just made a group of intelligence hackers very happy indeed.


It still requires scrapping to discover the thread ids though does it not?


We'll have indexes and a catalog view soon.


That's great news. No hope for a posting API I presume?


Probably not. Since we don't have user accounts, and already have a bit of a spam problem, I'd be pretty worried about what a real POST API might bring.


What if you charged for it? (Not saying this is a good idea, just that it's an idea.)


Charging for spam just means spammers buy accounts to post spam...


Could end up as a profitable honey pot.


I figured as much. Just wishful thinking on my part--it'd be nice to be able to build a client wholly atop the API.


It's not like the html form does anything special. Forms don't have funny markup you have to scrape or produce. It's just a POST.

I guess there's mime/url encoding or whatever, but that's hardly an issue.


There's a captcha


Are some of the threads not rendered to JSON yet? i.e., do we only get it for new threads?


All of the old threads were rebuilt, so everything should have a JSON representation.


How long did that take? Just the current active threads across the boards, right?


Just a few minutes, since we only store active threads in the database. Was only ~9,000 threads.


What is the index URL format


Doesn't exist yet. I'll update the API GitHub page once we add it.


Couple things:

Pipes/YQL seems not to have been banned yet, so getting 4chan data back wrapped in a callback for nothing-but-front-end hacks seems doable. Also along those lines: it'd be super-awesome if the API would take ifModifiedSince as a query parameter and not just in headers.


What I'm taking away from the comments below is:

"Everything that could be done with this API has already been done using HTML parsing. This development will simply make those applications faster."

Truth?


Yeah, and there have been Python scripts anyone interested passes around and shares too, so you haven't even had to write it yourself...


true


Could someone explain to me how this could be leveraged (or if it could be) to gather a sort of stream of messages, a la the Twitter streaming API or reddit.com/r/all/comments.json?

I'd be interested in doing some language statistics and comparing them to the aforementioned networks.


Elsewhere in the comments here, moot said, "We'll have indexes and a catalog view soon." So for now, you need the thread id.


Sadly read-only, though it's not much work parsing the HTML and faking a submit through a Post request. Good luck submitting a 4chan app to Apple's app store though :)


They have already existed. They all got pulled recently.


Forgive me if this is a noob question, but does 4chan restrict embedding of images.4chan.org images from external urls? I was just playing around with the API and it seems all the images are rendered as the placeholder image that says "4chan.org".

If this is true, I don't know how to utilize this API to make something valuable since all I can do is get the url or text. Somebody please enlighten me. Thanks!


This sort of protection is usually done by checking the referrer header, which is trivial to set when retrieving something programmatically or when using standard tools like wget. The API seems focused on reducing the processing costs of browser extensions that let the user view the page, but add extra features to the page, anyway. Those would probably still seem like a normal browser view of the image to the site by default even if browser plugins can't perform the trivial client sent header change (not sure if the browser plugin API exposes it).


Why would you hotlink to 4chan-pictures? These get deleted with their thread once the thread hits page 10, anyway, which can happen in under 5 minutes (on the more active boards like /b/)


This could transform 4chan as mobile and desktop clients are created. God I hate the web interface.


The mobile adapted web interface is pretty good now.


There are already mobile clients for android and, IIRC there were clients for iOS but were banned from the app store due to some kind of infringement (I think it was adult content)

So I don't think a lot of stuff is going to change, excluding the diminishing server load that happened with old clients/extensions.


There already were plenty of ("native") mobile and desktop clients, although they worked by parsing HTML.


Can anyone repost the info for those of us who can't (or prefer not to) visit 4chan at work?


Every thread will be available as JSON.


Regarding financial sustability, have you thought about charging for the new API?


I'm sure this is about trying to improve site performance, and charging for it would inevitably cause everyone to continue scraping the HTML, thus defeating the point.


This is interesting. Would surely give it a try and integrate with our app


Something I just cobbled together:

   curl http://api.4chan.org/b/res/423418552.json | python -mjson.tool
   curl http://api.4chan.org/b/res/423418552.json | json_pp
Example for grabbing a thread and prettyprinting the JSON of it.

Because, you know, we need more 4chan in the house.

(EDIT: brief skimming of the comments indicates it may be semi-offensive, so be warmed. We're skimming /b/, after all.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: