I wrote this article with one mouse click

patio11 · on Dec 1, 2010

You can do an awful, awful lot for your business by taking this idea one or two iterations further:

1) Identify data source

2) Extract value from data source

3) Spit out templated content pieces extracted from data source

4) Farm out templated articles to freelancers for thickening up, Demand Media style

My client from this summer who paid me to do it for the average value of particular college degrees is launching sometime in the next week or so. I'll happily play show-and-tell with the non-proprietary parts if folks want, after it launches.

jawns · on Dec 1, 2010

I tried to do almost exactly what you're describing about a year ago -- scraping structured data about mutual funds and constructing articles about each fund, which I submitted to Associated Content.

Unfortunately, I should have spent more time "thickening them up" -- after the first half dozen or so, AC began rejecting them for being too similar to each other.

But yeah, the potential is definitely there.

patio11 · on Dec 1, 2010

I'm curious: why did you submit them to Associated Content instead of building your own site, where you'd have total control and keep most of the value you created? A deep backbench of semi-automated articles about funds plus relatively fewer pillar content pieces for linkability strikes me as a potentially very viable business in an industry which is quite literally awash in cash to spend on marketing.

jawns · on Dec 1, 2010

At the time, AC was paying about $3 up front, plus pay-per-view incentives. And it has a good Google PR. So, I figured I could either be lazy and submit to AC or build my own site and spend a lot of time trying to get a decent PageRank. I chose lazy.

patio11 · on Dec 1, 2010

Just in the spirit of introducing you to other options: there exist people who already have profitable affiliate sites in the space who you could pitch on the idea of "bolt this onto your site and get X more inventory which will rank on the strength of your existing brand/trust/etc." I'd be thinking more in the five figure range than the $3 up front range.

eru · on Dec 1, 2010

Good suggestion!

And $3 per article can also add up, if you have a scalable solution.

sachinag · on Dec 2, 2010

I'd be interested in this. sachin@blueleaf.com

ekanes · on Dec 1, 2010

> I'll happily play show-and-tell with the non-proprietary parts if folks want, after it launches.

I'd love to hear more about this, thank you.

pchristensen · on Dec 1, 2010

Folks want, Patrick. Folks want.

LiveTheDream · on Dec 1, 2010

What does it mean to "thicken up" an article?

eru · on Dec 1, 2010

From the context: Take an article from raw data to something a human would like to read.

gkoberger · on Dec 1, 2010

Personally, I'm not a fan. For generic content like this, I'd rather read it in a table or chart. The data is being encoded into natural language, and then when we read it we have to parse the important information back out.

This is a weird example, but look at Groupon. One of the main reasons it's so big is the custom, humorous descriptions that go along with each item.

If newspapers want to survive, they shouldn't be automating their content- it just makes it more generic and forgettable. Nobody wants to read an article that a computer wrote.

LiveTheDream · on Dec 1, 2010

> One of the main reasons it's so big is the custom, humorous descriptions that go along with each item.

Personally, I never read those. I skim the headline, and look at the deal details if the topic interests me and is a really good deal.

There's a difference in content though; with deals I just want the hard facts. With many news articles, I want some well-written copy to add context to otherwise meaningless/bland data.

For the Powerball results, the sentence format is a bit more engaging and I can skim it as fast as I would skim a table. If nothing else, it makes me feel like the publisher cares more about the reader.

pchristensen · on Dec 1, 2010

look at Groupon. One of the main reasons it's so big is the custom, humorous descriptions that go along with each item."

Groupon feels that this is valuable enough that they hired 100 new people for their editorial department just in November*.

ajays · on Dec 1, 2010

FTA: "But the result — being able to automate what used to be a monotonous task — is totally worth it."

Some of the best hackers and coders I've met were the laziest folks around; but they'd spend hours working to automate a 1-minute monotonous task.

patio11 · on Dec 1, 2010

You can get staggering productivity wins by automating enough of the right 1, 5, 15 minute tasks, especially when you consider how terrible people's schedulers are. If your computer lost an hour of productive work every time it context switched, you'd figure out ways to eliminate it's list of small recurring chores, too. Happily, your computer has very, very efficient context switching relative to you.

I've been keeping a running count of time spent on various activities this month, for giggles. Total support time for BCC in the last two weeks: eight minutes. The machine has been humming so efficiently I burned some time yesterday just to check that the whole thing hadn't been hit with a meteor or something.

varaon · on Dec 1, 2010

A second point would be that the time you spend automating tasks has other payoffs, in the form of learning and inspiration.

For instance, I suspect that part of the inspiration for Appointment Reminder came from Patrick's own realization of how valuable the automation of small tasks can be (chasing down missed appointments, in this case).

---

To go off on another tangent, this is akin to eliminating technical debt from your workflow. By taking time to "refactor" certain tasks by doing them in a more efficient way, you get a net savings going forward. You can increase your ability to take on new tasks by increasing the efficiency of existing ones.

In keeping with the refactoring theme, if you have tasks that don't scale well, you may need to spend more time on them in a crisis. For example, in the event of a site outage you might suddenly have a deluge of support emails. Having support tickets be automatically created would save you n*60 seconds of copy/pasting.

bmelton · on Dec 1, 2010

I remember interviewing for a position at a company I was already at, moving from one department to another.

I'd gotten a new manager in the old department, and I don't think I'd had enough time to get over an initial bad impression. I was the night shift guy, on a help desk, and the call volume was generally low. To help better justify the use of my time, I was tasked with some additional duties, like processing account delete requests and other tedium.

My manager at the time found fault with the fact that I had written a series of scripts to completely automate the additional tasks I'd been assigned, and considered me lazy.

In the interview, I was asked the question "How do you respond to the accusation that you sometimes cut corners?"

My answer was something like "I think cutting corners is a good thing. If there's not a requirement for square corners, and the rounded corners don't impact the quality or integrity of the work being performed, and if the corners aren't irreparably damaged in the process, I think that taking the more direct path of least resistance is not a bad thing."

I got the job, and I was later told that the manager was very impressed with the answer.

Uhhh, that was a roundabout way of saying "I agree."

torme · on Dec 1, 2010

I believe the saying goes "Weeks of coding can save you hours of planning"

Certainly holds true here, but I think the real value from things like these is beating the task.

sliverstorm · on Dec 1, 2010

"If you have a difficult task, give it to a lazy man- he will find an easier way to do it"

eru · on Dec 2, 2010

Procrastination!

steveklabnik · on Dec 1, 2010

http://en.wikipedia.org/wiki/Larry_Wall#Virtues_of_a_program...

kljensen · on Dec 1, 2010

Nice article. It could be improved if the author collected numerous, diverse lottery result articles and used these to create a script that outputs randomized articles.

This is already done in the sports area, which is significantly more complicated: http://mediadecoder.blogs.nytimes.com/2009/10/19/the-robots-...

jerf · on Dec 1, 2010

This task screams for DSLs, especially on the generation end. Doing this in PHP directly (or most other general purpose languages) encourages too little variation because adding alternatives is relatively heavyweight. Writing a good DSL that makes it easy to offer more alternatives in a single template will make it much easier to produce something even less distinguishable from a human report.

aristus · on Dec 1, 2010

I think you are missing the point. What's remarkable is that this person is combining basic hacking skills with a completely different career, not the cleverness of the design. Not everyone has to be a poet but amazing societal changes happen when everyone can read and write.

jerf · on Dec 1, 2010

I don't think I was missing the point. I think I was making another one.

jerf · on Dec 1, 2010

Further thought: Actually, not being a programmer reinforces my original DSL point, not contradicts it. Part of the purpose of a DSL is to reduce as much as possible the "programming" part of the task so the domain expert can concentrate on what needs to be done.

I never said "this guy should have written a DSL instead", which I didn't say because it would be an asshole thing to say. I said that this task screams for a DSL, and that's only more true if this guy isn't a programmer.

derefr · on Dec 1, 2010

Don't worry; some people don't understand threaded conversation, and think that every reply to something has to "continue the conversation" of it.

EDIT: Okay, I'll just repost a comment from about a month ago here, that was upvoted instead of downvoted and yet made exactly the same point:

This is a threaded comment system. We can have as many discussions about something (post or other comment) as we want: go off on wild tangents, point out the spelling, have a pun thread, mention patterns of blogging/commenting the parent fits into, reply to the author on a separate subject, share anecdotes related to the subject of the post, and actually talk about the content of a post or comment, all at the same time, without breaking anything. That's what's so neat about threaded discussion: it doesn't require the "comparative notability" that a linear conversation needs in order to function.

In this case, we can have a discussion about combining hacking skills with a completely separate career, and then have a tangential discussion about using DSLs to generate text, with neither conversation interfering with the other. No one has to be "missing the point," and in fact jerf could be contributing elsewhere in the discussion alongside his creation of this tangent.

eru · on Dec 1, 2010

Beautiful analogy! To mote it further: Boring but literate things like accountancy have a bigger impact an the real world than poetry.

eru · on Dec 2, 2010

Mote -> move.

rmc · on Dec 1, 2010

Isn't that (parsing and formatting text) exactly what perl was created for? Perl is the DSL you talk about.

troels · on Dec 1, 2010

No need to mess around with curl bindings directly in php. `file_get_contents` will accept an url.

jawns · on Dec 1, 2010

Hey, I'm the blog author. You're right -- I'm just so used to using cURL for more complicated requests that the simpler solution skipped my mind. I'll update.

troels · on Dec 2, 2010

You may know already, but a lesser known feature of php is that you can pass a [stream context](http://php.net/manual/en/function.stream-context-create.php) as optional argument to most file-operations. This enables you to make fine grained http-control (post, headers etc.), still using `file_get_contents` and friends.

Aaronontheweb · on Dec 1, 2010

This idea has a lot of potential, and thanks for including code samples!

thehodge · on Dec 1, 2010

wow, thats pretty funny, its almost exactly what we do for http://www.saturdaylotteryresults.co.uk... funny that

prs · on Dec 1, 2010

You might want to add a space between the year and 'Lottery Results' in your titles.

  Old: 2010Lottery Results
  New: 2010 Lottery Results

ddemchuk · on Dec 2, 2010

this a classic seo content generation move, called mad lib sites. You create a templated article, with variables for each piece of dynamic content. Usually, you will also create "spun" content so that each article created with the madlib template is even more unique.

Then, you can scrape or find large databases of consistent information and deploy very large sites.

The trouble is getting google to fully index these sites. It requires a good amount of link building both to the madlib pages and the home page to get enough juice for the crawlers to spend time on the site and get things indexed.

They can be very useful sites to build for a variety of reasons, and can actually add some value depending on the data you're publishing

klbarry · on Dec 1, 2010

Will someone please make a start-up where a non-technical person can plug in information and do something like this? It would be great!

commanda · on Dec 1, 2010

Do you mean for general workflow tasks, or for creating documents given a set of data?

For workflow tasks, Automator.app on OSX is pretty great, even for non-programmers. There's probably something analogous on windows/other OSes.

For document generation... that seems like a fun weekend hack. I'm imagining some kind of more configurable madlibs-style app.

roryokane · on Dec 1, 2010

Microsoft Word can already do the document generation thing from an Excel spreadsheet. Of course, it’s more complicated than a purpose-built app would be, but probably also more powerful.

rg · on Dec 5, 2010

Here's a description of how MS Word and MS Excel were actually used to create 4,600 parameterized web pages, complete with samples of the Excel and Word documents used:

http://www.horniman.info/DOCUMNTS/HOWTO.HTM

Scroll down to section 6, "Webpages" for explanation of this topic.

The generated website is also live online at

http://www.horniman.info/

so the generated HTML can be easily inspected.

klbarry · on Dec 1, 2010

The latter

to · on Dec 1, 2010

http://pastebin.com/NmtPBDgK

to · on Dec 1, 2010

http://pastebin.com/8gMi4iAU