Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Is it possible to have forum type community running without a database?
36 points by escapologybb on July 20, 2014 | hide | past | favorite | 65 comments
This question came about as I've been asked to set up a very small forum type community running on a Raspberry Pi, and the thought occurred to me that statically generated websites can handle reasonable amounts of traffic whilst using relatively small amounts of resources.

So I wondered if it is possible to run a forum type community - something along the lines of [elgg][1] or [Friendica][2] - but without the MySQL or other type of database backend?

Now I may be about to get laughed at silly for asking such a daft question, but I would genuinely like to know the answer. So, does HN think something like this is possible? And if not, could you please explain why not? And if it is possible, have you got any recommendations!

Thanks in advance. E

[1]: http://elgg.org/ [2]: http://friendica.com/




File systems are databases of some sort: there's a way of storing and retrieving data. Using the FS won't necessarily be simpler or less resource-intensive than using sqlite or something similar, because you _will_ store and retrieve data, and the helper layers you build on top of the FS will probably be less optimized than existing lightweight databases.


Sure. HN itself has no database, only the filesystem (or rather the filesystem is the DB).


Really, Wow. Would you mind expanding a little on how that works? I mean, is HN just stored as a series of files on a server somewhere?


Ask PG: Database, flat files or other for YC News? https://news.ycombinator.com/item?id=99092


I was about to say that, myself. The irony of asking this question on HN made me smile.

Note that they are (or did? or might be?) moving to a DB format in the end, but for at least the first several years of Startup/Hacker News, it was S-expressions written to files.


I hate it when people change the question so they can give the answer they want to give, but I'm going to do it anyway, as I think its a relevant data point; I think you're dismissing the ability to run a database too quickly.

I have done almost exactly what you're talking about with a raspberry pi, using Ruby on Rails (3.0.x era) and sqlite. the Rasppi (model B) had enough horsepower to run in development mode on webrick and handle ~5 rps with page rendering and database calls, and that was in develop! when running in production, rails does less reloading of resources, so should be even more efficient.


  So I wondered if it is possible to run a forum type 
  community - something along the lines of [elgg][1] or
  [Friendica][2] - but without the MySQL or other type of
  database backend?
You're attempting to optimize the wrong part of the architecture, I think.

A simple read-only query to SQLite or a NoSQL database is fast - generally just a few milliseconds and often less than one millisecond. The odds of you doing it much faster yourself are low.

Rendering pages is the slow part. Because each forum page you render is going to involve multiple queries to your storage engine (whether you go with SQL or roll your own) and a ton of string manipulation/concatenation.

So what I'd do is....

- Use SQLite for my storage engine. The 256 or 512MB of RAM on a RaspberryPi is plenty for SQLite. - Cache rendered pages (and/or page fragments) to disk, rather than writing my own storage engine.

I'd use a lazy caching/prerendering strategy. Suppose a discussion thread has 50 pages. One of your mods deletes a post on page 1. Now all 50 pages need to be re-rendered. You have two choices. You can either re-render all 50 pages right away, or you can simply delete all 50 pages right away and re-render & re-cache them as needed, when a user actually requests one of them. I'd do the latter.

I've used this strategy myself. It was a very common paradigm back in the 90s and early 2000s when web servers commonly had hardware specs (700mhz, 256/512MB RAM) that was quite similar to what a RaspberryPi has today. The hardest part of this strategy is getting cache invalidation correct. Every time your code writes to the database, it has to also be aware of which cached pages/fragments it needs to blow away.

(You know the old joke: "There are only two hard things in Computer Science: cache invalidation and naming things.")


Actually there are four hard things: cache invalidation, naming things and off-by-one errors.


I don't know if I've ever seen this joke not answered in this way, gets me everytime though.


I think this is the best answer right now. Just build your forum the way you would do it normally and aggressively cache.

One addition I would make. Don't cache HTML, cache JSON responses, and build your forum as a one page app. This will allow you to have every response cached while still having customized pages per user (i.e. you could have a user profile on the page that shows a thread).


Good idea. Caching JSON responses as opposed to rendered HTML makes a lot of sense when the server has little RAM.


You're probably overthinking this. While the Pi is limited in processing power and I/O, if it's really a small community it will run a forum just fine.

Here's what to look out for:

1) Make sure you run NginX, it's fast and memory-efficient.

2) Install PHP-FPM and have NginX connect to it using a socket file. Keep the number of FPM workers small.

2a) Zend Opcache comes standard with PHP now, but there are scenarios where you might want to look into using APC instead, especially if you want to hand-tweak some caching into the forum app. But see if it's necessary first.

3) I had good experiences with MariaDB, a MySQL fork, which comes with the Aria table type. If possible, change the forum tables to that type, it's very fast. The worst table type on a Pi is InnoDB, it's pretty is much unusable.

4) Beware of flash medium write wear, buy a larger-than-needed SD card or place files that update frequently on an external USB hard drive.


"While the Pi is limited in processing power and I/O"

This line is the key to the question. Simply roll back time until the specs of the Pi would be a "decent" or at least "cheap" webserver for that time, the kind of server you'd run a small forum upon at that time. OR if not decent for bare metal hardware, decent for a virtualized image on a bigger server.

Its not like 2014 is the first year in human history with webservers or web based forums.

So people were running forums on tiny virtualized servers with about those specs on linode perhaps just a couple years ago. Perhaps even today either on very tiny linodes or competitors.

The only real problem is the software. So use old stuff open to individual and class of vulnerabilities that were fixed 5 years ago, or take an obese beast of a modern forum and give it a liposuction?

Look into puppet and run the master somewhere else and automate it enough that you can go from a bare metal Pi to a live host in ten minutes or so. You'll be doing it again sooner or later at a time not of your choosing, so may as well get it right now.

Also, implement a backup strategy and automate its restoral process (and integrate with above)


Depends on your goal.

If you want to learn a lot about development, write a lot of code, you can use flat files and treat them as a small database. Implement simple SQL analogues of WHERE, etc. It could definitely be fun to implement! A lot of educational value. You would learn more about designing APIs consumed by other applications, etc.

But if you just want something that would run on r-pi, you would be better off by using a standard database. Some of them are around for over a decade, written in C and thousands of men hours were spent on optimizing them. As long as you are not doing something a bit crazy (e.g. 3 fields in WHERE statement, non of them are indices), you will be just fine.


Yes, it's totally possible to write a blog, forum, or pretty much any "Web 2.0" application without using a database. You just need to be very, very careful with how you design your filesystem layout and file format.

As for the file format, it would be best to stick with standard formats like JSON, XML, or YML. Or the standard serialization method for your favorite language, such as pickle() for Python and serialize() for PHP. Try not to invent a brand-new format, it's going to be error-prone and generally slower than using a standard format.

A more difficult task is to maintain an index of some sort, separate from all the individual files, so that you won't have to read and parse every single file in order to generate the forum listing. Whenever a reply is posted, you'll need to bump a thread to the top of the listing. Think about how you can implement this without modifying or renaming several files at a time.

And then you'll need to ask yourself how you're going to prevent inconsistencies in your data over the next few years. What if you decide to add a new field to the JSON schema and all the old files don't have that field? What if you delete a thread but an error occurs halfway through and some of the individual posts still remain? You'll need to write logic to handle such edge cases as well.

SQLite solves a lot of these problems while offering the same sort of performance, if not better, on resource-constrained environments. As far as SQLite is concerned, a Raspberry Pi is a very powerful machine. There's no reason why you sholdn't be able to enjoy all the benefits of flat files together with all the benefits of a relational database.


  As far as SQLite is concerned, a Raspberry Pi is a very 
  powerful machine. There's no reason why you sholdn't be
  able to enjoy all the benefits of flat files together
  with all the benefits of a relational database.
Yes. There are reasons to avoid databases, but saving resources usually isn't one of them. Definitely not on a RaspberryPi where you have 256MB or 512MB of memory which is plenty for a lightweight database like SQLite.

By the time you're done reinventing all the things a storage engine like SQLite or one of the NoSQL storage engines would give you, you'd be hard-pressed to do it more efficiently than they do.


Once upon a time, when I was playing around with Perl on Tripod, I wrote a very very basic forum with a flat-file "database". Essentially each "post" was a single delimiter-separated line (yes - posts were oneliners because everything was done through GET queries and I didn't know any better) in a text file holding an id, parent id and the post content.

As an actual implementation, of course, it was horrifying and terrible but it's quite possible. I think the forum script from Matt's Script Archive (which still exists) basically edited the html thread files directly, and didn't use a database connection either.


You are correct about the Matt's Script Archive forum. I ran a few 10,000+ message forums back in the late 90's that were based off it. They ran mostly without any issues and only occasionally you would find a parent node's tree was not saved correctly and truncated. In most cases this could be manually repaired.

The biggest issue was archiving as the main file would need to be manually edited to remove older threads.

I had moved on to other things by the time phpBB was released.


Oh wow... and I even found a few of those Matt's Script Archive forums still running, too. So much nostalgia..


The Golden Age of Webmasters!


Hi there, you ask interesting questions. I have two things for you that aren't exactly what you're looking for, but they are definitely relevant.

First of all, a while ago I made a serverless chat for in your browser. It's just a proof of concept, it does work however. There will be nobody there, but you can test it by opening two browser tabs or using a second device. It works in the same way that Skype works, peer to peer. It's cool, because it is in the browser. http://codepen.io/Azeirah/pen/BHnbz

Second of all, you know bitcoins, right? It has this blockchain thing. People are starting to realize the potential of the blockchain. I've seen multiple initiatives and examples of applications using the blockhain to store the data without a static database. Note however, the data is still stored somewhere, on the users' pc's!

This is not a database-less application, it is decentralized. It's still very cool however, and may have a huge influence on the internet in the future if it stabilizes.

https://eris.projectdouglas.org/ (and their github: https://github.com/project-douglas/eris)


all hail eris! all hail discordia! kallisti.

the goddess looks at this creation and she is happy.


You can run a community of sorts, but the feature list will be sparse. The only viable system sans-database that comes to mind is plain-text.

I'm a member of a forum that had an old system in the beginning that used plain text "database" for a very long time. Performance was fairly reasonable at 5000 visitors per day, if the daily hit counter was to be trusted. They since moved it to SQLite, which is very reliable in its own right, and now it's Postgres. It was only now that we got searching as a "feature".

The system was written originally in PHP 4.x, but they did move it to 5.x. I'm not sure what version they use now. I do know they were running ancient hardware and was up solely due to the charity of the admin.

I think there was an index file that stores new topic summaries, as there was no field for titles, when they were created.

They used microtime() as the ID, but the topics were stored by splitting it to 3 digit directories.

Seconds as 1234567890 and decimal 0.12345618 were combined to create /123/456/789/012/345618.html

There's no registration system or other way to identify the user so I believe the posts were separated by some kind of entry separator in a single file.

I think the pagination system created an array of sequential IDs and checked to see if files existed in those directories. There were no numbered links for pagination, only next/previous and if you reached the end, the next page would be blank.

There's no reason you should ever get laughed at for being curious.


Thanks, I appreciate the nice answer but it's amazing how often newbies get laughed at for being curious! Glad it hasn't happened today :-)

So on to your answer, I'm going to be running this community at way less than 5000 visitors per day! So from what I understand, the site was run using PHP and every time somebody made a new entry a new file was created on disk with a reference to that file in the main index page. With the name of the file was based on the time it was written to disk, with some checking to make sure to avoid files with duplicate names. I assume the main index file was the front page of the site?

Have I got that right? And if so, where do you think I should take my studying to set something like this up. I mean, are there any frameworks that you know of like Jekyll for instance or would I need to learn PHP before I could set it up?


There are lots of samples for PHP, but you don't need to study that. As d33n suggests, there are examples that already use flat files as a database and you can browse Github for code examples.

Our forum was setup so that one thread = one HTML file. Any new replies to the thread were appended to the bottom of the file right before the footer. The added benefit of this is that it was very quick to read new posts as there was no processing taking place after the post is created.

Ex: If a new thread is being created, it will generate an ID, say 140585637400000000. Now this gets translated into a path : /140/585/637/400/000000.html and a file is created there.

The body of the new thread is added to the HTML file. Any subsequent replies are added to that same file right below the previous one.

So when a visitor requests example.com/topic/140585637400000000, the script takes the last part, turns into a file path, adds a template - which has the reply form with that ID - and sends it to the user. No additional processing needed. When a new reply is made, the script builds the path again and adds it to the bottom of the same HTML file. And so on...

I think your biggest hurdle is the initial planning. Try to carefully plan this out as much as you can, but obviously you won't know what future circumstances will bring. If you build it in such a way so that the storage mechanism doesn't need to change much or at all, you should be most of the way there.

Edit: Maybe I can convince the admin to publish some of the source. You're not the only one to be interested in something similar and I think there's a real demand for lo-fi community software.


There are a lot of open source products you can use, I would advise start at http://en.wikipedia.org/wiki/List_of_content_management_syst... to find a Wiki/CMS in the language you like most, with a flat-file database. Then look for plugins/extensions that provide the community features you need.


Yes, it's possible. There is a blogging platform called Greymatter which used to be popular that you can draw inspiration from.

https://en.wikipedia.org/wiki/Greymatter_%28software%29

When you'd add a comment to a blog post on Greymatter, it would write that content to it's own flatfile database, then trigger a rendering update which would read the content, transform it with a template, and produce a new .html file, overwriting the old one. Bam! Dynamic site based on (mostly) static pages.

I don't see why a forum or social platform couldn't be built in a similar manner. You won't be able to do much in the way of per-user authorization, but as long as all your content is supposed to be viewable to everyone, you should be OK.

One problem with Greymatter was that if a thread or site grew large, sometimes generating new files from templates would take so long that your connection to the CGI would time out before they were finished. Another problem was that two updates to the site posted at about the same time could stomp on each other during the rendering phase.

'Cause hey, flatfiles are hard. That's why I wouldn't forgo using a real database in favor of flatfiles. If you don't want to run MySQL or Postgres on an RPI, then use SQLite. There is, after all, a reason that these products exist. Let SQLite or MySQL deal with the flatfiles for your instead of re-inventing yet another wheel.

I also wouldn't update the .html files the moment someone posted. I'd have the scripts just commit the post to the database, and then regularly re-render the files from a cron script. This avoids having your rendering stage stomp on each other, provided that the rendering time is less than your rendering interval.


A file database like sqlite or leveldb might be a good compromise between the features of a database and the ease of a flat filesystem


Have a look at 4chan - once the thread is done, it's gone, no need for db for that and the community is still there.


I don't know about now since I don't visit 4chan anymore, but the imageboards very clearly used a DB - during periods of high load one of the most common error messages was "MySQL connection error" which became a sort of minor meme for a while (this was several years ago.)

On the other hand the textboards do appear to not use a traditional RDBMS - they were running a modified version of Shiichan (http://wakaba.c3.cx/shii/shiichan ); Kareha (http://wakaba.c3.cx/s/web/wakaba_kareha ) is another popular textboard script that doesn't need a database.


I remember reading that 4chan saved each page as just that, static pages. The server saves to each page when a new comment is posted to the relevant page and serves up pages based on last modified times.


AFAIK sites like 4chan store post data in a database then use that to rebuild the static html files whenever someone posts or deletes.


I sure do think it is possible. I am not sure how soon you will actually run into problems, but I have been using Dokuwiki, which uses text files rather than a database for all storage purposes. Although it is not in itself a good example for a community-centric software, it shows what can be done, the contents are versioned, there are access control lists and so on.

If you can build a wiki based on files, why would you not be able to do so with some kind of forum community? Maybe Dokuwiki is already a good solution for you.

I feel that the others readers reactions show that you could elaborate your motivation some more. A database can be a quick and useful solution and solves many problems, such as distributed access, central management and administration, security and backup.

http://en.wikipedia.org/wiki/DokuWiki


It is sort of possible to do this statically. The benefit of static websites is not only fast loading, but also that GitHub Pages offer very high-quality free static hosting.

For a forum:

1) You'll need a server with an API for submitting posts. You can use the filesystem as the DB, if you'd like.

2) This server will update the static website (i.e. regenerate it, and push it to GitHub Pages) on a regular interval (which could be as small as one minute).

So it's definitely doable, and offers huge benefits. For one, static pages can be served very efficiently. If you use GitHub pages, you can leverage its global Content Delivery Network (CDN) for free. And you are guaranteed a nearly zero downtime for free by GitHub.

The only "drawback" is the insignificant one minute delay before a new post goes "live". Shoot me an email if you'd like to collaborate on a static forum project!


I would look into running a Citadel BBS[0] if it's something on the order of a personal project. Essentially, it's a BBS with Telnet, API, web, and email interfaces.

[0] http://www.citadel.org/


Back around 1999-2000, there was a popular free forum software that did exactly that: it generated new static pages every time someone made a post or commented on the thread. Reviewing my personal timeline, that was before I learned PHP, so it was probably something that ran on Perl.

It may have been YaBB (http://sourceforge.net/projects/yabb/) (http://www.yabbforum.com) or some predecessor with a long-forgotten name. The earliest commit on that sourceforge site was 2003, but the screenshots look similar to what I remember.


Several people have made use of a lightweight webserver like Lighttpd in combination with PHP/SQLite in order to set up boards or wordpress installations on Raspberry Pis. Would that be something for you?


Another lightweight stack would be Openresty. Scripting nginx with lua. Both blazing fast. Lapis might be an appropriate web framework to start developing with.


I'm not sure if this counts as a database, but you can try out things like Firebase or Parse, which allow you to fetch data from a cloud server, and render it on your server. So your application will be just a dumb static client containing templates and application logic in HTML/CSS/JS, while the data would be stored on the firebase servers, directly fetched from there by your visitors on page-load, making your work pretty easy.


Dokuwiki is a flat file based wiki, and there is a discussion plugin for it, maybe that might give you some insight

https://www.dokuwiki.org/plugin:discussion

Though I think you could get the messages in a flat file (heck, BBSs did that since the 70s) though having a small db or some sort of link lists for user access, message indexing and such would probably add to the speed of the system.


Long time ago (~7 years ago, while at high-school), I have used pm-wiki[1] for a small lab-team (~10 ish people) to collaborate on our research projects :-) We eventually moved to tarballs and emails, but wiki was really easy to setup, had a lot of plugins and didn't require a database (all data are plaintext). Maybe it would suit you.

[1] http://www.pmwiki.org/


If the only requirement is to run it on a raspberry pi, and you find that a RDBMS will be too resource intensive, you can always use one of the cloud data storage services out there. Amazon has an RDS offering as part of AWS, MySQL, Postgres, etc. Then your app only needs to talk to the database over the wire with zero local resource consumption by the database.


Wait a minute, not sure I follow. Do you mean that I can run something like one of the forum communities I mentioned on the Raspberry Pi, but then offload all of the grunt work to Amazon?

Obviously that would mean that the Raspberry Pi would always need to be connected to the Internet which wouldn't believe much of a problem, but aren't there fairly significant privacy concerns with doing something like that? Or do you think it's possible to mitigate those concerns by encrypting the link to Amazon, I am only speculating wildly here while I furiously Google all of the acronyms you mentioned. :-)

Thanks!


You can indeed run a forum community but offload the work to Amazon. Speaking about that, you can easily run everything on Amazon, because the latency between your webserver and the database might become a problem.

Privacy concerns wise, Amazon and Azure and the sorts are probably way better protected than your own installation. Of course the privacy with Amazon and Azure are all clearly written in their terms of usage and privacy statements. And NSA wise, if they get into an Amazon (either politically or technically) they will easily be able to hack your own installation.


If offloading things to Amazon is an option, why use a Raspberry Pi at all? Wasn't the sport of it to get it running on a Pi? If it only has to look like it's run on a Raspberry Pi you might as well let it run Nginx as a reverse proxy to a discourse.org forum.


One of the main points of static sites is that they don't update terribly often. Forums, on the other hand, update constantly. You are wanting something with good caching (server side and client side), and thats about it. Otherwise, you are going to be regenerating the pages constantly on every post/update.


"good caching" needs to be "stores in cache forever, unless an update is required, at which point, only the content that actually changes is updated in cache" to be as good as static files. In my experience, caching is never that good. What's the problem with writing to the fs on every POST? The number of GETs will still outweigh the number of POSTs significantly.


It's my understanding that forums like HN and Reddit don't regenerate the front page for every post or view; rather, it is regenerated every second or two and cached in RAM.

Of course, it's unlikely you'd run a forum as big as HN or Reddit on a Raspberry Pi.


Wouldn't Russian Doll caching tick that box?


A static forum website seems like a contradiction in terms. Static means unchanging and forums must change by definition. I guess the author means the site will serve .html files?

The site could not be completely static, because something must be live to handle POST requests.


If the point of getting a forum site running on a Raspberry Pi is just for novelty & challenge, ignore this. But I think the most practical solution here is to forget the Raspberry Pi altogether or to set up the majority of processing and storage on a separate machine and have the Raspberry Pi server talk to it.


Yes.

People (including me) are thinking about and working on personally owned and hosted data. Your forum posts live in _your_ cloud, and in a way that the features as brickcap describes are achievable.

See https://news.ycombinator.com/item?id=8049890


The first forum software I wrote way back in 1998 didn't use a db.

Basically I just marked up the html file with some data to tell me where the posts were and then when someone would post a new comment the perl script would just insert it into the html.

Serving the forum was just pulling the html files from the disk.


I assume that you just handled the concurrent write situation with a file lock?


I've used SimpleForum for that purpose a long time ago:

http://freecode.com/projects/simpleforum

Unfortunately it doesn't seem to be in active development any more.


This is a fun exercise :)All right. Let us first analyze what the requirements should be. Obviously I am assuming a lot of things and thinking about this in a way I would go about building a (minimal)forum on raspberry pi:-

1. There should be a couple of predefined categories.

2. There should be a way to authenticate/authorize users.

3. There should be a way to search the previous posts.

4. There should be a facility to moderate.

Static forums

Very neat idea. All we need is a simple webserver, I am thinking, nginx that transforms and renders our posts to a user. Our first requirement that of predefined categories can be easily met by creating sub directories inside the directory that will be served. And we can avoid duplicating the pages between the sub directory (ie. a tag) and the main directory by using symbolic links[1] and configuring nginx to serve the symlinks [2]. Moderation of posts can be done by migrating a post in and our of a `moderate` folder that is not served by the application. But that obviously limits the moderation only to the webmaster. The trouble arises in two cases

a) When we need a facility to manage sessions. There is no way we can do that in static pages.

b) It will be quite difficult to create a search index for our application.

Now nginx is a good choice as it is pretty light weight and we will keep it. But what if we could have another application that can easily talk with nginx, manage user authentication for us, allow us to create search indexes, takes care of our data and still be light weight. Other have already suggested sqllite.How about couchdb?[3]

First couchdb has built in authentication and authorzation support. It will take you no more than 3 http requests to implement your own register-login-logout-scenario[4]. Second it is pretty easy creating simple search indexes with couchdb[5]. You can serve your forum directly from couchdb. It does not statisfy your orignal requirement of no database but it certainly gives you a way to work without any application layer. But perhaps most importantly it is very low on memory consumption. Not as low as sql lite but still quite low.

[1](http://stackoverflow.com/questions/1951742/how-to-symlink-a-...)

[2](http://nginx.org/en/docs/http/ngx_http_core_module.html#disa...)

[3](http://couchdb.apache.org/)

[4](http://www.staticshin.com/programming/easy-user-accounts-man...)

[5](http://wiki.apache.org/couchdb/View_collation#String_Ranges)


I haven't seen anyone mentionning it yet: what about a mailing list ? You could even offload its presentation layer to a third party nntp reader.


Checkout NoNonsense forum (camendesign [.] com/code/nononsense_forum). It works without a database and stores everything as RSS.


You could build it on text files, but it's still a database... so I think the answer is a pretty clear no.


Sure its possible (plenty of people I work with use files instead of a database), but why would you want to?


I don't see why the Pi would be unable to run a database. In any case, sqlite would work.


you can write it to disk

you can also hold all the data in memory but when the memory is full you have to get rid of the old threads, but if your server app restarts you lose all threads

you could also use a message queue, something like firebase


I'm a beginner, so forgive me if this is stupid, but consider using Disqus - https://disqus.com/.


Disqus isn't really appropriate for a forum. It's a simple turnkey commenting system, intended for blogs and whatnot. It doesn't have the capabilities that you'd want for a more involved forum.


git or mercurial can be use to track content, furthermore libgit2 api also useful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: