Hacker News new | past | comments | ask | show | jobs | submit login

That's not the only limit btw. If any node (say "/users/") crosses 'some' number of subnodes ("/users/a", "users/b") then you cannot do any queries on the node itself. Like I cannot get even get the IDs of the subnodes ("a","b",...).

I also got similar advice to shard my users or something.

So right now, we have crossed that limit and are unable to know how many users are on our system. Their server just fails and takes down the DB for 10 or so mins if I do that query.

Firebase is good for MVPs and prototypes but not at all scalable.




> Their server just fails and takes down the DB for 10 or so mins if I do that query.

I don't know if you are aware, but their DB can't handle more than 1000 requests/sec, so if you are iterating through a list of nodes and requesting data for each one you can hit that limit (not a good practice, but sometimes you have to). Additionally, once you hit that limit, the DB slows down but keeps accepting requests meaning, if you keep hitting it, even at a slower rate you make the backlog worse. Seriously, be very very careful not to go over that limit, we found out the hard way.


Iterating would require me to first get atleast a 'shallow' list of keys for that node. But even that one REST query for shallow list of all keys crashes their server instance.

I am not even sure how I should get keys for all my user nodes anymore.

I tried doing it from an offline JSON backup they generate. But that one giant 100GB+ JSON is impossible to parse with any available tools.


Hmm. Congratulations, you've nerd-sniped me this morning!

This sounds like a classic use case for a streaming parser. The data is a mile wide and an inch deep, so at any point the memory requirements should not be too high.

What do you want to do when you've parsed it? Insert it into a real database? Iterate over it? Would simply turning it into a list of user IDs one per line suffice?


Hello. The JSON dump looks something like this:

  {
    users: {
      "userid1": {...},
      ...
    },

    ...
  }

I have tried jq stream parser to split the big dump into files like: - users.json - chatrooms.json - ... So I can then work on individual nodes.

But jq fails silently after 12-24 hours of processing. I am still researching this in free time.

If I can just get the keys (like "userid1") I can do the rest from firebase itself.


I would think something like this would work:

  cat input.json | jq -c --stream '. as $in | select(length == 2 and $in[0][0] == "users") | {}|setpath($in[0][1:]; $in[1])' > users.jsonlines
Output would me a file that looks like this:

  {"userid1":{"name":"user1"}}
  {"userid2":{"name":"user2"}}


I think I have tried something similar. I will try this one too. Thanks a lot!


The parser would have to parse the whole thing before being able to split it.

You could to the splitting yourself (it's just plain text) and create multiple files whose contents is just an array in the format:

```

[

{},

...

]

```

Then you can use JSONStream to load each of those files individually and map/reduce on the contents.


You could loop through the file counting braces, storing line numbers. Then split the file along those line numbers. The smaller files might not have the exact formatting you need to run through a parser, but you should be able to manually adjust it then, hopefully.


It's not impossible. You can do it pretty simply using https://pypi.python.org/pypi/ijson


Wow.

Running a query takes down the db?

That sounds like a major problem. How can that happen?

Maybe returning incorrect or incomplete results due to sharding... But taking the db down? That's very... Unexpected.


Never used firebase but for what it's worth you can take down most databases with a bad enough query.


In the RDBMS world, you can take down pretty much any database by giving someone in accounting a copy of Crystal Reports :)


How, specifically? Something like a very complex query joining too many tables, or maybe a full Cartesian product of n > 2 tables?


You are right. But we usually have multiple ways to query our data in a DB. Not many ways to do that with the limited Firebase API.


Nope. A clear server crash with "Internal Server Error" and the DB being totally unavailable for 10-15 mins. Apparently it's 'normal'.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: