Hacker News new | past | comments | ask | show | jobs | submit login

Apparently last year's major failure was caused by their Oracle data warehouse going down hard:

http://foia.state.gov/_docs/PIA/ConsularConsolidatedDatabase...

"The Consular Consolidated Database (CCD) is one of the largest Oracle based data warehouses in the world that holds current and archived data from the Consular Affairs (CA) domestic and post databases around the world. As of December 2009, it contains over 100 million visa cases and 75 million photographs, utilizing billions of rows of data, and has a current growth rate of approximately 35 thousand visa cases every day"

Unclear what's gone wrong this time, but the mention of "biometrics" (like photographs) makes me suspect it's the same system.




An interview with the director responsible for the system, props to @kalleboo for finding this:

http://fcw.com/Articles/2014/10/20/State-Department-database...

Typical quote: "We knew we could run it on one node. We needed to have one very powerful node."


> Typical quote: "We knew we could run it on one node. We needed to have one very powerful node."

As someone who has done government contracting...it's always alarming when this path is suggested. The last time was a beefy server with about 50TB of ram to keep all data in memory with multiple hard drives to keep backups. Ugh.


Mind talking more about what the challenges in running a 50TB relational database?

I and I'm sure many others would be very interested in a blog post or long-ish comment with anecdotes and lessons learnt...


> Mind talking more about what the challenges in running a 50TB relational database?

I wish I could but it wasn't my direct project so I wasn't part of the team implementing it. I'm not even sure it was successfully delivered.


The WOPR? How about a nice game of chess?


One of the problems with proprietary databases is that licensing issues are another barrier to creating proper clusters (and to have equal environments on development, testing and production).


I'd assume Oracle would still have a century long contract to ensure they still have a monopoly.


They did mention it's not the same thing this time.

"This is not the same problem we had with the CCD last year, which was a problem with the database caused by a software patch. This is a hardware failure, and we are working to restore system functions."


If you read that carefully, that does not preclude it from being a different problem with the CCD.


A Visa application system doesn't strike me as an obvious candidate for a relational database. I would rather store all the information related to a visa application in a document store with a relational database only used as a sort of index. I wonder if lots of systems are not built using relational just out of habit. And then bump into these problems.


Say you're the tech lead for the project, built before 2001 [1], and you need to hire a vendor that can provide a cluster that won't break with 100TB+ and guarantee long term support, possibly for decades.

Who would you call?

[1] http://www.law.umaryland.edu/marshall/crsreports/crsdocument...


I don't remember the exact timeframes, but I suspect Digital and Sun Microsystems were likely looking choices around that time. (Sun was the default choice for Telco billing systems around then...)


Why do you need a cluster?


Back in 2001, 100TB was massive. The first 1 TB HDD didn't roll out until 2007.


It's my understanding that most of this is pictures and not "actual rows", the database without the pictures should be much smaller. You could put the pictures on a SAN.


To a conservative government purchaser working with a conservative software vendor on a system that doesn't do anything remotely fancy or magic, a relational database (Oracle, specifically) was most certainly the right choice 10 years ago, only moderately less so today.

And just to be clear, that's (mostly) a good thing.


> I would rather store all the information related to a visa application in a document store with a relational database only used as a sort of index.

Now you have two problems.


I assume the technical fundaments are older than document databases were popular in enterprise and reliable on scale


Why not? There should only be at most a few billion rows (since you can't have more visa applications than humans).


Never bumped into the "beautiful flexibility" of ERD databases - where the app devs assume all responsibility over things that database designers _really_ should be saying "Hell no!" to?

  +------------+------------+--------------------------------+
  | asset_id   | asset_name | asset_data                     |
  +------------+------------+--------------------------------+
  | 2147483646 | firstName  | Joseph                         |
  | 2147483647 | lastName   | Bloggs                         |
  |-2147483647 | phoneNumber| +1 415 555 1234                |
  |-2147483646 | email      | joebloggs@gmal.com             |
  +------------+------------+--------------------------------+
(I think I still have brain damage from trying to get "too smart" with a Magento eCommerce site once...)


The flexibility typically comes from transactions and joins, something document store proponents typically shy away from. And yes, if you give stupid devs powerful tools they will shoot themselves.


It's not only flexibility, but also discipline and taking away choices (for bad modelling) that comes from following the relational normal forms.


Of course you can, you might need to apply for a visa multiple times if you want to visit multiple times. In fact, most tourists have to do that. It's still about the same order of magnitude though.


Yes, that's what I meant.


The database has way more than Visa applications, it also centralizes a bunch of other records.


What happened to "No one was fired for using Oracle or Microsoft" meme?


The meme doesn't imply that those products are good or never break, just that you won't get fired for choosing them. We'll have to wait and see if anybody involved in selecting Oracle gets fired over this.


I think it was "no one was fired for choose IBM" :p


and IBM as well, true :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: