Does anyone else think that, all conspiracy theory aside, the information published is only partly true?
If it's a purely hw issue, say a piece of highly specialized hardware (hw crypto, piece of old mainframe, etc), it can take a very long time to source it (although a SPOF is surprising in such a critical infrastructure), but it does not take 100 people to work on it, 24/7. It requires a dozen guys making angry phone calls every 2 hours to some suppliers...
If it really takes 100 people 24/7, mostly likely explanation to me is that it's software related and they have to rewrite a critical sw in record time.
Causes:
-bugfix (which may be entirely unrelated to secrity)
-emergency migration from one hw vendor to another, for sourcing reason which entails rewrite of a part of the system.
I think you're taking that a bit too literally: I interpreted that as "we've got 100 tech guys and we've told them to fix this ASAP". Which, this being government, likely means 99 of them frantically trying to cover their asses and 1 guy doing the work.
Let me describe the structure of a project I witnessed but thankfully was never part of (and see if I can get the formatting right)
Subcontractor which did the work:
- 2 devs
- 1 accountant
- 1 project manager
Contracted under:
- 1 project lead
- 1 dev lead
- 4 QA engineers
- 1 lead QA engineer
- 3 accountants
- 1 lead accountant
- 2 document reviewers
- 1 lead document reviewer
The actual people doing the work spent 1 month writing documentation and then changed ...10 lines of code. This software was not mission critical, no lives were at stake, and if there was a bug the danger was negligible. That's government in action.
I whole heartedly believe that there are 100 people at work here.
That sounds typical for any large legacy codebase serving hundreds of customers. It's not a government thing, it's just the nature of large projects. It's also not as simple as saying that it's waste. Changing 10 lines of code in an app that has been live for years can be the result of months of work and careful ruling out of 10,000 lines of code you decided not to change.
100% that project alone. Remember, qa people in government contracts often are concerned less with qa'ing the code and more qa'ing the process within which the code is developed.
"100 people to work on it, 24/7" sounds like manual data entry to me - who wants to bet against a loss of a db without usable backups available, and a bunch of people now hand entering hardcopy forms and data out of emails back into some critical database?
It could also be manual verification. Say they had two or more database nodes, a hardware failure on one could cause them to go out of sync and then they need to verify all the divergent changes.
I'm guessing a backup technically existed (to satisfy any requirements or boss's-orders to the "letter of the law"), but nobody bothered giving any thought to restoring everything from that backup.
Of course, a backup that can't be restored (or nobody knows how to restore) is more or less equivalent to not having a backup, so this distinction probably doesn't matter.
Indeed - "usable" backups - they are ones you've tested restoring from (recently enough to be assured they still work on the latest version of your system).
I _hope_ they don't have someone saying "we had a backup - it was on a RAID set!".
It would surprise me less to discover the reported migration from Oracle on Windows to Oracle on Linux was still partly done, and they were taking solid reliable useable backups - of the old not-yet-decommissioned windows db servers...
(For the record, I've made both of those mistakes (and more) in my career... Fortunately neither represented weeks of 24x7 remedial work by 100s of people.)
Damn I wish this would happen at the IRS. Tired of paying taxes to do nothing to help. Spend hundreds of millions on computer systems but they can't figure out how to do a backup.
"More than 100 experts across the country are working on this problem 24/7" is PR-speak.
There's a pile of folks between a few contractor firms, and the staff in the offices that manage those contracts, whose current top priority is fixing whatever the issue is.
They said specifically that embassies and consulates were having problems doing biometric checks, which I would imagine requires a State system to talk to a number of other federal/intel/defense systems to do records look-ups. If we do some further supposing, that web of interconnected systems may have had some underlying issues and I could definitely see it taking some time to get each firm involved in building or maintaining those systems together to figure out where the issues are and to figure out how to fix it.
Not necessarily; replacing the physical hardware is just one step of the process of getting a system back to normal operation. They could have a spare already, but they haven't been keep up their disaster recovery drills and now have to figure out how to setup it up again from scratch. It doesn't even have to be custom hardware, it could just be a hard drive failure on commodity server that was just never backed up properly.
Or the system has a long and complex work flow and some parts of that involve human labour. When any part of the system has a backlog, the laborious part needs people thrown at it to clear the backlog.
This certainly is the case with the UK Passport office when it has issues.
I'm betting on the "we told all are IT workers to solve it" explanation, but their phrasing could still be true literally.
Imagine that software depends on complex hardware, but it's not manufactured anymore, or the gov can not legally buy it anymore (contract expired). If it fails, it must be ported ASAP, what can take that kind of work.
I've never seen something like this happen, but I've seen enough instances of it being possible to imagine it would happen once in a while.
Hmm, given they already had 'billions of rows' [1] (though probably distributed across many tables), they might have been wise enough to plan for this.
"The Consular Consolidated Database (CCD) is one of the largest Oracle based data warehouses in the world that holds current and archived data from the Consular Affairs (CA) domestic and post databases around the world. As of December 2009, it contains over 100 million visa cases and 75 million photographs, utilizing billions of rows of data, and has a current growth rate of approximately 35 thousand visa cases every day"
Unclear what's gone wrong this time, but the mention of "biometrics" (like photographs) makes me suspect it's the same system.
> Typical quote: "We knew we could run it on one node. We needed to have one very powerful node."
As someone who has done government contracting...it's always alarming when this path is suggested. The last time was a beefy server with about 50TB of ram to keep all data in memory with multiple hard drives to keep backups. Ugh.
One of the problems with proprietary databases is that licensing issues are another barrier to creating proper clusters (and to have equal environments on development, testing and production).
They did mention it's not the same thing this time.
"This is not the same problem we had with the CCD last year, which was a problem with the database caused by a software patch. This is a hardware failure, and we are working to restore system functions."
A Visa application system doesn't strike me as an obvious candidate for a relational database. I would rather store all the information related to a visa application in a document store with a relational database only used as a sort of index. I wonder if lots of systems are not built using relational just out of habit. And then bump into these problems.
Say you're the tech lead for the project, built before 2001 [1], and you need to hire a vendor that can provide a cluster that won't break with 100TB+ and guarantee long term support, possibly for decades.
I don't remember the exact timeframes, but I suspect Digital and Sun Microsystems were likely looking choices around that time. (Sun was the default choice for Telco billing systems around then...)
It's my understanding that most of this is pictures and not "actual rows", the database without the pictures should be much smaller. You could put the pictures on a SAN.
To a conservative government purchaser working with a conservative software vendor on a system that doesn't do anything remotely fancy or magic, a relational database (Oracle, specifically) was most certainly the right choice 10 years ago, only moderately less so today.
And just to be clear, that's (mostly) a good thing.
Never bumped into the "beautiful flexibility" of ERD databases - where the app devs assume all responsibility over things that database designers _really_ should be saying "Hell no!" to?
The flexibility typically comes from transactions and joins, something document store proponents typically shy away from. And yes, if you give stupid devs powerful tools they will shoot themselves.
Of course you can, you might need to apply for a visa multiple times if you want to visit multiple times. In fact, most tourists have to do that. It's still about the same order of magnitude though.
The meme doesn't imply that those products are good or never break, just that you won't get fired for choosing them. We'll have to wait and see if anybody involved in selecting Oracle gets fired over this.
June 4th - OPM breach announced.
June 12th - OPM confirms security clearance records exposed.
I'm glad I don't have an urgent need to US visa or passport or any form of vaguely federal security or residency or travel related paperwork to go through.
So they're saying We're terrible and we don't care that much. Wow.
If this where a business, they'd be losing customers... but thankfully, they're a government apparatus that holds people over a barrel and doesn't have to provide enough decent service to offer enough visas to meet the demand, hence 12-30 million undocumented immigrants.
Most good companies have extensive interview process to hire new employees - this is most direct comparison with getting new immigrants to the country. Comparison with having 12-30 million illegals in US is the Caltrain deciding not to eject drunk hobos living under overpass for tresspassing. Instead of getting immigrants compete on education/skills/job experience/employee referrals (aka relatives) like Canada, Australia or NZ, US allows unprofitable, illiterate and low skill immigrants into the country.
The US has a process for highly skilled workers as well, not an easy one but it exists.
I don't have the numbers but looking at the amount of illegal immigration into Europe, and looking at what "jobs" most of the semi-legal (refugees, asylum seekers etc.) who also outweigh the amount of high skill labor coming into Europe I won't say this is a US specific problem.
I would also suspect that Canada with it's point based system has the same issue as well. Their immigration system is just more publicized and was given a priority especially during the late 90's and early 2000's since they felt like they were losing the competitiveness with US based companies.
Heck I've been to Canada 3 times in the past 5-6 years and the amount of what i would assume is "illegal" immigration in some of the cities there seems to also be quite high, quite a high percentage of Asian and African decent workers that don't speak English or French and seem to be very wary of people in general.
The keystone service is cartel enforcer. That's probably not something that should ever take the human out of the decision-making loop. When the service is essentially, "I hold the same gun to everyone's head to make sure that everyone follows the same rules," you really don't want that automated.
Leaving aside the issue of whether such a service needs to be centralized or to exist at all, if you substitute human evaluation for a set of algorithmic rules, a systems cracker can corrupt the entire cartel more easily than someone individually subverting possibly thousands of independent human actors.
In terms of many of the other services typically provided by government, yes, those could potentially be replaced by software.
Technically, sure (assuming you also build the robot hardware) but we're quite far from having software that can negotiate treaties, figure out how to regulate water rights in California, etc...
Well of course I don't mean to completely replace the government.
But how about making it more lean? How about modernizing with the times? How about making it all much more transparent and accountable? Couldn't open source software do all of this?
We could start with systems that are used by smaller, poorer countries & grow from there.
I've been trying to get marriage visa things completed since May.
http://nvc.state.gov/
The payment system was down for almost the entire month of May. After finally being able to get through step 2, I've been stopped constantly. The entire social security website was down one night this month. I currently need my tax transcripts from the IRS. The online system for printing them out is down indefinitely and even the form to request they be sent by mail was constantly throwing up a technical error has occurred message.
I know that pain. Took me and my wife almost 5 years, near $2,000 and three interviews to get her green card. And that was with her already being in the states when we met and got married.
Keep at it. The feeling when it's finally over is great.
These stories always fascinate me. At that point why even bother validating the grounds for the visa - anyone who would put up with a process like that deserves it!
I immigrated to Japan - a country often held up as an example of xenophobia and resistance to immigration, and it only took a week for my spouse visa paperwork to be examined and approved. The only cost was for translations of some paperwork from home ($40 at the embassy) and then $20 for the residence card once I was approved.
My wife and I are trying to do the same thing, we're very close to applying for Adjustment of Status(for her), and are scrambling to make sure we don't miss any nitty-gritty details, which is proving to be quite the headache. Can you elaborate on why it took it as long as it did?
On slightly related note, USA doesn't support airport transfers without visa - which is taken for granted in the rest of the world. Which basically makes Latin America an island since most flights there are routed via USA, its gatekeeper.
We always knew that the FBI surveillance project was an complete and utter waste of money designed simply to make it appear that the government is indeed doing something.
Different departments. The NSA and FBI don't talk to each other; NSA has all of the technological innovations for monitoring wide spreads of internet bandwidth.
I'm waiting for an advanced parole visa (moved to the US in January) and can't get a visa in time to travel intentionally to my best friends wedding in the UK. Applied in Feb. Got told last week they need another 2 months to process it - I wonder if this is why?
Schedule a personal meeting at your closest USCIS (or whatever the immigration office is called these days) and explain it to the officer. They can give you a temporary "paper pass" you can use to travel until paperwork completes if they find your reason sufficient.
Best friend's wedding might not qualify as a good enough reason, but combined with the long delay it might - definitely worth your time talking to them.
I know someone who got this kind of pass to visit her sick father.
Took 14 months with Congressional intervention just to get my partner a immigrant visa -- which is presumably the easiest.
America has been so obsessed with thinking of itself as the "best country in the world!(TM)" for so long, it has meanwhile regressed into a failed state.
American Exceptionalism, etc. bother me too (an American), but it seems a bit incongruous to call it a failed state while simultaneously relating the story of someone desiring to immigrate to it so badly that they'd wait 14 months and get Congress to intervene. If it's so failed, your partner can leave and I'm sure someone else will be happy to come.
Of course I am a hypocrite. My partner wants to visit the US; I would prefer to live elsewhere. But, sometimes compromise is needed in relationships.
'failed state' is certainly hyperbole here -- there are large elements of society which continue to function, despite the problems at the national government level. It would truly be a failed state if, for example, the national government's performance level were propagated to the whole society.
As much as the US has inefficiencies, the hyperbole is a bit strong. There's quite a difference between being the "best country in the world" and a "failed state."
Certainly the mindset/passive prejudice that America is the "best country in the world (TM)" did not become a thing because of our once-fast Visa processing times.
Unless they're NEXUS pass holding Canadians, in which case they have been fingerprinted. I just got fingerprinted this weekend for a NEXUS pass; my fingerprints are apparently "perfect". The US CBP agent taking them advised me not to take up a life of crime, as I would be caught very quickly. Or at least to wear gloves.
No they do not. I've never been fingerprinted as a Canadian until I got a work visa for the USA. For many Canadians who visit the usa for tourism or similar will never have their fingerprints in a canadian or us database under current laws.
"The Bureau of Consular Affairs reports that the database responsible for handling biometric clearances has been rebuilt and is being tested. 39 posts, representing more than two-thirds of our normal capacity, are now online and issuing visas. We are working to restore full biometric data processing."
It is natural for issues for happen like this time to time. If it's not technology, something else. Australia's visa offices (DIBP) are on industrial action at the moment.
On a technical level, it's possible, on a practical level it's not. You can't get 100% uptime due to the unforeseen problems that will hit once in a while.
If the downtime was anticipated it can always be averted. Even a Fortune 500 can expect some downtime over a period large enough.
Take Sony earlier this year. The US government itself the year before.
On 18 June 2015 the Lufthansa site looked like this %templateSomethingVar/somethingVar1% and lots of variables like that (not sure what template system were they using). Now it's okay.
But Polish airlines is down too (unrelated systems I guess)... but could this be something coordinated?
(Edit: Sheesh, what's with all the downvotes? Hardware failures are a route-aroundable thing that should never cause downtime... as long as you don't adopt an outdated "here sits the one source of truth, and its name is [fat server manually configured]" psychology. HA clustering is at the point where it's a generic, drop-in thing for arbitrary services. If you fail to recognize the above, go do some learning.)
If it's a purely hw issue, say a piece of highly specialized hardware (hw crypto, piece of old mainframe, etc), it can take a very long time to source it (although a SPOF is surprising in such a critical infrastructure), but it does not take 100 people to work on it, 24/7. It requires a dozen guys making angry phone calls every 2 hours to some suppliers...
If it really takes 100 people 24/7, mostly likely explanation to me is that it's software related and they have to rewrite a critical sw in record time. Causes:
-bugfix (which may be entirely unrelated to secrity)
-emergency migration from one hw vendor to another, for sourcing reason which entails rewrite of a part of the system.