Hacker News new | past | comments | ask | show | jobs | submit login

This is a lesson learned well from open-ended systems. An open-ended system is one where input is received to process, but the input is not well defined. The more accepted unknown input becomes the more open the system must be in its rules to process it. The results of processing are:

* expected output from a known input (intention)

* unexpected output from a known input (defect)

* expected output from an unknown input (serendipity)

* unexpected output from an unknown input (unintentional)

For example I maintain a parser and beautifier for many different languages and many different grammars of those languages. In some cases these languages are really multiple languages (or grammars) imposed upon each other and so the application code must recursively switch to different parsing schemes in the middle of the given input.

The more decisions you make in your application code the more complex it becomes and predicting complexity is hard. Since you cannot know of every combination of decisions necessary for every combination of input you do your best to impose super-isolation of tiny internal algorithms. This means you attempt to isolate decision criteria into separated atomic units and those separated atomic units must impose their decision criteria without regard for the various other atomic decision units. Provided well reasoned data structures this is less challenging than it sounds.

The goal in all of this is to eliminate unintentional results (see forth bullet point above). It is okay to be wrong, as wrong is a subjective quality, provided each of the atomic decision units are each operating correctly. When that is not enough you add further logic to reduce the interference of the various decision units upon each other. In the case of various external factors imposing interference you must ensure your application is isolated and testable apart from those external factors so that when such defects arise you can eliminate as much known criteria as rapidly as possible.

You will never be sure your open-ended system works as intended 100% of the time, but with enough test samples you can build confidence against a variety of unknown combinations.




How is 2 & 3 from above different from 4?

An unknown input producing correct results is still a problem - the unknown input is the problem.

Therefore, i postitulate that anytime an unknown input is possible, the software is defective.


My application is a code beautifier that receives code samples as input. I cannot know of every possible combination of code samples. This does not necessarily mean the application is defective. A defective state is unwanted output.

Another way to think about this is that the more open a system is the more useful and risky it is. It is useful because it can do more while tolerating less terse requirements upon the user. It increases risk because there is more to test and less certainty the user will get what they want. Part of that risk is that its hard to guess at what users want as sometimes the users aren't even sure of what they want.


Say you write some movement software.

You expect that people will move forward, left, or right.

You didn't expect people to try moving backward.

People start moving backward, but the software happens to do the right thing due to how it was written.

Is the software defective because of your missed expectation?


Yes, i would still classify the software defective. It did not reject backwards explicitly. Just because by accident it works (i.e., the consumer of the output doesn't care) doesn't mean the defect has gone away.

To be not defective, the software has to explicitly reject input that it was not designed to handle.

Imagine if the software updated with some changes, and the unknown input now produces an incorrect output. Is the defect introduced with the changes? Or was the defect always there?


> To be not defective, the software has to explicitly reject input that it was not designed to handle.

In some cases that breaks forward capability. e.g. the case where there is an unknown XML tag. You could reject the tag or message. You'll end up rejecting all future change inputs.

If the whitelist of acceptable items is large, it may be acceptable to have a black list however if the above holds, you don't know what you don't know.


The middle ground may be explicitly flag/indicate/log that an unknown situation has been encountered, and 'handle' that by doing something useful (continuing to work without crashing, preventing known "unknown" data from being processed silently, etc). It may not help with forward compat entirely, but it would be explicitly known (and I'd think would be somewhat easier to modify/extend for known unknowns in the future).


I've been there, painfully. On my last day on a job, some code I wrote went into production, and the databases started filling up rapidly, to the point where the whole system would halt in a few hours.

Turned out the bug had been latent in the code for 5+ years, predating me. Its data consumption had never been observed before because it was swamped by other data consumption. Changing the architecture to remove that other data brought it to the foreground.

(fwiw, the bug was caused by the difference between 0 and null as a C pointer!)


What would "reject input that it was not designed to handle" look like for an automated car?


When you come around a curve near sunrise or sunset, you may suddenly encounter visual input that overwhelms your sensors. The sun is blinding you. It might overwhelm infrared sensors, too.

If you have alternate sensors, you should trust them them more, and camera systems less.

If you have a sunshade, you should deploy that.

If it is raining, or partially cloudy, the situation may change rapidly.

And perhaps you should slow down, but if you slow down too fast, other vehicles might not be able to avoid you.


You could also argue that your expectations are defective. It is possible to accidentally solve a problem in a correct manner.


Not reliably.

It's not professional to design systems that rely on luck.

"Let's ignore this edge case and hope we get lucky" is not something you want to see in a software specification.


Or simply acknowledge that your initial specs didn't cover enough, update the specs, test the "new" functionality, and call it a feature in the release notes.


Getting lucky is not the same as relying on luck.


Where do you fall on autonomous cars? It’s okay to be anti. I’m just curious


Well, it's harder to tell something's wrong when the output looks right.


The entire world of AI relies on dealing with "unknown" input?


I would say yes.

There's a saying that when people figure out how to make a computer do something well, that it's no longer in the field of AI. I'd say there's some truth in this, in that for many problems we have solved well (e.g. playing chess), the intelligence is not that of the machine, but of the programmer.

I think that in order for a machine to genuinely be intelligent, it must be capable of original thought, and thus unknown input. Known doesn't necessarily mean specifically considered, but that it could be captured by a known definition. As an example, we can easily define all valid chess moves and checkmates, but we can't define the set of images that look like faces.


No, it doesn't. It relies on explicit rules (old school) or statistical inference (new school).

There's a difference between "breaking" unknown input - i.e. non-computable within the system as it stands - and "working" unknown input, which is within expected working parameters.

The latter is business as usual for computing systems.

The former should have a handler that tries to minimise the costs of making a mistake - either by ignoring the input, or failing safe, or with some other controlled response.

It may not do this perfectly, but not attempting to do at all it is a serious design failure.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: