Do neural networks dream of electric sheep?

pitchups · on March 3, 2018

This example, and others like it point to the central weakness of neural networks for image recognition: No matter how much data you feed it, they never really develop concepts or abstractions of what the objects it is classifying really represent or mean. The weight and biases that get fine tuned by gradient descent, are no more than a highly complex function mapping the input pixels to discrete classes. While this may well represent how the visual cortex works at the lowest level, what appears to be missing are higher levels of abstraction and meaning. Perhaps machine learning needs to be coupled with some of the older paradigms of AI which included modeling, logic, reasoning, to achieve understanding. As of right now, a well trained convolutional neural network is no more than a mechanical pattern matching algorithm on steroids.

agitator · on March 3, 2018

I think this might also be due to the fact that the compute for neural nets and the complexity of the networks are still in their infancy. The neural nets in these cases are all simple classifiers, working on a fixed image resolution with fixed training data. What do you expect? You can't train an intelligent machine if your architecture is dumb to begin with.

If you showed me an image of a green field with a bunch of fur balls on it. I'd go "Oh look! Floofy Sheep!" but then maybe upon closer inspection, i'd go "heeyyy.... thats actually a herd of cats!" But a neural net isn't designed to make decisions, to say hey maybe I should investigate further, etc. Its just a black box that spits out probabilities of classifiers. I think if we want to get more sophisticated with judgements and something nearing more realistic intelligence, we would need something like nets of neural nets, and for ways to interconnect them. Like here is a model for sheep, it also has interconnections with environment, and here is another model for a sheep's facial features, etc. And maybe a net for decision making or asking questions if confidence is lacking or ambiguous.

I can see a toddler going "oooh sheep!", as well and then a parent going, "no, look closer, those are kittens!" And then the kid learns oh, maybe I shouldn't be so quick to conclude! Sometimes I may be deceived!

joe_the_user · on March 3, 2018

I think this might also be due to the fact that the compute for neural nets and the complexity of the networks are still in their infancy.

Well, neural nets may be just starting out but I think one can they're approximation process are not complex. They are very complex in the sense of having many layers and many pseudo-neurons on each layers.

What's happening is that the networks are mapping images to high-dimensioned "feature space" and then drawing dividing line in the feature between matching and not-matching images. It is vastly complicated but heuristic process. Essentially, the division between image types are based on both meaningful and meaningless differences between the images. The example classified as "a boy holding a dog" (when it was a goat) and "a herd of giraffes in trees" (when it was goats that had climbed trees happened to have more random characteristics in common with the classification than their real qualities.

The thing is the method can be made relatively better but for absolute improvement, you'd want a way to not just have more approximation but to find a way to get rid of garbage approximation, garbage conclusions and so-forth. I suspect that would imply both different algorithms and a different training cycle.

joe_the_user · on March 3, 2018

This example, and others like it point to the central weakness of neural networks for image recognition: No matter how much data you feed it, they never really develop concepts or abstractions of what the objects it is classifying really represent or mean.

This is an excellent point but it begs for an answer to the question "what does 'really mean' mean?" What are all the ways a human can determine what a picture "really means" and which of these methods can be used in a given picture?

We know dogs have certain shapes and goats have certain shapes. Other entities have different characteristics. We can explain how we think we reach conclusions. How we actually the conclusions is likely different and may or may not involve "pattern matching steroids" for a given case - what's more definite is we try to reconcile our conclusions between the example so they involve a single consistent picture of the world. Is determining "what a picture really means?"

fizx · on March 3, 2018

Meaning exists in relationships, which its clear that the current generation of AI learns. An example is word2vec, which can learn that king - man + woman = queen, and simultaneously king - man + boy = prince, etc.

The current generation of image recognition is really missing an understanding of physics and 3d space. There's no understanding of what would happen if a dog moves its head around.

The next generation of algorithms might fix this. Some people are excited about "capsule networks", which are supposed to learn features that are able to be rotated significantly without breaking.

Scene_Cast2 · on March 3, 2018

I haven't seen any follow-ups on capsule networks since their big splash half a year ago. I'm guessing follow-up projects have a research latency of a year.

3pt14159 · on March 3, 2018

We're going to get general AI the same way we get I: multi-sensory agents existing with agency, instincts, and guides in real 3D space. I cannot conceive of any other way to understand things deeply. Babies run experiments. How does AI play with a cat? How does it ever understand the concept of a cats mind without ever playing with it? If we want our AI to have conceptualization as we understand it we need AI to have similar sensory inputs and similar arrays of potential actions. And sure, we could copy the code from one AI to the next to have identical minds at t0, but I struggle with the ethics of that and really I'd rather have diversity in AIs than to have a bunch of clones running around thinking with the same types of thought patterns.

The problem I have once I think about it is that this line of thinking leads me to be much less sure of the nature of my own existence. Do we first let the mind of an AI develop to appreciate humanity before letting it know that it is an AI? Seems like it would solve a lot of possible problems since Ghandi wouldn't take the murder pill.

http://lesswrong.com/lw/2vj/gandhi_murder_pills_and_mental_i...

dahart · on March 3, 2018

That’s all true. Sometimes I describe NNs as fancy least squares. Interesting things are bound to happen when you have a few hundred million parameters.

Just to play devil’s advocate, we don’t know yet if the model is bad or if we’re just feeding the wrong kind of data, right? These optimization algorithms are good at interpolating; they do well with new data points that land inside the multidimensional convex hull of the training data. They can fail spectacularly when the new data to inference is outside that boundary. But humans aren’t that great at extrapolation either. Maybe NNs will be good enough when we show them everything there is, maybe what we think of as conceptual understanding is just as much simple interpolation of our experiences as our neural networks...?

pitchups · on March 3, 2018

>maybe what we think of as conceptual understanding is just as much simple interpolation of our experiences as our neural networks...?

Possibly - we don't know. Sure if we could train one of these imagenet type classifiers with a million or billion images - close to all known objects in the universe, it may well be able to "recognize" everything. But that still doesn't solve the problem of abstraction or meaning, much less of intuition and generalization. Humans are able to generalize to new domains based on internal models of the world around us. The models used in RNNs for encoding word embeddings, seem to a bit closer to representing meaning. I agree though that NNs are evolving - we are in very early days, and who knows the NNs of the future may reveal that what we consider as understanding is no more than simple interpolation, as you suggest.

chillee · on March 3, 2018

I think it's pretty clear that neural networks are developing concepts/abstractions of the object it's classifying.

Check https://distill.pub/2017/feature-visualization/appendix/

I think some of the filters are pretty clearly developing "higher levels of abstractions".

argonaut · on March 3, 2018

It's not an interesting question whether networks learn abstractions. It's almost tautological - an image classification network will by definition (attempt to) distill an image into a distribution over categories. So when people criticize abstraction I think they are really criticizing the quality of the abstraction...

Because the key phrase in the grandparent post is "really represent or mean." Grass + white strands = sheep, is a hierarchical abstraction, but it's a bogus one.

Feature visualizations do not answer this more relevant question.

sgk284 · on March 3, 2018

It's not entirely clear to me that all of cognition isn't simply pattern matching with learned responses. It seems the human brain may just be doing more nuanced matching / responding.

idrios · on March 3, 2018

I remember reading (don't remember where) that the current state of machine learning is a lot like the state of bridges a few thousand years ago. People basically try different approaches using intuition and sometimes if the results are good they'll tell other people about it, nowadays in the form of research papers getting published. However, the generalized equations that facilitate modern civil engineering were far from existing. Similarly, we don't yet have general equations to derive or optimize a machine learning algorithm for a given problem as input, we just choose an algorithm, choose parameters that we think make sense, go through a lot of trial and error, then let the algorithm run.

There's a lot that's still not understood about the brain. Even at the cellular level, microglia have gained attention recently for their contribution to human learning [1]. The NN algorithm was modeled after human neurons for having n inputs and 1 binary output. It's hard to say how cells that regulate the cellular environment and communicate with each other via cytokines would affect the abstraction of the brain to an algorithm or mathematical model. At least for this example, there's a book called "the other brain" about how glia, which make up ~85% of the brain, perform a myriad of operations beyond just keeping neurons together [2].

But you could also look at the brain through a genetic lens. A lot of the neurosystem is simply hardwired. The knee-jerk response is a reflex arc, which means the signal goes directly from the sensory nerve to the motor nerve (causing you to kick) before it even gets to your brain. That reflex response has been hardcoded into your DNA. How much of the rest of the brain is learned vs. predeterminantly structured?

Cognition can probably be reduced to just pattern matching with learned responses, but it's a bit like an ancient Roman looking at New York City skyscrapers and saying "these are just built using arches and a lot of nuance".

[1] https://www.sciencedirect.com/science/article/pii/S107474271...

[2] http://www.simonandschuster.com/books/The-Other-Brain/R-Doug...

candiodari · on March 3, 2018

> a mechanical pattern matching algorithm on steroids.

Firstly I would like to point out that there are no (good) "mechanical pattern matching algorithms". I would love for you to point out some, but as far as I know, outside of AI no such algorithms exist.

As for the entire argument, the problem with this reasoning is that it only works at the lowest level. And even then, it sort of works for fully connected and CNN based image classification. But autoencoders certainly have what I'd consider "concepts". Not in a language we understand, but they do. They have a signaling mechanism "explaining" high-level features to other neural networks. RNNs have concepts. RL policy networks don't just have concepts, they have strategies. They have lies, truths, and even political lies: truths explicitly designed to make it really really easy to believe something that's not actually happening. Usually they even exhibit meta-lying: systematically not deceiving with just one (but important) deception in an unpredictable location.

(and I would like to add that the CNN features, looking at the numbers, look VERY similar to concepts in more abstract neural networks. Perhaps the "difference" is merely one of those philosophical differences we keep hitting). GAE networks have concepts (of course they are autoencoders usually).

So in truth this is a matter : neural networks that have no use for concepts have only an incomplete notion of concepts. Neural networks that have to "teach" or otherwise interact at a high level with either humans or other neural nets very much do have concepts.

And I would like to say, this is yet another "humans are magical because X" argument. None of those arguments has ever stood the test of time. This one won't either. AGI is coming, sorry to disappoint you, and the current theories of neural networks will be shown to be "insect-level" (or whatever level) AGI.

pitchups · on March 3, 2018

Yes, RNNs and autoencoders, etc. do seem to be encoding concepts via embedding vectors in higher dimensions. But see the argument made by Doug Hofstadter about language translation using RNNs [2]. We have also seen how CNN layers can "visualize" higher levels of features in images as we go deeper in the network. And many think that some variation of this basic approach done repeatedly, with more data, computing power etc. will be sufficient to lead us to AGI in the future.

However, many of the leading figures in AI - including Geoffrey Hinton - the father of deep learning, is very skeptical of the approach to AI that he pioneered. He recently stated - "My view is throw it all away and start again." [0]

Francois Chollet - the author of the deep learning framework Keras, has said: "For all the progress made, it seems like almost all important questions in AI remain unanswered. Many have not even been properly asked yet." [1]

And of course, Doug Hofstadter, who thinks it is going to take a lot more to come close to human level intelligence & understanding, even when you consider the most advanced RNNs of the day - those that run Google Translate .

[0] https://cacm.acm.org/news/221108-artificial-intelligence-pio... [1] https://twitter.com/fchollet/status/837188765500071937 [2] https://www.theatlantic.com/technology/archive/2018/01/the-s...

candiodari · on March 3, 2018

The counterpoint to Douglas Hofstadter's argument is pretty simple : he is defending horse drawn carriages against cars.

Take a great horse drawn carriage, and compare it to the first cars. Oh my god. Those cars SUCK ! They are bumpy (no suspension). They rarely get from one city to the next without doing repairs [1]. Getting them to start at all requires an engineering degree AND more than average muscle. Despite many cities having maybe 20 cars total, one blew up every 10 days or so. And the top speed of horses is actually faster than cars ! And the fuel is WAY more expensive than horse feed.

Compared to the AVERAGE horse drawn carriage, which also had no suspension and constantly needed repair ... well they started off being about even, perhaps a little worse, and after a few years they were so much better it isn't even funny.

Likewise compared to a PhD-level expert human translator, in that specific language pair, Google translate sucks. Of course ... there are a few 10's of thousands expert human translators, and they might know 3 languages, and perhaps there's 100 that know 4 languages. But as can be plainly seen on the Indian channel in Australia, Google translate outperforms the translators used for that ... by a wide, wide margin. Google translate can translate between any pair of languages, either language can be chosen out of over 100 languages. Let's face facts here : on most metrics Google Translate wipes the floor with even those PhD level translators, but not -quite yet- on every last metric. Compared to the average human translator, Google Translate wins on every last metric.

Aside from that, Google translate is always available (even mostly available offline, with no Google involved beyond a binary upload to your phone), it's cheap, and frankly it translates from Chinese to English better than a Chinese person following an English course for 2 years can express themselves in English.

The truth is Douglas Hofstadter is ... wrong. Okay, you might argue he's just 80% wrong, and the remaining 20% is shrinking, that's fair. And of course, Google translate is not AGI.

You know, there are social scientists, in fact quite a few of them, that claim "AGI" is simply an AI solving 2 problems:

1) any problem, like surival in groups

2) explaining the actions they took to achieve (1) to other (usually new) members of their group

That's AGI. They're a pretty abstract/advanced form of auto-encoders. We don't know that for sure, but ... it's not far off the truth.

But yes, you can find a few exercises where humans still outperform Google Translate. They're mostly unfair (humans outperform the AI because they have side-channel information available, e.g. what events happened outside of the content of the actual translation. A good test would exclude that and then humans are 100% left behind, but in the press ...)

A lot of disciplines are currently like that. Humans are utterly beat by AI in just about every last thing that was used to "prove" humans have "intelligence" and machines "don't" just 10 years ago. Even the most human of things. AIs actually outperform humans at chatting up humans, can you think of anything you can do on computers that's more human than that ?

And no we're at the point where it's becoming more and more obvious that while AIs don't beat the best experts in specific fields, but they do beat the average human. It's getting ridiculous. AI robots are better at navigating forest terrain than humans, to take an example I recently saw. AI is not just better, but has error rates solidly and consistently below humans on expert-level medical analysis. Expert level mathematics and physics without AIs doing most (or all) of the work has been dead for 2 decades, and in the 2 decades before that I would argue forms of AI made particular researchers far more successful than their peers already.

Where exactly is the point when "but they can't yet X" gets the answer "oh yeah. Hold my beer" ? Is it really that far off ?

So here's Mr. Hofstadter, and with all due respect, he's merely moving the goalposts again. He best goes home to dive into his books, urgently, because in 2 years, we'll have crushed even the most expert human translator using AIs, and he'll need a new place to put those goalposts. I look forward to where he'll put them this time, it'll be fun !

How did we historically "prove" computers "aren't intelligent" ? "Don't have a soul" ? Well, they can't analyze a problem, can they ! (and then we had expert systems). But they can't strategize ! Take chess (and then we had deep blue). But they'll never recognize unstructured data like images will they ? (oops) Okay, but never video and reading people's intentions ! (oops) Okay, but at least humans are better at voice recognition (AIs win consistently now) ! And translation (90% there) ! Okay, sure, but they'll never control a robot in the real world ! (and now pretty much every research robot ... and of course there's self-driving cars). Okay but they'll never deal as well with dynamically stable robots as humans will (that one's a TODO at the moment). Okay, but they'll never deal as well with partial information games and bluffing (Poker - humans beat. Starcraft ... TODO)

Hinton might very well be right. There's 5 major chapters in an introduction to ML course, and Hinton is a big name in 3 of them. Frankly there ought to be a 6th (Hebbian learning). When it comes to exploring deeply, we have only really done that for one of those chapters, the symbolic reasoning chapter. We're getting deeper into the neural network chapter, but symbolic reasoning got a headstart of a millenium or four or five, which I would say we've not quite matched yet. We are very far from out of ideas to get the field to progress further, so I wouldn't worry about that yet.

He also does have a good point : the overwhelming majority of current AI research is focusing on too narrow a slice of the field.

I would like to point out that Hinton is a theoretical researcher into AI. That he believes that theoretical advances are necessary to advance the field is almost a tautology : he wouldn't have become Geoffrey Hinton if he believed otherwise. I mean, this argument has it's place, but it's a statement of faith. A very successful statement of faith, by a very impressive researcher, but ultimately it's about as informative finding out Mr. Nadal likes tennis.

[1] https://en.wikipedia.org/wiki/Bertha_Benz

badpun · on March 3, 2018

> But yes, you can find a few exercises where humans still outperform Google Translate. They're mostly unfair (humans outperform the AI because they have side-channel information available, e.g. what events happened outside of the content of the actual translation. A good test would exclude that and then humans are 100% left behind, but in the press ...)

That's an very reductionist view on translation. I'm of opposite opinion that it requires human-level intelligence to translation anything but the simple and dry texts. Translators of literary texts are no less authors than the actual writers.

> Poker - humans beat.

AIs beat humans only in simplest variant of poker - heads-up (two players). In the more complex ones, AIs are nowhere near humans.

marmaduke · on March 3, 2018

My 3 year old makes similar mistakes. I expect he’ll improve in large part because he isn’t just a set of layers but has subcritical areas, fears and motivations and a body to explore the world with.

This suggests that we these large ML models do need complementary components to improve.

For example, if they were capable of active exploration of an image, they may have said there are sheep and then upon searching for the sheep, realize they aren’t there.

bobthechef · on March 3, 2018

> As of right now, a well trained convolutional neural network is no more than a mechanical pattern matching algorithm on steroids.

And computers are at best mechanical pattern matching machines. This isn't something that's subject to empirical evidence, and appeals to ignorance or the limits of our current understanding will not do. Computers cannot be any more than that by definition (and arguably the word "matching" is being used in an analogous fashion; rather, a computer configured with an algorithm is such that given an initial state, it will lead to a final state that, when interpreted by a human being, can be interpreted consistently in the desired manner, i.e., a final state of 0 means that the initial configuration encoded two state that match, and a final state of 1 means that they did not). Abstraction is, by definition, NOT reducible to a mechanical process like this. The human interpreter is the one possessing abstract concepts and who interprets by means of meanings conventionally assigned to symbols or machine states. A machine may contain a state that we call an image, but no process can in-principle abstract anything from this aggregation of states. To claim otherwise is to completely misunderstand what symbol manipulation is. (Even a human being couldn't abstract anything from an image in a mode analogous to the way in which a computer operates. By analogy, given a matrix of RGB values, could you tell me what's in the image? Or could you at best compute, say, a value that, when looked up in an already given table of values, gives you a label such as "sheep"?)

However, that does not mean that AI cannot perform well, at least within narrow constraints. It may very well be possible to improve AI techniques to such a degree that it can assign the label "sheep" correctly with high accuracy. There simply is an in-principle difference between AI and actual intelligence.

fizx · on March 3, 2018

When you talk about mapping (patterns of the) the input pixels to discrete classes, I don't think thats entirely what people do. We have the ability to make "distributed representations" of concepts e.g. word2vec, GloVe, etc which contain the idea that a sheep is pretty similar to a dog. The classes are far from discrete.

I'm pretty sure people train image recognizers to output these representations, which can include states like 51% certainty dog, 48% certainty sheep, 1% other, and if you aren't sure, take the best choice.

Its such an intuitive idea to combine these things that if it hasn't happened in the literature yet (I only looked for 1 minute), its because 1000 people have tried it, failed to improve the state of the art, and didn't publish.

On the other hand, we're generally pretty bad at inferring and generalizing 3d structure out of images, so I tend to blame that.

fizx · on March 4, 2018

If you look at the fast.ai lesson 11 YouTube video, the very beginning has an amazing example of this with fish!

felippee · on March 3, 2018

Though I agree with the spirit of what you are saying, I would add that it is perhaps not the weakness of neural networks per se, but weakness of the current architectures and training (supervised, hence verbalized) paradigms. I think we could do much more to improve things, if we stopped pushing for extra % on benchmarks and instead rethink the problems we'd like to solve and approach them from a new angle.

minimaxir · on March 2, 2018

@picdescbot is a Twitter bot which uses the same Microsoft Azure Computer Vision API endpoint for image captioning: https://twitter.com/picdescbot

It's more accurate than you'd think, and as this article notes, the mistakes are indeed funny: https://twitter.com/picdescbot/status/968561437126938625

GuiA · on March 2, 2018

It's okay, humans make these kind of mistakes too.

A friend of mine has a young son (~2-3 years old), and a cat named Mono.

Her son knows the cat is named Mono - he plays with her everyday.

But when they go out for a walk, any 4 legged animal he sees is also a "Mono".

Fortunately for her son, his developing, extremely plastic brain will soon know how to differentiate Mono the cat from a random dog on the street (unlike neural networks which we will need to entirely redesign, rebuild, and retrain to get similar progress).

DFHippie · on March 2, 2018

It could be that your friend's son has an overly generous sense of what Mono is. It could be that he knows precisely which individual entity is Mono but misunderstands what "Mono" denotes/refers to. It could be somewhere in between.

If a neural network has a representation of entities in the world apart from language referring to those entities, that's would be awesome. I'm guessing we're not there yet though.

_emacsomancer_ · on March 2, 2018

It could also just be limited vocabulary and that his outside use of "Mono" means something like "a creature which is somewhat similar to Mono".

DFHippie · on March 2, 2018

It could be he knows who Mono is and he knows "Mono" just refers to Mono, but there are a lot of other things in the world that he wants to talk about and he knows the grownups will be able to piece thing together. I do that often enough. "Fred or whoever, that guy, did that thing ..." If neural networks were doing that, that would be more awesome still.

dvaun · on March 2, 2018

Your friend's son is over-extending "Mono" to other similar animals. This is actually a normal part of a child's linguistic development.

My son also does the same thing with "dodo" for dinosaur. At first, he applied it only to big, scary dinosaurs that made large sounds—later on, it applied to any animal that had some large body and jaws, e.g. a shark. Finally, he learned to differentiate between the majority of animals (save for a few big, scary-looking animals that are still "dodo") and can name sharks, dinosaurs, and bears separately!

comex · on March 3, 2018

So in other words, "dodo" is (slowly) going extinct from his vocabulary?

dvaun · on March 8, 2018

Yes. I don't know what the exact term for the process is, though, since I'm not knowledgeable about linguistics and childhood development beyond what I've learned from raising my son.

It's very exciting to watch and hear the progress!

comex · on March 9, 2018

I don't actually know the terminology, either; I was just making a really bad pun. ;p

nonbel · on March 2, 2018

>"unlike neural networks which we will need to entirely redesign, rebuild, and retrain to get similar progress"

I don't think you would need to redesign or rebuild anything for that. You would need to train the network on additional examples, which I supposed you could call retraining (although it need not be from scratch), but that is the same in the case of the child.

sp332 · on March 2, 2018

Yeah but it's good to keep in mind what kinds of errors a neural network can make. It will help in training and debugging them. Also it helps to keep a little skepticism now that "deep learning" gets waved at everything like a magic wand.

taneq · on March 3, 2018

> unlike neural networks which we will need to entirely redesign, rebuild, and retrain to get similar progress

You could argue that the child's neural network will be redesigned and rebuilt as his brain matures.

John_KZ · on March 2, 2018

They're not the same type of mistake. I don't remember the deteails, but when I was reading papers on caption generation, the part of producing coherent sentences seemed more like a hack that happens to kinda work (usually) rather than a robust solution.

danbruc · on March 3, 2018

That left me a bit disappointed about the current - I assume the tested detectors are at least somewhat state of the art - capabilities of neural network for object identification. But thinking about it also makes it somewhat obvious that they are flawed in this way.

Humans identify objects by looking at how different parts are geometrically located and connected, possibly in a hierarchical fashion, and what basic shapes, colors and textures those parts have. A sheep is a body, four legs, hoofs, a tail, a head with its characteristic shape, the ears, mouth, nose and all those come with characteristic textures and colors.

And because there are so many features and their relations, it is quite hard to fool humans, you can hide or change quite a few of them. We also have a lot of background knowledge, a bright orange sheep might be unusual but we also have a pretty good idea of how hard it is to change the color of a sheep.

I naively expected neural networks to also learn those features but there is just no pressure for them to do so. They mostly see common objects in common situations and there just looking for a patch of wool-colored, wool-textured fur might be enough to identify the presence of sheep correctly almost all the time. Or if sheep are mostly depicted in a characteristic environment it might be good enough to just identify landscape features and ignore the sheep altogether.

I would guess that it is in general not really feasible to come up with enough contrived, adversarial examples to force neural networks to learn the important parts and relations of different objects just by starring at many images. I think one would have to hard-wire some knowledge about space, spatial relations, occlusion, shapes and the like into a system to really get it to learn what a sheep is in a similar way as humans do without heavily increasing the risk of over-fitting.

notahacker · on March 3, 2018

Also, our grasp of sheep morphology is based on an abstract idea that the sheep is qualitatively different to and more significant than the surrounding fields and rocks which are roughly the same size, shape and colour as sheep. Unlike the ML process, this concept of sheep or more unfamiliar mammals as animate creatures which might be friend, foe or food exists independently of how many sheep I've seen before and the language used to classify them, and is probably instinctual rather than learned.

Show me a couple of images of fields full of sheep classified as oveja or Schafe and I might make the same learning error as the ML process and think the word refers to the [general pattern of] surrounding field or hills. But show me a further image of oveja outside a field - even a close up of an oveja that doesn't resemble those in field photos in any way - and I'll grasp the meaning of the term straight away. Needless to say I'm also less likely to stumble over conceptual links between the names of animals and tastes of food, types of clothing etc which are independent of the living animal's morphology

dahart · on March 3, 2018

> Humans identify objects by looking at how different parts are geometrically located and connected, possibly in a hierarchical fashion

FWIW this is what capsule networks are designed to do.

lucideer · on March 3, 2018

Maybe I'm oversimplifying but this seems like an obviously simple* problem to solve: as humans, before we can identify what objects are, we start out with depth perception and object detection. If NNs were trained on a dataset of imagery where autodetected object outlines were tagged, rather than simply tagging or describing an entire image, and then run with built in object detection and depth perception, I suspect the results would be pretty good.

I know the likes of Google and Facebook are already doing precisely this with human faces, but we'd need a more generalised object detection algorithm before the examples of sheep in the article would be reliably identified.

* I used the word "simple", as distinct from "easy": for example, creating a training dataset might be a challenge.

fizx · on March 3, 2018

Like most obvious things, scientist have tried what is known in the field as "image segmentation". If you google it, you'll see a bunch of demos and papers.

I haven't seen anything in the literature about incredibly effective implementations of 3D understanding and/or depth perception at the level you'd perhaps hope, but there is some progress being made.

jquip · on March 4, 2018

Categorical divisions and their names are being refined at that age. Which is why the most familiar name is allocated to the entire group of similar entities. Later on, categories become refined and also their namings.

wordpressdev · on March 3, 2018

Reminds me of Philip K. Dick's book Do Androids Dream of Electric Sheep? made into the cult movie Blade Runner.

ibdf · on March 3, 2018

This was a great read. Thank you!