A couple weeks ago I was asked to meet with two actuaries. Their company (a big one, I'm working for them on contract) is trying to promote the use of Machine Learning and when they heard I had just completed post-grad in ML, they were eager to pick my brains.
In a 60 minute meeting, we spent 5 minutes discussing how ML worked and 55 minutes going in circles. They had lots of data, but no problems to solve. And there are only so many ways to explain that a large dataset is not equivalent to a business problem.
As with many lines of work, the cool part may be a very small part of their day.
With insurance, there's probably an already entrenched way to think about the data that's been there for decades, as well as regulations that help to entrench it.
It might also be the case that the business problem isn't risk analysis at all, as one might expect. It may be finding investments for the float that provide a sensible return within the regulatory remit is harder than figuring out how much needs to be charged for dinging someone's car.
Correct on all points. There is also the issue that the pricing models themselves need to be able to stand up to some form of regulatory scrutiny (depending on the state and the nature of the insurance product).
I’ve been noodling a little with ML for infosec and if I spent an hour with a pro I might wind up doing the same thing.
The problem is likely to be one of a cacophony of choice. It’s not that they don’t have any problems, it’s that they have a million of them and have no solid footing from which to frame their answer in the form of a question.
I bet if you picked one field in one dataset and started drilling down into how they could possibly use it, the use cases will start to melt out of their frozen brains.
There are a bunch. Just about any will-instrumented platform presents many opportunities to do basically side-channel detection of compromise (e.g. cpu utilization, network utilziation, network flows, syslog data, process tree, etc)
While I get what you're saying, isn't that a very "supervised ML" way of thinking?
In a world where unsupervised learning was easier to apply, it might be possible to start with data, compress it to a low-dimensional manifold, and use that understanding as inspiration for better directions/questions to pose.
So I guess the real problem is that humans expect anything branded as "intelligence" to have the ability to learn in an unsupervised manner, while breakthroughs are closer to the regime where the goals are supervised while the solutions are discovered. Of course, the amount of supervision probably lies on a continuum, between everything being completely specified (classical programming) and everything unsupervised ("true" "human" intelligence).
Lots of data does nothing to help when the person you are talking to is not given a) knobs to turn, and b) a goal to achieve. To that end, just having loads of data doesn't necessarily help. You also have to have loads of control over some part of the system. And even then, you can't expect ML to help, all on its own.
So, the worst is folks that think just because you have full trace logs of your system, it should be trivial to apply anomaly detection. At face value, yes. Yes you can. But also at face value, your trace logs aren't free and it isn't like folks are keen on letting their detection system cut off anomalous transactions without some sort of kill switch.
I think my point may not have gotten across clearly, so let me clarify. I agree with most of what you said, which IMHO comes under the spirit of supervised machine learning paradigm, which is the most advanced tool we have pragmatically available to use today.
Looking beyond, if we find a model+learner which can discover a low dimensional latent space (through certain indirectly specified biases) then that low dimensional formulation of the domain can guide us towards interesting questions worth asking. To have a factorized low-dimensional formulation is roughly what it means to "understand" a subject, so such a toolkit would be enormously useful.
Many people (including some experts, rightly or wrongly) believe that neural networks might be that model class, and (clever tweaks of) gradient descent might be an acceptable learner.
All I'm saying is that the current hype about AI fails to separate the potential of the latter class from the currently available successful tools of the former class.
Looking beyond, if we find a model+learner which can discover a low dimensional latent space (through certain indirectly specified biases) then that low dimensional formulation of the domain can guide us towards interesting questions worth asking. To have a factorized low-dimensional formulation is roughly what it means to "understand" a subject, so such a toolkit would be enormously useful.
In my experience, at least, the value in this kind of activity is relatively low. Sometimes you find something, usually you just find a low-rank version of nothing in particular. Unless some significant developments appear in this space (and they very well might), the best low-dimensional approximations will be the means, variances, and correlations computed by business analysts, which are tremendously valuable quantities to keep in mind at all times.
> In my experience, at least, the value in this kind of activity is relatively low. Sometimes you find something, usually you just find a low-rank version of nothing in particular. Unless some significant developments appear in this space (and they very well might), the best low-dimensional approximations will be the means, variances, and correlations computed by business analysts, which are tremendously valuable quantities to keep in mind at all times.
Definitely. Humans find and reject "low rank" correlations all the time. For example, North America and South America are both part of the "new world", but no sociologist would use that to make general predictions about the people on the continent. People both like steak and cookies, and would conceivably fall into a "low rank" "humans like this category", but they couldn't be more dissimilar in many respects.
Conversely, some doctors have an encyclopedic knowledge of various ailments, and either memorize or infer from experience a particular ailment from a huge collection of possible ailments. If the number of possible ailments is large, their "understanding" is not low rank, it's the opposite.
Low rank does not necessarily mean "understanding", it's just a local minimum in some function of a random variable.
> Humans find and reject "low rank" correlations all the time ... steak and cookies ...
Sure, but the appropriateness of a low rank approximation depends on what you want to predict. Eg: for predicting basketball success, {height, wingspan} might be a very useful "low rank" description, while for predicting obesity the relevant low rank description might be the body mass index (BMI) or something like it. Wingspan = chest width + 2 * arm length and BMI = weight/height^2 could be composite features "discovered" by such a model+inference toolkit.
> Conversely, some doctors have an encyclopedic knowledge of various ailments, and either memorize or infer from experience a particular ailment from a huge collection of possible ailments. If the number of possible ailments is large, their "understanding" is not low rank, it's the opposite.
That's exactly why I would not call that understanding, especially if they have to memorize a book full of ailments and symptoms. "Understanding" would entail a (causal) model of underlying physiological problems and the symptoms they give rise to, with a recipe for inferring in reverse.
> Low rank does not necessarily mean "understanding", it's just a local minimum in some function of a random variable.
I don't understand that statement at all.
PS: I don't have any particular affinity to neural networks over other ML models, and don't mean to escalate the hype :-)
composite features "discovered" by such a model+inference toolkit.
This kind of "feature discovery" is sort of what random forests, gradient boosting, and neural networks already do. You can even do a neutered version of it with full quadratic interactions of all the inputs in a (regularized) linear regression model.
For a linear version of this task as an unsupervised problem, you will want to look at PCA. The problem right now is not that we don't have enough unsupervised data analysis techniques, it's that they are hard to extract useful information out of. Interpreting t-SNE, for example, is notoriously difficult. This is why I said advances in the space are necessary. As it stands, you can run all sorts of low dimensional mappings and embeddings on a given data set, and spend an afternoon finding a whole lot of nothing that your business analysts didn't already know.
Interesting aside about doctors though. Many people in alternative medicine would argue that there are obvious associations between various ailments, causes, and/or treatments that are under-recognized and under-studied.
Maybe if a machine produced those same associations instead of an alternative medicine practitioner, they might be more interesting to medical researchers.
You have to be careful. If you only feed data that has the associations, both humans and machines will find them. This is the entire premise behind introducing double blind studies. Turns out you have to actively try and falsify claims if you want solid results. Not just try and confirm them. :)
The biggest counter-example is with "hack" to AI. If you can change two pixels in an image, in imperceptible ways to a human, and an AI chooses "cat" instead of "dog", it shows there is something terribly different between how an AI operates compared to a human.
Perhaps "bugs" like that can be fixed with more/less neurons, training data, or neuron organization, but it doesn't change the fact that fundamentally, a neural network's definition of understanding is quite different from our own.
Based on this, I suspect that even if neural networks are the long-term answer, we're nowhere close to any sort of general AI that mimics human understanding.
That said, we can certainly make many useful tools in the interim.
Meh, look at how many people see visions in shadows and whatnot. The very constellations can almost be seen this way.
That is, yes, you can fool machines. But you can also fool people. That you fool the two in different ways isn't that surprising. Nor is the fact that you would want to. Just look at camouflage. Basically the same exact thing, just in the physical world. (And, notably, not something monopolized by humans.)
Ah, I think I'm arguing a different point, then. I fully grant they are thinking in different ways. For that matter, though, so do different humans. There is a great Feynman story about how he and a colleague both counted "in their heads" using completely different methods. To surprising results.
So, I am not arguing that they think the same way. I just don't know if that really matters. Regardless of if they think the same or another way, there will be ways to fool them. They could still be "thinking", though.
This is why I mentioned anomaly detection on live data. My understanding is that is one of the prototypical unsupervised learning fields. I see how I framed it such that I could have meant a labeled dataset to train a detector. My apologies.
I do think there is something possibly there. But it is just that, a possibility. My hunch is it is a low one that will require a lot of cost to reach. Think of it as akin to making money off of interest rates. If you have a ton of capital, it can work. For most people, it won't.
> They address a class of questions that were previously ‘hard for computers and easy for people’, or, perhaps more usefully, ‘hard for people to describe to computers’.
Those aren't the only problems. ML can also solve problems that were previously 'hard for people and with no good algorithm for computers'. These are problems where there is a good labeled dataset, but no good algorithm to map from data to label. For example, the work determining sexual orientation from images (https://osf.io/fk3xr/).
The problem with this approach is you get predictive ability, but no insight. But still can be of great value, and potentially great danger too.
To be clear, it's not obvious that the paper you linked is actually accurate. A lot of researchers consider that paper to be deeply flawed, and to show something other than what it claims to.
Absolutely. People are using ML for plenty of difficult problems. In operations research, a lot of time is spent coming up with heuristic solutions to hard optimization problems. This problem is plenty hard for people to do. There has been some recent work in using ML to create solution methods, like the paper Learning Combinatorial Optimization Algorithms over Graphs (https://arxiv.org/abs/1704.01665).
> For example, the work determining sexual orientation from images (https://osf.io/fk3xr/).
> The problem with this approach is you get predictive ability, but no insight.
Is it not possible to take one specimen and tweak it just a little at a time until it classifies as a different category, and that way find the border between categories?
Yes. It always bothers me when people claim you can't get insight out of a nonlinear black-box model, because you absolutely can. It's just not right in front of you, and it's not always as clear-cut as what you might find in a linear regression model. But even a linear regression model with quadratic interactions is already pushing the limits of interpretability, so it's not a problem that's unique to neural networks. It is, however, limited by computational ability.
My mental model: Neural networks are for gut decisions. This fits their application domain quite well imho.
Gut decisions are quick. You look at a picture and decide its a dog quicker than you can say it. Executing a neural net is feasible even on mobile devices on battery.
Good gut decisions require experience/training. You must train extensively to be good and your training data must be good. An example from "Thinking, Fast and Slow": Experienced stock traders often claim to have a gut feeling, but it is bogus because their training data is bogus (good decisions lead to bad outcomes and vice versa). In contrast, experienced firefighters have a gut feeling if it safe to enter a burning house. This works in practice. The observe stuff about the environment reliably without being able to point them out consciously.
Gut decisions are not about planning or knowledge. This requires different AI techniques than neural nets and intuitively shows their shortcomings.
It's tough to know just what machine learning covers, includes, consists of. From what I've been able to see, currently in practice, apparently 90+% of machine learning is curve fitting and nearly all of that is some form of classic linear regression.
Linear regression and curve fitting more generally have been around, available, and used going way back in electronic digital computing and well before. E.g., for software we've long had the IBM Scientific Subroutine Package (SSP), SPSS (Statistical Package for the Social Sciences), SAS (Statistical Analysis System), ..., R. There are stacks of polished textbooks in statistics and specialized to some fields, e.g., econometrics, time series analysis, etc. There has been some usage, but headlines have been rare for decades.
But, there's a lot more to applied math than anything much like that curve fitting. E.g., maybe take the learning as the ability to do statistical estimates -- well, there's a lot to statistical estimation than curve fitting.
There's also the field of optimization -- linear programming, integer linear programming, network linear programming, dynamic programming (discrete time, continuous time, deterministic, under uncertainty), quadratic programming, other cases of non-linear programming, and more.
And lot more can be done in applied probability and stochastic processes.
IMHO it would be better progress and description of the current state to discuss both the applications and the solution techniques in more detail.
E.g., the crucial core of my startup is some applied math I derived (with theorems and proofs) based mostly on some advanced pure math prerequisites. I wrote the corresponding software. Then from 50,000 feet up or from the point of view of a user of the work, what I did can look like a machine that learns a lot very quickly, is very smart, and puts out really intelligent stuff. Still, my math is not covered by any of the machine learning I've heard of.
Point: There's a LOT of applied math that can be done and, really, has been done far from current descriptions of machine learning.
I have the hard technical work done, e.g., the crucial core applied math and the corresponding code apparently ready for production, but I have to do some routine work, e.g., add more data, pick a company name and get trademark protection, get a static IP address, pick and register a domain name, get a tax ID, find out what the ad networks want so that I can run and get paid for ads, tell my county that I'm "doing business as ...", maybe set up an LLC, get a business checking account, get an e-mail address just for the business and another one for Web site feedback, do some more testing internally, tweak some of the software, tweak my code for my Web site session state server to get a better Web site log server, kick back and give a critical appraisal of the effort and make some tweaks, announce an alpha test here on HN and elsewhere, then a beta test, then get some publicity by some of the usual ways, then maybe have a business.
The startup is supposed to be the first good solution for a problem pressing for nearly every user on the Internet around the world, smartphone to ... workstation.
The potential of the business as currently envisioned would be, on average, about three sessions per week, 30 minutes of eyeball time per session, for over 50% of the users of the Internet in the world. The site will be able to do some relatively good ad targeting while having some of the best protection of user privacy, e.g., no use of cookies, logins, Web browser user agent strings, or third party tracking. So, make some assumptions about ad rates, multiply, and get an estimate of a good business.
The problem: Given an interest of a person, typically a narrow interest, maybe a short term, recent, or new interest, that interest treated as unique in all the world, find the Internet content with the meaning that person wants for their interest.
So, part of project is addressing meaning of Internet content.
The content might be in any of the common data types of text, still images, videos, music, Web cams, pod cast audio, etc. The content might be in Web pages, PDF files, YouTube videos, Instagram images, art gallery images, etc.
The interests might be narrow topics in crafts, politics, skills, academic subjects, art, social, interior decorating, travel, intersections of those, etc.
So, really an interest can be essentially anything, and the content can be essentially anything on the Internet. And again the main criterion is meaning.
To the users, the Web site is just how to find content with the meaning they want for each of their interests.
So, the site is a new form of engine for search, discovery, recommendation, custom curation, etc. To heck with these categories: The site is for the users to find the content with the meaning they want.
Well, characterizing meaning with just keywords and phrases is usually from difficult to impossible in practice. So, my work makes no use of keywords or phrases. So, really, my site is not direct competition for Google, Bing, etc.
If a user (A) knows what content they want, (B) knows that the content exists, and (C) has keywords and/or phrases that accurately characterize that content, then there is a good chance that Google, Bing, etc. will do well for them, and my work will rarely do better.
But for nearly all people, interests, meaning, and Internet content, (A)-(C) is asking way too much. E.g., (A)-(C) often works poorly for meaning, even when the content is based on just simple text. For meaning of audio, video, still images, etc., (A)-(C) and keywords/phrases are still less effective. E.g., tough to use keywords/phrases to characterize accurately the meaning of most art.
So, that's my startup.
Again, the code appears to be ready for production, say, to a few dozen new users per second. On average, each user will see, in about 30 minutes, a few dozen ads. Then multiply out and get an estimate of a significant business. The way the internals work, a lot of scaling will be possible just from simple sharding.
The key to the work, the enabling crucial core, is some original applied math I derived (theorems and proofs) based on some advanced math prerequisites I got before, during, and after my applied math Ph.D. From all I can see, my applied math has nothing in common with current computer science, data science, machine learning, or artificial intelligence -- I certainly have intended no such connections. But the users will have no sense of anything mathematical.
Initially, I'm borrowing some from the Paul Graham
"5. Better to make a few users love you than a lot ambivalent."
So, initially my site will be focused on some users and on some of their more likely interests and will not really be equally good for all interests of all users, i.e., will not be comprehensive.
If then the site is successful, it will grow. In an important sense, the site will grow automatically, organically, to please the users.
> 1) Machine learning may well deliver better results for questions you're already asking about data you already have, simply as an analytic or optimization technique. For example, our portfolio company Instacart built a system to optimize the routing of its personal shoppers through grocery stores that delivered a 50% improvement (this was built by just three engineers, using Google's open-source tools Keras and Tensorflow).
> 2) Machine learning lets you ask new questions of the data you already have. For example, a lawyer doing discovery might search for 'angry’ emails, or 'anxious’ or anomalous threads or clusters of documents, as well as doing keyword searches.
> 3) Third, machine learning opens up new data types to analysis - computers could not really read audio, images or video before and now, increasingly, that will be possible.
Are there any online resources to learn and implement step-by-step some ML to solve one of these use cases (or other ones)? Like a GitHub repo with data samples, a programming environment and then a step-by-step guide to solve trivial but real business problems?
Tensorflow sprays a lot of calculus. The idea is that all your known quantities ("features") become terms in an n-dimensional polynomial. The act of "training the model" is finding minima by traversing the negative gradient (the terms of the partial derivatives of any point when projected as a vector at that point).
I'm glad I had calc 3, even if that was only 3-dimensional.
Technically, it's not a polynomial as you can have nonlinear activation functions, such as the ReLU function. There's no polynomial that's equal to a network with ReLU activations (although, of course, a sufficiently large polynomial could come arbitrarily close).
I would state that a neural network is a large, complicated, differentiable function, and the beauty of deep learning is that it turns out that by doing optimization that's derived from basic calculus, you can optimize this complicated function to do surprisingly useful things.
You might find http://www.fast.ai/ useful. Depending on your learning style, their courses can either be amazing or somewhat annoying. Their library includes jupyter notebooks so that you can work through the examples.
Hey, I'm involved with a startup and we're aiming to just that -- provide a 0 to job-ready experience in machine learning. We're more of a paid, integrated experience than a repo with notebooks though.
Wow good write up, i just made it a few paragraphs in but had to comment. especially after just reading thru the IBM watson 'AI' bullshizz, this is refreshing and reminds me that there are folks who are trying to think about this stuff in new ways rather then just hyping/selling the out of 'AI' so your new BigCo can burn thru 10/100s of millions on some shy project =)
That's the point - to explain how people in the field are thinking about this stuff. Most people outside SV don't know what the experts in the field are saying. Indeed, many people in SV don't know this ;) If you already know all of this, you're not the target audience.
That's fair enough. It's a running joke with applied ML people that half of your job, if you have any sense of professionalism, is convincing possible clients that no, they probably don't need to use a 41 layer DNN to send a follow up email.
There's an XKCD [0] that hasn't aged well at all about how non-programmers fail to understand that some things that seem easy are actually really hard for a computer.
Modern ML research moved a big chunk of these tasks into the "easy for a computer" category, which is very exciting for programmers. However, as the comic points out, most of these things are stuff that normal people sort of felt a computer could do already.
I think this is the reason for some of the backlash/disappointment.
This article is more realistic than most ML posts, but it’s clear the author is not a practitioner.
> More handwriting data will make a hand-writing recognizer better, and more gas turbine data will also make a system that predicts failures in gas turbines better, but the one doesn't help with the other. Data isn’t fungible.
Transfer learning is one of the most interesting aress of machine learning. The focus is on taking learnings for one task, and applying them directly for another. More directly, Jeff Dean from Google had a fascinating talk about using these techniques to create a single super-model that combines learnings from thousands of tasks to accomplish new things quickly. [1]
> This article is more realistic than most ML posts, but it’s clear the author is not a practitioner.
No so fast.
Genuine question: how close is the research on 'transfer learning' to something that can be readily used to solve business problems today?
If you can't fire up tensorflow and the like and use it to solve a real problem, or if only the likes of Google are able to successfully apply it, then the author would be correct.
It's not that the author is incorrect, it's just that the argument he used was a straw man. i.e. they are unrelated domains. However the metrics that are used to qualitatively measure the performance of each model's performance on their respective problem domain may be useful for research. For example, we know that deep neural nets are currently state-of-the-art for image recognition, and so if our problem involves a similar image recognition problem, we might be wise to start off with a neural net. We don't have to start from scratch; we can use "transfer learning" to get some base weights (bottleneck features) going and refine our model from there.
The point is, there is no panacea, no "ultimate algorithm" for any and every problem, and yet the author demands this of machine learning in that section of his writing.
You _can_ fire up TF to solve real problems without being Google.
Transfer learning is _the_ way to do image classification for most kinds of images in 2018, and is covered heavily in most classes. In the fast.ai class, you use transfer learning in the very first lesson to build a dog/cat classifier. Takes less than an hour to get to 97+% accuracy with no prior knowledge of deep learning.
It sounds like you’re saying that transfer learning is helpful for image classification, which seems like an uncontentious position.
Are you really arguing that you think transfer learning would be useful from handwriting models to turbine failure models?
Using techniques that are successful with image classification as an example and generalizing to other domains that don’t look much like imaging seems like a stretch to me.
But perhaps I’ve missed some more convincing examples of the state of the art in transfer learning.
That's a good point, as far as I know there's no examples of cross-domain learning. There's new work in NLP for cross-task transfer learning, but that's as close as it gets at the moment.
It's hard to imagine there's anything to learn from handwriting images that could apply to turbine failure; a much broader kind of multi-task model than anything well see for awhile.
The argument is still false. You can very well get an advantage from vast amounts of data in similar domains. And more importantly you can have ML insights not possible without it. What if ImageNet was not open to the public? Would we get an AlexNet breakthrough?
But, the transfer takes place on a network that has already been trained with lots of dogs and cats and has been taught to differentiate different kinds of dogs from lots of other objects and different kinds of cats from other objects.
Getting a useful dog/cat classifier out of something that has been trained to differentiate between different kinds of boats instead of different kinds of mammals would be closer to what the OP aimed at.
I agree that using an ImageNet-trained model to classify a new set of subclasses should be easy, and is. Subsequent lessons show how to adapt the same approach to distinguishing dog breeds (more specific), and for identifying types of terrain in satellite images, which bear much less resemblance to anything in ImageNet.
That last one sounds pretty similar to your second sentence. Given what we know about transfer learning and CNN's, if we had a massive boat dataset, I bet it could be re-purposed to do pretty well at cat/dog.
Nothing we know how to do can transfer data from handwriting data to improve gas turbine failure detection or vice versa. ImageNet fine tuning isn't at all relevant here.
This article is pretty great and I agree with the framework it sets up around automation and future jobs.
One thing I felt was dismissed too easily was the "Google has all the data" line. Sure, nobody said "Oracle has all the database", but people ARE saying the former. Why should we ignore the piles of collected data when thinking about how the landscape will look in the future?
There are probably plenty of problems that do not require that much data to solve (e.g. in reinforcment learning). And google doesn't have all the kinds of data - they didn't own medical imaging data, they had to ask for it. There are some problems where they have advantage (speech / translation) and they 've probably already solved those problems, but their data advantage is not as important as they make it to be.
Whats more worrying IMO is that they 're hoarding all the researchers.
Sophisticated machine vision has been around a heck of a lot longer than the author suggests. Only because that technology has gotten so accessible do we argue about who has the role to apply it (and ML in general) in a responsible or sensible way.
In a 60 minute meeting, we spent 5 minutes discussing how ML worked and 55 minutes going in circles. They had lots of data, but no problems to solve. And there are only so many ways to explain that a large dataset is not equivalent to a business problem.
I'll be sending this on to them.