Hacker News new | past | comments | ask | show | jobs | submit login
DeepText: Facebook's text understanding engine (facebook.com)
407 points by adwmayer on June 1, 2016 | hide | past | favorite | 136 comments



Slightly contrarian, or maybe not-but is anyone else sort of fatigued with social media trying it's hardest to do it all for you? Algorithmic timelines telling me what happened while I was gone (trust me, twitter...I would have gotten to it eventually), prioritizing what shows up (trust me facebook, I do a very good job hiding content on my own), telling me what I ought to buy (trust me, Amazon if I wanted it-I'd either already buy it, or I'm buying it later when I have the disposable funds), and now arranging transportation when hinting at going out an about for me (trust me, facebook Messenger-I live seventeen steps away from a bus stop that carries me right into the heart of downtown-I'm good)

I have no mistakes that these features are probably loved by some, maybe even most, but more and more-while I don't want to disconnect FULLY from social media (while I much love the laconic, brevity inherent design of twitter, the fact that the 140 character limit is going away has me sighing heavily), I do sometimes finding myself wishing I could opt out and take a bit more control over the content I'm ostensibly subscribing to.

Has anyone else felt similarly, or could maybe phrase the phenomenon better than I have?


When all you have is a machine learning classifier, everything looks like a classification problem.


Heh yeah I wrote about this a while back -

http://techcrunch.com/2013/01/27/facebooks-categorial-impera...

But you add human intelligence to this kind of thing and it starts getting creepily accurate much of the time if you're looking to promote certain behaviors, expose others, or what have you.


On yesterday's episode of POI a character said that the Machine (an AI) has to contain a simulation of a person in order to understand what that person will do.


That idea is beautifully explored in this episode of Black Mirror: https://en.wikipedia.org/wiki/White_Christmas_%28Black_Mirro...


Fuuuuuuuuck that amazing show. An ounce of compassion in any of those situations and the outcome would be rather completely different...

Edit: Also, goddamn dickheads wasting human upload/duplication....


At highest level this is true. Any prediction must be based on some model, and that model can be considered a simulation of the subject.


I think AlphaGo had a pretty good simulation of its human opponent, even though strategy and intuition are hard.


AlphaGo didn't simulate its opponent, it used the entire recorded history of Go games to extrapolate what is the best move to play in a particular situation (falling back to Monte-Carlo simulations of possible games after reducing the problem).


Game AI uses minmax decisions, so it does simulate the opponent to pick _their_ best move, then it picks the move that removes their best options. That's why it's so unsportsmanlike when losing and why it doesn't go for points when winning.

It doesn't have a specific profile of the opponent, though, instead it thinks they're another AlphaGo. That's an area they could work on.


AG doesn't actually use minimax, it uses montecarlo tree search, which doesn't even try to simulate a good opponent, it just plays lots of games very fast.


Actually, the cool thing about AG is that it doesn't just use MCTS, but that it adds in deep learning and reinforcement learning :-) I was surprised by how readable their paper was!

Here's a link: http://www.nature.com/nature/journal/v529/n7587/full/nature1...

I think you search on the title, you can find probably find a non paywalled version.


Yup! Their additions to the algorithm are actually super simple, it was really surprising to me, its just very computationally intensive on the front end.


its even more than that of course... the network has to poke its fingers in somewhere, i'd be guessing it prunes those trees - so it is a bit more than playing lots of games really fast


AG does use minimax in the same sense that for example stockfish does (bounded depth and with pruning)


But that isn't minimax, its pruning by some evaluation function, where in this case the evaluation function is "blackbox Deep neural net trained on many games"


In that case, stockfish doesn't use minimax either right, because it does pruning as well


No, stockfish uses minimax with pruning.

Minimax and MCTS are just different algorithms.

MCTS simulates games randomly and creates a distribution of expected value for each one based on some cool math. It estimates the value of each state based on a sample. Minimax actually calculates them all (or all of them except for some pruned ones).

That's the whole reason MCTS can work on Go and minimax doesn't. Minimax actually needs to look at all possible branches all the way down. MCTS can just say "well its been 100ms, let's use the estimates I have".


> No, stockfish uses minimax with pruning... Minimax actually needs to look at all possible branches all the way down.

No, I assure you that stockfish definitely does not look "all the way down (ie to states where the game is over)" in most positions, even with pruning - that would take too long. It uses a static evaluation function to evaluate positions without searching down the tree further.

> MCTS simulates games randomly and creates a distribution of expected value for each one based on some cool math.

You described the "Monte Carlo" part of MCTS but not the "Tree Search" part. It's right in the name! See for example the first paragraph on page 20 of https://gogameguru.com/i/2016/03/deepmind-mastering-go.pdf, which clearly describes AG constructing a search tree.

Minimax obviously will produce perfect play in both chess and Go, but we can't use it because it takes too long. Hence we prune and truncate the search tree. When we truncate it we use an approximation of the value of the node (the static evaluation function); when we prune we sometimes use the static evaluation function as well. This is true in both MCTS and in Stockfish.


>No, I assure you that stockfish definitely does not look "all the way down (ie to states where the game is over)" in most positions, even with pruning - that would take too long. It uses a static evaluation function to evaluate positions without searching down the tree further

Ehh alright, it calculates the value of a node exactly (or exactly assuming its evaluation function). MCTS does not.

To be clear, what I meant, that you pulled out of context is that Minimax requires an exact calculation of the value of each child before choosing the best and returning it, this requires either simulating every possible game, or an evaluation function (depth-pruning). MCTS uses monte-carlo methods to estimate the best paths instead of trying all of them naively. At t=infinity, MCTS is equivalent to Minimax, because it will try all nodes. But it can run in time constrained systems better because it picks a much smarter order to run in.

>You described the "Monte Carlo" part of MCTS but not the "Tree Search" part. It's right in the name! See for example the first paragraph on page 20 of https://gogameguru.com/i/2016/03/deepmind-mastering-go.pdf, which clearly describes AG constructing a search tree.

yes. That has nothing to do with minimax though.

Which is what I said. AlphaGo uses pruning. AlphaGo doesn't use minimax with or without pruning, it uses MCTS (with pruning).


and it becomes especially problematic when prediction becomes conflated with classification


> is anyone else sort of fatigued with social media trying it's hardest to do it all for you

Yes, because they're not doing it for us, but for themselves, investors and customers (advertisers) fueled by our data.


Could Facebook be doing so well because this is actually a win-win situation for everyone involved ?

That includes the people you mentioned yes, but also users who get to use a valuable tool for free.

The whole "if it's free, you are the product" party line may resonate in HN circles, but you have to realize the public as a whole doesn't think it's a bad deal at all.


People in the general public don't really have a clear idea about what Facebook, Google, and Amazon are doing. Still, most people I know are vaguely aware that there's something creepy going on. They aren't really comfortable with Facebook, Inc owning their digital social life, but they rely on it because of the obvious network effect.

Try asking someone "hey, you know Facebook use a lot of state-of-the-art technology to algorithmically understand the meaning of everything you post and write, feeding this massive data set into their huge artificial intelligence program?" I don't think it's likely that many people will say "cool, that sounds like a great deal, because I love it when I get very specifically targetted ads."


>People in the general public don't really have a clear idea about what Facebook, Google, and Amazon are doing.

But neither do you really, right ?

Sure, you may have a better idea than them of what's technically feasible, but the general HN consensus on the motives, values and internal decision-making process at these companies ("always assume the worst from them, they're optimizing for money!") is in my experience (of working at FB a few years ago) a simplistic view that's way, way off the mark.

>"hey, you know Facebook use a lot of state-of-the-art technology to algorithmically understand the meaning of everything you post and write, feeding this massive data set into their huge artificial intelligence program?" I don't think it's likely that many people will say "cool, that sounds like a great deal, because I love it when I get very specifically targetted ads."

I think this is again where nerd-bias is misleading you : 1) Many people would answer your question with "yeah ? is this the thing that enables translations, picture captions for the blind and suggests me events I might like ? Please, bring it on!" 2) people actually do prefer very specifically targeted ads, rather than annoying+irrelevant pop-up banners of the past. This is why FB is making a killing.


True, neither do I... because what they do with content is secret.

I don't think it's nerd-bias... it's more that many of my friends and acquaintances who aren't nerds are still instinctively skeptical of large corporations and the power structures they create with respect to privacy.

As for what people in general think, we really should ask around. I heard about a consumer trust survey where Facebook's trust was very low, and based on how I see people relate to Facebook, I'm not surprised at all.

I think we ought to be suspicious and critical of corporations that collect large amounts of private and personal data, just as we ought to be so of government surveillance (of course they're inseparable).


Or maybe Facebook is still doing fine since there is no good enough alternative. It seems like Facebook is getting worse all the time but since everybody's there and it's familiar people still uses it. Indicators are they are losing young people, who tend to adapt faster and have not yet build habit of use and more and more people are using other services for particular activities.

Still there is nothing which could totally replace Facebook so old users are staying and baiting new users. It still has unique offer for large user base and very useful features so even if it's getting worse it can stand. People might be clicking their feed more but that's because they are so bored with content that they try if they can find something more interesting behind the links. Or they are scrolling down since trying to find something interesting, something different which is hard since algorithms just push same boring content all over again. But statistics look good.

What's the magic of Facebook? If you could answer that and build the service people will flock out from Facebook.


The problem with your argument is that you admit "the statistics look good" but then present a bunch of unsubstantiated personal opinions ("getting worse all the time", "boring") that are just not supported by the available data.


The magic of Facebook is the billion+ users on it. Do those people use Facebook because it offers some unique service which no one else can or does, or because it went viral and everyone jumped on the bandwagon?


If those features had a negative impact on their bottom line they would never offer them for free to everyone so yes, they're doing it for themselves and it just so happens that some people find them useful.


This is the real problem. Zuck and co are a lot of things. Words like benevolent, disinterested, and unbiased aren't on that list of things.


Know those little emails that Facebook sends you throughout the day? I've turned most of mine off, but I still like to know when people mention me, or get a page update.

Well, I figured out today that when Facebook has these, it doesn't necessarily send them to you right away. Instead it waits until it looks like you've bailed off the site to do something else. Then you get an email notification -- drawing you back into Facebook.

Yep. Pretty tired of it. Thousands of really smart people using the latest in tech to try to play me like a musical instrument.


I don't use facebook, but couldn't that also be an optimization? Like don't send the user email notifications if they're already on the site where they can see the # of notifications?


But that's not what they're doing. I'll get mentioned in a comment while I'm on the site. I'll click over and read the comment and reply to it.

Then, once I'm away from the computer for 30-60 seconds or so (I'm sure they have a system training for this), I get the email notification -- even if the original back-and-forth occurred an hour ago. The timing of when I get the email has nothing at all to do with when the event occurred. It's completely dependent on my interaction with Facebook. Many times, it takes so long, I'm usually thinking "Wow! I wonder what else X had to say?" but I click over and it's the same damned thing I already interacted with.


This sounds like a bug / coincidence.

I've never got a later email about something that happened live while I was browsing the site, and was already notified about.


A couple of times is a coincidence. I stopped counting after seven.

So you're saying that there's a bug. And the bug waits for me to stop using Facebook, then emails me things from Facebook.

I find this quite difficult to believe. At the least, it's a very convenient bug.


They might be doing this to only a subset of their users, for example as part of an A/B test.


The bug isn't that you get emailed about things only after you've left, that's the feature.

The bug would be to get emailed about things that happened while you were using the site and were notified in real time about already.


From a revenue perspective, the behavior that the GP is describing IS an optimization.


What scares me about this is the "bubble" it creates, see [1]

[1] https://en.wikipedia.org/wiki/Filter_bubble

(even mentions Facebook's news-stream)


In my opinion the bigger problem is, when people are not aware that they're in a bubble. They existed, even before social media and internet searches. E.G. when I was a kid/teen, I got most of the information about the world from one local news paper and one evening news show on tv. Yes, their "newsfeed" wasn't generated by algorithms. But still, it's someone that decided for me what is relevant and what not. Nowadays I have access to much more sources and different point of views as long as I am aware that they exist and know it's up to me looking them up. Similar thing if your part of a strongly opinionated subculture. It is hard to even realize how limited your horizon might be.

So I try to embrace the bubble when it is useful for me. I appreciate that when I type "something something python" into google the top results are relavant to my work and not about snakes or comedians. But when I'm want to form a politcal opinion I use duckduckgo and activly look for contrary positions.


> So I try to embrace the bubble when it is useful for me. I appreciate that when I type "something something python" into google the top results are relavant to my work and not about snakes or comedians. But when I'm want to form a politcal opinion I use duckduckgo and activly look for contrary positions.

I have a hard time feeling like it's helping me when I get filter bubbled; I feel like I'm more productive and on point when I know where I am in relation to the rest of the world for whatever reason. The feeling when you learn what to search to get the desired specialized results instead of other general usages by adding keywords or going to sites that are more specialized themselves (e.g. stackoverflow where you can search [r] to search posts tagged with "r" in particular) is wonderful. You opt in to a bubble of a particular sort versus not knowing what else there is out there. I tend to feel a bit... disgusted with Google deciding what's good for me during a given search.


Common man is brought up within the bubble of their parent's beliefs. Intelligent man breaks out of that bubble, once he find it too constraining. Unintelligent man does not. Unintelligent man also have trouble identifying bubbles.

Earth provide opportunities for success for both men. In the long run, one of them will dominate the other.


Darwin speak better. Darwin know grammar.


This is a slight tangent, but I don't even like the fact that what I see when I type something into Google may not be what others see. I think the inconvenience of having to type "order pizza in $city" rather than just "order pizza" is more than outweighed by not kind of destroying "consensus reality" as far as it pertains to search results, not to mention the potential for really creepy things down the line. It's not something that bothers me terribly by itself, it may not be important yet, I just don't like the precedent and idea of it.

And Facebook just can't seem to remember that I never am interested in whatever it considers "top stories", but always, every time I log on, switch it to "most recent". Not that I know that I actually get all the most recent stuff, but on top of that Facebook is clearly telling me it would prefer me to see its own curation of it, even though I obviously have no interest in that. That this option doesn't stick seems like the intended behaviour, and I consider that yet another hostile aspect of it. So I'll assume whatever they're now doing instead of a UI that deserves the title, it will understand text just fine when it suits advertisers, and will be useless otherwise. Not a claim, but a guess.


I read a paper on this recently in class and I think their findings were that most google results are the same other than geographically specific ones like you mentioned. For instance, if we both type "trump", most of the results will be the same unless there is a trump tower near you, in which case it'll direct you to it. Ordering of results however can change.


Except you're diminishing the importance of search results ordering. Some research has shown that ~50% of clicks go to the top two search results and ~90% of clicks go to the top ten search results.

Further reading:

https://aeon.co/essays/how-the-internet-flips-elections-and-...



This seems like reasonable behaviour when it comes to providing search results in the language of your choice, however.


Filter bubble might be addressable using ideas related to "information quotient": If FB decides it's worth trying to "show you the content that will most effectively challenge your views", then we pop the filter bubble, promote good content, etc etc.

Note - I don't just mean, "content that disagrees with you", because 95% of that is shit (Sturgeon's Law), but, content that effectively disagrees with you. That actually has a chance of changing your mind.

Information Quotient, as I understand it, is the idea of "What information will teach me the most?"


There's also a YouTube video on this - https://www.youtube.com/watch?v=B8ofWFx525s


Nobody wants these features, but Facebook doesn't make them for the users. They make them for the advertisers / suppliers to increase sales.


> I have no mistakes that these features are probably loved by some, maybe even most, but more and more

Count me in the camp that loves them.

I don't want to spend hours a day keeping up with social activity. Facebook does a great job of giving me a quick summary of everything going on in my friends' lives by sorting through all the drivel algorithmically.


In my experience, Facebook primarily shows me boring memes (probably because they get liked/shared a lot) and often buries generally interesting news (births, etc.).

By the way, I believe that some filtering might be necessary. It seems that when I joined Facebook there were far fewer memes and 'folklore' posts (we have a nice word for that in Dutch: 'tegeltjeswijsheid'). But that would be a user-trained filter, which is not interesting to Facebook.

So, I agree with both camps: we need filtering, but Facebook's filtering is rarely good, because it's not aligned with my interests.


> Facebook does a great job of giving me a quick summary of everything going on in my friends' lives

The only way you can actually know that is to compare the summary with the raw data. The only way to remain sure of it is to keep doing that.


The "raw data" in this case is my friends' lives. Based on my (off-Facebook) interactions with them I remain confident that Facebook does a good job of keeping me up to date on their lives when I use it.


i can tell you first hand that it does an awful job with my collection of facebook friends. it doesn't prioritise the people i care about most or even the things nearest to me... and more than once has absolutely polluted my feed with someone to the point i considered phoning them to explain why i was about to unfriend them.

i'm not saying your experience is untrue... just that facebook haven't done a great job of this for everyone at all.


"...summary of everything going on in my friends' lives by sorting through all the drivel algorithmically."

Sounds like you really care a lot [0].

[0] https://www.youtube.com/watch?v=d1g9PFtSCKw


Well, do you filter your own spam too?

The problem to me comes only when the big corporate AI systems no longer are fully on your side, like spam filters which let the "right" spam through.


You aren't alone - I don't take well to unsolicited advice, unless it's about avoiding traffic :)


More often than not, Google Maps (I assume you're referring to) does about as good a guesstimate as any human in the area would.

I'm starting to feel like some good old brains would serve us well sometimes. Take Google Maps' traffic again, it can tell me the traffic now. It can tell me expected traffic tomorrow. But it doesn't understand that if I depart at 8am and I arrive at 10am, then at 9am that traffic jam is worse than when I departed. I'm better at planning this route myself this way. Kudos to the people who built the whole model (almost certainly using some sophisticated algorithms) that automatically detects traffic when there's enough data, probably correlating it with their map data and stuff, but this is an elementary feature apparently nobody there thought to implement. It's not all just training sets and enough CPU power, some human intelligence to provide it with the concept of time would be smart, too.

Now this might seem like one silly example, but in many cases it's like this. People promote machine learning and neural nets and all sorts of automated training in situations where some research and an if statement or five would have worked as well.

Another simple example: we had security monitoring class and one of the topic was machine learning. Let your intrusion detection system learn normal traffic patterns and alert on, say, traffic spikes on Sunday nights when there should be few if any people working. I feel like looking at traffic data and some if statements would be more useful, especially since you can then integrate it with holidays and events and whatnot. Indeed, one of the warnings in the course was that "during events this will go off as well". Yeah, that's why we shouldn't use it here in the first place. This just a hammer and screw issue.


The other thing that Google Maps could add, while we are talking about unsolicited advice and the human element of making traffic decisions, would be sports games. My commute at around 7PM is usually clear, unless there is a sports game, in which case it is a nightmare for a very specific period of time before the game. Further, a big team generates a sold old game which generates more traffic than a mid-week game against an out of division team. Would be fun to program although rather complex.


People promote machine learning and neural nets and all sorts of automated training in situations where some research and an if statement or five would have worked as well.

As a Machine Learning Engineer / Data Scientist I would always evaluate my model against a baseline (in this case the if statements) and only put the model in production if it outperforms the baseline.


>algorithmic timelines telling me what happened while I was gone (trust me, twitter...I would have gotten to it eventually), prioritizing what shows up (trust me facebook, I do a very good job hiding content on my own)

Absolutely! At least give me an option (even if it is buried nineteen pages beneath about:config) to switch to a simplified reverse chronological stream. Until this happens, I've in followed everyone on Facebook. If you didn't send something specifically to me, then no I definitely didn't see it. (:


Recommendations, algorithmic timelines and so on are indeed a bit of a nuisance. Similarly, Google Now is annoying me a lot at the moment but I can't use my watch fully without turning it on.


I find Google Now to be far more useful than e.g. Facebook "recommendations", which are basically ads.


Same, but google now for me is basically a mash of:

1. The weather

2. Stocks that have the same acronyms as things I've searched for

3. Flight information for any flight with the same code as mine

4. A recommendation of a film I might like to see because I searched for actors in them (which might be good except the reason I was looking up those actors was that I had just watched that film)

5. Sports scores for teams I don't follow

6. Possibly updates to sites I visit regularly (equivalent to a highly unreliable RSS client)


> 6. Possibly updates to sites I visit regularly (equivalent to a highly unreliable RSS client)

This is what annoys me the most in Google Now. It's a damn black box. I'm happy that they provide "updates to sites I visit regularly", but even more than those updates I'd like to see the list of those sites. Without that, and any kind of information about how reliable this reporting is, I have no way of trusting I won't miss an "important update". I just don't know, and Google Now does nothing to reassure me.


I don't really rely on them to show me important updates. I see the stories they provide me with as more of a friendly suggestion that they've found something I might like or might of missed. I don't get them very often and I really like the spontaneity of them appearing when I'm not normally looking for them.

The only time they annoyed me was when picking up a false article about Breaking bad being renewed. I shared the headline with several people before realizing it was a fake article.

http://www.businessinsider.com/breaking-bad-season-6-hoax-20...


I may be assuming ill intent where there is none, but I feel like that's part of the plan of the entities (Google, FaceBook) that automate recommendations: get you to discard that nagging feeling like you're missing something of actual importance, and just get your info from the feed. If I were nefarious and had their real estate, I would certainly like to not be transparent so that I could manipulate things without people being able to tell or complain.


I've turned most of these off. The weather is useful as is commute information, but site updates and things Google think I might like based on my searches are generally poor. There was a recent reminder not to forget to vote and occasional "are you interested in such-and-such an popular event" cards, but these never seem to have an option to prevent recurrences.

Parking information would be useful if it were both accurate (it isn't, due to taking a park and ride bus) and consistent. Like many Google Now cards it seems difficult to predict whether it will turn up or not.

Another trick it used to do was show me cards with navigation information to somewhere I'd searched for, usually the night before a trip. But, when I got into my car at 6am the next day such information was nowhere to be seen and I'd have to search for it again. Now I save the location in the calendar so at least there's a link to click, causing Google Now to remind me to leave on time after I've already left, sometimes when I'm nearly there.

Overall, it's not a useful product and relies upon data being collected which I'd rather wasn't. The only reason I leave it on is so that I can dictate reminders to my watch, which I find a very useful feature. The requirement to use Google Now for this last feature seems rather arbitrary to me.


This pisses me off to no end. When I disable something like web browsing history because I get a nagging feeling that Google knows more about me than it should.

Then on the way out of work I try an "OK Google. Directions home"[0] to get a time estimate or best route estimate and it tells me I have to turn on browser history to continue or some other such bs. I wonder when Google Now will understand "Ok google. F?!k you Google."

[0] correction: that's more likely "ok Google. OK Google! OK GOOGLE! Damnit (click google now, click microphone). Directions home."


This particular point was rather annoying, and involved working around it in this manner:

Me: "OK Google, navigate home" Google: "You need to turn on web and app activity." Me: "OK Google, navigate to $NUMBER $STREET, $TOWN." Google: "No problem".

But, they then decided that web and app activity was essential in order to give me commute information, and the only way I found around this was to turn it back on and start doing almost all my searches via Duckduckgo.


I have done the same thing. My default search engine is now duckduckgo. Changing the default theme to blue/green (upper right corner) made a huge difference. I don't know why duckduckgo uses a black and gray color scheme that essentially does not distinguish between link/site/description. Makes the results appear very muddled when they are, in fact, usually comparable to google.


Google Now kept creepily insisting I was on my way home and kept calculating directions day after day..Even after disabling this directions to home feature, it still killed my phones battery life, so as omnitious as everyone likes to say Google is, they still had to have my phone do the ML cycles. Bah! Keep it all on your server Google! protesters yell 'Take it off your servers Google!' heh.. :-)


How do you know the battery drain was caused by CPU usage, as opposed to something like network?


I had checked the statistics on the Play Framework service that it ran under. There was abnormally high CPU usage, high battery usage, and it was running all the time.


Lots of apps use the Play Framework, not just Google Now.


I am aware. But consider that I literally disabled/enabled it with intent to discover if it was the problem. It was. Google Now is a huge drain. Googling it turns up thousands of other similar experiences unfortunately. [1]

[1] https://www.google.com/search?q=%22google+now%22+cards+causi...


You also risk exposing yourself merely to an echo chamber as everything is automatically self reinforcing. This is bad stuff -- socially manipulative dark patterns.


This is almost exactly what it feels like FB and much of the web (even some of HN!) has become/is becoming.


I wonder how well these decision systems stack against random suggestions? I notice it feels like I fixate on the few people or subjects that are regularly presented to me to the exclusion of everyone else.


Yup, this is what will cause me to abandon twitter.


Well that's a pretty disappointing post: "Hey, we use our deep learning tools on text. Look at this 12 month old paper".

Everyone in the field has read that paper. It was good work! But there are lots of intriguing things mentioned in the post which deserve further details.

The most interesting thing to me is "more than 20 languages"!! That's pretty nice - the paper had some early results for Chinese, but if it can perform similarly to the English results across 19 other languages that is probably the state-of-the-art for many of them.


Yesterday I mentioned vodka to my wife in Facebook Messenger and a few minutes later she saw a vodka advertisement on her timeline. Obviously Messenger will never include end to end encryption.


An even better example. I searched for a product on Amazon.de on my laptop, and 1 hour later when browsing Imgur on my phone, I got an ad for the exact product.


That's a totally different ad context.

OP has shown that and how FB does in fact use text analysis of messages to show adverts. You habe been a re-targeting target by correllating your browsing habits and other factors. That has been going on for ages.


People probably said the same about Google. And while Allo doesn't do E2E by default, it does include it as an option.


are you serious? They had to decrypt text so that your wife can read it and that is where they started displaying vodka ads.

edit - I meant to say even if they had end-to-end encryption then your text would still need to be decrypted so that other user can read it.


End-to-end encryption means the server doesn't know the content of the message. Only the clients, the end-points, have the decryption key.


And then the endpoints integrate with ad networks after they've distilled it into topics.

I mean, my message contents are personal data. The topics "buy a bike" and "sell a bike" are not (I mean, not by themselves), so those you can transmit to ad companies.

Now of course you need to make the detection lightweight enough that you don't need to talk to the server anymore, but that's just a matter of time. The person you're responding to, who's got downvoted, has a point (if this was his point).

Edit: I sounded like I agreed with it, which is not the case. Even if it's stored "private and securely" at Facebook, they would still be able to connect my topics of interest with my identity and contacts, which would "technically" comply with the concept of end to end encryption (only metadata leaks because topics alone are metadata, which always happens because you need to route messages)... but which still doesn't comply with my own definition of proper privacy.


I'm not so sure about that. If you were talking about subjects that might be disagreeable to people in your country, maybe it's sex toys or homosexuality, if those topics start coming up in ad networks then that's still going to be a problem.


Oh, shit, I made one fatal mistake writing that post: it now sounds like I agree with it.

This is just how marketing would sell these features to the world (end to end encryption, and yet targeted ads), but disclosing each topic I ever discuss with a given person is still private in my opinion.

I've edited my post's original text slightly, adding a bigger edit at the bottom.


Facebook Messenger doesn't have end-to-end encryption, it is stored in Facebook.



I wonder if this is in response to Google's open sourcing of Parsey McParseface: http://googleresearch.blogspot.com/2016/05/announcing-syntax...



Facebook already has Wit.ai which is extremely similar to Microsoft's LUIS, and in some ways more advanced.


If this post is a response to anything, it is most similar to Google's recently announced Allo, at least from a product perspective.

It's a little tough to figure out what Facebook's goal with the post was though since it's not very technical.


except this isn't open source.


> Understanding the various ways text is used on Facebook can help us improve people's experiences with our products

You mean show us more "relevant" ads ? Last time I checked, Facebook's *product" was advertising. Thanks, this is exactly what I miss in my life - more ads for products and services I don't need.

So excited that Facebook is going to understand everything I'm talking about privately with my "friends" and sell my identity to more ad buyers, who'll design more subliminal ads to squeeze the last millisecond of what's left of my attention span.


Is this the paper they said they based this on: "Text Understanding from Scratch": http://arxiv.org/abs/1502.01710

Here is example code for text classification from character-level using convolutional networks in Torch 7 https://github.com/zhangxiangxiao/Crepe


So I mean this is great but are they open sourcing it?

Is there a way for us to play with it? Or are they just bragging?


I think FB has a strategy here that plays along well with their messenger (and their emphasis on bots). Imagine the example they give "I would like to sell my old bike for $200, anyone interested?", being matched with another user who said "I am looking for a cheap bike nearby". This plays right into google's territory, with the ability to organize and match people's intentions.


Everything is deep nowadays. Somebody should do a parody product called DeepBalls


Makes me wonder what the next AI industry buzzword will be. After Big and Deep, maybe comes Fine or Narrow, Tactful, and finally Aware.


Looks like just text classifier, which classifies text snippets by predefined simple intent.


Indeed.


I would be interested to learn more about their research behind it. Will there be any papers released? Specifically: how much data did they train on? How many machines for how long? How did the different NN architectures perform?

I guess those things would be fine to share without revealing the inner workings.


Such results as they published today have become pretty standard in the academic papers. I was wondering why weren't there any cutting edge applications of deep learning in text, as a product. It's probably too expensive to run the necessary hardware to use the latest and greatest ideas.


Are they actually open-sourcing it? There doesn't seem to be link to the code in sight.


Now we're really well on the way into DeepSh*t.

When these things get just a little bit better, the Five Eyes agencies will suddenly not have a staffing problem any more.


I feel like the more AI gets into the middle of our online interactions, then the more AI will try to steer those interactions. There is already plenty of evidence that our online behaviour is strongly influenced by algorithms, and I suspect this is just the beginning. One reason why I log onto FB maybe once every couple months; it just feels like I'm being manipulated.


Eben Moglen:

> Facebook is strip-mining human society. Watching everyone share everything in their social lives and instrumenting the web to surveil everything they read outside the system is inherently unethical.

> But we need no more from Facebook than truth in labelling. We need no rules, no punishments, no guidelines. We need nothing but the truth. Facebook should lean in and tell its users what it does.

> It should say: "We watch you every minute that you're here. We watch every detail of what you do. We have wired the web with 'like' buttons that inform on your reading automatically."

> To every parent Facebook should say: "Your children spend hours every day with us. We spy upon them much more efficiently than you will ever be able to. And we won't tell you what we know about them."

> Only that, just the truth. That will be enough. But the crowd that runs Facebook, that small bunch of rich and powerful people, will never lean in close enough to tell you the truth.


It is useful to hear what architectures they are using (e.g., BRNNs) and I appreciate the pointer to the original paper.

I was hoping for some open source code to read, or more detail on their models. Facebook is pretty good at open sourcing things, so hopefully more papers will be released and open source software as it makes sense for FB to do that.

EDIT: typos


Are they going to open source it so that everyone else can use DeepText?


It is good publicity and good will to release open source code.

They still have their business advantage intact: they have the data and the awesome infrastructure.


It seems like a lot of big companies are open sourcing their deep learning stuff. Part of me wonders if they do it because, as people get scared of AI, the federal govt can't come after lots of of small companies using the opensource AI to regulate/fine/shutdown. They could come after just Google or FB once people start loosing lots of jobs to AI, if they where the only kids on the playground.


They are mostly releasing their tools though. The raw diamonds are the trained network weights, it's architecture and the data itself. Releases on those parts we see rarely from private companies.


My Facebook account was only for public posts and private messages only, to force myself not to post anything that Facebook might consider public as if it were private. This just made me give up on the company altogether.

Fun that you guys can train a neural net to deduce what I like better than me, but as they said at the CCC conference, everyone has to decide for themselves how close they are with their machines (though that was in the context of taking your laptop to the toilet, in context of leaving it alone unattended).


Ya know, I normally don't believe in these "Facebook is listening into your conversations"-type voices and opinions because it seems like such an invasion of privacy and the company wouldn't do something like _that_ without permission. However, after a recent episode where I _know_ Facebook was listening to my conversation[1], I'm now much more nervous based on this machine learning news. I've become highly critical and pessimistic when Facebook says things like "Better understanding people's interests" and "New deep neural network architectures", not so much because of what their intent with the data is, but how they receive that data. It's done so sneakily, this is what I disagree with. I normally don't have a problem with Google reading my emails and suggesting news topics on Google Now + my flights, etc., because they ask if they can, but with Facebook, there's an era of creepiness that they are always listening.

[1]: This past weekend, a close friend was telling me about his friend who works at Google and the following morning, this Google person came up on my "Friends suggestions". It was the most bizarre thing... I also confirmed with my friend if he looked this Google guy up later on Facebook, which may have prompted his Google friend to show up on mine, and he said he didn't touch Facebook after our conversation.

EDIT: For further clarity.


Why did you think Facebook wouldn't do that? In their view, any content you provide to their site becomes their property. Why wouldn't that include conversations. Considering the fact that everything is unencrypted once it reaches Facebook, it's a mistake to assume good faith on the part of a company that makes its living trying to make you spend more of your life on their website.


Any content I provide to their site becomes their property is something I understand. However using my mic without specifically asking permission is something else. Yes, I do have their app, but even then, their app on my iPhone does not make my iPhone their property. In all honesty, I actually thought that this was something that phones would prohibit (I use an iPhone), but I was definitely mistaken and surprised by it.

Having said all this, it was actually in the news[1] yesterday that Facebook will soon be providing end to end encryption, though I doubt it'll apply to the listening in on people's conversations chatting.

[1]: https://www.theguardian.com/technology/2016/may/31/facebook-...


Isn't "text understanding engine" a poor description of what is basically a classifier?

I'd expect a text understanding engine to do more than classify, I'd expect it to read a bunch of sentences in context and then be able to answer questions about it:

John was walking down the stairs. John tripped and fell. John lays on the floor.

Describe John's status: Is he hurt or in pain? Is he standing?


Facebook have one of the leading teams on doing Deep Learning question answering. Jason Weston et. al wrote a paper introducing a machine-generated textual dataset[1] similar to what you describe, eg:

    Daniel picked up the football.

    Daniel drops the football.

    Daniel got the milk.

    Daniel took the apple.

    How many objects is Daniel holding?
There have been plenty other papers published by them and others tackling that and similar but harder problems eg[2]

[1]http://arxiv.org/pdf/1502.05698v10.pdf

[2] http://arxiv.org/abs/1511.02301


Can't Daniel just type "inventory"?


Deep learning can do question-answering.


I know that tech company M.O. is creeping invasiveness by acclimating users to marginally more Orwellian incursions of their privacy over time, but are we already at the point where something like this can gain widespread acceptance? Seems like the willingness of the general populace to prostrate itself to our newfangled corporate overlords is only accelerating.


I kind of think Google will win over Facebook in this particular war, based on what I have seen so far.


And here I was hoping to see links to published papers or God forbid the release of datasets for fellow researchers... Heck, even a trained model would be nice.

Seems like a PR stunt to attract talent.


Perhaps we're all computer simulations whose purpose is to a/b test marketing on real-world people based on their social media posts.


More creepy idealism from the team at Facebook.


Would love to see a compare / contrast between FB/GOOG/MSFT's offerings in this space.


gfd


sgfdsgfdsgfsdgf


dgdhgfsegfs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: