Interesting that AA frames it as TPG's app stealing confidential customer data, while it is the customer who set up the app and willingly provided their credentials in the first place.
Will AA be able to find a single customer who has a problem with what TPG does? What is their case then exactly? Would they similarly sue the app if customers were copy pasting data into it, rather than accessing it programmatically?
FYI the title is editorialized. This suit has nothing to do with screen scraping, but just data access in general. Services like these nowadays almost always use private APIs (built for mobile clients and SPAs) rather than parse HTML.
Speculating, but TOS generally say that you're not allowed to share your password with anyone. Assuming that's true for the AA site, AA might argue that TPG is encouraging users to break their TOS ("tortious interference").
Though if app doesn't actually share the password with TPG and just uses it locally there may well be a question of whether entering your password into a third-party app actually counts as sharing it with that party. How exactly would this be different logging in from a web browser? It's just a different kind of user agent. Are Google, Mozilla, Microsoft, and Apple guilty of tortious interference simply because the software they release has access to your passwords on your own machine and could report them back? (For that matter, they even store the synced passwords on their servers, though in principle those are supposed to be private.)
Of course there could also be specific terms in the TOS against accessing the service with unapproved user agents, independent of any prohibition on sharing credentials.
1. Breach of Contract
2. Tortious Interference with a Contract
3. Unfair Competition by Misappropriation
4. Trespass
5. Trademark infringement
6. Dilution
7. Dilution under Texas State Law
8. False Designation of Origin
9. Copyright Infringement
10. CFAA
11. Violation of Texas Harmful Access by Computer Act
12. Unjust Enrichment
IMO it would serve the common person well if we changed the way the law works so that if a corporation sues you for a laundry list of things, if a single claim gets thrown out, then they all do. That prevents this insane pile-on where a half-afternoon's work by 4 people in their giant legal department turns into a multi-year nightmare for an average citizen. I think that would be a fair way to reign in the corporate "throw everything at the wall, see what sticks, or at least hope we intimidate this regular Joe into submission" approach that's all too common today.
While I agree with sentiment, I don't agree with the premise or the method. Corporations aren't the only ones stacking charges. Prosecutors and private citizens (usually operating pro se) are just as capable of frivolous litigation. The best action a judge can and should take is to dismiss the meritless claims before proceed to court phase.
Generally no. The reason you see a 12 claim filing is because they don't get unlimited do overs. You get some chances to amend the complaint, but once it starts moving forward, that's it.
That also happens when county prosecutors are up for re-election. Stack charges in case the jury finds one claim weak, there are 5 synonyms for the same underlying crime that the jury is willing to convict on. Also helps with forcing plea bargains.
Curious how Breach of Contract can fly, considering TPG never agreed to a contract with AA to begin with. And AA is obviously not suing their own customers.
Some of these are serious, like the CFAA claim, some are genuinely hilarious. Dilution? Really? Unfair competition? Are they accusing TPG of creating an airline?
> Speculating, but TOS generally say that you're not allowed to share your password with anyone. Assuming that's true for the AA site, AA might argue that TPG is encouraging users to break their TOS ("tortious interference").
They specifically claim it's presenting something that is intentionally confusingly similar to an official AA logon screen using copyrighted and trademarked AA content to harvest customer username and password info, does not prominently note it's nonaffiliation with AA, and directly violates the TOS itself, as well as arguing tortious interference, copyright infringement, trademark infringement, trademark dilution, and violation of the CFAA and the Texas equivalent.
They also note a similar dispute settled with a separate firm owned by the same parent before TPG tried to negotiate a deal with AA for permission to access customer data for this purpose and was turned down.
Also, would there be any legal difference if they wrote a browser plug-in/extension that provides additional information while you're logged into AA's site?
What if their plug-in advertises for competitors or defames AA or navigates the user immediately away? It's hard to argue it is still a browser/user-agent if it changes or editorializes somebody else's content beyond a certain point.
That they're stealing AA's opportunity for, ahem, "customer engagement".
They may have a case for tortious interference, I think? It does seem like an uphill climb. The strategy might just be to punish TPG with attorneys' fees and discourage this practice in the future.
Every similar case that has been litigated has been about wholesale copying of a large amount of content (e.g. millions of LinkedIn profiles, Craigslist listings, flight prices).
In this case an individual customer authorized the access and only their data was affected. This is a pretty common use case for a large category of apps – think Mint, Plaid, all wallet apps which organize and track different accounts.
So basically corporations don‘t even pretend like they grasp the idea, that you pwn your data.
It is theirs. You can use it, give it to companies, to the government, but you are not allowed to destroy it (just like currencies in many countries). LOL.
Edit: typo. I wrote „pwn“ instead of „own“. But then I thought about it and somehow it feels appropriate.
Tangent: pwn is always a misspelling of own. Granted, intentional meme usage is almost entirely limited to the "conquer" metaphor, but unintentional usage wouldn't have that same restriction/implication.
The issue isn’t whether or not it’s what the customers wanted. The issue is that TPG wasn’t the party that entered into the agreement with AA when creating the account.
I’m not suggesting it’s right or wrong, but that’s the issue.
If I authorize an agent to act on my behalf in a particular way wouldn't they be considered and authorized as me? If I make a contract with Bob to mow my lawn, it's not like Bob's workers or employees have to sign the contract too. TPG shouldn't need to be party to the contract to be an agent of one of the contracted parties.
But if TPG is an agent of the customer, then AA should be suing the customer. As said above, TPG isn’t subject to the TOS because they aren’t the ones who agreed to them. So, it’s the user that is misusing AAs data. But they don’t want to sue their customers (and in particular each customer individually, because I’m sure there is a no-class-action clause)…
So here we are. It should be fascinating to see where this goes.
> But if TPG is an agent of the customer, then AA should be suing the customer.
Respondeat superior lets you sue the principal for torts of an agent committed in the scope of their agency, but it doesn't eliminate the liability of the agent for their own torts.
You typically want to sue the principal when the agent is less able to pay, less willing to settle, or one of many easily replaceable agents employed by the principal for the same purpose.
When the first and third of those are reversed and the second is unclear (which seems to be the case here), you want to focus on the agent. You could sue the multitude of principals, too, but that's just a lot more cost for little additional benefit (and possibly sympathy backfire in a jury trial.)
And here's me, all idealistic, thinking that decisions on who to sue should be about the merits of the complaint, rather than "who has the potential to give me the most money".
Civil lawsuits are a means of dealing with a problem, mainly by recovering losses (via damages) but also sometimes by imposing other legal compulsion (equitable relief).
“how does filing a lawsuit against this party improve my expected resolution to whatever problem motivated the lawsuit, and is that worth the added cost of pursuing that party” should be part of your calculus. That's not to say the merits of the case don't figure in, but a meritorious case against someone who even the best possible outcome won't do anything to improve your condition is just a way to waste your money and time, and the courts.
But if TPG is an agent of the customer, then AA should be suing the customer.
As the great Professor Hecker repeated so often in Business Associations - “The tortfeasor is always liable.” E.g. when you get hit by the Dominoes delivery driver, you can sue the delivery driver.
I was just looking at this in the context of the Rogel Aguilera-Mederos (110 year sentence) truck-crash case the other week. IANAL, but as I understand it there's a legal concept called "respondeat superior" and the "McHaffie Rule" that may complicate that assertion. A lawyer could and probably would stomp me on this description, naturally.
I'd even say most leases vs some, especially regarding residential. Buying a house for the purposes of renting requires different financing (although most are being bought for cash) than a typical homeowner's financing which has clauses that says the person receiving the financing will use the address as their primary residence.
That's how a temp agency works, no? You hire the agency because you need a worker for a short term job, and the agency will in turn hire a temp worker to actually do the job.
AA and all other US airlines argue that information about your customer account is part of a data set that belongs to them, so they set the terms by which the data may be accessed. It's in their terms of service. This isn't particularly unique to airlines either.
Someone else gave a decent answer to your question, but I’m not sure that’s this situation. TPG is scraping AA and then displaying the results in TPG’s app. There’s an http request being sent by a TPG computer and IP address to AA. I think that’s a relevant distinction. I also think that’s relevant to whether TPG is a party to the AA TOS.
I don't think that's an important distinction. TPG could be doing everything locally on the user's device (maybe they are?) and AA's complaint would be no different.
They could (conceivably) if Google touted using their browser to access said website despite all this, especially if Google derived some benefit from it. It may well not come down to making available a tool, but to knowingly encouraging the breaking of ToS (assuming it's valid in the first place).
The US was formed under the premises that adults have the right to make their own decisions - good and bad, and that adults are the masters of their own property.
I think these principles have served us pretty well despite the violations of human dignity like not being able to check your AA points from the TPG app.
At the same time, doesn't AA have the right to protect themselves from potential abuse? I don't have a problem at all saying that I'm not supposed to share my credentials with 3rd parties. The user has no control over what the 3rd party might do, and can only make that decision based on what they think they are going to be doing.
> The US was formed under the premises that adults have the right to make their own decisions - good and bad, and that adults are the masters of their own property.
Yes, and AA servers, copyrights, and trademarks are all, under the law, it's property, of which it is master.
> Interesting that AA frames it as TPG's app stealing confidential customer data, while it is the customer who set up the app and willingly provided their credentials in the first place.
They think the data belongs to them. They think they own the consumers and all the data they generate. To them, it's their consumer data, something they abuse to get consumers to log in and look at ads, and they'll be damned if they let some random software have access to it no matter what the user wants.
GDPR makes it clear that personal data belongs to the person, these companies are just middle men who are temporarily processing it. Cases like this perfectly illustrate why such a change in perspective is necessary. Access to this data is a privilege and it can be revoked.
Never underestimate the greed of these people. They'd literally put ads under our eyelids if they could. They'd put ads in our dreams. They see our moments of peace as an opportunity to get richer.
As an attorney who focuses in this area, I can say that the most interesting question is first where this case will be decided. Texas, and the Northern District of Texas in particular, has historically been the worst jurisdiction in the country for web scrapers to litigate. Southwest has a long history of litigating successfully there, including two cases from just last year. If TPG is going to win, first they'll need to win the jurisdictional question of whether the case will be decided in Delaware or Texas.
Thank you sharing the link. I find it amusingly appropriate that viewing this legal guide requires agreeing to Terms & Conditions, but appreciate that it’s one of the shortest Terms & Conditions I’ve ever seen easy to read and understand within seconds.
Haha! Given that the article is about scraping, I figured some prudence with respect to people copying and pasting my work somewhere else was warranted!
As a non-american reader, it feels bizarre to me that the direction of a case changes so drastically depending on where it will be held, within the same country
> I'm surprised more businesses don't ban all of Texas as customers to prevent any litigation happening there.
Businesses are often the ones structuring agreements so that they can sue customers there and so that customers are forced to sue them there, so...that would seem counterproductive.
Even before considering the size of the market you’d be cutting off.
Businesses sue? Blame customers! In order to save the village we had to destroy the village! There needs to be some massive reforms, especially with copyright.
It's funny how regionalism pops up on the internet. Thing is, it's less than 10% of the country (and surely not all of those 29MM are on the internet), and there are other businesses surviving just fine foregoing a TAM of the whole country.
How can you be so sure it's not regionalism that's causing you to dismiss nearly 10% of one of the largest, richest countries in the world as inconsequential?
The top comment is from a legal expert basically saying that TPM's success in this lawsuit depends on avoiding Texas a jurisdiction, and the product in question is not possible without winning this lawsuit.
It's not that Texas is inconsequential. It's that the choice is "cut off Texas or risk the entire business in the other 49 states".
Oh, it absolutely is regionalism, in the form of "Texas is a corrupt legal shitshow and it can be a smart business move to refuse to do any business there."
And because HN only allows me like 3 comments a day, I'd like to emphasize "can."
Not all users are created equal. This is why many internet businesses cater exclusively to a wealthy country that contains only 6% (that is, excluding 94% of internet users) of the people on the internet.
In this case, it would be TPM banning Texas customers, since Texas is the favorable venue for AA.
That doesn't seem like as big a deal, since TPM doesn't have a giant corporate headquarters to relocate and can cutting Texas from the TAM isn't that big of a deal.
Years ago I (and a handful of other folks) had a meeting with some people from American who were thinking about opening up their data via an API. One of the other attendees said something to the effect of, "software developers are very, very good at removing inefficiencies when given data like this." It was delicately phrased but the subtext was clearly a warning: if your business depends on asymmetry, an API can sink you. I guess they took that warning to heart.
When you operate in a commodity business you want to impede the market's price discovery mechanism (in this case by making it harder for prices to be aggregated).
The same issue and battle is playing out for US healthcare as well due to the recent rule forcing hospitals to make prices public.
While some parts of healthcare - like drugs - can be seen as commodity, a lot of it is not. Service differs significantly from doctor to doctor. Not disclosing the prices makes it easier for the providers to have higher prices but it doesn't make the service in question a commodity.
Definitely, but if I'm comparing primary care providers in the same area between Kaiser and Hoag and Sutter or other large chains there isn't going to be much difference in service between the pool of available doctors at each one. Or even routine specialists for non-critical stuff like dermatology, you don't get any better care by going with the best. Where it matters is cutting edge, critical care like cardiology, oncology, and surgery where methods aren't standardized and individual skill (as opposed to drug innovation) plays a large role in outcomes.
I'd guess most healthcare services are more like a commodity than not. (Also, to the extent that there is important variation in quality of service, it's probably the nursing staff that matters for the vast majority of services. I can't remember the last time I interacted with an actual MD for more than a few useless minutes.)
While surgery may not be a commodity, the same mechanisms of price discovery lowering prices across the board apply. Even if a surgeon is more skilled, are they worth 3x the cost?
A surgeon can easily be worth 3x the cost. Some surgeries have a significant risk of death, for instance. Or a low chance of success. Both of those factors may change dramatically with surgeon skill.
And those are no brainless. A more skilled surgeon on cosmetic surgery with lower recovery times and or prettier work could easily command a premium because their supply is so low and rich enough people have huge demand.
Opacity in price discovery as an objective in a commodity business is definitely an insightful framing of the issue.
Although the airline industry can be considered a commodity industry, the airline rewards miles industry is less so. What those miles can get you, can essentially change at any time if the airline says so.
This is true. Though one thing I've considered is that the front-end clients to the miles themselves have a balance of usability and inefficiency. They don't want everyone getting maximum dollar for their points. I would maybe so that giving data API access to the points tilts the $USD market price of what a point is worth in the favor of the consumer.
At the very least, data obfuscation obfuscates the shenanigans of a constantly asymmetrically redefined value store.
TPG just highlights the shenanigans which puts pressure on airlines to change the value even more frequently. This in turn makes the shenanigans more apparent which might call for regulation.
This lawsuit is attempting to nip this process in the bud before stumbling on regulation. But, fundamentally, the relationship is asymmetric regardless of any data api access.
For some perspective: This isn’t just about friendly apps like TPG helping consumers out. If you have an API (or even just turn a blind eye to scraping) and have a popular service, scores of ill-intentioned business people will descend on your business to suck any value they can out of it.
This ranges from all of those StackOverflow scraping websites in your Google search results to companies that want to scrape Facebook for images and personal info to build a database of everyone.
I wrote a scraper for Air Canada's aeroplan program a few years ago. I wanted to track my points in my own custom native app. I probably had $10,000 worth of points in my account. One day I logged in to find out my account had been deactivated on suspicions of fraud. After several lengthy phone calls with their team (including sending them the node.js script I was using), I was able to get my account restored. For the weeks it took to fix my account it was pretty frustrating. I just don't understand why you can't write a script that acts as your web user agent.
I’ve been on the other side of this, defending against bots.
Basically: Well-behaved and well-intentioned scraping bots are rare. You’d get a lot of users setting update rates to 60 seconds that did a new login every time and creating as much traffic as 1000 users. Then they’d release the script for integration with something people and suddenly you have 1000 people each creating 1000 times as many login requests as a single user.
Another common problem was forgetting to implement reasonable back off for failures. A lot of newbies write scripts that immediately retry on a tight infinite loop whenever something goes wrong, sending a huge stream of requests to your server if the API changes or when it goes down. Again, multiply this by many users sharing a script and it becomes a problem.
Then of course there are people trying to make a business out of extracting your company’s data, such as putting it in some other website where they can serve ads over your content or whatever (think of all of those StackOverflow scraping websites in Google)
Basically, you can’t investigate the motivations of each individual user. You just block them all.
And unreasonable bot load is a legitimate concern.
What's illegitimate is that "attempting to ban programmatic access" is on the table as a legal redress.
The only way, from a technical moral standpoint, I could see that being remotely reasonable is if there was 1:1 feature and access parity with an API, then being able to legally force agents to use the API.
But critically that's 1:1 feature - if a user can do it, the API offers a method to do it.
And 1:1 access - if an unauthenticated user can do it, then no mandating an account is required for API use. And if any user can do it, then any user will be approved for an API key.
Otherwise, it's just ceding more power to companies.
So what we need to do is make a platform for user's bots that completely prevents them to behave in anyway un human like. then get other platforms to trust this platform, rather than trust individuals and their scripts.
Reminds me of a little battle I got into years ago at work. The thermostat covered like 4 or 5 offices, and they had given us a control to change it on an internal website. It would record your name when you changed it, and then make you wait like 10 minutes to change it again. When I first moved into the office, I noticed that there was a battle between two people that had it doing several degree swings all day long. I sent them both and email and proposed a truce with a temperature in the middle, and they agreed. A few weeks later I noticed the guy who preferred it warmer broke the truce. I do not like it warm. So I wrote a script that would reload the page, check if the temperature was above a certain number, hit the down button, wait 10 minutes, then repeat. Some time after that, it became obvious that the other guy had a script too. But his script had no timeouts in the loop. Eventually the people in charge of the internal site emailed me and asked me to stop. They said they only noticed I was using a script because the other guy's script was breaking the website, and they looked at the logs and saw my responsible script reacting to it all night long. My manager laughed and told me to make the script more human-like. The other guy gave up his temperature tyranny and I let it sit at the truce point again.
> think of all of those StackOverflow scraping websites in Google
You don’t need to scrape stack overflow, you can just download a .zip
That’s one reason why people use it: they can’t just gate you off from the content you’ve created. You and others can (and will/do) have a copy of it all.
You beat me to it, was going to say the same. It's always a few bad actors that try and hammer our servers, gets annoying real fast. I'd honestly block them and move on, I don't have time to investigate every single request. Now to sue someone? That seems like a waste of everyone's time.
For the specific use-case of "badly written scrapers", this might be reasonable, but usually by the point when engineering needs to care about scrapers, other people at the company are involved and just view it as a service theft issue. i.e. "Why waste time and money forcing people to scrape fairly when we can just ban all scrapers?"
Not to mention, actually malicious traffic will find any non-Sybil criterion you use to enforce rate limits and work around it. "Enforce rate limits per User-Agent?" I'm now 10,000 different applications. "Enforce rate limits per IP address?" I'm now 10,000 different compromised residential IP addresses. At some point, distinguishing between well-behaved, buggy-but-legitimate, and outright malicious automated traffic is either impossible or too time-consuming. Upon which point you throw up your hands and say, "Screw it, everyone but Google or a browser is banned."
> "Screw it, everyone but Google or a browser is banned."
Thanks! Why don't malicious actors just spoof browsers?
More generally, I would think that any defense that prevented malicious actors would prevent badly written scrapers, simply because malicious actors can do anything a badly written scraper could do, but can also take more active steps to evade defenses.
(These are honest questions; I have very little knowledge about this.)
Why does it need any investigation of motivation? If you can block then you can as well implement backing-off logic on the server's side and serve all traffic. One of the other comments is right. There is some sort of discrimination on bots that probably will never be resolved until laws are put in place to give them certain equal access rights.
There’s no winning here. Sending back a bunch of 429’s is still part of your API. Sure it’s less expensive to do than the operation the client was probably requesting but it’s not free and it’s stateful. For the kinds of bad actors people are talking about in this thread you still want to blackhole them.
All you're doing is offloading the response to a piece of network hardware. It seems like what you're looking for is a technological solution for load management which you're forsaking in favor of a kludge, then blaming the user.
Does it make sense? Bots rarely make HTTP requests for images, css, video clips, large JS files, custom fonts, etc. Real people do. A well-written bot just seeking some specific data can often complete it's task with less than 1% of the resources that would be sent to a "real" user.
I am not sure what "bots" you are talking about here. When I wrote a web scraper to get my Air Canada point values I used a script that fetches the web page and parses it. It was the only way i could get it to work. I had to steel the session token from the browser cookie in order to make it auth
I guarantee that, with some effort, you could write a script to emulate every HTTP call -- logging in, accepting the cookie (just a value in a Set-Cookie header), and requesting the point values, making sure that cookie value is in your Cookie header. Just because you could "only get it to work" one way, does not mean there isn't a far more efficient way.
Dude, this is my story about my experience. The context provided above is relevant to what I was doing. This has nothing to do with theoretical possibilities that you are speaking about it. You are in the wrong thread.
Those are all static files that are easily (and typically) cached in front of the application. Pulling customer-specific data from an authenticated session taxes the application (and DB) directly.
I think you are over-estimating the use of caching in a lot of industries and a lot of companies. Further, a company that aggressively caches static files should also recognize the benefits of caching their most common database queries. Another replier to my comment mentioned his Air Canada point totals. The original post is about American Airlines. Point balances change infrequently. An airline could easily query every active customer (had a point balance change within the past 6 months) every 6 hours and keep all those values in memory, dramatically reducing individual DB queries. Or not, and instead choose to sue a very popular blogger and builder of a tool used by your best customers, pissing everybody off and looking extremely petty and customer-unfriendly in the process.
True, although the more you lock down, obfuscate, and hide your data, the more the bot-writer is going to use the heavy guns to penetrate you. Points Guy, and Award Wallet are both attempting to provide a service that people, especially American Airlines's most valuable customers, obviously want -- AA could easily work with them, instead of against them (or provide the same service themselves).
> Then of course there are people trying to make a business out of extracting your company’s data,
If I can do this by hand there's no legal reason I can't do it by machine. You can try to defend against it, I guess, but the second you start impacting your obligations to someone else (like disabling their account after they paid you) you are in the wrong.
Accesses and privileges given to you does not always extend to agents acting on your behalf.
Having a driver’s license does not grant someone acting on your behalf the right to drive on public roads if they don’t have their own license. And more directly it also doesn’t grant you the ability to use your autonomous driving software on public roads either.
And just because you have a license doesn’t mean you can drive any vehicle on public roads. It has to be street legal and you need a different license for 18 wheelers.
"agents" and "bots" were proposed as a thing of the future back in the 90s. "you'll have all these agents that can buy stuff for you and book travel and ..."
> I guess consumers aren't allowed to access the technology afforded to enterprise employees.
Lots of us built our own stacks and attempt to fly under the radar - things have gotten legally gray over the last two years and corporations have no problem sending their legal team to kick your door down for trivial stuff.
Many moons ago I wrote and published a little app that allowed users of a popular VOIP provider to listen to their voicemail. Previously, voicemail functionality was limited to their awful web-only interface, but the audio files could be easily scraped once the user provided their credentials.
My biggest fear/risk was getting noticed and sued by the company, merely for letting users request HTML and display it differently than the company wished. The app is no longer available, so I guess I survived but the idea of $BIGCO crushing me with lawyers was chilling.
at a previous employer we had a similar issue as you. from the employer side, we had a managed api platform that normalized all our messy data stores into a single api front end. our websites basically all called this api to populate the data in the page requests.
except we didn't write this platform, we paid for it. and it is licensed per 100 queries.
so when. a single account starts doing 30k queries every hour, 24/7. the pocketbook was directly hit. and yes, web scraping was in the tos that every account agreed to but never read
"Partly" reminds me of LinkedIn's case against hiQ
> The LinkedIn dispute arose out of hiQ’s use of automated bots to scrape massive amounts of information from publicly available LinkedIn user profiles. Thus far, lower courts have sided with hiQ on grounds that certain information on the site is publicly available and could be accessed by the public without entering a password. [1]
There are similarities, however, different context in that in hiQ's case, information was publicly available, but in TPG's case the owner of that data (the AA customer) is providing them access to the data. The customer could just as well copy / download / screenshot the data, etc, and transfer it to TPG, obviously most people wouldn't bother, so that should be the core of the argument here. Is a user allowed to make their data available to a third-party? Screen-scraping is a means to an end
Why the headline editorialization? "Is screen scraping illegal" isn't in the article headline and scree-scraping isn't mentioned by name. And this case doesn't appear to be generic screen-scraping of a public site, as TPG was using customer credentials to log-in and retrieve info (with permission). The lawsuit is about breach of site T&C.
I can't see how American Airlines will prevail in preventing screen scraping. At best, they can prevent their point data from being transmitted and stored to TPG's servers.
Screen Scraping is essentially interacting with the DOM extract information. American Airlines can't conceivably attempt to limit programmatic interaction with the DOM because that is a core component of how the web works including accessibility tools such as screen readers, and browser plugins/extensions.
I wouldn't put it past the court system to make web browsers ("user agents" as we called them back in the day) illegal accidentally. I'd be more surprised if they didn't, honestly.
Just be aware that TPG is often harmful to consumers; they often tout credit card deals for cards with better sign-on bonuses if you go directly to the issuer; they were touting a 60k Amex Gold bonus when you could just go to Amex directly for 75k. They exist to drive CC referral revenue; if they can't get referral revenue for a card, they won't promote it.
I'd happily give TPG the 15k miles worth of referral bonus anytime - their reporting on travel, especially the extra work they've put in for helping people travel in the pandemic, is some of the best on the web. Even in the past 24 hours they shared a new and interesting trans-pacific airline that makes stopovers in Anchorage which would be fun to try (https://thepointsguy.com/news/northern-pacific-airways-boein...)
That's a good point about awareness of sites with affiliate links. Doctor of Credit and https://www.frequentmiler.com are my top two, though there's overlap between them. Doctor of Credit has the best credit card, bank and brokerage sign up bonus lists and Frequent Miler has good point redemption guides.
Plaid and the like now have API agreements with many of the major US banks, and are cutting down on their scraping usage (because its obviously unsustainable).
I would like to know who "the like" include because as far as I know it's just Plaid and maybe Intuit doing this. Until banks provide APIs directly to developers this just gives Plaid a monopoly. All devs who want to make finance apps securely have to go through Plaid.
The trademark/copyright stuff I get but I don't see how the screen scraping could be illegal.. How is it any different than a person logging in and writing it down manually..
It's not different in any way and definitely not immoral. Essentially it's "illegal" because it hurts corporate interests and they want to make it stop. Corporations have money and will just buy laws with lobbyists if they have to, it makes no sense to take this process seriously.
"Illegal" is the headline is the wrong word here. There are no criminal proceedings, just a civil case. Breach of contract is all that is being alleged here.
I wonder if other services like Mint, Yodelee etc. also do scraping? It seems to be the same model there as TPG/AA - a company has user data, but the user wants the data in some other place, so they authorize a third-party to extract the data and represent it in a different place. Most banks now are begrudgingly coming to terms with this being a thing, some going as far as providing OAuth-like read-only APIs to aggregators. Some are trying hard to not let that happen, and some just ignore the issue and let the aggregators scrape. But an actual case decision in this matter could change the picture - and make financial aggregator business so much harder if it's going a wrong way.
TPG 100%. If AA wins this, it means _in general_ it is possible to further limit programmatic access to content, you as a user, should have access as you see fit.
Favoritism should have aboslutely nothing to do with this. Public stuff on the internet should be exactly that, public. You have no expectation of privacy and it's up to you as an owner/operator to maintain the security of your site, make sure you aren't being dos'd, screen scraped (use strong captchas, validated accounts), etc.
But it's only accessing your private data with your authorization (by providing them your credentials). There is no hacking involved and it's not accessing people's information without consent.
It think you'll find that on page 47 of subparagraph 4 of the t+c you agreed to 27 years ago when you first flew on AA (since amended twice daily), that the data in question is in fact property of AA.
In all seriousness, this is the crux of many of the issues on the internet: Data about you, or generated by your activities, is probably not your in a legal sense.
In other words AA gives users a limited license to access AA data, and they are mad that you are allowing TPG to use that license. AA sees this as sharing your Spotify password to give them access to music.
I'm just responding to the claim of the parent comment that public stuff on the internet should remain public... This information is expressly NOT public, not sure why the expectation is that everyone should be able to see it.
Honestly, I don't fly enough to have status on any airlines anymore. I used to go to every conference I could, so I was flying internationally several times a year. Frequent flyer programs were entirely mileage-based then, so that made status easy. 100,000 miles was difficult and I only did it with mileage runs. Worthwhile back in the day; system-wide upgrades were easily worth the cost of a trip to Europe for a weekend in the winter :) Plus, visiting random countries was a ton of fun.
I kind of got tired of traveling so much and stopped doing it, though. Traveling for work (just meeting people in other offices) kept my status until about 2017. Then I stopped doing that, and now I'm just a bum with no status on any airline. That's about when the mileage-based gravy train came to an end anyway, so I'm not that upset about it. It is good to be marked in the system as someone to pay attention to (got many free upgrades on Cathay Pacific for no reason back in the day), but you are still treated well when you buy a first or business class ticket even without status.
Pretty nice isn't how I would describe great service. That's my point, the bar is low when it comes to airlines. In your case they should be treating you like a legend.
Not sure why you are doubling down on insisting all airlines suck, when not everyone agrees. When I was Platinum Plus (not even the highest level) AA held up a mostly-full 737 for about 25 minutes because my incoming flight was late and that was the last flight to our destination that day. I was the only person on the jetway, and they closed the cabin door before I even sat down -- in my upgraded-to-first-class seat.
Sure sometimes flying entails hassles and inconveniences but not every airline is awful and certainly not every employee lacks compassion.
One time I lost my keys and missed my flight, and they bumped some paying customer so I could get home sooner. Still feel a little bad about that one.
Another time, I was walking out of the Admirals Club at DFW to board my flight. One of the agents came running out to tell me I was walking in the wrong direction.
But, it's not always great. I had an award ticket in 1st class out of LHR, and they didn't clean the plane at all. The tray table had sticky soda all over it, and the plane was filthy. Miserable flight.
Another time, I had an upgrade to 1st class on a domestic flight, but the TSA didn't like my bags and did an extensive check (I must have been on some list back then, it happened a lot). I arrived way late to the gate, and lost my seat. Ended up flying back home in a middle seat in the last row. It was miserable.
So, it's a mixed bag, but generally good. It's hard for such a large company to offer consistency.
LOL, that other person was probably me, doh! It's fair, air travel comes with a lot of anxiety and heightened emotions because most people are going somewhere important or just trying to get home.
The original promise of client/server services was that the server would provide data on a universal open data format, and the USER AGENT (initially a web browser, but other kinds were expected) would process it in a format to the liking of the user, and satisfying their needs.
Compare this to the current situation where the industry standard is that the servers do indeed provide data through somewhat standardized APIs, but the browser or native app is developed by the same vendor and serves their commercial interests, not those of the user as a customer. The only standard customization recognized to users is light theme / dark theme, and it has only started a few months ago.
The market is not the problem. The misaligned setup of the market, where the interests of the most of the consumers are misaligned with interests of those who produces or pays for the content, is the problem. Imagine a grocery store which is actually a money laundering front for mafia, so it's much more interested in looking like a grocery store than actually selling any groceries. Do you think they'd sell high quality goods? Would they have the best prices? Would their customer service be excellent? Now imagine the mafia bought all (or almost all) grocery stores in town and turned them into money laundering outlets. How is your grocery shopping experience now? That's what we're having with ads.
The reality is that every company wants to have things presented in the same way they're prepared, for the purposes of marketing. That's why web browsers with css are a thing at all - to allow a website to look exactly like what the developer/company behind it wants it to look like.
Why would any company provide data in a standard API so that users can use that data in a standard way? If there was an API for banking, Bank of America wouldn't be able to showcase a professional-esque design and Sofi wouldn't be able to showcase a cartoonish modern design to strengthen their image and attract customers. How do they then attract customers? Only by features and lower margins, which is the opposite of what would make them money.
Remember that the original vision for the web was not made for companies, but for academic or government institutions.
The idea was to have computers support users, not customers. These words have become synonyms but they have quite different implications. The stated goal was to augment the human intellect, for which you need a well organized corpus of knowledge (think Wikipedia, whose spectacular growth and supporting community came from not being a commercial initiative).
But nowadays the whole industry is focused around building products and services that can be packaged and sold, to the point that its professionals can't even think of any other possibilities when discussing the characteristics of the ecosystem.
Incentives are completely different; it's no wonder that interests of industry are misaligned with actual needs of the final users.
I can't seem to find a reason why "screen scraping" is important here, as oppose to just scraping?
My understanding is that screen scraping is taking a picture of the rendered website and using an OCR or some other sort of recognition tools to extract the data.
If it is just scraping - it should be perfectly legal right?
It's against their ToS. The court is trying to decide what kind of a contract a ToS is. This will have implications for anyone writing software covered under a ToS... and anyone using it.
What's the "screen" part of this, though? I guess if you have to render the page fully (execute the JS as well), rather than just parse the returned HTML, that to me is more "screen scraping" vs. "scraping".
It's so funny you post that today! I just got this settlement email about Plaid:
> A Settlement has been proposed in class action litigation against Plaid Inc. (“Plaid”). Plaid enables connections between a user’s financial account(s) and approximately 5,000 mobile and web-based applications (“apps”). This class action alleges Plaid took certain improper actions in connection with this process. The allegations include that Plaid: (1) obtained more financial data than was needed by a user's app, and (2) obtained log-in credentials (username and password) through its interface, known as Plaid Link, which the litigation alleges had the look and feel of the user’s own bank account login screen, when users were actually providing their login credentials directly to Plaid. Plaid denies these allegations and any wrongdoing and maintains that it adequately disclosed and maintained transparency about its practices to consumers.
This question about whether a consumer has a right to enjoy access to information or already paid for services without being advertised to is being tested in businesses everywhere.
Streaming services insert ads for other original content when you hit play.
The grocery store has audio and video advertisements for prepare meals playing on loop you can not avoid if you walk past the meat department.
The problem is a race to the bottom on the pricing consumers perceive, and then recovering that money by squeezing every possible touch point.
People have to have enough options to choose a company that doesn’t have to make these compromises.
Plaid could file a friend-of-the-court brief here, since they have (presumably!) strong legal grounds to assert that they are legally within their rights to scrape bank websites, as they're doing so as an authorized user-agent, and since browsers are just user-agents, etc.
1. Breach of Contract
2. Tortious Interference with a Contract
3. Unfair Competition by Misappropriation
4. Trespass
5. Trademark infringement
6. Dilution
7. Dilution under Texas State Law
8. False Designation of Origin
9. Copyright Infringement
10. CFAA
11. Violation of Texas Harmful Access by Computer Act
12. Unjust Enrichment
Capital One has a tool called "Paribus" that, if you connect your email, will monitor and get refunds if prices drop or the item doesn't arrive on time. They used to do it with Amazon; they'd tell you something didn't come in the Prime two day shipping window and give you a pre-prepared complaint to get a free month of Prime as compensation.
I'm surprised that an affiliate marketing company is willing to get in such a fight. Like I would have thought everything they did was blessed and even paid for by airlines and credit cards.
There may be a tension between AA and Citi, who provides the cards and likely shoulders the costs of the various signon bonuses and benefits out of their card revenue.
ex-RV employee - it's a powerhouse of data for every significant moment in your life - from health to utilities to mortgage and everything in between. I personally don't see anything wrong on with this specifically.
> The interest was monetization of customer eyeballs, an American Airlines source shared that they wanted customers checking accounts at AA.com where they could be marketed to.
It seems like so many problems, annoyances and inconvenience in modern society are artificially created/maintained just to enable this disgusting industry. Imagine how more efficient things could be if this cancer was eradicated once and for all.
Yeah. Advertising ruins everything. Everything. They are directly and indirectly responsible for the poor state of consumer technology today. Privacy violations? Advertising. Borderline unusable websites? Advertising. Deliberately addictive social media? Advertising. Almost everything that's wrong with computing today is fueled by this industry's entitlement to our attention.
> Imagine how more efficient things could be if this cancer was eradicated once and for all.
Yes!! What I wouldn't give to see this entire industry banned from existence. It would solve so many problems it's ridiculous.
Well first you're going to have to convince people to pay for the things they use. History has shown us they have practically zero interest in doing so. People would rather trade their attention for free stuff than their money for priced goods and services.
The industry has grown out of this human behavior, not vv.
[edit] Humans have a limited capacity for making purchasing decisions. This is why in 20+ years nobody has made a successful micropayments based product or platform. Further, simply paywalling software or services means 99% of people will not use it. People also are uncomfortable with a system that just withdraws money from their account to pay people - and a universal subscription model isn't particularly tenable due to the competing interests of market participants.
If you can solve this problem with something other than ads, I will be your first investor because you're going to be the richest human on earth.
It's easy to say "ads bad" - ok, but the real question is what are we going to replace them with?
> It's easy to say "ads bad" - ok, but the real question is what are we going to replace them with?
I don't know and at this point I don't think it even matters. Every single time I discuss ads here on HN I learn new reasons to hate it. Some days ago I read a story about a politician who deliberately slowed down traffic so people would be forced to "contemplate" this garbage. The sheer audacity of these people never fails to impress me.
It's irredeemable and the fact that there's currently no alternative is no reason not to get rid of it. We really should just do it and let the chips fall where they may. People will figure out a way to make money. They have to, because the alternative is to go bankrupt. Maybe they'll make less money and that's absolutely fine.
> People would rather trade their attention for free stuff than their money for priced goods and services.
How would you explain the fact that people paid for cable and premium cable channels (e.g. HBO) in the past when free network TV has always been an option?
> How would you explain the fact that people paid for cable and premium cable channels (e.g. HBO) in the past when free network TV has always been an option?
The issue you're failing to separate out is that you're talking about entertainment with television. Websites are largely not comparable in entertainment value, and most are not entertaining at all. People will pay a lot for entertainment. That's why blogs are worth $... today and Netflix is worth $226 billion. If people would pay so much for eg blogs (or, again, any other comparable content), there'd be a $100 billion company extracting that monthly payment for producing volumes of written content on websites. Some blog network would have actually succeeded and become a global content juggernaut.
Most of the content online is not of high quality and people will not pay for it, or they'll pay so little for it as to be a sad joke.
Which website compares to the joy and value people extracted from Friends, Seinfeld, MASH, Fresh Prince, I Love Lucy, and dozens of other prominent TV shows from the past ~50 years. Much less the even higher production shows like Sopranos or Game of Thrones. There may be a select few and they're billion dollar services like Reddit with huge volumes of low value content. Do millions of people still talk lovingly about some websites from 2006 like they do decades later about I Love Lucy? Hell no they don't, only a tiny niche group of people does that.
People go back and watch movies over and over again for decades. They listen to the same songs/bands/albums regularly for decades.
Does the average person go back and dig up long dead websites and go through them start to finish on Archive.org, like they do old TV shows they enjoyed (Quantum Leap, ALF, Golden Girls, whatever). Hell no they don't, again, only a very very tiny group of people would do such a thing.
Most online text content is not very entertaining, even in the best case scenarios, that's the difference between the concepts.
Is there lots of funny, amusing, entertaining text content on eg Reddit? You bet. And people will pay pennies for it - if at all - because it's of low value compared to high quality, higher production value entertainment. They'll pay what it's worth, a pittance.
Easy - they paid for cable because it was an orange to OTA's apples. You'd only get a few OTA channels, and even then, not all of them would come in clearly. People paid for cable because it offered more channels. Most channels would still show commercials. Relatively few paid for premium channels, but it wasn't because they were commercial free (they were), it was because they offered better content (newest movies), in a time when piracy wasn't nearly as accessible.
Cable channels were also subsidized by advertising. You pay a decent amount of money per month to get access to exclusive content, but that cost still doesn't cover the costs of producing the media. If cable bills were double or triple the price, even without ads, many people probably would have been pushed to "free" TV.
While I agree that killing advertising is effectively impossible I'm not sure I agree that the problem is that people aren't willing to pay for things.
I think advertising dollars paying for services that users use is a convenient excuse that companies use to justify it, but it isn't an imperative for why it should exist.
If you can't get people to pay for your service when they know the real price maybe your service shouldn't exist, or shouldn't exist at the scale it does now.
That being said your main point that the industry grew organically based on human behavior certainly seems true. Banning advertising (like banning lobbying or corporate contributions to elected officials) is an impossible game of whack-a-mole. It can't be defined tightly enough to be outright banned (are you going to ban telling people you sell something they might like?) and the value derived from it is large enough that people will get creative when you ban the outcomes.
That being said I do think the most egregious versions should be regulated. Ads to kids, deceptive mail or email ads that look like invoices or bills, ads that are obviously lying about features or benefits, but enforcement is incredibly difficult. I'm not especially optimistic.
You can effectively kill it or curtail its noxious effects by banning common negative externalities of it.
The GDPR, though having a massive enforcement problem, disallows targeted ads and all the privacy violations typically associated with them.
You could make websites liable for the ads they display, automatically killing the "bottom of the barrel" of advertising such as chumboxes (Outbrain/Taboola/etc) and similar low-quality trash because the cost to review & audit these ads would outweigh any potential profits.
You could revoke Section 230 protection for social media companies that manipulate the reach of content to increase engagement (which is why every social platform switched from a chronological feed to an algorithmic one) as to discourage the practice and prevent them from profiting off intentionally pushing harmful or even illegal content to increase engagement.
> Well first you're going to have to convince people to pay for the things they use
> People would rather trade their attention for free stuff than their money for priced goods and services.
No, I don’t think this is true. I think this is something people in adtech tell themselves to sleep better at night. There’s lots of money in ads and free stuff, but there are many companies doing quite well by having customers pay for stuff.
Netflix is worth $200B plus, more than the ad tv networks. Those networks said for years that customers won’t pay and need ads. They were wrong. And I’d also add that they wanted more. They charged for cable channels and still sold ads.
People want choices and don’t mind paying for value.
I will say, I find it interesting that newspapers never developed an equivalent to "the daily paper."
Even in the 2000s I remember newspapers not costing more than a dollar or two. I would personally prefer a one-off payment of some small amount to get the website for the day or X amount of articles instead of some god-awful subscription that has to be cancelled via some shitty call center process where they try to stop you twenty times.
But what happens when nobody sees the ads anymore? I run enough ad blockers that I never see ads on the Internet. I don’t watch broadcast/commercial TV. I don’t listen to broadcast radio. In my area, public advertising like billboards are very rare. It really isn’t difficult to live a life with very little ads.
people keep saying things like this but really it's a question of what's available
music piracy is basically over thanks to spotify, apple music, youtube, bandcamp, etc
hardly anyone i know torrents movies anymore due to netflix, hbo, hulu
everyone has a phone plan instead of hopping around on free wifi constantly
people pay for lyft/uber instead of just skipping fare on the train
there exist social networks and communities that are effectively fee-supported. somethingawful and metafilter are some siloed examples, but there's the fediverse and stuff like mastodon where essentially people either join a specific community/clique that's funded by members or host their own instance.
i'm convinced what keeps most people on the main advertiser-controlled networks is they are run by huge corporations with service monopolies and network effect. remember, whatsapp got big on a dollar per year subscription
Don’t forget fake news and talk show hosts willing to say anything that gets people talking all for the purpose of selling crap products at exorbitant markups.
Democracy be damned, there’s money to be made on dietary supplements!
Devil's advocate: Advertising allows people to invest large amounts of money into products that don't charge end users. It allows largely equitable access to hugely powerful and complex information systems like Google search at no direct monetary cost to users.
Most people put the value of Google search on the order of thousands of dollars per year [1]. Imagine Google was a paid service: think of the academic disparity between the children of wealthy students who had access to Google and poorer students who do not.
Google? You mean the billion dollar company that subverts web standards and gets away with it due to their browser monopoly? The one that nearly kills websites the second some advertiser becomes "concerned" over some content? The one that demonetizes actual creators for arbitrary reasons?
They are very much part of the problem. Google search? It's bad and gets worse every day because of advertising -- Google has stopped trying to fight SEO spammers.
Exactly! We ban spam, why can't we ban these advertisers? Make no sense to me.
Advertising is mind hacking. Mind malware. It's about infecting people's minds with ideas nobody asked for. Why is it bad when some spammer does it but fine when advertisers turn our societies into a literal cyberpunk hell? Why does their "commercial interests" somehow legitimitize their industry? Nobody consented to it. Why do they think they can violate our minds without our consent?
Unfortunately, like the real cancer, it has too many forms and too many causes to have a simple way of eradicating it. And unlike the real cancer, nobody is seriously investing into fixing it. Maybe some kind of paradigm shift is required to make the approach of "we hold data about you that you need hostage so we could sell you stuff and control your behavior" no longer acceptable. After all, you wouldn't accept if the car dealer would be the only one who had access to your car mileage or maintenance records, and would only tell you any information about it after you listened to the sales pitch. People would be grabbing torches and pitchforks if that happened. But for other businesses the same behavior is seen as OK.
>car dealer would be the only one who had access to your car mileage or maintenance records
So yea, I have the next billion dollar idea. The gas gauge app. $3.99 a month to know how much farther you can travel. Also, we remove the gauge from inside the car.
Any car that has a screen could have a built-in diagnostic tool displaying all the trouble codes, live data from sensors, etc. The computer behind the screen already has access to said data as it's connected to the various data buses.
But yet even in 2022 the best we can do is "service engine soon" and getting more information than that requires an expensive visit to the dealership.
I think there are apps that could read OBD-II codes? Though usually they're kinds cryptic and fixing them often beyond what an average motorist could do.
Ooh lord, this reminds me of Mint and their useless email alerts. Every alert is along the lines of “You’ve been charged a foreign transaction fee! Log in to see what / who / when / how much!”. This is obviously to drive engagement at the expense of UX and, you know, actually making a useful service. I absolutely abhor this practice and I’ve written a few feedback emails to explain as much. As if mining all my transaction data wasn’t payment enough, I must also endure irrelevant marketing for yet another credit card or home loan I don’t need. It’s further annoying that despite having a complete view of my finances, they can still fail so stupendously at showing me any ad with even the slightest relevance. I personally feel that although we live in this age of unprecedented consumer data, the actual useful insights leveraged by companies to sell things are still in their infancy. I basically see all consumer data collection companies as ripping off a few parties: advertisers they’re selling the data to, and in turn, companies paying to run “targeted” ads, and of course, consumers. There is a huge consumer appetite for RELEVANT products that could make their lives better or solve even the smallest inconvenience, and yet that appetite is met instead with a deluge of poorly-designed, ill-conceived, ready-for-the-dumpster products. Some people are starting to push back, but it’s a massive problem with no easy solution. Related: https://www.pinkbike.com/news/mechanics-petition-for-repaira...
Or at least offer an option to pay to remove the annoyances.
HBO was a great example of this in the pre-streaming days.
"Want to see great shows and movies with no commercials, pay the extra for cable + HBO and you won't have to seem them again.!"
This default, "it's all free BUT you have to see ads and there are no alternatives" seems to be the problem. The apps described in the article are effectively recognizing this consumer surplus/need and acting on it.
The problem is that once you segment your customers into paying and non paying. The non paying customers become much less valuable to advertisers and the paying customers become much more valuable.
So the service needs to charge even more, which makes paying customers even more valuable in a feedback loop. The usual result is that eventually the service can’t help themselves and starts showing paying customers “a few” ads. Then the definition of a few gets larger and larger.
Outside of a very few markets that prey on desperate people's last dollars (payday loans, etc), most advertising's objective is to get people to buy something - advertising to those who can't afford it is useless.
Advertisers want the attention of people with disposable income. If someone's paying for something as superfluous as a streaming service, then they must have more than enough money to spend on whatever they want to sell.
The more you pay in general, the more they want you to pay them specifically.
It doesn't work. Paying to avoid ads makes you more valuable to advertisers. They will offer more money for your attention and some executives will eventually think they're leaving money on the table by not doing it.
The problem with advertising is that the price of an ad view directly correlates with how wealthy the viewer is. Being able to opt-out would essentially make the advertising platform worthless as the only audience would be broke people who can't afford to pay to opt-out and are thus very unlikely to be able to afford your product.
The only way would be if regulation either mandates a reasonably-priced ad-free tier (priced at the average revenue from an ad-viewing user) or other restrictions (GDPR but actually enforced, the website being liable for the ads it shows, etc) that would make advertising completely unprofitable.
> The apps described in the article are effectively recognizing this consumer surplus/need and acting on it.
Presumably, AA is pissed off because the people that value their time enough and have the skills to set up and use an alternative are people they'd very much want looking at their ads, much more so than the plebs who already use the website and see the ads.
"Industry could not benefit from its increased productivity without a substantial increase in consumer spending. This contributed to the development of mass marketing designed to influence the population's economic behavior on a larger scale.[24] "
That's a fairly bizarre and Byzantine approach to legalizing screen scraping. If we want to make screen scraping affirmatively legal, we can simply do that.
We don't need to anthropomorphize HTML/JSON parsing as equivalent to human vision, and then declare any efforts to restrict that vision to be a violation of the Americans with Disabilities Act. Lol.
We should just make it legal, but this seems like a convenient "backdoor" way to do it - either the company site is fully screen-reader enabled 100% or scraping is allowed of any kind.
It isn't bizarre; screen scraping and re-presenting information to the user in a way that is convenient for them is absolutely essential to many assistive technologies.
The airlines are already legally required to make their websites accessible in the United States, including to screen readers, by virtue of the Air Carrier Access Act. They already all comply with this.
Unrelated to screen readers, American Airlines and Red Ventures are involved in a dispute about the _legality_ of screen scraping. The dispute involves lawsuits, but does not involve technological measures to obfuscate or otherwise obstruct screen readers, or even screen scraping.
Your comment seemed to suggest that we should make _screen scraping_ legal by considering it the same as vision, and that any efforts -- including legal efforts -- to prevent screen scraping should run afoul of the ADA.
That was my understanding, and that's indeed a bizarre view.
If your point was limited simply to making screen readers accessible, then I think your point is perfectly fine, but kind of a non sequitur, because this dispute is not about technological measures to control screen scraping, it's about legal measures.
Is a website considered a private space (maybe not, but you agree to a TOS when you fly AA)? If so, you should have the right to say how your space is used by others, regardless of if we believe marketing is ethical or not. If AA doesn't want their data being used anywhere else, sounds like they should have the right to do that, just like we have the right to remove people from our private spaces if they do things against our rules. You also have the choice not to fly AA.
Their servers are their own private spaces. That's the line. I will not do anything that subverts their control of their own computers. I will not make their servers execute my code. Their website, however, is just data they sent to my computer.
They have exactly zero right to dictate what I do on my computer. Their HTML is being rendered on my computer and I absolutely reserve the right to delete or modify elements in any way and for any reason. Their javascript is executing on my computer and I absolutely reserve the right to delete and modify functions if I deem necessary.
The issue with this sentiment, in my opinion, is that users have access to this data by logging in and they've provided the app with that access by providing their credentials. Unless a court is going to go down the road of preventing users from sharing their own information, I don't see how AA can do anything about this. If you can view this info in a web browser, you can scrape it.
They'll invent some "acceptable use policy" document that says you can only talk to their servers using their software which of course does what they want, namely monetize your attention by showing you ads. Then they'll trick and coerce people into accepting or signing it.
Complete bullshit but it's totally something these judges would understand. One would think these people would actually protect consumers from corporation abuse...
Isn't the whole point of that bit that we've become spoiled and indifferent, to the point where we no longer appreciate how amazing and futuristic our world has become?
I think you're right, but it's hard to argue that flying is both an amazing human achievement and is also just the worst.
Edit: Wow, re-listening to the bit, my memory definitely failed me. Basically a rant on how we should be grateful that airline companies are even able to get a plane in the air. I think we can do better, personally.
> Imagine how more efficient things could be if this cancer was eradicated once and for all.
people dont like paying and they dont mind ads. they enable a lot of products and services that would otherwise not exist. ads might be a net negative, but they certainly have at least some positives.
One thing that has changed in the airline industry that wasn't the case when the points guy started, is that the Airline Frequent flier miles are worth more than the actual airline themselves making the airlines essentially banks.
"Unless otherwise noted, all information, AAdvantage® account information, articles, data, images, passwords, Personal Identification Numbers ("PINs"), screens, text, user names, Web pages, or other materials (collectively "Content") appearing on the Site are the exclusive property of American Airlines Group, Inc., or American Airlines, Inc., or their subsidiaries and affiliates"
"You may not copy, display, distribute, download, license, modify, publish, re-post, reproduce, reuse, sell, transmit, use to create a derivative work, or otherwise use the content of the Site for public or commercial purposes. Nothing on the Site shall be construed to confer any grant or license of any intellectual property rights, whether by estoppel, by implication, or otherwise."
But just because AA says so on their website does not make it so. Is there any evidence that they actually agreed to the TOS you linked? That they even read or were presented with a copy of it? And even if they agreed to the terms, what is the appropriate remedy for breaching the agreement?
The relevant word is "you". In this case, TPG (who provided an app) is not "you".
That would instead be the AAdvantage (AA's reward program) member, who agreed to the TOS originally, and who provided their login information to the TPG app so that it can scrape information about rewards etc.
So... the lawsuit from AA's side seems pretty bizarre, if the facts as presented in this article are true. If AA wanted to stop this, presumably they should sue their own rewards members who use the TPG app. But obviously that won't happen.
So fundamentally, this seems a case of whether the toolmaker is liable for an individual using their tool in a TOS-violating way.
Which seems pretty insane, if AA wins. If I pull open Chrome developer tools after logging into a website that requires me not to inspect its source, why would Google be liable?
---
And as a side note, "Because privacy and security" is quickly becoming the corporate anti-interoperability equivalent of "Think of the children."
The default should be that scraping is allowed.
If companies actually care about privacy and security, then they can offer an API and encourage access through it. But limiting scraping and not offering API access (or intentionally crippling it) is bullshit.
When I scrape, I attach a header that says roughly “By responding to this request, the provider allows me to use the response for my own personal use; and accepts that this overrides any terms stated on the web page.”
Cut and dried? A lawyer friend told me, informally, that if it went to court a younger judge would throw out my “contract” because it’s silly, but a more senior judge might well take the view that both contracts, my version and that of the web site, are equally silly, and both fall short of the “meeting of minds” standard.
Disclaimer: Founder of a company in the travel space that relies heavily on scraping.
What's interesting here is that there's conflicting precedent... and fundamentally that is what matters. hiQ vs LinkedIn is a great example of accessing data via a scraper that potentially violates the Terms of Services agreement, but found that Microsoft/LinkedIn violated antitrust laws. EF Cultural Travel vs Explorica is another example favoring scrapers. Against that, you have Facebook vs Power.com. Speaking personally, I'd like for clear and explicit rules about what is kosher to scrape and what isn't. Ticket bots are clearly problematic and deserve to burn in hell. Overly aggressive scrapers that incur load shouldn't get a free ride, but stuff like this that is initiated at the client's request and accessing solely the client's data.... I personally believe this should be fair use and would like to see that show up in the law somewhere.
A fair test would be that if you are able to delegate manually retrieving the information to a friend or colleague, you should be allowed to delegate it to a machine.
> You may not copy, display, distribute, download, license, modify, publish, re-post, reproduce, reuse, sell, transmit, use to create a derivative work, or otherwise use the content of the Site for public or commercial purposes. [emphasis mine]
Seems like any web browser by a for-profit company would immediately be in breach.
> Seems like any web browser by a for-profit company would immediately be in breach
A web browser cannot be in breach, because a web browser is not a legal entity capable of being a party to an agreement. The entity in breach would be the person using the browser, if they were using it in a way that was against the TOS.
BTW, that TOS is ambiguous. I see two ways it can be parsed. First,
> You may not (copy, display, distribute, download, license, modify, publish, re-post, reproduce, reuse, sell, transmit, use to create a derivative work, or (otherwise use the content of the Site for public or commercial purposes))
I.e., "otherwise use the content of the Site for public or commercial purposes" is one item in the list of prohibited things. Second,
> You may not (copy, display, distribute, download, license, modify, publish, re-post, reproduce, reuse, sell, transmit, use to create a derivative work, or otherwise use the content of the Site) (for public or commercial purposes)
I.e., "otherwise use the Site" is one of the list items, and "for public or commercial purposes" modifies the whole list?
If it is the latter it is saying you can do what you want if it is not for public or commercial purposes.
If it is the former, it is saying you may not do any of the explicitly listed things, and you can't do anything not listed if you are doing that thing for public or commercial purposes. You can only do things that are not explicitly listed and then only if they are private and non-commercial.
I'd guess they meant the latter, because under the former it is hard to see any way to use the site at all without violating the TOS. If that is the case, they should have written it as "You may not for public or commercial purposes <list of things>".
On the other hand, it would't actually be all that surprising for a big company to write a TOS that technically prohibits their users from actually using the site, so who knows?
Its going to be similar to this suit [0] with Linkedin which they lost. The only difference is the data is behind an auth wall which linkedin's content wasn't.
I thought the fact that LinkedIn's content was _not_ behind a paywall was the main point. E.g. it's fine to scrape whatever they did from LinkedIn because it was public to all.
I mean, is that legally possible to do? If they make information freely available on their website how are they legally able to control how that information is used? That seems kind of ridiculous to me.
They can say whatever they want. It is a completely different thing whenever that is legal or not.
I can say in a written agreement with my workers that they are my slaves, or that they can not work anywhere else ever. They can even accept those terms, but that does not make it legal.
There are always fair use clauses that copyright law accepts. That data about a customer is the exclusive property of a company is not really true. In some way it is actually the property of the customer.
> I can say in a written agreement with my workers that they are my slaves, or that
> they can not work anywhere else ever. They can even accept those terms, but that > does not make it legal.
True, but there isn't a long standing practice of binding people to the terms and conditions of a service regardless of fastidiousness on the part of the user (whether that's warranted / reasonable or not).
If I hang a book on a public street and put on the first page "if you read from this book you owe me money and if you use the information contained within I will sue you because it's all mine", it's not at all cut and dried. indeed I think a lot of people would note it sounds like bullshit.
And additional observation is that they're claiming ownership and exclusive rights to things over things which:
a) as broken and dystopian as our intellectual property laws are, it's not immediately apparent you can claim exclusive ownership of. Can you claim property rights on a pin number? on your customer's name? on your customer's phone number? is data even ownable?
b) as above, even if you could, it's not apparent that the airline is the one with the greatest claim to that ownership. Does the airline own my name if I fly with them?
c) the issue with screen scraping may just be scale, automation and commercial value, and it's not apparent you can just wilfully ban competitors from that because you say so. Indeed, is it a violation if an individual uses the information within without screen scraping? cause a lot of those exclusions and terms would seem like they restrict the use of the website and the information for its actual intended purpose on an individual level, screen scraping or not.
d) what are we doing when we read a website but biological screen scraping?
" American Airlines also reserves the right in its sole and unfettered discretion to deny you access to the Site at any time. "
"You agree that this Agreement is made and entered into in Tarrant County, Texas. You agree that Texas law governs this Agreement's interpretation and/or any dispute arising from your access to, dealings with, or use of the Site, without regard to conflicts of law principles. Any lawsuit brought by you related to your access to, dealings with, or use of the Site must be brought in the state or federal courts of Tarrant County, Texas. You agree and understand that you will not bring against American Airlines Group, Inc., American Airlines, Inc., or any of its affiliated entities, agents, directors, employees, and/or officers any class action lawsuit related to your access to, dealings with, or use of the Site."
If I understand correctly, the enforceability of these TOS just added to the bottom of the page is uncertain. After all, I wasn’t forced to read and agree to the TOS on AA’s website like I am for say, Twitter or Facebook.
The question is sue you for what? How do the quantify the damages? It would be an interesting law suit because if AA wins, then it means they are deliberately admitting that customers who regularly track their points causes financial harm to AA.
It obviously causes them harm. The points are often redeemed at a discount to their regular tickets... which are often loss leaders for their points business.
Most airlines lose money on every ticket, and make it up in their rewards programs. Only to lose money again on redemption. They really only make money on some of the inefficiencies..
Right, I know this to be true. But it's one thing for someone to infer that it causes harm, but it's another thing if they legally say this is the case.
It reminds me of card counting - it's just playing the game, but in a way that's qualitatively different from typical play. Just like card counting, the house doesn't like people who win too much.
The more relevant sections are a little further down:
Your account information is owned by and proprietary to American Airlines. While you may access your account information through the Site, you may not give access to your account to any person or entity other than a member of your household or a person that you directly supervise as part of your career or employment. You may not give access to your account to any third party on-line service, including, but not limited to any mileage management service, mileage tracking service, or mileage aggregation service.
You must access your account information directly through the Site and not through a third party Website, including but not limited to any mileage management service, mileage tracking service, or mileage aggregation service. You also violate this Agreement if you enable an AAdvantage member to access account information without visiting the Site.
> "You may not copy, display, distribute, download, license, modify, publish, re-post, reproduce, reuse, sell, transmit, use to create a derivative work, or otherwise use the content of the Site for public or commercial purposes. Nothing on the Site shall be construed to confer any grant or license of any intellectual property rights, whether by estoppel, by implication, or otherwise."
> Seems pretty cut-and-dry to me.
EULA for my hn comments: If you're reading this, you must grant me 'Droit du seigneur' and name your firstborn 'decebalus1'
If I'm understanding the situation correctly (which I may not be) it is AA rewards members who are using TPG's app to access the site. These people give the TPG app their AA login information so it can login to their AA account to get their information.
Arguably it is these AA rewards members who are scraping the site. TPG just supplied the tool those members are using. It would then be the AA rewards members who are the ones who have a contract with the site, not TPG.
Putting something on the site does not establish a legal right. I can make a site saying anybody who looks at it owes me a million dollars, that wouldn't mean they actually do. And I'm not sure my information stored at AA account is actually an exclusive property of AA. At the minimum, it's not cut-and-dry at all, and probably depends on current legislation and caselaw.
I would take "for commercial purposes" to mean aggregating the data for wholesale, or offering some kind of analytics service across users.
Whereas showing ads in the same client where the user is also viewing their own data is incidental. The purpose is to provide a useful intermediary service, and the ads help pay for that service, in much the same way a search engine can reproduce data from websites and show ads at the same time.
Will AA be able to find a single customer who has a problem with what TPG does? What is their case then exactly? Would they similarly sue the app if customers were copy pasting data into it, rather than accessing it programmatically?
FYI the title is editorialized. This suit has nothing to do with screen scraping, but just data access in general. Services like these nowadays almost always use private APIs (built for mobile clients and SPAs) rather than parse HTML.