A megabyte a day still seems excessive.

dgrin91 · on Jan 13, 2024

If you agree to some form of anonymous tracking for diagnostics I can see 1mb being reasonable. This would be periodic update on things like usage levels, part quality, etc.

Most likely that tracking acceptance is buried in some 500 page eula, but thats a separate issue.

solardev · on Jan 13, 2024

Why do you need an entire megabyte for that? Even if you did laundry five times a day, it shouldn't take more than a few bytes to store a few metrics.

Even if you're lazy and uploaded an uncompressed JSON array of objects, that shouldn't be more than a few kB. Way less if you compress it.

A megabyte is a LOT of data.

crazygringo · on Jan 13, 2024

I could easily see it measuring the forces and weight on the drum every 5 seconds (or even every 10 ms) during the whole wash, to be able to produce charts of vibration patterns, that engineers could use to correlate with failure. Remember -- when you're spinning at high speed to wring out the water, it's actually some pretty crazy strong forces.

Or other things measured every ~second, like stuff related to the motor, temperatues, humidity, etc. and other diagnostics.

Seems really easy to generate a megabyte if you consider time series. Even easier if it's in XML or JSON rather than a CSV.

Spooky23 · on Jan 14, 2024

When I worked in a storage group years ago, the system that controls swipe card access for a building generated something like 3TB of Java exceptions a month.

Because of the criticality, it was on high tier reliable SAN storage, replicated to a second site. IIRC, storage was like $80 Gb/mo.

phanimahesh · on Jan 14, 2024

Love how you said exceptions, rather than just logs. I am unfortunately (painfully) aware of exactly what you mean. The storage costs were the cherry on the top. But seriously, $240k and nobody raised a stink?

Spooky23 · on Jan 14, 2024

It’s one of those things that cloud storage helps with.

Because it was on prem, the chargeback model was associated with the business unit and not super granular. It got lumped in with another business function because it should be a trivial workload.

I found it when I was doing estimates for a new platform and the app’s growth numbers didn’t add up! Even at $240k, it wasn’t an obvious outlier.

solardev · on Jan 13, 2024

You really think they're going to measure all that, upload it, send it to some expensive engineer, have them try to physically model the error, and then... what?

That might make sense during development, but there's no way they do that in a consumer product. If a part breaks they're just going to send out a replacement or the repair guy is gonna get some third party part. Recording that much detail would just be noise.

Even if they had specific parts sensors (doubt it, for costs), they could just process that locally and send up an error code, not the whole log.

I find that all pretty hard to believe, but if anyone has evidence to the contrary, I'd be glad to be proven wrong. I had a LG washing machine bought new a few years ago promising all sorts of bells and whistles and app integrations. But it was super janky and cheaply made, the app integration was terrible, the on board memory would lose its configured settings, the entire LCD broke after a few weeks... it was not what I would consider well-engineered at all. If it was sending a megabyte a day I'd just assume it was yet another bug, not some forward thinking QA.

tayo42 · on Jan 13, 2024

I would suspect they measure everything and use almost none of it. Like most web services. This was a common complaint and low hanging fruit optimization, that people were storing metrics that never got read. Just in case.

voidfunc · on Jan 14, 2024

It's cute you think there is some intelligence to this design... it's just whatever some PM/exec dreamed up and some low-level engineer implemented based on requirements. The data is likely just sitting there collecting digital dust.

refulgentis · on Jan 13, 2024

I'm not really sure what you're asking:

Is he sure they're sending < 1 MB a day? Yeah.

Is he sure it's plausible it's measuring time series data? Yeah.

Is it plausible they measure vibration patterns? Yeah.

It's not about like "oh we'll send an electrical engineer to fix your specific vibration pattern", it's "we can collect data from the field to make forward-looking decisions": ex. maybe we switch supplier mix for replacement motors, and a data scientist ends up finding 6 months later that something changed in June 2023 where serviced washers in New York report dramatically more intense vibrations and now we know to go talk to the supplier who gained mix.

It's error logs for non-tech. YMMV on individual team quality if they actually follow-up. Conceptually, Big Data(tm) is something CEOs have been hearing regularly since, what, 2014? So definitely plausible.

nwallin · on Jan 14, 2024

I would imagine that they just log everything. Serial number, temperature, which cycle is used, time of day, how long it takes to fill the washer, how long it takes to drain the washer. Everything. Put all data in a great big database. When something needs to be fixed and is covered by the warranty, mark that the failed part is associated with that serial number.

Then do some sort of a regression to discover what logged parameters are associated with what failure modes/broken parts. If washers that take less time to fill up have higher than normal failure rates for some elbow joint, that probably means that high water pressure causes the elbow joint to fail. If a certain elbow join's failure rate is simply correlated with the number of cycles, that tells you something different. If a certain elbow join has a high failure rate that's not associated with anything, that probably just means it's a shitty part. But you learn something.

By logging everything and running a regression analysis, when you develop next year's model, you know where to improve. Now when you tell an expensive engineer, "This elbow join failed on 1000 units of revision F. Make it fail on 100 or less units of revision G." you can also give them a starting point to work with.

I'm a software guy. If I get 10 crash dumps, and you don't tell me anything, I don't necessarily know what to work with. If you give me those same 10 crash dumps and tell me that 9 of them had the language set to Arabic or Hebrew I know it's probably a BOM bug. Same thing.

Or you just sell the data to ad companies and let them figure out how to get value from it.

moritzwarhier · on Jan 14, 2024

Well if you claim warranty, I'd expect them to want to have that data.

Maybe they also just sell it together with your advertising id, why not use washing patterns for deanonimyzation ...

pstuart · on Jan 13, 2024

If they're not doing that they should, albeit finding a way to make the additional cost minor.

Collecting all that data for analysis would be incredibly valuable, especially considering the wealth of analysis tools today.

hutzlibu · on Jan 13, 2024

"Even easier if it's in XML or JSON rather than a CSV."

Yes indeed, but compression algorithms are not that new.

crazygringo · on Jan 14, 2024

Sure, but it's also easy to imagine an engineer just forgetting to or not bothering because it wasn't in the spec.

mortenjorck · on Jan 13, 2024

> Even if you're lazy and uploaded an uncompressed JSON array of objects, that shouldn't be more than a few kB.

See, that’s just one lazy engineer writing one telemetry solution. Multiply that by several engineers across several teams, each cobbling together a different telemetry solution for a different product manager’s initiative using a different stack of JavaScript libraries, throw in some poorly-rolled-out infrastructure changes a few years later resulting in some unanticipated retry loops, and I think you can hit that megabyte per day easily enough.

geodel · on Jan 14, 2024

> A megabyte is a LOT of data.

Well it is, but in this Cloud Native world the clueless management and IT engineers have been convinced that single micro service running on 50 kubernetes pods and generating 20 MB trace logs for single transaction is normal.

Now once we have built this inefficiency industry wide nobody is there to wake the management up about huge wastage of resources. They are floating in this lurid dream of "ultra smart" machines generating gigabytes of precious intelligence about customer behavior for target ads

gberger · on Jan 13, 2024

It's 12 bytes per second, or less than 1kB per minute. Doesn't seem like much.

poisonborz · on Jan 13, 2024

For telemetry on a washing machine, it is enormous.

jethro_tell · on Jan 13, 2024

That's WiFi/Bluetooth signal strength mapping amounts of data.

Aeolun · on Jan 14, 2024

It uploads a novel a day. That’s a lot!

NikkiA · on Jan 13, 2024

It's probably a single probe packet once a minute.

tim-- · on Jan 14, 2024

Which is likely what is happening here. The LG "ThinQ" washing machines do allow for remote starts: https://github.com/ollo69/ha-smartthinq-sensors/issues/234

> After 10 minutes (IIRC) in remote start mode without starting the machine goes to sleep. You must use the smartthinq_sensors.wake_up command to wake it up, then the remote_start command to start it.

penneyd · on Jan 14, 2024

Just checked mine which is used all the time and it's about 1.5mb per week.

jstummbillig · on Jan 13, 2024

It seems completely inconsequential

Brian_K_White · on Jan 13, 2024

It seems completely inexplicable.

I don't care if it's a small percentage of my symmetric gigabit fiber, I only care why they supposedly need it and where it ends up.

A phone number or a timestamp is a tiny amount of data.

In the quaint olden days, you had to go out of your way to volunteer to be a part of some study to have any aspect of your activity recorded every few seconds 24/7 to be collected and analysed like that.

It also doesn't matter that my washing machine usage might not seem like sensitive info. It's wrong by default rather than OK by default. You need a compelling need to justify it, not the other way around. It needs to be necessary to prevent one dead baby every day or something, not the other way around. No one needs to produce any convincing example of harm for that kind of collection to be wrong.

But even so, it's easy to point out how literally any data can end up being sensitive. Washing machine usage could lead to all kinds of assumptions like, this person does not actually live where they say they do (which impacts all kinds of things), or this person is running an illegal business, or housing too many people for the rating of the dwelling or allowed by the lease, etc, or just another data point tracking your movements in general. Or even merely to get out of 10% of warranty claims.

Uvix · on Jan 14, 2024

The users did go out of their way to volunteer, by hooking the washing machine up to their network.

Brian_K_White · on Jan 14, 2024

They did not. They went out of their way to buy a washing machine and maybe use some monitoring or alerting feature it offers. I decline to believe you do not know this.

jwalton · on Jan 14, 2024

War and Peace is 3mb as uncompressed plaintext[1]. 1mb a day is a lot.

1: https://gutenberg.org/ebooks/2600

lostlogin · on Jan 14, 2024

Would you prefer to read War and Peace, or the (shorter) washing machine logs?

It’s touch and go for me. The variables names in washing machine code would likely have be less easily confused.