So if you get less HTTP bytes than expected, then it’s a HTTP response error and you throw the whole thing away. For example, this sort of situation happens when streaming HTTP. The server first has to send the response headers, which would be a simple 200/206, then the data, which could have a much more complicated code path. If there is an error in that data code path, all you can do is close the connection and trigger an HTTP error since less bytes were delivered than advertised. Client needs to detect this and retry. While this may seem uncommon, this is well understood behavior for HTTP systems.
Or more likely for a range download, you use the bytes you got and keep making further range requests, to get the whole resource in however many tries it takes. And the 403 would come through as soon as you hit an uncached part of the resource.
This gist is that nomadic Romani people settled around Syria and wrote it. The language and writing is a blend of several languages and cultures. The evidence in the videos backs this up pretty well.
Thanks for sharing. I don't have the expertise to evaluate the claims, but it's certainly an interesting theory and a compelling story. Just wish he'd continued the presentation of his work.
A lot of people were speculating he would make his findings official and didn’t want to over share.
I do see a comment about his theory being debunked. That would be expected, the language used was a mashup of several existing languages, so it’s possible a lot of what was written is copy-pasta gibberish. However, the video points out of a lot of cultural aspects of the book which support a Romani origin.
I watched some of the videos in the link given by nyc_pizzadev, and recall seeing them when they were first released. He's continuing the work of Stephen Bax. Romani is convenient as it allows him to pick and choose words borrowed from various languages into Romani over a wide area, so if Farsi doesn't fit, maybe Bulgarian or Uzbek might, whichever is the most convenient. But until he translates some of the VMS (a few pages in different parts of the manuscript would suffice), and his translation isn't nonsensical, he hasn't solved it.
Check out Jason Ladanye. He’s a magician who uses shuffle math to place cards exactly where he wants them to be in the deck. Both impressive and scary.
Go see him in person. He’s now doing in person shows. I had to see it live in order to really believe he was doing what he showed on his social media. Absolutely mind blowing. He’s my favorite
Ha, came here to post about Ladayne. I’ve been watching all his 2-minute shorts and I still can’t wrap my brain around the mental/mathematical skill, or the astounding precision of his fingers.
Another takeaway is the jaw-dropping amount of practice he’s put into it, and his total dedication to perfection (zero mistakes allowed, ever). He speaks at length about this in interviews, and in fact part of his act is to explain that it is not magic at all, but his full commitment to mastering the craft. Something we can all reflect on.
High quality VOD (ie streaming a 4K movie). HTTP block file systems, each block needs to be streamed reliably to fulfill the read() call plus read ahead.
For VOD can I just open connection and send single message and then stream will continue forever? And HTTP is message oriented protocol. I can't just send infinite length HTTP message. Which would be processed as it arrives or can I? Meaning can I upload something not that small like terabyte of video data over HTTP?
One thing not mentioned often is that a lot of networks will drop UDP packets first when encountering congestion. The thinking is that those packets will not re-transmit, so it’s an effective means to shed excess traffic. Given we now have protocols that aggressively re-transmit on UDP, I wonder how that has changed things. I do seem to remember QUIC having re-transmit issues (vs HTTP1/2) years ago because of this.
Looks like a marketing piece publicizing that CERN is using WD HDD products at scale with no technical details. To make matters worse, the WD product links don’t even work!
CERN publicizing about their HGST Ultrastar use [0] in 2013 was what got me started buying their drives and I never had issues with them. HGST is now part of Western Digital [1].
The last time I had to buy disks I switched to Seagate Exos X and thought I'll continue buying them. I think it was one of the Backblaze Drive Stats which made me buy them. I like the drives.
So CERN is now doing again such an advertising campaign:
> When Bonfillou shared the requirements from the next generation collider, the team suggested testing the company’s new series of JBODs (Just a Bunch of Drives), the Ultrastar hybrid storage platforms.
Since the project is expected to start in 2029, add about at least 5 more years for CERN to collect data on the drive stats, that's a long time to wait.
Does anyone here know if WD's Ultrastar are still as good as back then, when HGST was HGST? Was it just a brand change and all the rest, R&D-team, design, production, was still the same, separate to WD?
These codes aren't talking HTTP. They are talking POSIX to a real filesystem. The problem is that cloud-based FUSE mounts are never as reliable (they will "just hang" at random times and you need some sort of external timeout to kill the process and restart the job and possible the host) as a real filesystem (either a local POSIX one or NFS or SMB).
I've used all the main FUSE cloud FS (gcsfuse, s3-fuse, rclone, etc) and they all end up falling over in prod.
I think a better approach would be to port all the important science codes to work with file formats like parquet and use user-space access libraries linked into the application, and both the access library and the user code handle errors robustly. This is how systems like mapreduce work, and in my experience they work far more reliably than FUSE-mounts when dealing with 10s to 100s of TBs.
Does anyone have any experience on how this works at scale?
Let’s say I have a directory tree with 100MM files in a nested structure, where the average file is 4+ directories deep. When I `ls` the top few directories, is it fast? How long until I discover updates?
Reading the docs, it looks like it’s using this API for traversal [0]?
What about metadata like creation times, permission, owner, group?
Hi, Brandon from GCS here. If you're looking for all of the guarantees of a real, POSIX filesystem, you want to do fast top level directory listing for 100MM+ nested files, and POSIX permissions/owner/group and other file metadata are important to you, Gcsfuse is probably not what you're after. You might want something more like Filestore: https://cloud.google.com/filestore
Gcsfuse is a great way to mount Cloud Storage buckets and view them like they're in a filesystem. It scales quite well for all sorts of uses. However, Cloud Storage itself is a flat namespace with no built-in directory support. Listing the few top level directories of a bucket with 100MM files more or less requires scanning over your entire list of objects, which means it's not going to be very fast. Listing objects in a leaf directory will be much faster, though.
Our theoretical usecase is 10+ PB and we need multiple TB/s of read throughout (maybe of fraction of that for writing). So I don’t think Filestore fits this scale, right?
As for the directory traversals, I guess caching might help here? Top level changes aren’t as frequent as leaf additions.
That being said, I don’t see any (caching) proxy support anywhere other than the Google CDN.
Brandon, I know why this was built, and I agree with your list of viable uses; that said, it strikes me as extremely likely to lead to gnarly support load, grumpy customers, and system instability when it is inevitably misused. What steps across all of the user interfaces is GCP taking to warn users who may not understand their workload characteristics at all as to the narrow utility of this feature?
If you really expect a file system experience over GCS, please try JuiceFS [1], which scales to 10 billions of files pretty well with TiKV or FoundationDB as meta engine.
Blobstores are O(n) to perform a directory operation. You are forced to serialize / lock when these expensive operations happen to maintain consistency which limits the maximum size.
Worth noting that the sample used did not produce a Meissner effect, but was still superconductive at ~100K.
> To further verify the superconducting properties, we conducted magnetic measurements on the sample, but unfortunately, no obvious Meissner signal was observed, indicating that the superconducting volume fraction of the sample may be very small. The preparation of high-purity samples are still a challenging task.
This is one of my key takeaways as well. It's a shame they didn't include their magnetization measurements, even if it didn't confirm superconductivity. The first conclusive paper will have to demonstrate a superconducting transition in both electrical resistivity and magnetization.