My guess that windows is in a loop of reading bytes from the source and writing the bytes to the destination. As munin mentions, the appropriate transaction size of writes to a device will be related to how much data the CacheManager chooses to allow to be buffered and not flushed to the actual device. Which is huge for a hard disk and very small for a usb drive since windows expects it might be pulled from the port at any point.
Now, consider that calling read() on a socket that's really an SMB share is a complicated, multi-round trip action. The read must involve some locking in case someone else tries to write to the file in between, and probably a ton of other stuff. In my experience, even doing something like opening a 5k php file in vim from a samba mount can take a few seconds. That overhead is getting paid far more often in the network -> USB case than it is in the network to HD case.
In short, windows uses smaller packets for USB, necessitating many more network roundtrips.
That seems like a very reasonable hypothesis. I'm not sure how MS could solve it without using arbitrary amounts of memory or caching on the HDD, which would be a really bad solution, IMO.
It really could be 7 hours - I spent way too long investigating how to copy to USB quickly, and discovered it's a combination of:
* Variability in flash drive speeds.
* Variability of USB implementations. For instance, a 2008 MacBook is less than half as fast as a 2010 MacBook Pro, and both were slower than Linux running on a Dell.
* Most USB drives use awful filesystems like FAT16 and FAT32 that are inefficient when it comes to any complexity, as far as both space needed and speed.
That said, I was working with millions of files, and eventually a coworker developed a single-file storage. But for people with that horrible task, the trick is mainly to make a USB drive image with the filesystem intact, and flash the drives blindly.
If they can set up openID servers (you need a real server to perform authentication), why exactly wouldn't they be able to set up an email server to receive the confirmations?
If you want a nice "OMG!?" moment drag the contents of a zipped file to a folder on a network share. Hell freezes over.
There is something seriously wonky in the file access, network and (!) audio stack within Windows.
A good example is the automated wireless network search that runs every 30 seconds or so. If you have somewhat flaky (like, standard Atheros) wi-fi drivers, there's no way you will be able to listen to a song without interruptions and distortions. There are tools that can disable the wi-fi search, but good luck finding this solution if you are a newbie.
The same goes with copying files over the network while playing an mp3 file at the same time.
I'm talking about Windows 7... There are still issues, it's better than Vista tho. It's funny, I actually remember that article from way back when I first started investigating this thing.
If I'm understanding your scenario correctly, then it's something I do all the time with reasonable results. My file server runs Windows Server 2008, and I frequently pull up shared folders with zip/rar files and extract them from my Windows desktop. Obviously, this is a bit slower than remoting into the server and extracting, but with gigabit ethernet it's not bad at all for small to medium sized compressed archives.
I suspect that's not windows fault this time, but maybe your USB key's IO driver. OK, so maybe it's windows's drivers fault ;)
To sum up, flash disks can only access data by big chunk of say 32KB. So if you have 32 files of 1KB each, the (dumb and not optimized) flash disk driver will overwrite 32 times the 32kb block where the 32 files will be stored. In this example, 1024KB (32 x 32KB) of data is written instead of just 32KB. So it slows you down 32 times in this case.
As you can see, the bigger the block and the smaller the files, the slower the transfer.
Such write patterns can be avoided by buffering writes by only a few milliseconds if you're clever. It's not like this isn't a long-known problem.
If you just want to make the blinking light on the drive accurately show when it's safe to unplug you can easily delay writes for at least a quarter of a second.
I just copied about 800 megs worth of data (a handful of files) from a network share directly to an attached USB flash drive in less than a minute on Windows 7 x64.
Another folder on the same network share is 5.96 GB worth of data in 2795 files across 190 directories. According to the detailed dialog, it is estimating "1 day" to finish copying despite the fact I'm getting sustained transfer rates greater than 5MB/s and it looks to be about 20% complete after a several minutes.
If I were to make a guess, you might be able to shave off some of the time it takes by enabling write-caching for the usb-device.
I believe that by default, windows disables write-caching for USB devices. Somewhere in the device manager, there should be a choice between "optimize for performance" (enable write caching) or "optimize for safe removal" (disable write caching).
This would be my guess too. By default, Windows optimizes removable devices for filesystem consistency. For copying (or worse, moving) lots of files, that means updating filesystem metadata after every file moved, and flushing buffers. That kind of sync-interspersed random access will kill performance on hardware set up for sequential writes.
this would be interesting behavior to really dig-in and investigate (in all of my spare time, of course).
NT is a lot like linux (no, really), external hard drives and flash drives are represented as SCSI drives exactly like local SATA disks (think /dev/sda for your root and /dev/sdd for your thumb drive).
the big difference is in the cache manager. windows recognizes that "surprise removals" of thumb drives is a lot more likely than "surprise removals" of fixed disks and it adjusts the aggressiveness of the cache manager accordingly (this is controlled via "device properties->policies" from the device manager).
in the client case (copying something onto your flash drive from a CIFS server) there shouldn't be any difference to the FS/IO stack than copying it from your desktop via explorer. the server case is a little different, srv.sys does writes a little differently than usermode clients, but...
I quickly tested this, and got different results. Copied 90MB file to my Android phone (appearing as mass storage device) both from hard drive, and from network share. Both cases took about 30 seconds to finish. My OS is Windows 7, and the network share was hosted on Windows XP machine.
There's another popular utility whose name escapes me at the moment. However, its author has insisted on continuing to use a package format that some anti-virus programs flag due to high use of the format by mal-ware authors. I've also found it to behave incorrectly in some circumstances, in the past.
Windows is shit at copying data anyway! Windows Vista and 7 miss folders...
It's such a pain when copying customers documents and you find out it's copied 2GB instead of about 25GB because it's missed the MAIN folders out. FFS Microsoft, sort it out!
It's a VERY BASIC function for an operating system...!
I wish Windows would queue multiple transfers occurring between the same two drives, rather than executing simultaneous transfers (in order to reduce the time spent seeking all over the place).
Here's why Windows feels a lot slower in a lot of areas when compared to Linux, Mac OS X and other OSes:
the process scheduler has a bit of a high latency, the IO system is pretty bad and the entire hardware blocks a whole lot more than it should, there is no apparent control over how much IO bandwidth a process is allowed to eat up, the entire OS has to do a lot of system calls and a lot of calls to the registry every second, aero is also making the entire OS slower due to a poor design
This is why windows always feels laggier than other OSes. Windows 7 also increased the processing latency of the OS.
Windows does have per-process I/O priority (at least since Windows Vista). I think async I/O design is baked in to the OS probably more pervasively and with a better model (e.g. no polling) than most other OSes; however, fairly few applications outside of servers use async I/O.
What does "the entire OS has to do a lot of system calls" even mean? I don't think you're talking about the kernel, because the point of avoiding system calls is avoiding a user/kernel transition (something Windows' I/O completion ports is good at, IMO the reason its async I/O implementation is fairly solid). But it leaves open the question of what it is you are talking about; is the problem Explorer? The shell? The Win32 layer?
Windows, for me, does not feel laggier than other OSes. In particular, Linux has always felt substantially more laggy for mouse input. Linux has long given me the impression of having an egalitarian scheduler, giving equal weight to user interactive apps and background processes when they have the same priority; but Windows bakes interactiveness into the scheduler algorithm as another input alongside priority. I haven't used OS X long enough to give a solid opinion on it, but it seems about as responsive as Windows and almost certainly better than Linux.
Here are a few issues with it:
event 500 / 501 with aero at 1920x1080 screen resolution - bad design & inefficient use of GPU functions and hardware for desktop composition & aero
While this may be normal behaviour for pre-DX10.1/11 cards due to software emulation of some functions, this is not ok for DX10.1/11 cards which have those functions.
laggy video - I couldn't get Windows 7 to play any kind of video smoothly on a GT 450 and 9800 GT at 720/1080p on an i7 with w7 64bit; you think you can play it smoothly, but then I play the same file on linux box and it's so smooth you'd think there's no such thing as frames
if you kill ALL your apps (including services and background apps which are part of your apps) and you check how many requests per second to the registry there are, you will have a shock; I did this back in the XP era and it had like 50-100 registry requests / second.
if you leave a W7 machine on for more than 24 hours and you have more than 6-8GB of RAM, you will have degraded performance even if you close the apps (memory leaks and/or memory fragmentation, maybe other issues)
As for Linux, it's been quite laggy and buggy before in the aspect of responsiveness, but it seems to have improved in those areas recently.
Mac OS X is doing fine too so far.
edit: Let's not forget game input lag. Also, some CPU intensive games perform better when you set the priority of the processes which are 100% idle in 99.99% of the time to idle. I am talking about things like print spooler, app notifiers & tray area apps, updater apps and others like these.
If linux seems slower in the graphics department, keep in mind it's got an X server, apps which render in that X server and that brings quite some overhead. Even though this is how things are, Linux is pretty close to "being there", pretty much like OS X is.
Windows also lets processes starve the entire OS completely.
It's not ok to have resources be eaten up completely so that the user might have to wait even hours until a task is done so that CPU usage drops below 100% across 4 or 6 cores of a system. This isn't normal.
Also, Windows can give you high latency and bad performance even when there are more than enough resources: CPU & memory.
Therefore, I'd say that the Windows scheduler is closer to the treat all processes equally and / or sink more time into idle processes with more idle threads than into processes with fewer threads and more active threads model.
Linux hasn't left me with this impression recently and OS X hasn't left me with this impression since Leopard / Snow Loepard.
Event 500 and 501 seem like false alarms (I had never heard about them until today). I have a fairly high spec video card (AMD 5870) and when I drill into the log where those events show up, I can see there's been a few of them every day - but I've never actually noticed any performance degradation. I would never have known that they were being produced until I looked into them, and on the basis of the lack of actual degradation, I'm confident I can forget about them.
Video is fine and clear for me across a much larger range of codecs than I've ever seen on Linux (or Mac, for that matter). I normally get no more than 2% CPU utilization for video playback; any HD content I have is hardware accelerated, I expect.
50/100 requests per second is meaningless without knowing the cost of those requests. My machine, if you ignore superscalar architecture and hypethreading, is capable of over 12 billion operations per second. Unless these registry reads have fairly substantial costs, they're completely unnoticeable.
My Windows 7 machine here currently has an uptime of 48 days. I seldom reboot until the Windows Update nag gets the better of me, and I frequently turn off the update service to get rid of the nag. I have 12GB of RAM, and I've never seen this degraded performance you speak of.
I wrote my own print spooler (it actually runs on a different machine) and have a bunch of tray applications: CrashPlan, DropBox, Process Explorer, corporate VPN, occasionally Skype and Steam. Again, I've not noticed any performance drain from these things. Not even games these days are CPU constrained; with 4 cores (or 8 if you include hyperthreading) the only thing that taxes my CPU is video transcoding. There's plenty of spare CPU capacity in games; they seldom get above 30 or 40% utilization.
If you really think Windows is odd in starving the system for non-low-priority runaway processes, try running a forkbomb without ulimits in place on your Linux or Mac machines. But I have no difficulty running Handbrake or SuperPI or similar utilizing 100% CPU but still having responsiveness enough to shut them down etc.
It's extraordinarily rare for me to find myself in front of a Windows machine where the UI is completely unresponsive except in cases of driver conflict or flaky hardware (e.g. a paging error from a bad hard drive will do horrible things). I don't know why you've soured so much on Windows, or what bad experience it was that pushed you over the edge, but when you bring up things like registry traffic and this "event 500/501", I get the impression that you are actively looking for things you think are worth complaining about, rather than pointing out things that are actually hampering your interaction. It hurts your credibility in my eyes, frankly.
(FWIW, I was a computer technician before I became a software engineer, and I've been working with Windows of all varieties since about 1993, including the early (3.1 and 3.51) versions of NT (almost 20 years, scary!). I've dealt with thousands of machines running Windows. And for MS OSes based on NT, problems have overwhelmingly been because of hardware or driver issues, almost never because of OS issues.)
really? can you provide some analysis here? I've done a bit of hacking on I/O layer stuff in both linux and NT and I haven't seen anything that would lead to that conclusion ...
Take one app which wants to do heavy IO and it will stall the OS more often than not.
Take a machine which boots the OS and check how much stuff gets on the screen and if you can interact with it while the HDD led is at full brightness or blinking heavily.
Take an i7 with 8GB of RAM and one 64MB WD RE4 drive and see how much that stalls compared to OS X on the same hardware...
How come the cost of accessing the disk for reads / writes is greater for performance when compared to OS X? Why is Linux able to get stuff done and displayed on the screen while doing disk IO? I am guessing it's trying not to block as much due to starving other threads which want to read / write
Notes: I've been using Windows since 96, Linux since 2001 and Mac OS X since 2004. I don't claim to be an expert or anything, nor a fan of any OS, I just made some observations over the time and wish that all the OSes improve in one or many areas.
Did he timed this on a Mac or Linux? Just curious how bad is Windows. I've experienced this many times but never thought as a big hassle to hop through HDD.
Unless he tried to copy the same file to the same USB stick on a machine with a different OS, the question should be "Can anyone explain why copying files to a USB stick takes so long?"
The most likely answer is that the USB stick itself is pretty slow at writing incoming data into its internal flash modules. He also ought to test writing to a USB hard drive (with the physical spinning spindle inside) and I'd be surprised if his throughput would be less than 100Mbps even with a crappy drive.
Why? The issue is the source, not the destination.
If he is getting radically different times for Network->USB vs. Harddrive->USB, then it's probably not a USB issue.
As for the different OS... I mean, he isn't posting this to HN, so he may never have even used a different OS in person (most "normal" people I know haven't. Yes, including OSX).
I haven't seen a USB stick that was slower than 2-6 megabytes per second on large-file transfers. Dropping an order of magnitude of speed on smaller files is entirely windows' fault.
Windows uses two modes for USB devices - cached and uncached. When you plug USB stick, it is uncached by default. It was done this way, because most users do not bother with disconnecting the device and just pull the stick from USB port (and then they are surprised, that their files are not there).
You can change the defaults in device properties, if you want caching on USB stick.
I can show you an USB stick, that is slower than 2 MB/s. :(
I am aware of the distinction windows makes there. But it's artificial to make such a thing binary. If it cached but only for a fraction of a second, with queue reordering to speed things up, it would still be safe to unplug right after a copy completes.
You can't do time-based caching like this. Block is either written, or it is not. When you flush your cache, you don't know how long it will take. Even if your cache it is only fraction of second old.
That's why there used to be performance problems in apps with fsync().
What I'm saying is that you make it so as soon as the light stops blinking it's safe to remove. That means that you can have a cache as long as you flush it almost immediately.
I've been bitten a couple times before by pulling drives on linux that weren't being written to, but still had a cache waiting to be written. I learned my lesson about not dismounting on an OS configuration where it isn't safe, but disabling caches is not the only way to be safe.
It might take 10 seconds to flush the cache. That's okay. The key is that you never go more than a quarter second without writing to the disk until the cache is empty.
Everybody seems to be failing at reading comprehension today...
His two tests look like this:
1. Network -> USB
> I'm copying files from a my networked hard drive to a USB stick.
> It tells me it's going to take about 7 hours. For 3 Gig of data.
2. Network -> Local HDD -> USB
> I copy it from the networked drive to my C drive. This takes about 4 minutes.
> I then start copying it to the USB stick. This is varying between 8 minutes and 45 minutes, depending on whether it's copying a big file (fast) or lots of little files (slow).
Test #1, network directly to USB, is showing a 7-hour time estimate.
Test #2, network to HDD to USB, is showing an 8-45 minute time estimate (truth will be somewhere in between).
This is not a generic complaint that "Windows I/O is slow", it's a complaint that the obvious method (straight from network to desired USB device) is many times slower than the less-obvious method of hopping through the local HDD for no good reason.
I've actually tested this, so I have the exact answers! The author just saw the estimate of 1; he hasn't measured it, it seems. The estimation by Windows Explorer was wrong in my test but the actual difference wasn't so significant.
In my tests, the time of "a lot of small files" is dominated by the USB stick slowness (800 files totaling 5 M took 80 sec from the HDD and 90 sec from the network share), and the time of "copying a big file" gets two times slower when the sizes of write blocks to USB stick are not "convenient" and that happens in the "network to USB" case (130 MB for 7 sec from HDD, 14 sec from the network share).
Note that Windows by default doesn't cache writes to the USB stick; specifically, the scenario "plug out the stick without unmounting" is supported, whereas the last time I've checked I had to unmount USB sticks on Linux to be sure that everything is properly written.
So once you don't cache writes you're at mercy of the USB stick, which is reasonably fast only for one specific block size (specific multiple, actually). When you don't get the best block size over the network, you'll get slower writes to the USB stick.
What I see in my tests (on Windows XP) is that the copying is done by CopyFileEx API, which uses blocks of 65536 bytes (64KB) from the local HDD but uses blocks of 61440 bytes (that's exactly one 4K block less than 64KB) over the network, which is then slower on the USB stick.
Conclusions:
a) Don't copy small files to the USB stick on Windows if you need speed. It's hundreds of times faster (when the bytes are counted) to copy an archive of the files vs. a file by file.
b) If you want efficient copying of bigger files from the network share to the USB stick, avoid CopyFileEx API function, at least on Windows XP, use the software which writes in 32K multiples.
It would be neat to hack up a simple copy program that uses more USB friendly buffer sizes and see what happens. I don't have an XP machine handy so I can't easily test this.
I agree with his conclusions and have experienced it myself while working with clients. Trying to copy a code project (with a lot of assets) of about 300mb says it will take 3 hours to go from my laptop over wifi on their LAN to an external USB drive hooked up to their server. When I have to copy something large I always request to either unplug their external hard drive and plug it into my laptop, or copy it from my laptop to one of their desktops an then copy it to the external drive. I do think there's something fishy about going over a network to a USB on Windows.
Show me the same effect with actual timed runs. If it still shows that effect then it'll be interesting.