Hacker News new | past | comments | ask | show | jobs | submit login
The Grind a Day: thousands of Apple II floppy disks archived (textfiles.com)
175 points by pabs3 on March 7, 2023 | hide | past | favorite | 38 comments



Regarding the copy protection process itself, there's an interesting interview at one of the links:

> The most common protection schemes were the ones that were productized and resold to hundreds of publishers. This was coordinated through the disk duplication houses, who offered copy protection as a “value add” on top of mastering the disks themselves. Publishers got the benefit of the latest and greatest copy protection without needing to play the cat-and-mouse game themselves.

> The E7 bitstream, a.k.a. “generic bit slip protection,” was the most common. It was a sequence of 1s and 0s, specially crafted so the first half could be read “in phase,” then the code would intentionally skip half a byte and read the second half “out of phase.” Bit copiers would drop bits due to hardware limitations, and the out-of-phase values would be wrong. It was brilliant.

https://paleotronic.com/2018/06/15/confessions-of-a-disk-cra...


4am (along with qkumba) is also responsible for Total Replay, a single disk image containing hundreds of old Apple ][ arcade-style games you can run from a single, beautiful launcher app.

See:

https://archive.org/details/TotalReplay

https://github.com/a2-4am/4cade

https://www.youtube.com/watch?v=pki7rsGNxXs


> Applesauce pushes out three general types of disk images in its work. Fluxes, which are to-the-bit accurate portrayals of the magnetic flux of the floppy disks.

This is a distressingly common misperception. In fact the flux stream that you get out of a reader like a Greaseweazle is the TTL-level signal from the drive. This is the output of a differential amplifier and crossing detector. In theory, the raw drive head will show zero volts when nothing is changing under the read head, a very small positive voltage when the magnetic flux reverses in one direction, and a symmetric negative signal when it shifts the other way.

What you get out of the amplifier is a digital (!) pulse of ~1us when the signal shifts in either direction. And due to the way the amplifier works, when "nothing is changing" it tends to just amplify noise in the environment and invent fictitious transitions. Proper disk encodings limit the distance between transitions (generally to no more than three bit cells) to deal with this.

This behavior was actually a core copy protection trick. The custom drive writing the commercial software would gate the transitions in the middle of a sector, leading to a region that would read different noise every time. Correct consumer hardware isn't able to do that.

And that's why the flux files are so large: they're storing multiple revolutions of every track, precisely so that tools like the WOZ (Apple) or ATX (Atari) converters can intuit the style of copy protection (there seems to be no clean equivalent in the Commodore world, their "archived" copy protected media tends to be reverse engineered the last I checked?)

A fun project I'd love to see someone try some day would be to wire up a proper analog ADC chain (doesn't need to be more than ~8 bit at 2 Mhz or so) to the unamplified drive head output and see if a cleaner representation of the flux can be read.


> This behavior was actually a core copy protection trick. The custom drive writing the commercial software would gate the transitions in the middle of a sector, leading to a region that would read different noise every time.

Can you explain that a bit more? On the software side, how could you detect if you had a pirated disk or not? Read a few times the same data and make sure it'd always be different?


>Read a few times the same data and make sure it'd always be different?

As EvanAnderso and Meic said, yes. Think of it as the Schrodinger's cat of copy protection; as long as the bad sector is in a state of quantum superposition, the game works. It's when the superposition collapses into a definitive state that the game fails to load.


This reminds me about the copy protection I once developed for the Atari ST. The Atari ST had a proper floppy disk controller from Western Digital. They never changed a bit about the WD controller. You could rely on its characteristics. My Schrodinger's cat of copy protection for the Atari ST needed a new, never formatted disk, that gave different track reading data every time you read it to start with. I produced some hundred pseudo random bytes carefully avoiding the magic sync bit sequence to init the read circuit of the controller. At the end of the pseudo random bytes where exact one controller sync sequence following a 32 bit serial number. Than I wrote them as track data. But I exactly timed the write track command and terminated it in the middle of execution to keep the random state of the rest of the floppy track. When you than read the track you got every time different data, except that 32 bit serial number. It worked very reliable. Even if a floppy disk had read errors and you could not read the directory or the program data any more, the copy protection worked still reliable. It was the only floppy disk copy protection for the Atari ST that could never be copied. :)


Thank you for sharing that. Did Dungeon Master use your copy protection? My understanding is that its protection was so tough that people gave up and actually bought the game.


My copy protection was made for a niche high quality Atari ST graphics program that only got sold in very small numbers. The developer (my customer) want to get sure he really get money for every copy. And I put a lot of effort into randomization of my copy protection detection software. I want get sure nobody developed an automatic cracking program for my copy protection. But I left a easy to find exit point where cracker could easy remove my copy protection detection software. Manually. That worked. Everything can be cracked. But my copy protection got never in deep analysed because I made it really really hard to do it.


The developer (my customer) of the Atari ST graphics program was wicked :) When he detected (in runtime at random places) that the program was cracked, he added a invisible watermark to the output of the graphics program. When someone produced public, commercial work with a cracked version of his software, he could prove it was done with a non-paid cracked version of his software. He could "convince" several user to pay him retrospective.


Sometimes called a "weak sector".


I think "gate" here is being used to mean "gate off", or "inhibit".

The writer would simply refrain from writing anything for a moment, leaving the original bulk-erased magnetic nothingness in the middle of a sector.


Exactly that.


I'm confused. You're saying it's portrayal of the magnetic flux of the floppy disks, but you were distressed when someone else said it? What part of Jason Scott's description did you take exception to?

Here's Applesauce's author's description, FWIW:

> At the lowest level, they are all capturing the amount of time between magnetic flux transitions on the media. The real differentiator comes from what you do with all of these flux timings.

https://applesaucefdc.com/what-is-applesauce/


The Applesauce is at the mercy of the analog board in the drive. Artifacts of the signal processing done there are going to be reflected in the data stream the Applesauce receives from the drive.

A more authentic representation of the flux on the disk could be made by amplifying and sampling the analog waveform straight off the drive heads. A variety of signal processing functions could be applied to that waveform vs. what the electronics on the drive will do.


Yep, I always have wondered why AppleSauce did it the way it did. but TBH I didn't read all of their quite comprehensive documentation...


An SCP file is NOT a direct measurement or representative capture of the flux transitions on the media. It's the output of a machine that imperfectly tries to measure them and inserts noise and delays into the data instead.

And not only in a meaningless "not as accurate" sense -- the drive loses exactly the information that's critical to implementing a working copy. That's why you have to sample a track multiple times for the benefit of downstream analysis/RE tools, and why a naive capture can't be written back to a disk to get the same results.

It's not a good situation. A driver reader that *could* measure the actual flux transitions wouldn't have these problems, but we don't have such a device yet.

My point was that a true capture is/should-be possible, but not with the hardware as implemented by Greaseweazel et. al.


[ Willy Wonka puts his head on his fist and looks back and forth excitedly ]


yes, my impression is i'm really ignorant on the subject, or someone is hoping someone is ignorant while sounding smart on their pet-peeve soap box. either answer is perfectly acceptable to me, but that's a pretty good description of what my brain was going through reading all of that


You're aware you can ask questions instead of just pointing and laughing at the ridiculous nerds, right? Is there something you're confused about regarding floppy drives?


who is pointing and laughing? i gave an honest description of how i felt after reading that jumble of words that looked like an answer to something. it read as if it came from ChatGPT. obviously, more than one person was left confused after reading it. clear as mud one might say.

>Is there something you're confused about regarding floppy drives?

after reading all of that, i'm so confused i no longer even know what a floppy drive is or what it was meant to do


This is just bullying. Please stop.


I am happy for you, as it is clear you've never experienced actual bullying.


Yes, these flux files are yuge (given the amount of actual data they contain). It seems to me they should compress well, but I can't see to remember if tried to do so the last time I fiddled with flux files--during the pandemmy when I thought the world was ending and it was important that my 8th grade book reports be preserved.


There was a talk at VCF[0] a couple years ago discssing exactly what you're describing in the context of tapes. It was very interesting and some of it has good applicability for floppy disk preservation

[0] https://www.youtube.com/watch?v=sKvwjYwvN2U


Since I can't edit anymore:

This article re: imaging floppies in the analog domain made the rounds on HN a couple years ago:

https://scarybeastsecurity.blogspot.com/2021/05/recovering-l...

The comments have some good links and background:

https://news.ycombinator.com/item?id=27187435


I had an apple II as a kid. By the time I got it, it was an old computer, but the person who sold it to us included hundreds of floppy disks. Some were games, some utilities, some I never understood what they were supposed to do. It was an amazing learning device could be booted to a Basic prompt for easy programming.

Considering how shitty most software targeted towards kids is today, I am nostalgic for my old apple. I don't have any devices that fit the bill. Basically these requirements:

- easy selection of programs: for a 7 year old, putting a disk in and hitting the power button is as easy as it gets. Diskettes are more easily discoverable than things embedded in a launcher or cli. All I had to do was flip through the physical disks.

- no spyware / in app purchases / monetization attempts. You either had a full version of a program or shareware, but you knew what you were getting

- no internet. I'd be comfortable with something connected but lacking a web browser. Browsers have distractions, and it's too easy to accidentally navigate away from the page.

We got our daughters CD players because they fit the same bill better than mp3s. A non-literate kid can easily pick out their favorite physical media and start playing. Media can be easily shared or swapped.

Is there a way to boot directly to an apple emulator and just select from a list of roms? So it feels like the experience I had as a kid?


In the article, the archive.org page is listed.

https://archive.org/details/wozaday?&sort=-week&page=16

About 1,500 titles. Click and play in the browser. There are no ads or spyware on archive.org. Thumbnails and proper labelling.

Point a browser at this site, put it in kiosk mode so they can't go anywhere else, and you're golden.

I think you will find all dedicated emulators for the Apple II involve more fussing about with poorly-labeled disk images than a kid will want to do.

"ramaadv1a.img? Sure, that sounds like fun! Oh, I have to swap the disk... I'll press F6, scan down the list and figure out that 'flip the disk' in this case means I want ramadva1b.img, even though on other titles the convention was b1a. Oh, I guess this is just a text-mode cooking tutorial program from 1985."


Thank you, but doing it within a browser is not what I've been trying to do.


My daughter used an Apple IIgs with a FloppyEmu[0] when she was younger (and still uses it some). Getting a real Apple II may be beyond your means (they are getting ridiculously expensive for as many as were made) but, if you do, the Floppy Emu is a real improvement to quality of life.

[0] https://www.bigmessowires.com/floppy-emu/


I'm doing something like that with a misterfpga. I set it up with an amiga core and an atari st core and a list of disk images my son can use.

It's then setup as a desktop for him to use as his personal computer.


So "4am" is a person? And "they" is the pronoun referring to that person? I couldn't parse the first two paragraphs after reading them three or four times over. Then I started skipping down the page trying to figure out if I (who was around in the eight-bit days, and is interested in hacker-history and nerd-archaeology type stuff, but not personally involved in any related "scene") would be able to make sense of it, or if it was purely written for "insiders". Eventually figured out enough context that, yes, the grammar checked out and I was able to read it and make sense of it. Only then did I come back to "But 4am is an engineer" that strongly implied (but didn't entirely confirm), we are talking about a person here and not a time of day or an abstract concept.

I'm not knocking people for choosing unusual names/handles, after all it's always been a part of hacker culture, or for choosing "they" as a pronoun, which is somewhat inevitable these days... but a little more careful writing of the article intro would make it soooo much easier for the casual reader to just cosy up and enjoy the story. Which was great, by the way. As hoped for - nothing I can act upon, but it tickles my nerd-gills and makes me feel enlightened.


"Ah," I said, writing about my friend 4am and their Apple II cracking towards discussing project management. "It's going to be a little confusing when this gets to the front page of Hackernews, with all those busy people skimming the post, but I hope they'll not be scared by a modern pronoun usage and a link set to the projects to determine what I'm talking about."


Took me a while, and maybe it's implied by the kinda niche theme of the post, but 4am is an Apple II floppy hacker/cracker/tinkerer

https://mastodon.social/@a2_4am


I had the same experience. Read the first part more than once. Confused for a while until I had enough context and understanding that there were two developers. Then I went back and re-read the beginning. Native English speaker here.


I was having a similar problem. Glad I wasn't alone!


[flagged]


This is the dictionary definition of an unfunded mandate.

I'll just say that I'm not seeking a glorped together description of the games in question, but a semantic listing of the contents of original articles, ads, discussions and mentions related to the titles. LLMs do not do this, and if you're training the LLMs (at cost) to do this, you're already having to do the very same searching out of materials within the corpus related to what you want. It'd be like gathering every TV guide in a stack to then ask it to describe things that are like All in the Family, instead of just going through them as you gather them to indicate what each issue has information about.


> LLMs do not do this, and if you're training the LLMs (at cost) to do this, you're already having to do the very same searching out of materials within the corpus related to what you want.

A more reasonable suggestion would be not training a LLM (which one doesn't want to do anyway) but treating it as a retrieval+summarization task: search the corpus for mentions and similar-by-embedding documents, and summarize. LLMs are good at abstractive summarization with minimal hallucination or error. This can serve as an 'annotated bibliography', a first pass for a human writing it themselves, or the collective summaries be fed into the LLM for a summary.

The main problem here is I guess that most of the relevant texts have poor or no OCR, so one can't do that in the first place. But there's a good chance that that will mostly stop being an issue in a few years as 'text' LLMs move to images (see eg PIXEL https://arxiv.org/abs/2207.06991 or Kosmos https://arxiv.org/abs/2302.14045 or https://arxiv.org/abs/2010.10648#google https://arxiv.org/abs/2012.14271 https://arxiv.org/abs/2209.14156 ) and they will either OCR, embed, or just process images of complex text directly. So, something to keep an eye on, perhaps: there's never going to be enough humans to do all this archiving properly, but perhaps there may eventually be enough GPUs to do it...


[flagged]


You've modified your original comment (which I didn't know Hackernews allowed people to do). So now I look completely weird because it "seems" you only typed one word and I wrote something.

You wrote, essentially:

Why not train a LLM AI on the magazines, newsletters, documentation, etc, and have it generate the descriptions based on prompts?

...and I call this an "unfunded mandate", which is where someone comes along to a project, and tells them a bunch of things they should take on and do, instead of either offering to do something themselves, or pointing to a resource or project that fulfils the need that is being asked for.

The article is about me finding out that certain kinds of activities are jammed in a Catch-22 of necessity but lack of potential funding (time or money). Your idea shifts no needle and makes things more complicated, to no greater chance of success, but plenty of chance (contemporarily) of hallucination and misleading results.

Feel free to delete, I guess, even more.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: