Hacker News new | past | comments | ask | show | jobs | submit login
Prepping for the Transfer of 25,000 Manuals (textfiles.com)
148 points by r721 on Aug 16, 2015 | hide | past | favorite | 48 comments



Context/Background link: http://ascii.textfiles.com/archives/4683

tl;dr: A technical manual reseller in Finksburg, MD is throwing out 25,000 high-quality paper manuals from as far back as the 1930s this week and Jason Scott & friends are driving there with "$900 of banker's boxes" to save whatever they can.


I know many people may be thinking "just throw it out". But you don't understand - you may be faced with system that is in one of those manuals and some previous genius decided that when they cleaned their office to toss that manual.

Or even come into possession of one of those systems and have no idea how to use it. If google hadn't crawled the manual for certain products I would have thrown away a number of electronic paperweights.


Those are manuals for 40-80 year old technology, something second year EE student could explain to you in their sleep. The only value they posses is historical.


This is incredibly naive. Many second year EE students today will not even know what a vacuum tube is, much less be able to explain how a device using them works.

Even if you're used to looking at discrete transistor circuits (and even that is getting rare these days), a device with tubes can look like magic of the highest order.


Seconded. Another factor is that looking at a device won't tell you why it was designed the way it was. A good technical manual will be invaluable for telling you things like valid ranges, service or environmental limits, etc. which would otherwise need to be reverse-engineered. If you're trying to replace an old system or studying the history, know what does and doesn't matter can save a ton of time.

(This goes double if the manual was annotated by a good operator)


Tubes are history, so you pretty much agree with me? :) I didnt say those manuals are unimportant and should go into the landfill, I pointed out they are of historical nature, not something you would actually use today.


The Thunderstrike exploit for EFI required knowledge of the option ROM, which is a little bit of legacy tech left over from the original IBM PCs. Check out this[0] overview. Hacking a modern machine using, in part, information taken from the Intel 8088 architecture reference manual!

Also, my EE studies largely skipped over the analog world entirely. There were two courses on linear circuits, but they talked very little about analog (not a single mention of a vacuum tube to be found). Your assumption that a second year EE student could explain this to you is no longer correct in 2015 as most programs are similar to what I went through.

[0]: https://trmm.net/Thunderstrike_31c3


Apples and monkeys. Option rom is extensively documented and part of every bios hacking writepu. Example: https://books.google.pl/books/about/BIOS_Disassembly_Ninjuts...

authors blog http://bioshacking.blogspot.com/

free book download: http://www.lejabeach.com/sisubb/BIOS_Disassembly_Ninjutsu_Un...

It isnt some arcane knowledge only dying out grey beards would know about.


No, it isn't arcane -- in part because those old manuals are around.


:/ its like saying we know how to program in x86 assembly only because of some 34 year old books.


That wasn't what I meant at all in my original post, and apologies for not expressing myself clearly. But it's good that those manuals are preserved, as they capture the original context of the system's use.

I have another post on this thread where I talk about some uses I've had for old manuals. That's the sort of thing I'm getting at. The GE engine manuals, for instance, aren't arcane in the context of CH-46 maintenance, but in the broader context of printed material they are extremely arcane.


Why does he say the Linear Book Scanner (https://code.google.com/p/linear-book-scanner/) destroys books? I thought the idea of the Linear Book Scanner was to automatically scan books without cutting the binding.

Anyhow, cool project.


The FAQ on the linear book scanner site says the following,

"Prototype 1 could scan the majority of books without damage, but may tear one or two pages in some books. Out of 50 books tested, 45% had one or two of their pages either torn or folded. This is a very early prototype and there are many areas for improvement in the design."


Ok, thanks. I read that, but I thought there might be some other, more dire reports of its use. I guess we just differ on what it means to destroy a book. When I read Jason's comment, it sounded like he was saying that the linear book scanner rendered the book unusable after scanning.


Look into commercial book scanners that use vacuums. This is a solved problem. Kirtas is one of those manufacturers.

https://www.youtube.com/watch?v=ds63ZBXFdLM http://www.kirtas.com/

50 pages per hour with no damage to the books.

You're looking at $50k-$90k for the equipment plus $8k/year service contract, though. So you need to figure out whether book scanning is something that the Internet Archive is interested in, beyond this project.


I was over there yesterday morning helping sort and it seemed like the vast vast majority of the books were either ring binders or spiral bound and could easily be taken out of the binding to be scanned in a standard scanner.


I think that page-turning method is clever, but far too much like a reciprocating slicer/guillotine. Unless the pages are perfect, catch one in the gap and it'll get torn off or folded. Cameras are also much faster than traditional scanner-type sensors.


Having gone through the gut-wrenching task of choosing which data books to throw out on more than one occasion I fully understand the value and sentiment of this project.

As a young hobbyist and later engineer I learned TONS out of data books, application guides and equipment manuals. I'd spend hours paging through data books, learning about the various chips, going through the application notes, building circuits, testing them and studying schematics when equipment actually came with schematics.

Anyone who was "all in" in electronics did exactly the same.

To this day I've kept my National Linear Applications books and a few others. eBooks have yet to capture the speed and convenience of holding a 500 page book in your hands that you can page through and explore. Worst yet, having five or six such books spread across your workbench as you work on a design.

That said, having the ability to search books or, better yet, your entire library, is useful. I don't buy programming books in paper form any more. And, I still prefer PDF to any other eBook format. For me it tends to be a far better experience across platforms.

This thread has made me think about the idea of digitizing my physical books. I find myself thinking about this every few months. I have both engineering and business books that will never be available in electronic form and I would definitely like to preserve them and make the searchable.

Is there a service or a device one could use for this purpose. The linear book scanner seems interesting yet apparently it is known to damage books. A service could be interesting but it would have to be comparable to buying a book, meaning, $20 per book or thereabouts, not $500 (or whatever). This would mean they'd have to have a slick and low cost means to digitize books or monetize the process in some form beyond charging for digitizing.

Building a scanner could be interesting, of course. I'm thinking about bringing this up as a project for the FIRST FRC robotics team I mentor. You never know what the kids might come up with.

Any resources on this front?


When I looked into book scanning a few years ago, the Kirtas (mentioned elsewhere in this thread) was as far as I could tell from Net sources, and remains, the reference method of fast, high-volume, non-destructive, high-quality scanning. Even so, many libraries with Kirtas units still employ someone to stand watch over the page turning and ensure only one page at a time is flipped. Perfect page turning is apparently not a completely solved problem yet.

If procuring (and paying nearly $10K USD per year in maintenance fees) through a hacker collective or maker space is infeasible in your area, then the community at www.diybookscanner.org have a workable solution for a much smaller subset of what the Kirtas units address, so you could look into that as a modest workaround for the time being (though I wonder what results they got for dewarping by simply taking pictures on all the sides of the scanning target to synthetically construct a 3D volume, as perfect dewarping continues to be an open and unsolved problem).


Even so, many libraries with Kirtas units still employ someone to stand watch over the page turning and ensure only one page at a time is flipped

Most books have page numbers; couldn't they use that along with OCR to detect and retry skipped pages? Maybe even a state that shakes the pages more than usual in an attempt to separate ones stuck together. It doesn't sound too difficult to do (perhaps you'd have to tell it where the page number is), given what the Kirtas machine costs.


The challenge seems to be the OCR takes place in a post-processing phase instead of real-time, and the desire is to catch the improper page flip before putting away the book. Perhaps with one or more gigabit pipes, the image processing can take place in the cloud in near real-time.

The Kirtas units seem highly-regarded by conservators; they might have lots of objections to even gentle shaking of their sometimes fragile charges. The impression I get is that the slight vacuum employed by the Kirtas on pages is the most handling that is accepted. There might be recent developments in computer vision and robotic fingers which could see an improved robotic analog to a human page flipper in the future.

My personal hunch is the popularization and (relative) mass adoption of the slower, lower-tech open source book scanners will eventually outstrip the dedicated scanning throughput of the high-end units, and put more digitized content onto the Net, along with a legal fight over content "abandoned" by publishers. When I digitize my content, it goes into my private collection, but I sure wish publishers were more aggressive with digitization of the older material, or lenient with letting that older material go into the public domain if they aren't even chasing the long-long-long tail of that material anymore.


If the sole limiting factor is manual labor, can he hire a bunch of people off of Craigslist? Though I can't be there, I'd gladly donate for another person with a truck to help.


> Why are these even worth anything or worth keeping, tidy your life, lighten up, etc. Either you really understand why 80 years of manuals, instructions and engineering notes related to 20th century electronics are of value both historically, aesthetically and culturally, or you don’t. To try to make the case would be a waste of time for both of us.

I'm not sure I do understand the motivation, but I don't think that I'm beyond understanding it. Is it that some of these systems are still in service? Is it just the history/archeology aspect?


You would be surprised how often old technical manuals are useful. Examples from my own work:

1) I was tasked with instrumenting the T58-GE-16 engines in a CH-46 [0]. So what did I need to inform my sensor placement and selection? Some schematics and technical manuals, all from the late 1960s, all undigitized.

2) I needed to reverse engineer an old test set. The documentation had been lost to time. When I cracked it open, I saw lots of 5400 & 7400 series chips. Now, this is kind of a trite example, because lots of working EEs still have copies of the TTL Data Book at hand. But still, I needed to refer to that old tome when working on this project.

3) When I worked at a NASA contractor, a primary piece of equipment failed. We needed a replacement in a hurry. Fortunately, someone had kept the older version of this system around. It dated from 1959 (!) but the manual was still around, too. A quick read through that manual got us back in business.

Technology never dies [1]. But without the manuals to understand that technology, things become much harder when you need to use that technology again.

[0]: https://en.wikipedia.org/wiki/Boeing_Vertol_CH-46_Sea_Knight

[1]: http://www.npr.org/sections/krulwich/2011/02/04/133188723/to...


Thank you for 0), you may be able to guess why from my user name.


I was working on instrumentation for testing an IR suppression system for the Phrog's exhaust. Here [0] is the exact aircraft (BuNo 152578) sitting at the Pax River NAS museum, with the IR suppression attached (and a nice big bundle of our thermocouple wiring, too). From what I could find in the records, this was either the 4th or 5th time the Navy had tried something like this, and the results were less than promising.

[0]: http://cdn-www.airliners.net/aviation-photos/photos/3/0/8/16...


You'd be surprised how true that is even for computers:

http://www.pcworld.com/article/249951/if_it_aint_broke_dont_...

It's why I keep reposting links like the Bitsavers manuals just in case someone needs them one day.


I personally wouldn't be surprised, as we still carry the legacy of the original IBM PC around.


Yes, we do despite Intel and IBM attempts to rid us of it. Then there's the other side's legacy. ;)

https://queue.acm.org/detail.cfm?id=2349257


> Quality only happens when someone is responsible for it

When our test aircraft were being delivered, me and the other guys who work on the instro systems would ask, "Who was the asshole?" In other words, who took it upon themselves to be the person who made sure that things came together properly during buildup and checkout? The one aircraft we got where no one had stepped up for instrumentation and been the asshole has proven to be the most problematic of all the aircraft.

Maybe not the most elegant way to put it, but there you have it.


I believe it haha


> I'm not sure I do understand the motivation, but I don't think that I'm beyond understanding it.

From an archaeological perspective, think about it 100 years from now (assuming there is no catastrophe -- a different topic).

Information lost now is lost forever. People in the future trying to reconstruct the past may well need various kinds of information that does not seem valuable today.

It's parallel to any library. Only a small fraction of any library will be of direct interest to any given person, but the collection overall is trying to serve a community, whether individuals see why various parts are useful or not.

> Why are these even worth anything or worth keeping, tidy your life, lighten up, etc.

This is an individual's thinking. The other point of view is about serving the larger community -- and not just now, but with very long term benefits.


"Historically, aesthetically and culturally". This isn't about saving manuals so people can keep running 80-year-old systems that are still in service. This is about history.


And learning from past works of science and engineering is how we keep learning and growing as a human race. So yeah, history - specifically science history, which is arguably even more important than your average history.


Technology and science doesn't exist in a vacuum.


This isn't about saving manuals so people can keep running 80-year-old systems that are still in service.

I think it would be more correct to say that it isn't only about that. As somebody above pointed out, "technology never dies". You'd be shocked what you'll find still running out there if you look in the right places. Forget Silicon Valley for a minute... go find a manufacturing plant in the midwest or in the southeast somewhere, or even in the rust belt. A plant that makes some kind of goofy sub-assembly for producing something, where none of us have even though about that sub-assembly or would know what it was if we saw it. In that kind of place you'll still find all sorts of seemingly archaic technology... old IBM mainframes with drum hard-disk drives where the drum weighs about 50 lbs and stores 50MB of data. IBM S/36 and S/38 minicomputers, DEC PDP/11's, old VAX machines, you name it, it's out there. Heck, go check in some non-profit telephone cooperative somewhere in rural america... I'd be you'll find more of the same there. And so on, and so on...


It's great that someone is doing this. The person involved is affiliated with the Internet Archive, so they will know about their book scanning capabilities.[1]

Once they have the books in storage, the next step is to take a picture of each cover, and put those on line. With an inventory, people will be able to ask for (and perhaps pay for) digitization.

[1] https://archive.org/details/partnerdocs


Beyond the immediate rescue (which is by no means secured), perhaps someone at the Society of American Archivist or the student membership thereof would be interested? I'm seeing discussion here on HN regarding the worth of the collection and how to handle it. SAA have mailing lists[1] and there appear to be a couple of plausible Twitter handles[2].

http://www2.archivists.org/initiatives/askanarchivist-day-oc... https://www.google.co.uk/search?q=ssa+twitter+archivist


Photos and updates on the progress available on Twitter...

https://twitter.com/textfiles


25k sounds like a lot, but from the pictures it looks like most of them are not very thick and it'd be relatively easy to grab a whole stack of them at once. From my estimation that is around the size of a small library.

If they can be removed from the shelves and boxed at an average rate of 5 per second, that's 5,000 seconds or <1.5h at the most. Even after adding in trips to the new storage location, packaging, unloading, etc., and considering it's a trivially parallelisable task, it definitely seems doable to move the whole collection of 25k within a few hours.


I doubt you'll get anywhere near 5 per second, even with a large number of people working on it. A single person will likely need 5-10 seconds to grab a manual from a shelf, place it in a banker's box, and move on to the next one. You'd need 25-50 people to get that.

Still, even if it's 100 person-hours, it's an achievable goal; A dozen volunteers over the course of a day can do so.


A single person will likely need 5-10 seconds to grab a manual from a shelf, place it in a banker's box, and move on to the next one.

If you look at the pictures like this one:

http://ascii.textfiles.com/wp-content/uploads/2015/08/IMG_69...

I could probably grab 25 or more of those at a time and set them in a box in 5 seconds, hence 5 per second. Getting the first ones out (because there is little "gap" to stuff hands into on the shelf) will be slower, but once the gap is made the whole pile easily comes out. This isn't about pulling one out at a time, spending another few seconds inspecting it, and then putting it in a box; it's about getting them off the shelves and out of the building ASAP.


> This isn't about pulling one out at a time, spending another few seconds inspecting it, and then putting it in a box;

Actually, it is. Many of the manuals are duplicated and they are only interested in keeping one unique copy of each. The duplicates will be immediately discarded. Obviously they will want to keep the highest quality copy of each manual for digitisation, so the process involves taking the manual from the shelves, checking to see if it is a duplicate, finding the highest quality version among the duplicates, and keeping the best one.

They can't just run through the shelves, grabbing 25 manuals at a time and throwing them in to a box.


Keep in mind, you also have to remove just one unique copy of each set, and throw the rest away, being very careful not to accidentally throw out a "duplicate" that is actually a similar-looking, but unique, manual.


Given the really hard time limit, wouldn't it make more sense to grab everything as quickly as possible and the sort them over a couple months? That requires more storage space but has the advantage of a bounded, easily-calculated maximum time.


Depends how many duplicates there are. If there are eight of each manual, then you are talking about eight times as many boxes. One truckload is now eight truckload, one storage unit is now eight storage units. And now you have to pay for discarding 7/8 of the manuals, which is not free.


He also mentioned in the previous post that they were going to look through all to make sure they do not save duplicates.


Looks like the place is in Finksburg Maryland. about 30 minutes from Baltimore and an hour from DC.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: