Download all of Wikipedia on your phone

xanderlewis · on Nov 27, 2023

I remember doing this about ten years ago, taking up almost the entirety of my phone’s storage in the process, and feeling unreasonably delighted by it. I’m not sure why, but this idea really appeals to me. It’s all very good having access to information on some server somewhere, but literally having (a very useful amount of) information on almost every topic humans have so far contemplated available literally in your pocket is somehow incredibly thrilling. For the same reason I’m fascinated by the idea of locally-hosted LLMs. It’s hard to explain. Maybe it’s like some sort of digital version of hoarding behaviour.

flashback2199 · on Nov 27, 2023

> Maybe it’s like some sort of digital version of hoarding behaviour

Digital hoarding is a real thing

reddit.com/r/DataHoarder

cm2187 · on Nov 27, 2023

I would bet those who grew up with 56k modems are more affected!

mikestew · on Nov 27, 2023

56K, such luxury. I know what digital scarcity is, and a 1Gbps fiber line to my house isn’t it. Sure, I have my secret stash over in the corner of that NAS in the garage, just in case. But I otherwise can now download what I want in seconds, minutes at worst. Ergo, I don’t hoard nearly like I used to (never was a true hoarder in the first place, so I can probably be ignored).

Not to say that your point isn’t well taken. One does tend to keep stuff after a long overnight download, and old habits die hard. And I do have a local copy of Wikipedia…just in case.

btbuildem · on Nov 27, 2023

Local LLM with the entirety of Wikipedia accessible via RAG (or better, newer technique) is legitimately a little super-powered assistant. It works when the grid goes down, and the searchability is orders of magnitude better - especially when you're not sure what you're looking for.

golergka · on Nov 27, 2023

What better technique are you referring to?

btbuildem · on Nov 27, 2023

Something like this perhaps: https://blog.normalcomputing.ai/posts/2023-09-12-supersizing...

aceazzameen · on Nov 27, 2023

Are local LLMs available on phones yet?

simonw · on Nov 27, 2023

MLC have demos that work on iOS and Android:

- https://llm.mlc.ai/#ios

- https://llm.mlc.ai/#android

CGamesPlay · on Nov 27, 2023

This model came out a few days ago and runs on an iPhone. https://huggingface.co/NousResearch/Obsidian-3B-V0.5

simonw · on Nov 27, 2023

From their README:

    <|im_start|>user
    What is this sign about?\n<image>
    ###
    <|im_start|>assistant
    The sign is about bullying, and it is placed on a black background with a red background.
    ###

Any idea how you fill out that <image> bit?

CGamesPlay · on Nov 27, 2023

No, but the GitHub repo linked from there includes a Gradio Demo UI which has an image field, so it should be possible to reverse engineer.

johnsillings · on Nov 27, 2023

Hitchhiker's Guide vibes

babypuncher · on Nov 27, 2023

An early (pre-App Store) homebrew iOS app I used that did this on my iPod Touch was called "Don't Panic".

Damogran6 · on Nov 27, 2023

I had the same feeling I imagine...when I paid $2.00 to download the entirety of the Laserdisk game Dragon's Lair onto my iPhone (3gs? 4s?)

The thought that I used to spend $.50 a game and now I can hold it in my hand for 4 times that.

nlh · on Nov 27, 2023

I sometimes daydream about what would happen if I were suddenly transported back in time -- but with everything on my person intact (iPhone included).

Step one in this daydream is always "Crap, no internet for 100 years -- if only I had all of Wikipedia cached".

Check! :)

Step 2 of course is figure out how to fashion some sort of charging system so my magical godlike pocket super abacus continues to function after ~days, but I'm making the bold assumption that somewhere in the annals of Wikipedia I can figure that out.

guntars · on Nov 27, 2023

7 cell copper-zinc wet electrolyte battery - https://en.wikipedia.org/wiki/Penny_battery

Terr_ · on Nov 27, 2023

> If I were suddenly transported back in time

It'd be nice if there was a curated downloadable subset of Wikipedia pages for that kind of purpose. (Time-traveler in past, stranded/post-apocalyptic survival, history pages but only the text parts.) Even if only mundane events occur, it also represents a certain cross-section of user-interest.

That said, one more daydream item to add to the pile: "What if I was somehow lost in space, what useful information could I provide for maybe locating my home-planet again?"

____

Without Wikipedia, the best I can think of are notable conditions that could help them filter existing records, often using ratios to avoid having to define measurement units, ex:

1. It's the 3rd planet from the Sun, the majority of it is covered with water, etc.

2. It has an unusually large single moon of 1/6 its mass.

3. The 5th and largest planet in the system has a little over 70% of all the (known) planetary mass.

4. The 5th planet is 5.2x further from the Sun than my 3rd planet.

5. The 6th planet still has notable rings which haven't decayed yet.

With Wikipedia, perhaps I could tell them about pulsar maps or whatever. [0]

[0] https://voyager.jpl.nasa.gov/golden-record/golden-record-cov...

qingcharles · on Dec 9, 2023

I asked GPT4...

Locating the Sun from an unknown position in space, especially if you're many lightyears away, would be a complex task. Here are some strategies you could use:

1. *Spectral Analysis*: Every star has a unique spectrum, and the Sun's spectral signature is well-documented. If you have the capability to perform spectral analysis on nearby stars, you could compare their spectra to that of the Sun. This, however, assumes you have a database of stellar spectra, including the Sun's, for comparison.

2. *Position of Known Stars and Constellations*: By observing the position of known stars and constellations, you might be able to estimate your location. This becomes more difficult the further you are from Earth, as the relative positions of stars change with distance, and familiar constellations may become unrecognizable.

3. *Parallax Measurements*: Parallax is the apparent shift in position of an object when viewed from different perspectives. Measuring the parallax of nearby stars compared to more distant background stars could help you triangulate your position. Knowing your position relative to the Milky Way galaxy would be essential in locating the Sun.

4. *Galactic Landmarks*: Identifying key features of the Milky Way, such as the galactic core, spiral arms, or specific nebulae, could help orient you within the galaxy. The Sun is located in the Orion Arm, about 26,000 lightyears from the galactic center.

5. *Radio Signals*: If your ship can detect radio signals, it might catch broadcasts from Earth. The Sun would be in the general direction of these signals. However, depending on your distance, these signals might be very weak or non-existent.

6. *Automated Stellar Navigation Systems*: Use any advanced stellar navigation systems on board. These systems could automatically identify stars and calculate your position relative to them.

7. *Consult Star Maps and Databases*: If your ship has a database of star maps, you could use these to match up visible stars with known stars and work out your position relative to the Sun.

8. *Communication with Earth*: If communication with Earth is possible, getting assistance from ground-based observatories or space agencies would be invaluable. They could provide information based on your last known position and trajectory.

rokkamokka · on Nov 27, 2023

IIRC the fact that our moon is almost the exact same size as our sun when viewed from the surface is rather rare as well

hgomersall · on Nov 27, 2023

There's some sweet spot where technology is relatively easy to improve upon with Wikipedia, but one can also find wires and generators. Some time in the first half of the 19th century is probably good. It gets easier if you're allowed a solar charger to take with you, then you can improve tech in loads of ways (though probably best to restrict yourself to post enlightenment if you want to keep intact).

agrimonyhal · on Nov 28, 2023

You might enjoy this book: https://www.goodreads.com/book/show/39026990-how-to-invent-e...

dylan604 · on Nov 27, 2023

one of the other apps to I would want to have would be an astronomy app. also, in your "on my person" thing to consider would be a solar charger. otherwise, when the natives come to you after the battery dies, your magic abilities are less godlike.

Nextgrid · on Nov 27, 2023

> Step 2 of course is figure out how to fashion some sort of charging system so my magical godlike pocket super abacus continues to function after ~days

In the good old days of USB 2 I'm sure you can figure out how to get a stable-enough ~5V power supply that would charge the smartphone.

With USB-C you'd need to reinvent IC manufacturing and figure out a needlessly-fragile and overcomplicated protocol before the phone would even accept any power. Yay for progress!

ianburrell · on Nov 27, 2023

USB-C works without USB-PD. Plain USB-C is simpler than USB-A since it supplies 1.5A or 3A based on resistors. The USB-C to USB-A and microUSB adapters are just resistors to use legacy USB. Legacy USB power is fairly complicated with different standards for 500mA, 1A, and 2.4A power. The reason it looks simple is that cheap chip is in every charger and device.

You can probably provide at least 2.4A 5V on USB-A, just like providing 3A 5V would be fine with USB-C.

lxgr · on Nov 27, 2023

USB-C is a superset of USB-A/B, so you can use exactly the same simple protocol: A short of the D+ and D- pins, per USB Battery Charging.

For a USB-C sink, you don't even need any resistors beyond that, let alone USB-PD. For a USB-C source to be able to charge a legacy USB-B sink, all you need is a single 56 kOhm resistor.

_kuvn · on Nov 27, 2023

To my understanding, there are ways of wiring a USB-C cable such that only four contacts are active, and behave as a USB 2 cable? (although you can only do USB 2 charging over it, so no PD)

RockRobotRock · on Nov 27, 2023

are you suggesting we make the USB standard so easy a caveman could do it?

canpan · on Nov 27, 2023

While not necessarily related to Wikipedia, I focus a lot on the offline usage of my phone: I fly a few times per year and I like to travel. Even in 2023, internet is not always working so great. Two important things for me are:

* Offline Map - The times I need a map and I am offline are strongly correlated. * Stardew Valley - Great game, pay only once, play offline, no ads.

Wikipedia does sound cool though. Maybe I'll trial one of the reduced size versions.

123pie123 · on Nov 27, 2023

I strongly recommend Organic Maps for a free offline map application

FredPret · on Nov 27, 2023

Organic Maps: 10/10

Completely offline, shows more info about the landscape than Google/Apple Maps, especially for hiking, has almost all businesses, has driving directions.

solardev · on Nov 27, 2023

It heavily depends on your area though. It's often way behind Google Maps, in every populated place I've tried it in the USA. (Businesses are years out of date, roads are missing or incorrect, etc.)

I do contribute to OSM, but day to day I find Google Maps + Alltrails + Trailforks to be more practical (all paid, closed source, but high quality data).

Google Maps also has free offline tiles. Those other two have paid offline maps, but it's worth it to not have to always wonder if your map is accurate.

thinkling · on Nov 27, 2023

A tip about offline Google Maps:

When your phone can “hear” a cell tower (and thus thinks it’s online) but doesn’t have enough signal to get any data, apps like Google Maps and Gaia will often hang instead of showing you the downloaded map data that’s on your phone.

Put your phone in airplane mode to fix that behavior.

123pie123 · on Nov 27, 2023

you can also add and change that map - openstreetmap.org

it's very rewarding when you see your changes being used by other people and apps

xanderlewis · on Nov 27, 2023

> Stardew Valley - Great game, pay only once, play offline, no ads

It’s a sad indictment of the current times that this isn’t completely standard.

1970-01-01 · on Nov 27, 2023

This would be great if it was an optional part of the wipe process. Turn old Android phones into offline encyclopedias.

marcod · on Nov 27, 2023

Especially since the mentioned app apparently doesn't run on the newest Android version...

fmajid · on Nov 27, 2023

Hmmm. The full English Wikipedia Zim went from 87GB 1.5 years ago to 103GB now. While 512GB microSD cards are fairly reasonable nowadays, iPhone users will struggle.

speedylight · on Nov 27, 2023

With the iPhone 15 you could easily plug in a USB C SSD and download it onto that—would probably require a custom client to read the data from the external drive though.

fmajid · on Nov 27, 2023

I tried on my iPad Pro M1 using a Crucial X9 Pro drive, and Kiwix works just fine. The non-pro iPhone 15 with its USB2 speed USB-C port won’t be snappy, however.

thinkling · on Nov 27, 2023

Are there cases yet that will take an SD card and provide a pass-through USB-C plug? (Much like the battery hump cases?)

Didn’t find it on a superficial search.

flyinghamster · on Nov 27, 2023

Or any other phone lacking a microSD slot. I'd assume it works that way on an iPhone, but I can at least plug a flash drive (or card reader) into my Pixel 7a's USB-C port. Not quite as nice as having always-ready extra storage, but handier for quick swaps.

isametry · on Nov 27, 2023

Current iPhones are available with up to 1 TB of storage. I’d argue that whoever would even think about downloading all of Wikipedia on their phone is a particular type of user and their choice storage will be on the northern side.

fmajid · on Nov 27, 2023

I do have the Maxi Wikipedia ZIM on my iPad Pro, which has 1TB.

I upgraded my wife's iPhone X to a 15 non-Pro, maxed to 512GB, but she has so many videos I doubt the ZIM will fit. I myself will keep my iPhone 12 until it either breaks or they no longer provide security updates for it. Not everyone changes to the latest iPhone.

cm2187 · on Nov 27, 2023

Is it compressed? Also I presume the html must gross up the number.

ohlookabird · on Nov 27, 2023

Back in the year 2001/2002 (I think) I remember doing this with a Wikipedia plugin for Rockbox (https://www.rockbox.org/), which I had running on my Archos FM Recorder. It was truly awesome to have all this knowledge in the palm of your hands without omnipresent Internet.

tmikaeld · on Nov 27, 2023

Makes me curious on how useful a tiny AI model based just on wikipedia would be on mobile phones.

Especially if it could also cite sources.

llmnonsense · on Nov 27, 2023

A (tiny) AI model can not cite sources, it can only hallucinate citations. A (tiny) AI model can not create a representation of knowledge or truths based on sources it knows. It can not provide any truths except by random chance.

You would have an hallucinating make-up-some-fake-facts machine on your phone.

simonw · on Nov 27, 2023

"A (tiny) AI model can not cite sources, it can only hallucinate citations."

I don't think that's definitely true, if you build the system around it well. It would look something like this:

1. User asks a question. LLM extracts key concepts from that question to use as search terms.

2. LLM triggers a search of Wikipedia, getting back snippets of pages, each with their page identifier.

3. LLM is fed those snippets along with the user's question and instructions to cite pages that it uses content from.

4. LLM generates a response which includes formatted citations. This response may be complete garbage, but...

5. Your code can at least confirm that the citations correspond to the pages that you fed into the LLM.

I've seen this approach work well with larger models. The open question is if it could work with smaller ones.

The 7B models (Mistral and its variants in particular) are getting VERY effective. I'm confident they could mostly-work for the above sequence... and you can just about run a 7B model on a phone.

The bigger question for me is if you could get this to work with a 3B model, since those are much more mobile-device friendly than 7B.

johnsillings · on Nov 27, 2023

agree. sounds doable and awesome.

KMnO4 · on Nov 27, 2023

Just for curiosity, why did you create a new account just for this comment?

llmnonsense · on Nov 27, 2023

I try not to engage in discussions on social media but some days I read something and duty calls.

Erratic6576 · on Nov 27, 2023

I suspect that chatGPT was shy about giving me exact quotes of Latin works for fears of disclosing its training sources and risk copyright infringement lawsuits.

KeplerBoy · on Nov 27, 2023

Meh, you could always embed the relevant section of articles into your prompt and have the model rephrase and cite that.

simonw · on Nov 27, 2023

I think this could work really well.

LLMs on mobile phones can be surprisingly capable. I've been having fun trying out this one on my iPhone: https://llm.mlc.ai/#ios

Getting that to work with a local copy of Wikipedia - where it can essentially run a "search_wikipedia(...)" function any time it needs extra information - would definitely be feasible, using this pattern: https://til.simonwillison.net/llms/python-react-pattern#user...

I think it would need to use regular text search though - you could try calculating embedding vectors for the whole of Wikipedia but even with a small embedding model I doubt you could fit those on the device, they'd take up too much space.

WhitneyLand · on Nov 27, 2023

The hybrid approach looks powerful, however I wonder how it would compare to taking the best ~7B size model and fine tuning it directly on Wikipedia?

Maybe impractical, but seems like it could potentially be more seamlessly integrated. Keeping in mind:

- Many models already include WP in their training, at least parts of it.

- The parts that are missing or a subset of what’s missing could be supplemented via finetuning.

- Data is commonly compressed about 100x during training (it’s lossy) so the total size might still run on a phone.

You’d then have a fully capable LLM, but enhanced to be comprehensively encyclopedic.

toomuchtodo · on Nov 27, 2023

Looking forward to Wikipedia building these models of their corpus and hosting them alongside the content. They could even offer as a mutable torrent cryptographically signed [1] (or rsync if you want to go old school).

[1] https://news.ycombinator.com/item?id=29513547

someotherperson · on Nov 27, 2023

Funny enough, I was doing this just yesterday, on the same app mentioned. I remember seeing offline Wikipedia hardware devices about a decade ago and decided it would be handy to have while on a flight. It was actually quite surprising to see how big Wikipedia has since gotten.

The subdivisions the app provides isn't enough, or at least I was using it wrong. For example, trying to download the lighter versions of the Geography section only seems to offer combinations that ultimately leave you without much usable information (i.e Paris only providing the introduction text and not the content from the later sections) unless I go for the much larger size.

interloxia · on Nov 28, 2023

I like the wikipedia dumps with Aard 2. Excluding images, it's only about 21gb which was small enough to use years ago when I had spotty connectivity and small enough to leave lying around today.

https://aarddict.org/ (available on F-Driod)

dobladov · on Nov 27, 2023

> This app isn't available for your device because it was made for an older version of Android.

I hope they update it.

linkpuff · on Nov 27, 2023

This reminds me of Aard2, it does the same job as the one mentioned in the article, but the dumps are really small (for example, the wikipedia english one uses around 20 something GBs) No images on it, but the app loads them from the source when needed

davidgerard · on Nov 27, 2023

the other cool thing about Kiwix is that you can download other MediaWiki sites for it, there's a whole selection: https://library.kiwix.org/

hombre_fatal · on Nov 27, 2023

That language dropdown list is insane. I can't find spanish/español. And since it's not sorted, I can't rule out that it doesn't exist without scanning the whole thing.

lxgr · on Nov 27, 2023

It's obviously right between Southern Sotho and Shqip!

seba_dos1 · on Nov 27, 2023

Made me think of the old https://en.wikipedia.org/wiki/WikiReader from 2009.

bionhoward · on Nov 27, 2023

This is great, I have difficulty with the navigation though, it seems you can get stuck in some part of the library and there’s no button to go back to the beginning.

uzurpator · on Nov 28, 2023

I don't remember the name of the educational project that made knowledge accessible. It fit on a usb driver.

baggy_trough · on Nov 27, 2023

Keep Wikipedia around on your iPad in case the world comes to an end (and have some solar panels for charging).

matthewfelgate · on Nov 27, 2023

Hmm, could you leave out the least visited articles to reduced the size massively?

OJFord · on Nov 27, 2023

Our usage may differ, but I would think I'm probably most interested in some relatively little visited articles, because I'm there looking something up not knowing about - versus say 'United Kingdom (country)' is probably quite highly visited in total, but not that likely to be visited.. if that makes some sort of sense.

Might be better to ignore certain topics you know you're not interested in. (That way if there's no result, you know there's either truly no result, or that you don't care anyway because you're not interested.)

Or perhaps most effective for the size goal would be just to cut images. Perhaps selectively include them depending on category - e.g. maybe you want them for science & technology categorised articles (diagrams) and geography, but can do without otherwise.

hateful · on Nov 27, 2023

From the article:

> Kiwix also allows you to download subdivisions of wikipedia — like the top 1 million articles at 43.53GB

ellisv · on Nov 27, 2023

The frequency of article visits probably resembles a Pareto distribution. So there would be many many articles with very few views, including new articles.

I’d think the good ol’ Page rank would be a better metric for deciding which ones to prune.

hombre_fatal · on Nov 27, 2023

Kiwix lets you download a pack of the top 100,000 articles on your phone which is more than enough for offline browsing.

iandanforth · on Nov 27, 2023

Can this be done for a Kindle?

isuckatcoding · on Nov 27, 2023

Total noob question: Would an LLM approach reduce the total required storage?

_kuvn · on Nov 27, 2023

I'm not quite sure what you mean here - English Wikipedia is ~60GB, and there are definitely models trained on Wikipedia that are less than 60GB (eg: Llama 7B), but that's a lossy store of information - you won't get perfect recital of a Wikipedia page from an LLM

Obscurity4340 · on Nov 27, 2023

Kiwix

Mouthfeel · on Nov 27, 2023

Ice hockey articles are ~.5% of wikipedia?

gwbas1c · on Nov 27, 2023

Yeah, I find that odd too. I wonder if there's some kind of algorithm that downloads what the hockey articles link to, and thus a lot of what's called "Hockey" could be things like the articles about the cities that the teams are in, articles about the physics of water freezing, ect.

DitheringIdiot · on Nov 27, 2023

My guess is that set has higher resolution images, just because it can.