More

MoSal · on May 6, 2017

I wrote cargo-esr[1], an alternative tool for searching crates, with the purpose of narrowing down good choices.

Feedback welcome.

[1] https://github.com/rust-alt/cargo-esr

MoSal · on Jan 11, 2017

What's syntactically obscure about Rust?

I only knew(and liked) C reasonably well before Rust. And nothing felt obscure when I started learning it.

I can only remember not getting what `||` meant (in the context of defining a closure without args). The positional meaning of `||` and `&&` is the only thing, I can recall right now, that can be considered obscure syntactically (for C developers at least) . They should have gone with literal `and`/`or` for the operators IMHO.

Thiez · on Jan 11, 2017

I suppose it can get a little silly when you mix closures without arguments and the logical 'or' operator', e.g.:

    // prints 'true'
    println!("{}", true||(||true)()||(||true)());

Of course you wouldn't actually do this unless you're being intentionally obscure. Another favorite of mine:

    // prints '()'
    println!("{:?}", (||||||||||())()()()()());

But I can't recall ever running into such silly code, in practice the closure syntax and logical or do not lead to confusion (imho, ymmv).

MoSal · on Jan 11, 2017

> in practice the closure syntax and logical or do not lead to confusion (imho, ymmv).

That's true. But put your self in the mind of a C developer looking at Rust code for the first time:

     if a || b {
       println!("True");
     }

Cool.

Then:

     thread::spawn(|| println!("Hello from a thread!"););

What? What is the logical or doing there?

----

IIRC, there are also cases where you have to write `& &` instead of `&&` to not confuse the compiler. That's a design/practical issue.

Both those issues would have been avoided if literal `and`/`or` were used.

I find it interesting how the only thing that momentarily confused me, as a C developer, about Rust syntax, was caused by Rust authors not wanting to syntactically deviate too much from C.

MoSal · on June 7, 2016

Is this a problem where multiple connections wouldn't help?

I can add an option to saldl[1] to use a new connection with each chunk. But I'm not sure there are real world examples where this would help.

[1] https://github.com/saldl/saldl

MoSal · on May 27, 2016

Interesting.

Why is this interpreted as a relative path?

     cd /etc
     curl file://resolv.conf

throwanem · on May 27, 2016

Well, it's ill-formed, so what the UA does with it isn't really specified; cURL presumably treats it as a path relative to . because that's what the developer(s) decided that cURL should do. (For comparison, Firefox transforms it into file:///resolv.conf, then tries to find /resolv.conf on localhost and fails.)

MoSal · on May 6, 2016

Christian Grothoff is an excellent academic. In fact, he is one of the most knowledgeable in the field worldwide.

Unfortunately, that's why ,IMHO, GNUnet didn't succeed. To build a successful product/network, you need to be practical, and you need to make useful features/services available as early as possible (without compromising security of course). Designing with pluggability and forward-compatibility in mind helps in this regard.

Academic perfectionism, however, can delay your product/network launch indefinitely. And that's what seems to have happened with GNUnet.

MoSal · on May 4, 2016

Are there any plans to support archiving Web 2.0 pages?

More and more people are starting to rely on "archive.is" as it handles Web 2.0 content without issue. But I'm concerned about the survivability of that survice, and whether it can handle big growth.

MoSal · on April 30, 2016

If all you intend to provide in that regard is "Better keyboard shortcut support". Then no, Vimperator functionality will not be supported.

MoSal · on April 4, 2016

I'm deeply concerned about popularizing torrent streaming.

A few leechers breaking "rarest-first" might not cause much harm. But if most leechers become streamers, torrents will lose their efficiency in distributing less-popular content.

Transmission implemented a "streaming" feature once, that didn't actually stream. It just stopped fetching pieces "with the same rarity" randomly. They still got too much heat for adding that not very usefull feature. And they reverted the commits soon after.

KMag · on April 4, 2016

Former LimeWire engineer here. I implemented randomized chunk selection for swarmed downloads in LimeWire.

I think the proper swarmed streaming solution is to make the percentage chance of requesting the rarest chunk be a smooth function of the number of replicas of the rarest chunk. If the rarest chunk has only one source, the probability should be 1.0. If there are multiple chunks that are all the most rare, you probably want to randomly select which chunk to request next, with an approximately exponential distribution rather than uniform distribution. Y probability of selecting the first chunk, Y^2 for the second, Y^3 for the third, etc. You'd want to run some simulations to fine-tune the probability function and also the Y percentage.

What I did for LimeWire was (1) if the MIME type wasn't on a streamable whitelist, download all chunks in randomized order. (2) if the file was a streming type and less than 10% complete, start downloading the available chunk closest to the front of the file that isn't currently in progress (3) if the file is a streamable type and 10% to N% complete, randomly select either in-order or randomized selection with probability X. (4) beyond N% complete, always use random chunk selection. I'm pretty sure N was 50 and nearly certain X was 0.5. I originally proposed making X a smooth function of the % downloaded instead of 0, 0.5. 1.0 stair-steps, but the lead developer strongly preferred stair-steps.

The random selection algorithm actually tried to keep the number of ranges of bytes (extents) below 5. So even after 50% downloaded, you still have a 25% chance of getting in-order downloading.

The reason I used randomization instead of rarest-first was that it was my first change to LimeWire, and this was the least invasive change to make. At that time, LimeWire had a global list of verified downloaded chunks and a global list of in-progress chunks, but no global counter for number of replicas.

KMag · on April 4, 2016

Oh, and if the user was idle more than something like 5, 15, or 30 minutes, LimeWire would switch to random chunk requests regardless of MIME type, assuming the user didn't need a streaming download. I hope full-screen media players prevented the user from being counted as idle.

I vaguely seem to remember WMV and ASF also needing some information from footers in the file, and therefor also prioritizing the last MB of the file.

This was all implemented using the Strategy object oriented design pattern, to make it easier to play around with many alternatives and make specialized strategies for specific MIME types.

misingnoglic · on April 4, 2016

> Former LimeWire engineer here

Wow that's an entity I haven't heard in a long time. It would be super interesting to hear more about the engineering behind the software, do you have any blogposts anywhere?

KMag · on April 5, 2016

No blog entries specifically about LimeWire, but I do have a few observations that maybe I'll blog about:

    (1) Merkle trees are tough to get right
      (a) Bittorrent's BEP 30 is vulnerable
      (b) A small tweak would have allowed Gnutella's THEX to carry a proof of file length [0]
      (c) Use the Sakura tree construction [0]
      (d) There was an attack against LW where one could respond quickly with a bogus THEX root for a popular SHA-1
      (e) The THEX root should have been the unique identifier in both DHT and query responses
    (2) Using HTTP for data transfer was definitely the right choice
      (a) It uses X-alts and X-nalts "experimental" HTTP headers for swarm control
      (b) I prototyped an Apache plugin to allow it to transparently participate in Gnutella swarms
      (c) HTTP/2.0 would be ideal now
    (3) Gnutella uses query broadcast
      (a) exponential fan-out means most traffic is in the last hop
      (b) if the fanout is 19:1, 95% of traffic is the last hop
      (c) LW used Bloom Filters to often skip the last hop
      (d) We should have used mulitple hash functions in the Bloom filter
      (e) Adding new hash functions is backward-compatible, at the cost of increased query traffic during transition
    (4) LW connection handshake includes the 32-bit serial number of the latest XML version message
      (a) The message is signed using DSA
      (b) Newly signed XML messages propagate to 95% of the network within 60 seconds
      (c) We accidentally DDoSed our servers by having everyone come for updates at the same time
      (d) So we added user alert time randomization parameters in the XML message
      (e) There was no mechanism to roll over or expand version message serial numbers.
      (f) We could have locked ourselves out of asking users to upgrade by signing an INT_MAX serial XML message.
    (4) We wrote a minimal C++ agent capable of downloading the latest free LW version from LW nodes
      (a) SHA-1 of the free installer is part of the signed XML version message above
      (b) SHA-1 was checked before running the full installer, preventing malware injection
      (c) It was great for saving bandwidth and reducing legacy support
    (5) I misplaced a paren in LimeWire QueryKey crypto code (later fixed)
      (a) QueryKeys prevent turning the LW network into a DDoS botnet
      (b) I knew the code wasn't behaving quite right
      (c) I convinced myself that my reasoning was wrong and the code must be right
    (6) Random seeks are tough on equipment
      (a) Apache would kernel-panic OSX on random HTTP range requests (ca 2006)
      (b) Anecdotally, random block download order wasn't great for hard drive life
      (c) Random download order code tried to minimize number of file extents
         (i) Saves bandwidth in describing what you have
         (ii) Might be better for hard drive life

[0] http://kmagsoftware.blogspot.hk/2016/02/on-content-addressed...

catwell · on April 4, 2016

This could be true, however I think that if all this torrent streaming sticks to same behavior such as that of XBMC torrent [1] (i.e. streaming for the whole duration of the movie) the swarm will end up stronger, not weaker.

[1] https://github.com/steeve/xbmctorrent/blob/e6bcb1037668959e1...

MoSal · on April 4, 2016

Suppose there is only one seeder and 2-3 leechers. All leechers managed to get 50+%. Now, if the seeder disappears, the leechers should be able to finish the download from each other. If all initial leechers are streamers, that download will never finish for them or anyone else.

Supposedly long seed intervals will not help the swarm in that case.

RyanZAG · on April 4, 2016

Streaming isn't going to work there anyway.

In reality, there are 100+ seeders and if you had a constant supply of streamers who spend half their time seeding (assuming they download the whole movie by the 50% mark), then you have a very very healthy swarm.

deepnet · on April 4, 2016

Unfortunately in many juristictions seeding is copyright infringement and leeching is not.

Seeding by default sadly gets a lot of newbs into trouble - especially as publicising IP addresses are part of the protocol.

Porn blackmail companies and MPAA agents know that seeders are low hanging fruit.

Similarly Limewire and ilk using the downloads folder as a default share folder is useful for the health of the network but this has led many to be unwitting uploaders - which is what they got done for.

Jammie Thomas is a case in point, newb music fan (or her kids I recall) but the sharing by default is what she was convicted of - for $30,000,000.

No one has ever been convicted of downloading alone - they don't bother trying.

So seeding by default can be very cruel - sadly.

Even with that caveat, those like Jammie, brought up on Sesame Street were taught to share and don't know how severe tne penalties can be.

That sharing is or can be wrong is now taught at a nursery level.

claudius · on April 4, 2016

> Unfortunately in many juristictions seeding is copyright infringement and leeching is not.

At least in Germany, both leeching and seeding are copyright infringements as soon as you upload any data back into the swarm. Since leeching also does this (though not exclusively), it is also copyright infringement. Pure downloading is not, though, which is why streaming websites (just downloading, no uploads) are fairly popular here.

deepnet · on April 4, 2016

Yes by leech only I meant downloading alone.

Well clarified thanks.

amelius · on April 4, 2016

This is why bittorrent should have included passwords. If everybody uses the same password (e.g., "cyberpunk"), then everybody could download each other's files, while the liability would lie purely with the leecher (because he broke into someone else's machine by guessing the password).

loup-vaillant · on April 4, 2016

> That sharing is or can be wrong is now taught at a nursery level.

Really? This is yet another step closer to "The Right to Read". Such a waste. We should end scarcity, not create it artificially.

maldusiecle · on April 4, 2016

Where are you getting 30 million from? Her initial fine was $220k, a later appeal upped it to 1.2 mil, then back to $54k, and the final ruling was $220k. (Source: wikipedia.)

Which is still a crazy result, but it helps no one to make up numbers that are several orders of magnitude higher than the real ones.

deepnet · on April 4, 2016

Yes of course, thanks for the correction - I'd edit but the hour is up - going back over the case, enraging injustice.

The sum you mention was the 1st amount, on appeal this was raised to $1,920,000, then lowered to $54,000, appealed again to $1,500,000, then appealed to $54,000, appealed finally in 2013 to $220,000.

She denied ever using Kazaa and no such files were recovered from her hard drive.

Conviction was on IP address alone from MediaSentry.

* https://en.m.wikipedia.org/wiki/Capitol_Records,_Inc._v._Tho...

Over six years in court, mess your life up some, threats began in 2005, so 8 years of harrasment over 24 songs.

IMHO the many years she was dragged through the courts already more than sufficient.

Flippin bonkers.

branchless · on April 4, 2016

A regular person hit for 30 mm. USA is insane.

_clhx · on April 4, 2016

> They still got too much heat for adding that not very usefull feature.

I don't get this. It is definitely useful.

I'm not saying don't upload: the client still seeds. It just downloads the contents in order.

MoSal · on April 4, 2016

It wasn't very useful because it didn't actually force streaming. The "rarest-first" rule was still respected.

It wasn't optimal for the swarm. Yet, the user didn't really get what he/she wanted.

amadvance · on April 4, 2016

Can the seeders enforce the rarest-first approach giving priority at lechers that ask for the rarest chunks ?

MoSal · on March 3, 2016

The URL does not work right now. But I tried another one from the same site.

No client can get this right, always. aria2c is not more reliable. It's just choosing to take the filename from the redirect URL. It appears to be the right thing to do in this case. But it would fail if the start URL was actually the one that had the right filename.

Hosts can use the Content-Disposition header if they want to make sure all (capable) clients get the right filename.

In saldl, I implemented `--filename-from-redirect` to handle your use-case. But It's not set by default.

notfoss · on March 4, 2016

Thanks for the explanation. But generally I have found aria2 to be more reliable in such scenarios.

MoSal · on March 3, 2016

Try saldl[1]. It depends on libcurl. So protocol support should be good and reliable.

[1] https://github.com/saldl/saldl

contingencies · on March 3, 2016

https://github.com/saldl/saldl/wiki/saldl_vs._aria2

MoSal · on March 3, 2016

FWIW, this page is incomplete and outdated.

For example, `--mirror-url` was implemented. So, it is now possible to download from two sources concurrently.