More

mkaufmann · on Feb 2, 2015

Yes because the recovery menu won't be accessible than (which is needed to replace the sticky key executable)

The settings can be changed with bcdedit:

    bcdedit /set {default} recoveryenabled No 

    bcdedit /set {default} bootstatuspolicy ignoreallfailures

Additionally booting from USB/... should be disabled in the BIOS/UEFI options and also access to that should be password secured.

Further more because the person has physical access, the computer should be locked away so that the harddrive can't be accessed. Also all cables should be secured so that no sniffer can be plugged in between. This also especially includes the USB ports on the monitor if those are enabled.

arca_vorago · on Feb 2, 2015

Note: Make sure to run bcdedit commands with elevated privileges. I am currently working on adding this into my ansible test AD replacement.

mkaufmann · on Jan 26, 2015

Driving on a german highway with autonomous cars has already been done as already as 1996 by Mercedes[1]. But same as with the google cars they used very special hardware which would be too expensive for production use. The current challenges are to make autonomous driving happening with fewer and cheaper sensors.

Also having a real highway allows testing scenarios like partially blocked street due to construction, testing at night, with rain, with snow, etc. Google's cars are currently not working under such conditions so there are still many problems to solve.

[1] http://wwwlehre.dhbw-stuttgart.de/~reichard/content/person/d... (Sorry no english news)

mkaufmann · on Dec 18, 2014

While I like the construction of the column store and the corresponding API. The claims of the author don't really make sense:

> "... columnarization, a technique from the database community for laying out structured records in a format that is more convenient for serialization than the records themselves."

Column stores in comparison to row stores don't offer any serialization benefit per se. The main benefits are the following, I will be using a record (A,B,C,D,E) as example with all types u32 (4 bytes):

* If you only use some fields you have to load less data from memory/disk into the CPU cache and your working set is more probable to fit into cache. For example when filtering only the records where A=22 and B=45 you only have to actually load x(sizeof(A)+sizeof(B)) = x8 bytes instead of xrecord_size=x20. This can make a very significant difference. * When using compression to reduce the size of data, columns can often be compressed better because they only contain data of the same type and nature and thus probably share similarities. When using such a small record consisting only of integers it probably won't make a difference. But if e.g. some fields are country abbreviations, textual description or others are ids, one could easily imagine that there are gains.

Coming back to the point about serialization, using the same technique as described in the blog post, there won't[1] be a performance difference between column storage and row storage (e.g. using a struct). The method described in the blog post just lets the data array of the original vector be wrapped by a Vec<u8> without even moving the memory, so the method is independent of the data type that is stored in the vectors. Of course it will only work for data types that do not contains references, otherwise we could get illegal memory access after deserialization (which should be guaranteed by the rust type system because only Copy types are allowed).

The only thing this benchmark is testing is how fast a vector can be initialized.

[1] There can be an space improvement of keeping the data in a column layout compared to row layout when using normal structs. Normal structs normally align the total size to the size of the largest field in the struct. A struct containing i64 and i8 would contain 7 bytes of padding. In a column layout this overhead would be avoided. Still there would not be an improvement in this serialization scheme as it does not actually copy any data.

frankmcsherry · on Dec 19, 2014

Hi, author here.

Unless I misunderstand your post, I think you may have missed important parts!

More is going on than "just wrapping the data of the original vector as a Vec<u8>". That is what happens when we are handed a Vec<uint>, but for other inner types (pairs, vectors, etc) there is more to do (and, actual code presented showing what that work is).

1. A Vec<(u8, u64)> definitely ends up as a (Vec<u8>, Vec<u64>), thereby avoiding padding you'd have if you just wrote out the elements as structs. You absolutely end up saving space, and part of doing this is definitely copying data. (responding to "as it does not actually copy any data").

2. The types themselves can be vectors, corresponding to a struct with owned pointers (whose elements can also own pointers, etc). This is not something that just casting the source array will deal with. It's important here that Rust hands you ownership, as otherwise it would be totally inappropriate for us to claim the underlying memory (which the code does). That part, recycling owned memory, is one of the big performance wins (about 2.5x faster for me than invoking the allocator each time I need to mint a new array).

3. "Copy" doesn't appear in the first post. The types don't have to be Copy, and indeed Vec<T> is not Copy, even if T is. Not sure where you got that from. But, no such requirement; yay!

Read part 2 for more about Copy, and how when your type is Copy you get a free implementation that does what you suggest (keeping each struct intact), except you can mix and match with Vecs and Options and stuff like that.

Hope this clears up some of the not-sense-making. Feel free to holler with other questions.

Cheers, Frank

mkaufmann · on Dec 19, 2014

Addressing your points:

1. In my comment I acknowledge that there are space savings with types that include padding. I don't understand why you imply in your answer that I didn't understand this point. Regarding my comment about not copying data: it was based on the benchmark[1] as linked in the blog post. The parts you measure do not copy any of the user data! It only converts your internal vectors of user types to a list of vectors of u8 type. So what you are doing is essentially moving the data. But when using move semantics the size of the user data does not matter any more, so there won't be a difference between your column layout and using the more complex types directly inside the vectors if measured the same way as in this benchmark. In my opinion your benchmark is flawed and does not support the argument you make in your blog post.

2. Ok yes that was bad wording on my side. If you provide adaptors for complex types with pointers like Vec than you can of course also serialize those.

3. I guess this is just about the same argument as point 2) and thus redundant.

I made the effort for you to write a serialization framework which does not do "columnarization" but which simulates row layout and also added an adaptor for vec and ran it with the same benchmark, and also reran the original columnar benchmark on my machine. Both benchmarks were compiled exactly the same way. You can find my code here[2]. Here are the results:

==============================

columnarization <uint> 12.1 GB/s, 742k values/s

columnarization <(uint, (uint, uint))> 5.3 GB/s, 107k values/s

columnarization <vec<uint>> 2.1 GB/s, 54k values/s

columnarization <Option<uint>> 1.58 GB/s, 163k values/s

==============================

row (simple_serialize) <uint> 12.1 GB/s, 730k values/s

row (simple_serialize) <(uint, (uint, uint))> 8.83 GB/s, 164 values/s

row (simple_serialize) <vec<uint>> 2.13 GB/s, 56 values/s

row (simple_serialize) <Option<uint>> 8.82 GB/s, 263k values/s

==============================

You see that columnization does not have a performance benefit in your benchmark and it even is significantly slower for Option<uint> type and pairs.

In your blog post you never mentioned that columnization will only has the potential to bring performance benefits to de-/serialization when using types with large padding overheads. I think this discussion would probably helped the blog post. It would be much better if it would either omit the currently wrong performance argument and just focus on the nicely typed API or if it would use a proper benchmark which would support your argument, which from my understanding would be only possible in a very limited set of use case scenarios.

Here is a list of other valid arguments you could have made instead:

* Format saves space at the cost of performance.

* Better than repr(packed) as it will also work on platforms that don't support unaligned access

[1] https://github.com/frankmcsherry/columnar/blob/master/exampl... [2] https://github.com/mkaufmann/simple_serialize

frankmcsherry · on Dec 19, 2014

With all due respect, your code seems to be a copy/paste of my code, where you've removed the Pair and Option implementation (allowing the alternate default I discussed in my second blog post) and replaced my string `ColumnarVec` with your string `SimpleSerialize`. So, thank you for measuring my code for me? :D

The reason the new numbers look better for Option with the Copy implementation is that you are now writing 16 bytes for each Option. They are a 50-50 mix of Some(0u) and None, which I wrote out in 5 bytes on average (always a 1 byte "present/not", and then 4 bytes of data on average). The Copy implementation is just writing 3x as much data and padding the throughput numbers, rather than reporting something more like goodput.

I sense that this isn't the right place to shake this out, so I'll stop posting. If there is a better way to follow up, let me know.

[edit: I've got you through github, thanks! I'll buy beers when I swing by TU next]

mkaufmann · on Dec 19, 2014

The changes make all the difference. Your argument was that columnization and thus storing each native type in a data record in seperate columns brings a performance benefit. By removing the specialisation of Pair and Option which save the data in seperate columns I switched back to a classical storage which basically stores the data of one record at one place like a row store. So your code simulates a column store and my a row store.

Using your original benchmark I than show that the row layout brings a large performance win in the benchmark. My numbers show this performance win not just in throughput (bytes per sec.) but als in "goodput" (values per sec.). Check my previous comment. I just noticed though that I sometimes forgot the k in the reported numbers, where it is missing you have ti multiply the number times 1000, I can't edit it anymore.

I guess we can agree to disagree and should continue the discussion in another form ;)

PS: I just updated my github information, I am now at the TU Munich

mkaufmann · on Nov 26, 2014

Just for fun I tried to estimate the performance characteristics of this service.

My initial assumption is that the 2.2M pages per ~18h are the main workload. This is also supported by the chart at the bottom, outside of the 18h timespan there is hardly any baseload. The blog additionally gives the following facts: 18 c1.medium instances and ~60% utilization after the optimization (taken from the chart).

Now this allows us to calculate the time per page. First the time for the total workload per day is num_machines(cpu_time_per_machine)=18machines(18h*0.6)=194h of processing per day.

On page level this is than 194h/2.2M=317ms per page.

This feels really slow, and should even be multiplied by two to get the time per cpu core (the machines have two cpu cores)! I would guess that the underlying architecture is probably either node.js or ruby. Based on these performance characteristics the minimum cost for this kind of analysis per day is $25. For customers this means that on average the value per 1k analyzed pages should be at least $1.13. I think this is only possible with very selective and targeted scraping, given that this only includes extracting raw text/fragments from the webpages and does not include further processing.

mkaufmann · on Nov 15, 2014

Concerning your critique of the in your opinion overly hard scientific review process. According to her blog the paper was rejected because: "Reviewers at conferences [...] have been very skeptical, and pressed us multiple times to produce some kind of comparison of basic block versioning against tracing."[1] I think this is a very valid concern. When you describe a new scientific approach (Basic Block Versioning) you should compare to the state of the art. Otherwise it is very hard for the reader to judge the merit of the new approach. However I agree with your sentiment that a new approach should not only be judged by performance numbers (there is actually a nice article going deeper on this topic on databasearchitects[2]). But there should at least be a more thorough theoretical discussion than only 3-4 sentences. Benchmarks comparing to existing approaches in this case can help to show pathological cases which might indicate weaknesses of the approach or to empirically demonstrate the feasibility of the new approach.

[1] http://pointersgonewild.wordpress.com/2014/11/14/the-fastest...

[2] http://databasearchitects.blogspot.de/2014/09/experiments-hu...

EDIT: Many conferences also allow publishing papers without a deep comparison to exisiting research in the industrial session. This allows demonstrations of interesting implementation variants or system choices

EDIT 2: The review critisim in older blog posts "Conference reviewers criticized us for not discussing compilation times, and raised the issue that perhaps basic block versioning could drastically increase compilation times." is also very valid for a jit compiler. Again discussion does not have to mean that you have to be faster than all existing systems. Paper acceptance is always a little bit a random process, but at least in this case the review comments are valid from my point of view and her phd advisor should probably have detected these problems in proofreading before submitting. I really hope that the paper will finally be accepted!

mkaufmann · on Nov 4, 2014

Impressive acceleration time! The new record for accelerating from 0-100 km/h of only 1.785 seconds was just set a few days ago[1]. The car required less than 30 meters to accelerate to this speed. Of course it was only a prototype car but still impressive to see the limits.

[1]https://www.ethz.ch/en/news-and-events/eth-news/news/2014/11...

masklinn · on Nov 4, 2014

The current non-prototype (/production car) record holder is the 918 Spyder with 2.2s (and a slightly higher base MSRP of $845000)

coldpie · on Nov 4, 2014

Although they're already sold out, so you'll have to pick one up on the used market. ;)

giarc · on Nov 4, 2014

And that used price will most definitely be increasing as a 918 recently burnt up.

http://ca.autoblog.com/2014/09/29/porsche-918-spdyer-update-...

m3Lith · on Nov 4, 2014

And, for reference, top fuel dragsters can do 0-100 km/h in like ~0.3s.

masklinn · on Nov 4, 2014

Hence the emphasis on production cars, once you get into special racing or one-offs it's meaningless: you can just strap a soon-to-be-dismantled body on a rocket sled and launch it at 50+G.

mkaufmann · on Feb 4, 2014

On a related note there is a recently published paper exploring how to load CSV data into a database efficiently[1].

The authors get a loading throughput (including index creation!) of over 1.5 GB/S with multithreading. They don't seem to do any dirty tricks when loading the data as they immediately can run hundreds of thousands of queries per second on the data.

There also seem to be more interesting papers on the research project page http://www.hyper-db.de/index.html

[1] "Instant Loading for Main Memory Databases" http://www.vldb.org/pvldb/vol6/p1702-muehlbauer.pdf

mkaufmann · on Jan 30, 2014

Great article, I am really eager to see the solutions in github. I guess there won't be many with c++ solutions.

I only found the challenge two days before the end. My goal with the challenge was not to get the best solution, but rather use it as an exercise to practice the different languages and finding intuitive solutions.

Level 0 as you wrote was really just a two line change.

Level 1 was fun, because I tried to work with the existing bash structure as far as possible and just replaced the hot loop with inline python (something new learned). I found it interesting that this way I could directly inject the variables into the python program without having to load them as env variables (see [1]).

In Level 2 I did too much work, I did not think simple enough. I assumed that the different test cases would differ a lot and that a static divider would not help. My assumption was that the histogram of requests per second across the different IPs would have a bimodal distribution[2], thus I used the excellent "fast-stats" library to get a histogram and ban clients based on that. The library even offers approximate histograms, so even if there would be thousands of requests it would still scale.

To solve Level 3 with the given code required to changes:

-(i) Partitioning the data across the three server. I only loaded a subset of the files based on the server id (which was conveniently already available in the search servers) and

-(ii) Loading the files in memory so that they don't have to be read from disk for every search term. This was enough to pass this level. The code was too long for the gist but I can put it on github if someone wants to take a look at it. No fancy index structure required. I just scanned The files linearly on each request.

Mastering Level 4 was not possible for me, I tried working with the go-raft library and got the integration with unix sockets working, but somehow the system got unstable after a while. Fixing this was really frustrating and in the end I gave up because I could not find the problem and time was running out.

I congratulate all ~200 persons which also passed this last level. This was much more difficult than the other ones!

[1] https://gist.github.com/mkaufmann/8716922 [2] http://en.wikipedia.org/wiki/Bimodal_distribution

recentdarkness · on Jan 30, 2014

well to level3 I can tell you, I just used grep embedded in a nodejs server application :D

Everything boiled down to this: "cd " + data_path + "; grep -rno " + key + " | cut -d: -f1,2 | sort | uniq"

I read the result from stdout and transformed the result into json that was enough to make it. I did not want at that time to spend more time on that level ;)

mkaufmann · on Jan 30, 2014

Damn ok, so I also missed that opportunity. Great find!

recentdarkness · on Jan 30, 2014

I forgot to add: I cached every result for every key which was searched for, because there was quite often that it asked 3-5 times for the same term in a row. So I had an instant result in a simple javascript object :D

fragmede · on Jan 30, 2014

Mind posting your code for level3? I got that partitioning was the main trick but I am too unfamiliar with scala to reassemble the responses properly.

ajtulloch · on Jan 30, 2014

FWIW, It was possible to pass Level 3 with ~5 lines of code changed - https://gist.github.com/ajtulloch/e05f75aaa6ba3b5b0241#file-... for my solution.

brown9-2 · on Jan 30, 2014

Here is how I merged the responses: https://gist.github.com/mattnworb/8717911

necubi · on Jan 31, 2014

Or a bit cleaner using lift-json: https://gist.github.com/mwylde/8724700.

dkhenry · on Jan 30, 2014

I did both all my solutions in C++. I am going to clean them up a little and post them in my github account

github.com/dkhenry

mkaufmann · on Jan 27, 2013

See my comment here [1] why this is an pretty weak argument-

[1] http://news.ycombinator.com/item?id=5125902

tkahn6 · on Jan 27, 2013

Thanks, that's informative. I agree that it weakens my argument

mkaufmann · on Jan 27, 2013

I am not a social media expert, but two points why comparing trending topics in France to the US is very difficult (without any statement on which country is more racist):

1. Twitter is much more mainstream in the US than in France and other European countries. One source I found shows the active users by country [1]. Using those numbers and assuming 320 million population for the US and 65 mil. for France one can see that 7.15% of US americans are active per month while only 3.3% of French are active per month. My interpretation is that the twitter users in France are much less representative than in the US. Also it is much easier for a very small but active community to break into the trending charts. Another hypothesis which I sadly could not find any numbers for is that the average twitter user in the US is posting MUCH more than the average french. Thus it is much more harder for fringe opinions to rise to the top trending.

2. Trending hashtags don't have to mean people have to endorse it. It could be a critical retweet or other kind of dismissal of racism. So just the fact that a racist hashtag is in the trending topic could also be interpreted as that people in france give a much larger response to racists tweets and try to express their dismay and thus circumstantially also push that hash tag. (I don't want to say that this actually happened)

But just these two points alone show why trending hashtags are a very weak indicator.

[1] http://www.mediabistro.com/alltwitter/twitter-countries-acti...