data_hope's comments | Hacker News

data_hope on June 15, 2017 | parent | context | [–] | on: Developers who use spaces make more money than tho...

The effect in the data (unless it isn't random fluctuation. At the moment I'd go with the assumtion that it is a genuine correlation, as the effect is present over all subgroups), might (1) not be monocausal, i.e. a combination of contributing factors like development experience, age of IDEs/tooling, etc. might play a role as well as other aspects. (2) a cause might not be "explained away" because the control variable was considered. Let me elaborate

The years of experience variable might be taken to explain away the effect of experience, and then conclude that tabs vs. spaces must be due to another effect than years of experience. But chances are, that "years of experience" and "tabs vs. spaces" are just correlated to a common, causal property (like "programming proficiency" or however you want to call it). Both "years of experience" and "tabs vs. spaces" are then just incomplete reflections of the underlying cause, both rendering the effect of the underlying cause incompletely.

What I am trying to say is: Its complicated, probably you won't be able to find the one true cause for the effect in the data. If this were physics, one could come up with a predictive theory to put this to the test. In social studies, we just cannot control the parameters well enough.

If you are interested in reading more on this, "Causality" by Judea Pearl is a good (but exhausting read).

The best discussions are usually marked off topic or 'no good fit" or "not of use for future readers"

wst_ on May 16, 2017 | | [–]

And that's OK because discussions tend to lean in different direction, very often, and before you know it you are talking about something completely different. If you are sticking to the topic and have something really important to add, don't worry, moderators won't get in the way.

alanbernstein on May 17, 2017 | | | [–]

Yea - that's my biggest beef with SO currently, what I thought OP would say when I asked him.

as a european, in canada and the US I was constantly confused by directions. america uses street names and cardinal directions (turn north on I-??? then west on ...). europeans think in terms of sequences of towns (to get to munich I must drive on the autobahn via Stuttgart, Ulm, Augsburg).

I once travelled from toronto to chicago by car and decided to write down my own directions from google maps because I felt the ones provided were useless. Boy, was I lost when I didnt see a roadsign for windsor/detroit.

sinxoveretothex on May 6, 2017 | | [–]

Here's a thought: the reason for the difference is due to how New World cities were artificially made rectangular whereas the Old World cities have a concentric circles growing out feel to them.

Go on Google Maps and look at a city like Montreal and you'll see that the streets are very rectangular.

Look at a city like Paris and you'll see a spiderweb instead of rectangles.

pbhjpbhj on May 6, 2017 | | | [–]

In the UK Google Maps does road numbers wrong. So A4042 should be "ay four o four two" but Maps says the less efficient "ay four thousand and forty-two" (it's also wrong, it's a code not a number; like calling 0b20 "twenty").

It throws me much more than it should.

pbhjpbhj on May 7, 2017 | | | [–]

0h20! lol ... there are 10 types of people in the world ...

woogiewonka on May 6, 2017 | | | [–]

That's funny.

I wonder how different / similar these romance languages are, compared to the reference frame I have: German dialects. German dialects can be mutually unintelligible, young germans typically know standard german and thus have a "common ground" for communication, also they usually speak a form of the dialect that is already considerably closer to the standard "high" language of newspapers and televisions, than what their grandparents or their great grandparents speak / spoke. Sometimes (typically in documentaries), they even subtitle dialect speakers.

So yeah, I wonder if depending on the context, the classification of languages and dialects differs.

XaspR8d on May 1, 2017 | | [–]

Well there's a notorious adage, "a language is a dialect with an army and a navy".

I think academics shy away from attempting to make the distinction except when extremely obvious, and instead talk directly about quantitative measurements and feature overlaps (isogloss is a search term that may be useful here). Dialect/language lines will often have completely different shapes when you look at different distinctions in lexicon, phonetics, syntax, etc. If I had to generalize though, in a particular language "chain", linguists seem to identify an order of magnitude more separable languages than non-academics do. (Consider the cases of huge macrolanguages like "Chinese" & "Arabic", or even "Italian", whose singular labels by laypeople are pretty universally rejected.)

It doesn't help that people are generally unaware of the incredible political pressure most nations put on presenting a singular linguistic front, when the truth is much much more muddled. As a result, the common parlance distinction between dialect & language often verges on meaningless.

jackjeff on May 1, 2017 | | | [–]

My experience is that it's possible for people to understand each other. But the farther you go and the more difficult it gets. It's mostly the accent and the word endings that change.

My family is from Aveyron/Tarn (near Albi). We can understand texts from Frederic Mistral, written in Provençal (near Marseille) even though it sounds weird. My uncle says he had some success speaking Occitan in the Italian Piedmomd. However neither my parents nor my uncle understand any of the Catalan spoken in Barcelona. (I do, but I'm fluent in Spanish and not in Occitan...)

I think my parents (born in the 1950s) are the last generation fluent in Occitan. In France, even though it's now being taught as a second language, it's essentially gone. My mom told me she used to be punished for using Occitan at school whether in the classroom or during recess. I remember when I was a child, the farmers used to speak it among themselves (or more likely to their elders). The same people today only really speak French, even among themselves.

pmontra on May 1, 2017 | | | [–]

Italian dialects can be mutually unintelligible too. TV and internal migrations consolidated standard Italian to the point that local dialects are basically dead in some areas (for example Milan) but there are people in smaller cities that are actively bilingual, their dialect and Italian.

My father remembers that they could tell the town of origin of somebody by little variations of accent and vocabulary, over distances of less than 10 km in a well populated and well connected area centered around Milan.

toyg on May 2, 2017 | | | [–]

> local dialects are basically dead in some areas (for example Milan)

Uela, you have to consider that Milanese dialects basically overlapped modern Italian already - as standardized on the works of Alessandro Manzoni, a writer from Milan. The accents still survive though, and even a few words.

It's incredibly funny to observe language in motion. At one point in the '90s, a few rappers living in the city I come from (Bologna) popularized a bunch of local slang in their songs. Nowadays, youngsters from Milan use that slang as native and strongly believe it originated there.

pmontra on May 2, 2017 | | | [–]

Do they use it correctly? Example: the roman "sti c...i" is very often used with the opposite meaning of the original here, that is: as a surprise, probably by guessing.

And by the way, "bagaglio" always surprises people here.

For the non Italians, among the other things the Milan accent basically swaps the open and closed e sounds. I have to change the way I say spaghetti when I'm outside region :-)

robocat on May 2, 2017 | | | | [–]

> My father remembers that they could tell the town of origin of somebody by little variations of accent and vocabulary.

I have seen this in Ireland where upon meeting someone new, one Irish person would guess the other person as coming from a small village (of a few hundred people).

iandioch on May 2, 2017 | | | [–]

This is somewhat true, but many of the accents are not as strong anymore.

The Irish language itself is interesting. Before standardisation, I'm told the Irish of the north of the island was more similar to Scots Gaelic than that of the south. Many of the regional dialects have disappeared now though.

data_hope on May 6, 2017 | | | | [–]

I can tell apart from whick village around my home village comes from just by hearing them say the word "eier" (eggs).

interfixus on May 2, 2017 | | | [–]

Amazingly, up here north of Germany, in the tiny land of Denmark, Danish dialects manges to be mutually incomprehensible. Or at least they did, up until about a generation ago. Going to the northenmost or westernmost regions, I find no shortage of people I simply do not understand. On the other hand, a lot of Norwegian - officially a different language - appears to me like a distinct, but unproblematic dialect.

Interestingly, in this small, flat, homogenous country, linguistic faultlines can still be persistent and razor sharp, clearly reflecting population boundaries from way, way back - the viking age and earlier. Travel some thirty kilometers between some neighbouring major towns, and hear the tone of spoken language change abruptly about midway.

jacobush on May 2, 2017 | | | [–]

This IS interesting. Some thoughts:

Norway was ruled from Denmark, so the danish ruling class in Norway probably spoke a similar dialect to yours. (See bokmål, basically danish style Norwegian.)

Norway has dialects VERY different from each other - these were used as stock for an attempt at standardization of non-danish-inspired language, which they call "nynorsk". Which is confusing, because it's basically a mix of OLDER norse dialects. :) https://en.wikipedia.org/wiki/Ivar_Aasen

In Norway, the very distinct local dialects makes sense, because people were separated by high mountain ridges. (The same story goes for Greek dialects, but I digress.)

So it IS indeed interesting that Denmark, which is very flat, still has these sharp boundaries. :)

interfixus on May 2, 2017 | | | [–]

Everything you said. And yes, obviously I'm thinking primarily of Norwegian bokmål. Although I do comprehend at least som spoken fjeldnorsk. Having had a Faroese girlfriend, and some exposure to Icelandic does help :)

Rhythm and intonation of spoken Danish shifts markedly down towards the southern islands. No difficulty of compehension whatsoever, but it's clearly a dialectal belt with a history quite different from neighbouring parts of Sjælland (or "Zealand"). I'd love to see a genetic mapping of the local communities. I'm almost certain that lots of corresponding patterns would turn up.

herewulf on May 1, 2017 | | | [–]

It probably compares very similarly to the German dialect situation in that it is a dialect continuum. The dialects become less intelligible as geographic distance increases.

The article alludes to this at the end:

"Romance linguistics teaches that by walking across the former Roman Empire from Sicily to Normandy, every pair of neighboring villages can understand each other."

"Language" vs. "dialect" is also very tricky because politics often come into play to demarcate the two. The classic saying is that a language has an army and a navy whereas dialects do not (i.e.: languages are associated with nation states).

And the classic example is that of Norwegian, Danish, Swedish which due to high mutual intelligibility are often linguistically thought of as dialects of one language. However, each one belongs to a nation state whose inhabitants would likely often disagree that they speak "a mere dialect".

Edit: Stepped away from the computer for a long while before actually posting, hence the similarity to the answer below.

line spacing of the gutenberg-latex-typeset version is terribly enlarged. It's a disease... Who can read that?

> The name "Ukraine" (Ukrainian: Україна Ukrayina > [ukrɑˈjinɑ]) derives from the Slavic words "u", meaning > "within", and "kraj", meaning "land" or "border". > Together, "u+kraij" means "within the borders" or more aptly in > English, "the heartland".

To quote wikipedia. I.e. from what I read on Wikipedia, it is the other way round: Ukraine is the heartland, Moscow would be at the outskirt.

I started to write a toy compiler in OCaml. I had some previous experience with Haskell, but in no way an expert. I.e. no category theory background, only shallow exposure to monads.

My "problems" with OCaml started, when I wanted to "map" over a data structure I defined. I ended up having to define custom mapping functions for all container-like data structures I wrote and call them in a non-polymorphic fashion (where I would have just used fmap in Haskell).

Sure, in OCAML I needed to use a parser generator where I would have used megaparsec in haskell, but it was also a tolerable inconvenience.

Trouble started when I needed to track state in the compilation process. I.e. I was generating variable names for temporary results and values, and I needed to track a number that increased. In the end I used a mutable state for it, and it turned out nightmarish in my unit tests.

After a while, I just ported the code base to Haskell and never looked back. The State monad was an easy fix for my mutable state issues. Parser combinators made the parser much more elegant. And many code paths improved, became much more concise. It is hard to describe, but in direct comparison, OCaml felt much more procedural and Haskell much more declarative (and actually easier to read).

The only advantage of OCaml to me is the strict evaluation. I don't think lazy evaluation by default ins Haskell is a great idea.

ms013 on April 16, 2017 | | [–]

I assume you were just not interested in passing the state around to the functions that needed it, and preferred the fact that the state monad hides that plumbing for you via bind and return. It's worth noting that there exist Ocaml libraries that provide the same operators and even similar do notation syntax that desugars to bind/return operators (via PPX).

Ocaml does tend to be more verbose than Haskell - it's just the nature of the language syntax. E.g., in Ocaml, one says (fun x -> x+1) vs (\x -> x+1). Similarly, ocaml is cursed by the excessive "in"'s that accompany let bindings. "Let .. in let .. in let ...". That can get annoying.

Interestingly, I had the opposite experience with a commercial compiler project. Haskell's syntactic cleverness (monadic syntax, combinator libraries, etc..) eventually got in the way - it became very difficult to understand what a single line of code actually meant since one had to mentally unpack layers of type abstractions. Migrating to ocaml, the verbosity eventually was more tolerable than the opacity of the equivalent Haskell code once the compiler got sufficiently complex.

My experience may vary from yours. I've been doing Haskell/Ocaml in production for many years, so the pain points I've adapted to are likely different than one working on toy compilers or weekend projects. And no, category theory exposure is not and never has been necessary for understanding Haskell or FP unless one is a PL researcher (and even then, only a subset of PL researchers are concerned with those areas). And one can be quite productive and prolific in Haskell without a deep understanding of monads and monad transformers - the blogosphere has given you the wrong impression if you believe otherwise.

implicit on April 16, 2017 | | | [–]

I've done a lot of production Haskell and I've had a similar experience.

In our case, we dealt with it by keeping relatively bare, boring code. We avoided point-free style, crazy combinators like lenses, and complex monad transformer stacks except in the 'plumbing' part of the application that didn't need to change very much.

This paid off in spades as we had a lot of engineers who only ever had to work in the 'porcelain' parts of the application. They got a lot of great work done using abstractions that matched their intuition exactly.

throwaway7645 on April 16, 2017 | | | | [–]

Thanks for the thorough reply and it sounds like you're quite experienced here. Any chance going into more detail with what you do for a living? Do you maintain a compiler for something more mainstream?

ms013 on April 17, 2017 | | | [–]

I cofounded a company recently that is using code transformation and optimization methods to accelerate data analytics code on special purpose hardware. Our compilation toolchain is all ocaml, and the language that is compiled/transformed/optimized is Python. Prior to this venture, I did similar work - code analysis and transformation, but in that case largely around high performance computing for scientific applications. That tooling was mostly ocaml/Haskell, but not production focused - it was mostly research code.

a0 on April 16, 2017 | | | [–]

Just want to note that, while not frequently seen, you can use more powerful abstractions (monad transformers, parser combinators, etc) in OCaml.

As an example consider looking at the Angstrom[1] parser combinator library and my Pure[2] functional base library.

[1] https://github.com/inhabitedtype/angstrom

[2] https://github.com/rizo/pure

> I'd wager BDFL prefers the CPython implementation because it's probably simpler. Also, it supports tons of targets and is super simple to build.

At a EuroPython Keynote, the BDFL mentioned that he hasn't had a closer look at PyPy (he mentioned downloading it and playing with it for a few minutes). I.e. there is a certain disinterest. Also, remember that the "Zen of Python" (https://www.python.org/dev/peps/pep-0020/#id3) was written about the design principles of the Python interpreter, and PyPy is not exactly the Zen of python.

Personally, I'd love to see Python 4 to be based entirely on PyPy.

My personal, subjective impression: Commits are getting smaller and smaller nowadays. As in: In the subversion days, many people commited only few times a day, sometimes not for several days. SVN commits of course involved a sync with the server (a "push" in git lingo), and thus usually represented a much larger increment with a substantial change to the code base [X]

With git, it became very common to structure changes to a code base in many, very small commits. Rename a variable? Commit. Write some docs? Commit. Of course, the overall changes when developing a feature did not become smaller, they are now just distributed over many more commits. So I'd argue that a SVN commit was often conceptionally closer to what we now have with a git pull-request.

Why does this matter? Because It is kind of hard and not helping anyone if you describe your renaming of a local variable with an extensive docstring.

What I do miss however, is a good description of the overall change. I.e. now often the description in the merge commit is just the autogenerated message, but this is where I would like people to really take the time and describe the change extensively. This is why I like `--squash` merges, because they let people focus on the relevant parts in their description. I know, rewriting history is bad, but overall, I favour reading a history book than 18th century newspapers.

[X] not saying that there weren't small one-line-change commits, but overall they were rarer.

caf on March 17, 2017 | | [–]

[the merge commmit message] is where I would like people to really take the time and describe the change extensively

http://lkml.iu.edu/hypermail/linux/kernel/1702.2/03492.html

rtpg on March 17, 2017 | | | [–]

Never thought of that usage of merge commits. This is a great place to write the couple paragraphs that you might have in a Pull Request, better than squashing IMO.

I've found that for smaller commits, if you have something long you want to explain in the commit message body... you should probably put it in a code comment!

If you don't think it merits a code comment, it's probably not important enough for people to look up the commit message body either (if only because the commit message body is less likely to be seen).

ribasushi on March 16, 2017 | | | [–]

> What I do miss however, is a good description of the overall change.

https://github.com/ribasushi/dbix-class/commit/1cf609901

Something like that I take it? :)

sundvor on March 17, 2017 | | | [–]

Wow. That took some commitment!

ribasushi on March 17, 2017 | | | [–]

There's actually a paragraph at the end about that too :)

data_hope on March 21, 2017 | | | | [–]

excellent

projectileboy on March 17, 2017 | | | [–]

Changing _public_ history is bad. I don't see any problem with rewriting your _personal_ history before merging it in.

data_hope on March 21, 2017 | | | [–]

Changing public history is bad, because it makes collaboration and two devs working on one branch harder.

But I do not see a problem with rewriting history on a branch, if (and only if) you kind of know that no one else is pulling the changes. Or, when merging a PR, a rewrite is okay too, if the next feature will be branched off of the trunk, too.

Also, mercurial's tooling seems to help https://www.mercurial-scm.org/wiki/ChangesetEvolution with rewritten history by making it easier to track history rewrites. Basically I think this is a path in version control systems worth exploring.

stinos on March 17, 2017 | | | | [–]

Not only not a problem, but a must in my book and I'm fairly sure I'm not alone. For me it's like a new workflow which I always wanted but never could have without git. A lot of days for me now consist of creating a lot of small commits and then every couple of hours when a single 'thing' is finished, start an interactive rebase and create a storyline which is easy to read, understand and follow. This can be even one commit sometimes if it makes sense. And in repos I manage myself an if the change spans several days it's usually big and I might create a seperate branch and have a merge commit so it's extra clear all commits belong to feature/xxx.

wst_ on March 16, 2017 | | | [–]

I find tons of small commits a clutter and waste of time. I don't see any reason for doing so. On the contrary I can see disadvantage - reading and understanding a history later may become difficult task. After all what counts is your full chunk of work, reviewed via pull request, and merged to master. It should be treated as a whole.

Has it really become so common with git? I don't see such trend around me.

clusmore on March 17, 2017 | | | [–]

>On the contrary I can see disadvantage - reading and understanding a history later may become difficult task.

I'm replying to you but this is directed at everybody who advocates squash merge and discourages small commits.

IMO this is a tooling problem, plain and simple. When I am committing to Git, I am using the "write" components of Git which are incredibly powerful. I can commit in as small a chunk as I want and preserve the richest history of all the small changes I've made, knowing full well that the state of the code at HEAD will not be degraded for doing so. If I make two small independent changes, I can feel free to branch them separately and then merge them together to show that they could have been performed in any order.

When you read my history, you are using the "read" components of Git. Unfortunately these are not as powerful. You can do some nice things, like if you want to treat history as a straight line you can use `git log --first-parent` and you'll see only the merge commits (as if all merges had been squash-rebases).

It would be much better if you were able to collapse or expand any sequence of linear commits to gloss over the lower level details. But as far as I'm concerned, this is a problem with the "read" components of Git, not the "write" components, and so I will continue to use the "write" components to their full power. And the best part is that if I do it this way, we can improve the "read" components and allow the reader to collapse my verbose history, but we will never be able to expand pre-collapsed history.

frant-hartm on March 17, 2017 | | | [–]

There is "Collapse Linear Branches" action in Intellij's git log viewer (and I guess any Jetbrains IDE) which does pretty much what you describe :-)

pm215 on March 17, 2017 | | | | [–]

The main reason I request commits to be split up is for ease of code review. It's much easier to review three commits that each do one easily comprehensible small thing than one commit that does three things at once. It's also better if you find there's a bug -- you can bisect down to a commit that's fairly small where the bug should be easy to see, rather than one that's enormous and where the bug is hard to find among all the other changes.

wst_ on March 17, 2017 | | | [–]

I think it is a matter of definition of "small" and "enormous". If you have a small thing, easily comprehensible, but big enough for it to be a complete piece of work. Then probably you also have separate task for it, and the change you introduce doesn't break the build. So it the end it's just a perfect candidate for pull request.

But note the comment above mentioned a commit for variable change. Or a commit for adding some comment sentence. Nano commits they are.

Sure, tasks should be small, easy to get, easy to review. But there must be a balance. Going to extreme, both ways, doesn't do any good.

data_hope on March 21, 2017 | | | | [–]

indeed, if the commits are individually reviewable it is nicer. To the contrary however often these small commits can be a bit messy. Sometimes you'll find commits that are reverted later on, or fixed up later on. I.e. for commit-level review to work well, it's great if the history was polished.

nothrabannosir on March 16, 2017 | | | | [–]

Small, incremental commits are an asset with git blame, git bisect and git revert. I find it much easier to deal with too many small ones, rather than too few large ones. Especially if you keep the convention that master is always "merged into", i.e. "left of the merge", i.e. "parent 1".

data_hope on March 21, 2017 | | | [–]

especially with very small commits, I find small commits to be tedious and error prone (sometimes the software doesn't even build because the developer distributed two not-so-independent changes over two commits because the connection wasn't so obvious. Then you have a failed build and you don't really know if `git bisect` just beamed you into the middle of a refactoring, or whether there is an actual issue.

ibgib on March 17, 2017 | | | | [–]

> After all what counts is your full chunk of work, reviewed via pull request, and merged to master. It should be treated as a whole.

I find the PR mechanism works great for the view of the whole, whereas the individual commits are great for the pieces. So in my commit history, you can read the timeline, and then if you want to see the commits squashed down, you click on the individual PR. On the PR screen (assuming you're using GitHub), it has a nice list of the subject lines of each of the individual commits.

erwan on March 16, 2017 | | | | [–]

Commits can serve as a supplement to documentation. When you properly commit the different logical steps that led to the current state of the code, it becomes incredibly easier for another team member to get why and how you have implemented things a certain way.

greenhouse_gas on March 16, 2017 | | | [–]

Would be interesting if there was a way to annotate a set of commits, like "commit ???? - ????: refactored A,B, and C" so you'd get the advantage of small commits and clearer messages.

ibgib on March 17, 2017 | | | [–]

This is what PRs are good for. Also, with my particular approach to commits, I always have at least one issue associated to a commit, and I'm always working on a particular branch associated to the issue. I pick an emoji that captures the issue/branch in a single concept, and I have that in my subject line. This is combined with my git commit template mechanism, and I like it. At a glance, I can see which commits belong together, and if I want to look at the whole, I go to the PR.

E.g. https://github.com/ibgib/ibgib/pull/180

data_hope on March 21, 2017 | | | [–]

neat

data_hope on March 16, 2017 | | | | [–]

I think you can do that in a merge commit, sort of.

The more I think about it, the stranger a strong aversion to rewriting commit history for clarity is. In university if I did some math / physics calculation, I would often start, and once I got somewhere, make a clean copy of the successful work to have a concise and revised version.

perfmode on March 16, 2017 | | | [–]

I am a firm believer that it's totally fine to rewrite history when working on a private branch that hasn't been pushed.

jononor on March 17, 2017 | | | [–]

Mostly fine to do it on a feature/PR branch also, in my opinion. If those become long-lived with multiple people touching them (where history rewrites become peoblematic) you are not integrating continuously enough.

cortesoft on March 16, 2017 | | | | [–]

Not pushing private branches is risky though - you have no backup if something happens to your machine.

robbyking on March 16, 2017 | | | [–]

Unfortunately, I'm guilty of the opposite: I rarely, rarely commit. Maybe one commit per point. I have to consciously remind myself to commit more often.

flukus on March 17, 2017 | | | [–]

> As in: In the subversion days, many people commited only few times a day, sometimes not for several days.

This was often the source of merge hell. Half of what makes git merges easier is the smaller commits that it encourages.

data_hope on March 21, 2017 | | | [–]

But kind of it was also the tooling. Most svn projects I worked on were trunk-based and thus integrated much tighter than git feature-branch based code. However, the times I merged subversion branches, I kind of was sure that subversion lost some changes.

emodendroket on March 16, 2017 | | | [–]

I mean, aren't pull requests basically the solution to that problem?

data_hope on March 21, 2017 | | | [–]

Not if the merge commits just say `merged branch ....`.

Funnily enough, there is precedent, that a red light can be ignored if it doesn't change for 5 minutes. And I really doubt that there isn't a way to regulate the crossing without a permanent red light.

http://www.stvo.de/info/faq/165-ampel-bleibt-rot (German)

mmarx on March 15, 2017 | | [–]

That allows you to assume that the traffic light is defective, in which case the traffic signs would apply and require you to turn right—but that is exactly what is allowed by the green arrow even at a red light.

data_hope on March 15, 2017 | | | [–]

ah you are right, probably there is a blue right-turn only sign (I just thought about the Grünpfeil depicted in the illustration image).