Hacker News new | past | comments | ask | show | jobs | submit login

To me the beauty of git stems from the fact that it is an implementation of a functional data structure. It‘s a tree index that is read-only, and updating it involves creating a complete copy of the tree and giving it a new name. Then the only challenge is to make that copy as cheap as possible - for which the tree lends itself, as only the nodes on the path to the root need to get updated. As a result, you get lock-free transactions (branches) and minimal overhead. And through git‘s pointer-to-parent commit you get full lineage. It is so beautiful in fact that when I think about systems that need to maintain long-running state in concurrent environments, my first reaction is ”split up the state into files, and maintain it through git(hub)“.



". . . unlike every single horror I've ever witnessed when looking closer at SCM products, git actually has a simple design, with stable and reasonably well-documented data structures. In fact, I'm a huge proponent of designing your code around the data, rather than the other way around, and I think it's one of the reasons git has been fairly successful. . . .

"I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

--- Linus Torvalds, https://lwn.net/Articles/193245/


That last comment is absolutely golden. Once upon a time I had the privilege to spend a few years working in Swansea University's compsci department, which punches above its weight in theoretical computer science. One of the moments that made me the programmer I am today (whatever that's worth) came when I was meeting with the head of the department to discuss a book he was writing, and while we were discussing this very point of data vs code, I said to him, realising the importance of choosing the right structure, "so the data is central to the subject" (meaning computer science in general" — to which he replied emphathically that "the data IS the subject". That was a lightbulb moment for me. From then on I saw computer science as the study of how data is represented, and how those representations are transformed and transported — that's it, that basically covers everything. It's served me well.


That's great. It reminds me of a comment by Rich Hickey, the inventor of Clojure:

" Before we had all this high falutin' opinions of ourselves as programmers and computer scientists and stuff like that, programming used to be called data processing.

How many people actually do data processing in their programs? You can raise your hands. We all do, right? This is what most programs do. You take some information in, somebody typed some stuff, somebody sends you a message, you put it somewhere. Later you try to find it. You put it on the screen. You send it to somebody else.

That is what most programs do most of the time. Sure, there is a computational aspect to programs. There is quality of implementation issues to this, but there is nothing wrong with saying: programs process data. Because data is information. Information systems ... this should be what we are doing, right?

We are the stewards of the world's information. And information is just data. It is not a complex thing. It is not an elaborate thing. It is a simple thing, until we programmers start touching it.

So we have data processing. Most programs do this. There are very few programs that do not.

And data is a fundamentally simple thing. Data is just raw immutable information. So that is the first point. Data is immutable. If you make a data structure, you can start messing with that, but actual data is immutable. So if you have a representation for it that is also immutable, you are capturing its essence better than if you start fiddling around.

And that is what happens. Languages fiddle around. They elaborate on data. They add types. They add methods. They make data active. They make data mutable. They make data movable. They turn it into an agent, or some active thing. And at that point they are ruining it. At least, they are moving it away from what it is."

https://github.com/matthiasn/talk-transcripts/blob/master/Hi...


Many decades ago I was coaxed into signing-up for an APL class by my Physics professor. He was a maverick who had managed to negotiate with the school to create an APL class and another HP-41/RPN class with full credit that you could take instead of FORTRAN and COBOL (yeah, it was a while ago).

One of the things he pounded into everyone's heads back then was "The most important decision you have to make is how to represent the problem. Do that well and programming will be easy. Get it wrong and there isn't a force on this world that will help you write a good solution in any programming language."

In APL data representation is of crucial importance, and you see the effects right away. It turned out to be he was right on that point regardless of the language one chose to use. The advise is universal.


I also really like this quote and his has influenced the way I work a lot. When I started working professionally as programmer I sometimes ended up with quite clunky data structures, a lot of expensive copying (C++ :-)) and difficult maintenance.

But based on this, I always take greatest care about the data structures. Especially when designing database tables, I keep all the important aspects of it in mind (normalization/denormalization, complexity of queries on it, ...) Makes writing code so much more pleasurable and it's also key to make maintenance a non-issue.

Amazing how far-sighted this is, when considering that most Web Apps are basically I/O - i.e. data - bound.


A mentor often repeated the title of Niklaus Wirth’s 1975 book “Algorithms Plus Data Structures Equals Programs”.

This encapsulates it for me and informs my coding everyday. If I find myself having a hard time with complexity, I revisit the data structures.


I think part of the confusion stems from the word “computer” itself. Ted Nelson makes the point that the word is an accident of history, arising because main funding came from large military computation projects.

But computers don’t “compute”, they don’t do math. Computers are simplifying, integrating machines that manipulate symbols.

Data (and its relationships) is the essential concept in the term “symbolic manipulator”.

Code (ie a function) is the essential concept in the term “compute”.


But what is math, if not symbolic manipulation? Numbers are symbols that convey specific ideas of data, no? And once you go past algebra, the numbers are almost incidental to the more abstract concepts and symbols.

Not trying to start a flamewar, I just found the distinction you drew interesting.


Well, the question of whether there's more to math than symbolic manipulation or not was of course one of the key foundational questions of computer science, thrashed out in the early 20th century before anyone had actually built a general computing machine. Leibniz dreamt of building a machine to which you could feed all human knowledge and from which you could thus derive automatically the answer to any question you asked, and the question of whether that was possible occupied some of the great minds in logic a hundred years ago: how far can you go with symbolic manipulation alone? Answering that question led to the invention of the lambda calculus, and Turing machines, and much else besides, and famously to Godel's seminal proof which pretty much put the nail in the coffin of Leibniz' dream: the answer is yes, there is more to math than just symbolic manipulation, because purely symbolic systems, purely formal systems, can't even represent basic arithmetic in a way that would allow any question to be answered automatically.

More basically and fundamentally, I'd suggest that no, numbers aren't symbols: numbers are numbers (i.e. they are themselves abstract concepts as you suggest), and symbols are symbols (which are much more concrete, indeed I'd say they exist precisely because we need something concrete in order to talk about the abstract thing we care about). We can use various symbols to represent a given number (say, the character "5" or the word "five" or a roman numeral "V", or five lines drawn in the sand), but the symbols themselves are not the number, nor vice versa.

This all scales up: a tree is an abstract concept; a stream is an abstract concept, a compiler is an abstract concept — and then our business is finding good concrete representations for those abstractions. Choosing the right representations really matters: I've heard it argued that the Romans, while great engineers, were ultimately limited because their maths just wasn't good enough (their know-how was acquired by trial-and-error, basically), and their maths wasn't good enough because the roman system is a pig for doing multiplication and division in; once you have arabic numerals (and having a symbol for zero really helps too BTW!), powerful easy algorithms for multiplication and division arise naturally, and before too long you've invented the calculus, and then you're really cooking with gas...


It involves symbolic manipulation, but it’s more than that. Math is the science of method. Science requires reason.

If one were to say computers do math, they would be saying computers reason. Reason requires free will. Only man can reason; machines cannot reason. (For a full explanation of the relationship between free will and reason, see the book Introduction to Objectivist Epistemology).

Man does math, then creates a machine as a tool to manipulate symbols.


You make some interesting points. There was a time I was intrigued by Objectivism but ultimately it fell flat for me. I sort of had similar ideas before encountering it in the literature, but these days I'm mostly captivated by what I learned from "Sapiens" to be known as inter-subjective reality, which I also mostly arrived at through my own questioning of Objectivism. I'm not sure we can conceive of any objective reality completely divorced from our own perceptive abilities.

> Reason requires free will

isn't it still kind of an open question whether humans have free will, or what free will even is? How can we be sure our own brains are not simply very complex (hah, sorry, oxymoron) machines that don't "reason" so much as react to or interpret series of inputs, and transform, associate and store information?

I find the answer to this question often moves into metaphysical, mystical or straight up religious territory. I'm interested to know some more philosophical approaches to this.


Your comment reminds me of the first line from Peikoff’s Objectivism: The Philosophy of Ayn Rand (OPAR): “Philosophy is not a bauble of the intellect, but a power from which no man can abstain.” There are many intellectual exercises that feel interesting, but do they provide you with the means—the conceptual tools—to live the best life?

If objective reality doesn’t exist, we can’t even have this conversation. How can you reason—that is, use logic—in relation to the non-objective? That would be a contradiction. Sense perception is our means of grasping (not just barely scratching or touching) reality (that which exists). If a man does not accept objective reality, then further discussion is impossible and improper.

Any system which rejects objective reality cannot be the foundation of a good life. It leaves man subject to the whim of an unknown and unknowable world.

For a full validation of free will, I would refer you to Chapter 2 of OPAR. That man has free will is knowable through direct experience. Science has nothing to say about whether you have free will—free will is a priori required for science to be a valid concept. If you don’t have free will, again this entire conversation is moot. What would it mean to make an argument or convince someone? If I give you evidence and reason, I am relying on your faculty of free will to consider my argument and judge it—that is, to decide about it. You might decide on it, you might decide to drift and not consider it, you might even decide to shut your mind to it on purpose. But you do decide.


Last idea, stated up front: sorry for the wall of text that follows!

It's not that I reject the idea of objective reality–far from it. However I do not accept that we can 1) perfectly understand it as individuals, and 2) perfectly communicate any understanding, perfect or otherwise, to other individuals. Intersubjectivity is a dynamical system with an ever-shifting set of equilibria, but it's the only place we can talk about objective reality–we're forever confined to it. I see objective reality as the precursor to subjective reality: matter must exist in order to be arranged into brains that may have differences of opinion, but matter itself cannot form opinions or conjectures.

I'll assume that book or other studies of objectivity lay out the case for some of the statements you make, but as far as I can tell, you are arguing for objectivity from purely subjective stances: "good life", "improper discussion"... and you're relying on the subjective judgement of others regarding your points on objectivity. Of course, I'm working from the assumption that the products of our minds exist purely in the subjective realm... if we were all objective, why would so much disagreement exist? Is it really just terminological? I'm not sure. Maybe.

Some other statements strike me as non-sequiturs or circular reasoning, like "That man has free will is knowable through direct experience". Is this basically "I think, therefore I am?" But how do you know what you think is _what you think_? How do you know those ideas were not implanted via others' thoughts/advertisements/etc, via e.g. cryptomnesia? Or are we really in a simulation? Then it becomes something like "I think what others thought, therefore I am them," which, translated back to your wording, sounds to me something like "that man has a free will modulo others' free will, is knowable through shared experience." What is free will then?

"free will is a priori required for science to be a valid concept" sounds like affirming the consequent, because as far as we know, the best way to "prove" to each other that free will exists is via scientific methods. Following your quote in my previous paragraph, it sounds like you're saying "science validates free will validates science [validates free will... ad infinitum]." "A implies B implies A", which, unless I'm falling prey to a syllogistic fallacy, reduces to "A implies A," (or "B implies B") which sounds tautological, or at least not convincing (to me).

I apologize if my responses are rife with mistakes or misinterpretations of your statements or logical laws, and I'm happy to have them pointed out to me. I think philosophical understanding of reality is a hard problem that I don't think humanity has solved, and again I question whether it's solvable/decidable. I think reality is like the real number line, we can keep splitting atoms and things we find inside them forever and never arrive at a truly basic unit: we'll never get to zero by subdividing unity, and even if we could, we'd have zero–nothing, nada, nihil. I am skeptical of people who think they have it all figured out. Even then, it all comes back to "if a tree falls..." What difference does it make if you know the truth, if nobody will listen? Maybe the truth has been discovered over and over again, but... we are mortal, we die, and eventually, so do even the memories of us or our ideas. But, I don't think people have ever figured it all out, except for maybe the Socratic notion that after much learning, you might know one thing: that you know nothing.

Maybe humanity is doing something as described in God's Debris by Scott Adams: assembling itself into a higher order being, where instead of individual free will or knowledge, there is a shared version? That again sounds like intersubjectivity. All our argumentation is maybe just that being's self doubt, and we'll gain more confidence as time goes on, or it'll experience an epiphany. I still don't think it could arrive at a "true" "truth", but at least it could think [it's "correct"], and therefore be ["correct"]. Insofar as it'll be stuck in a local minimum of doubt with nobody left to provide an annealing stimulus.

I will definitely check out that book though, thanks for the recommendation and for your thoughts. I did not expect this conversation going into a post about Git, ha. In the very very end (I promise we're almost at the end of this post) I love learning more while I'm here!


One problem is that, at least for certain actions, you can measure that motor neurons fire (somewhere in the order of 100ms) before the part of your brain that thinks it makes executive decisions.

At least for certain actions and situations, the "direct experience" of free will is measurably incorrect.

Doesn't mean free will doesn't exist (or myabe it does), but it's been established that that feeling of "I'm willing these actions to happen" often times happens well after the action has been set into motion already.


Starting at 1:12:35 in this video, there is a discussion of those experiments with an academic neuroscientist. He explains why he believes they do not disprove free will.

https://youtu.be/X6VtwHpZ1BM


Oh, thank you for this :) Because I won't deny, a friend originally came to me with this theory and it has been bugging me :)


There is a lot here. For now, I will simply assert that morality, which means that which helps or harms man’s survival, is objective and knowable.

I’ve enjoyed this discussion. It has been civil beyond what I normally expect from HN. From our limited interaction, I believe you are grappling with these subjects in earnest.

This is a difficult forum to have an extended discussion. If you like, reach out (email is in my profile) and we can discuss the issues further. I’m not a philosopher or expert, but I’d be happy to share what I know and I enjoy the challenge because it helps clarify my own thinking.


Yeah, I expect we're nearing the reply depth limit. Thanks for the thought provoking discussion! Sent you an email. My email should be in my profile, too, if anyone wants to use that method.


In Spanish the preferred name is "ordenador" which would translate to something like "sorter" or "organizer machine".


That's in Spain. In American Spanish computador/a is most often used: http://lema.rae.es/dpd/srv/search?key=computador

There is also informática/computación; both Spanish words to refer to the same thing but used in Spain/America.

I guess that literally they'd be something like IT and CS.


Good points from both, indeed it's a country thing not a language thing. My bad!


In French, it's the same; it's about "putting things in order", similar in concept to an ordonnateur:

https://en.wikipedia.org/wiki/Ordonnateur


Similarly for French - "ordinateur"

https://www.dictionnaire-academie.fr/article/A9O0665

A search for "computer" does not find anything; though I suspect many French actually use computer not ordinateur.


No we don't. We use ordinateur.


In Finnish, it's an 'information machine'.

To use one is colloquially 'to data'; as in, a verb form of data :)


more accurate, it is called 'ordenador' in Spain. In Latin America, is 'Computadora'


Hmm, interesting. In Norwegian the word for computer translates to "data machine" (datamaskin)


As in Swedish.


The Swedish name is “dator”, isn’t it? Its root is certainly “data”, but I like it better than the more cumbersome Norwegian word “datamaskin”.


I’ve always thought that dator was just a short form of datamaskin. But some other comments suggested otherwise, so I had to look it up. Apparently, dator is a made up word from 1968, from data and parallels the words tractor and doctor.


Yes, it's "dator". The word was initially proposed based on the same Latin -tor suffix as in e.g. doctor and tractor, so the word would fit just as well into English as it does in Swedish.


And in Danish we had "datamat", which has a nice ring to it. But everybody says "computer" instead.


In Anathem by Neal Stephenson, computers are called Syntactic Devices ("syndev").


Computer science indeed sounds a lot like you're working with computers.

In German the subject is called "Informatik", translating to information science. I find that quite elegant in contrast.


Yes, I've heard it said that calling it computer science is like calling astronomy "telescope science".


It also helps identifying journalists that don't know what they are writing about. They frequently translate "computer science" literally as "Computerwissenschaft".


Interestingly, Computer Science is called "Datalogi" in Danish. I always liked that term better.

Coined by Peter Naur (of BNF-"fame"), by the way.


Same in Swedish. Also the Swedish word for computer is dator. Don't know if this in any way shifts the mental perspective though.


Informatika (Інформатика) in Ukrainian. Probably originated from German or French.


Linus was probably exposed to Wirth's book (from 1976) at some point.

I believe it was the first major CS book that emphasised data structures.

https://en.wikipedia.org/wiki/Algorithms_%2B_Data_Structures...


The Mythical Man-Month, published a year before Wirth's book, provides the most well-known quote on the subject (though he uses now antiquated language):

"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious."

But I don't think Brooks was trying to suggest it was an original idea to him or his team, either. I imagine there were a decent number of people who reached the same conclusion independently.


There's a good bit about data structure-centric programming in The Art Of Unix Programming: http://www.catb.org/~esr/writings/taoup/html/ch01s06.html#id...

(Apologies for linking to esr but it's a good book)


What's wrong with esr?


He's kind of a nutcase. He's a gun rights advocate, to the point where immediately after 9/11 (within a day or two?) he argued that the solution should be for everyone to carry a gun always, especially on airplanes.

And then he accused women in tech groups of trying to "entrap" prominent male open source leaders to falsely accuse them of rape.

And then he claimed that "gays experimented with unfettered promiscuity in the 1970s and got AIDS as a consequence", and that police who treat "suspicious" black people like lethal threats are being rational, not racist.

Basically, he's a racist, bigoted old man who isn't afraid to spout of conspiracy theories because he thinks the world is against him.


At least half of these "nutcase" claims are plainly true. Thanks for the heads up, I'll be looking into this guy.


Which half?


Maybe someone willing to get into a politically fraught internet argument over plainly true things will jump in for me. I'm already put off by the ease and comfort with which HN seems to disparage someone's character for his ideas and beliefs, actions not even entering the picture.


Public utterances are actions which can have consequences. If you're in favor of free speech, buckle up because criticism of public figures is protected speech.

But in this case the "consequence" to esr was somebody apologizing for linking to him. Methinks the parent protests too much


Every action has consequences, it's either profound or meaningless to point this out. I see it used as a reason to limit speech because this speech that I disagree with is insidious and sinister. Rarely is any direct link provided between this sinister speech and any action that couldn't be better described as being entirely the responsibility of the actor.


Indeed, I point out that actions have consequences because it's a common trope that "free speech" implies a lack of consequence.

> I see it used as a reason to limit speech because this speech that I disagree with is insidious and sinister.

Limiting speech is a very nuanced issue, and there's a lot of common misconceptions surrounding it. For a counterexample, if you're wont to racist diatribes, that can make many folks in your presence uncomfortable; if you do it at work or you do it publicly enough that your coworkers find out about it, that can create a toxic work environment and you might quickly find yourself unemployed. In this case, your right to espouse those viewpoints has not been infringed -- you can still say that stuff, but nobody is obliged to provide audience.

And as a person's publicity increases, so do the ramifications for bad behavior -- as it should. Should esr be banned from the internet by court order? Probably not. Does any and every privately owned platform have the right to ban him or/and anybody who dis/agrees with him? Absolutely: nobody's right to free speech has been infringed by federal or state governments. And that's the only "free speech" right we have.


The reason free speech is called free is that it is supposed to be free of suppression and negative consequence where that speech does not infringe on the interests of others. That it is only now protected in scope by interference from government does not make this version of the free speech the one that supporters of it (myself included) the ideal.

> Should esr be banned from the internet by court order? Probably not.

Where's the uncertainty in this?

> Does any and every privately owned platform have the right to ban him or/and anybody who dis/agrees with him?

Those that profess to being a platform and not a publisher should not be able to ban him, nor anybody else, for their views, whether expounded via their platform. That's why they get legal protections not afforded to others. Do you think the phone company should be able to cut you off for conversations you have on their system?


[flagged]


> I just explicitly affirmed at least two of four "racist, misogynistic, bigoted" statements of fact.

Well, that's how you're characterizing your actions, okay. But just so you know. Your employer is free to retain you, or fire you, on the basis of opinions that you express in public or private. Wicked tyranny, that freedom of association.

> Presumably now you'd like to...

Well, that's certainly a chain of assumptions you've made. Why would you, say, "respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize" when you're out in public? Oh right, that's a quote from HN guidelines. In any case, you're not changing minds by acting this way.

> because this is how you tyrants prefer we genuflect to avoid guilt by association.

Oh, no, the tyranny of public criticism! Hey did you know something? You're free to disagree with me. And criticize me. In public! And others are free to agree with me, or you, or even both of us, even if that makes zero sense!

> This is profoundly idiotic, but I again refrain from arguing because my audience has proven itself very unthinking and vicious

A personal attack, how droll.

> I hope you're not an American,

I am! And as an American I've got the freedom of association -- that means that I'm not legally obligated to verbally support or denounce anybody; nor is it unlawful for me to verbally support or denounce anybody! Funny thing about freedoms; we've all got 'em and it doesn't mean we need to agree on a damned thing.

> because you don't understand what "free speech" is or why we have it,

Well you're wrong there, but IANAL so here's first amendment attorney, Ken White.

>> Public utterances are actions which can have consequences

https://www.popehat.com/2013/09/10/speech-and-consequences/

>> buckle up because criticism of public figures is protected speech.

https://www.popehat.com/2012/07/31/the-right-not-to-be-criti...

> my friend

That's taking things too far. No thank you.


Next time you're in New York we'll get some boba, on me. I'm friends with everybody.


He's become somewhat controversial due to his worldview and political writings, which include climate change denial


And this has nothing to do with programming.

Imagine Einstein alive and denying climate change. Would you apologize every time when you are referring to the theory of relativity?

P.S. Sorry, if you don't agree with the apologising comment and were just informing about possible reasons.


Sorry if you're getting downvoted a lot. We as a group need to start learning a little subtlety when it comes to condemning all of a person's contributions because we don't like their opinions or their actions. We are smart enough that we should be able to condemn ESR's idiotic words and actions and still praise his extremely important contribution to technology.


Absolutely agree


I don't know, does it have to be a hard and fast rule?

Sometimes I quote HP Lovecraft and sometimes I feel like apologizing for his being racist (and somewhat stronger than just being a product of his times). But most of the time, also not. But it does usually cross my mind and I think that's okay and important. In a very real "kill your idols" way. Nobody's perfect.

And that's just for being a bigot in the early 20st century, which, as far as I know, is of no consequence today.

However if Einstein were alive and actively denouncing climate change today, I would probably add a (btw fuck einstein) to every mention of his theories. But that's just because climate change is a serious problem that's going to kill billions if we would actually listen to the deniers and take them seriously. This hypothetical Einstein being a public figure, probably even considered an authority by many, would in fact be doing considerable damage spouting such theories in public. And that would piss me off.

What I mean to say is, you don't have to, but it's also not wrong to occasionally point out that even the greatest minds have flaws.

Also, a very different reason to do it, is that some people with both questionable ideas and valuable insights, tend to mix their insightful writings with the occasional remark or controversial poke. In that case, it can be good to head off sidetracking the discussion, and making it clear you realize the controversial opinions, but want to talk specifically about the more valuable insights.

And this IS in fact important to keep in mind both, even if you think it is irrelevant. Because occasionally it turns out, for instance, through the value of a good deep discussion, that the valuable insights in fact fall apart as you take apart the controversial parts. Much of the time it's just unrelated, but you wouldn't want to overlook it if it doesn't.


I disagree.


The theory of relativity is a much bigger contribution to society than TAOUP.

The chapter I linked to was just a summary of ideas put forth by others - though admittedly written well.

My problem with esr is more his arrogance and conceit than politics (which I also find distasteful)


I'd say they are incomparable, but I hope it helped to get my point across :)

I've read and liked his book, btw, but I had to ignore all his stupid Windows-bashing where he attributes every bad practice to the Windows world and every good one - to the Unix world.


This is a good review of the book by Joel Spolsky which also touches on that point:

https://www.joelonsoftware.com/2003/12/14/biculturalism/


Referring to relativity and linking to Einstein's personal web page are surely two different things, no?


Yes, but I don't think this invalidates my analogy


Right, the book stands on its own. Thoughts on the author are irrelevant on the context of the work.


He's kind of crusty about climate change, but other than that he's just a guy with some strong opinions. I guess that scares some folks enough to require an apology.


Telling how this very reasonable, “maybe things aren’t completely black and white” comment got downvoted.


Not saying I agree with either sentiment, but there's a delicious irony in this comment in that you're reading into votes as if they're pure expressions of support or not for an issue that's not black and white... Even though the expressions are just projections of a spectrum of thoughts through a binary voting system!


Ah yes, the old insight. Fred Brooks: "Show me your [code] and conceal your [data structures], and I shall continue to be mystified. Show me your [data structures], and I won't usually need your [code]; it'll be obvious."


Yes, and here's one by Rob Pike, "Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self­evident. Data structures, not algorithms, are central to programming." --- https://www.lysator.liu.se/c/pikestyle.html

I think I found all these quotes on SQLite's website, https://www.sqlite.org/appfileformat.html


> I'm a huge proponent of designing your code around the data

That same comment was made to a class I was in by a University Professor, only he didn't word it like that. He was discussing design methodologies and tools - I guess things like UML and his comment was he "preferred Jackson, because it revolved around the the data structures, and they changed less than the external requirements". (No, I have no idea what Jackson is either.)

Over the years I have come to appreciate the core truth in that statement - data structures do indeed evolve slower than API's - far slower in fact. I have no doubt the key to git's success was after of years of experience of dealing with VCS systems Linux hated, he had an epiphany and came up with the fast and efficient data structure that captured the exact things he cared about, but left him the freedom to change the things that didn't matter (like how to store the diff's). Meanwhile others (hg, I'm looking at you) focused on the use cases and "API" (the command line interface in this case). The end result is git had a bad API, but you could not truly fuck it up because the underlying data structure did a wonderful job of representing a change history. Turns out hg's API wasn't perfect after all and it's found adapting difficult. Git's data structure has had hack upon hack tacked onto the side of it's UI, but still shines through as strong and as simple as ever.

Data structures evolving much more slowly than API's does indeed give them the big advantage of being a solid rock base for futures design decisions. However they also have a big down side - if you decide that data structure is wrong it changes everything - including the API's. Tacking on a new function API on the other hand is drop dead easy, and usually backwards compatible. Linus's git was wildly successful only because he did something remarkably rare - got it right on the first attempt.



My memory is a little fuzzy, but I think Jackson was/is an XML serializer/deserializer that operates on POJOs (potentially with annotations). You define your data structures as Java objects, and Jackson converts them to XML for you. As opposed to other approaches where you define some schema in non-Java (maybe an XSD) and have your classes auto-generated for you.


"It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures." This quote is from Alan Perlis' Epigrams on Programming (1982).


Which is also a base design principle of Clojure. There are few persistent data structures at the core, a sequence abstraction and lots of functions to work on them.


So you would have one data structure with 10 pointers to those 10 data structures you need and 10 times the functions?

Id rather split up independent structures.


Having a smaller amount of data structures makes the whole graph of code more comparable. Creating a bespoke data structure for 10 different elements of a problem means writing quite a lot of code just to orchestrate each individual structure, mostly due to creating custom APIs for accessing what is simple data underneath the hood.

There’s a reason why equivalent Clojure code is much much shorter than comparable programs in other languages.


> "I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

It should be noted that the basic premise of Domain-Driven Design is that the basis of any software project is the data structure that models the problem domain, and thus the architecture of any software project starts by identifying that data structure. Once the data structure is identified then the remaining work consists of implementing operations to transform and/or CRUD that data structure.


DDD is about modeling which is data and behaviour.


> DDD is about modeling which is data and behaviour.

It really isn't. DDD is all about the domain model, not only how to synthesize the data structure that represents the problem domain (gather info from domain experts) but also how to design applications around it.


I remember that Richard Hipp (SQLite creator) once cited a bunch of similar quotes including the Linus' one.

https://www.percona.com/sites/default/files/hipp%20sqlite%20...

"Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they'll be obvious." -- Fred Brooks, The Mythical Man-Month, pp. 102-103


>> Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

Tsk. Now I'll never know if I'm a good programmer. I do all my programming in Prolog and, in Prolog, data is code and code is data.


"The more I code the more I observe getting a system right is all about getting the data structures right. And unfortunately that means I spend a lot of time reworking data structures without altering (or improving) functionality..." https://devlog.at/d/dYGHXwDinpu


These are ideas echoed by some of the top people in the game development community as well. there is a nice book about these kinds of ideas:

https://www.amazon.com/dp/1916478700


Can anyone point to some good resources that teach how to code around data and not the other way round?


The Art of Unix Programming, by Eric Raymond, particularly chapter 9: http://www.catb.org/~esr/writings/taoup/html/generationchapt...


Switch to a language that emphasises functional programming (F#, Clojure, OCaml, etc) and it will happen naturally.


Surprisingly there is no one book AFAIK but techniques are spread across many books;

The first thing is to understand FSMs and State Transition Tables using a simple two-dimensional array. Implementing a FSM using a while/ifthenelse code vs. Transition table dispatch will really drive home the idea behind data-driven programming. There is a nice explanation in Expert C Programming: Deep C secrets.

SICP has a detailed chapter on data-driven programming.

An old text by Standish; Data Structure Techniques.

Also i remember seeing a lot of neat table based data-driven code in old Data Processing books using COBOL. Unfortunately i can't remember their names now. Just browse some of the old COBOL books in the library.


Jonathan Blow is also talking about data-oriented programming as a basis for designing his Jai programming language.


Data-oriented programming in games is a completely different concept though. It’s about designing your data to be able to be operated on very efficiently with modern computer hardware. It speeds up a lot of the number crunching that goes on in extremely fast game loops.

The Linus comment is about designing your programs around a data representation that efficiently models your given problem.


If I could say just one thing about programming to my kids, I would quote this.


I sometimes like to explain things the other way around in terms of functional programming immutability being like version control for program state.

I rarely use functional programming but I certainly see its appeal for certain things.

I think the concept of immutability in functional programming confuses people. It really clicked for me when I stopped thinking of it in terms of things not being able to change and started instead to think of it in terms of each version of things having different names, somewhat like commits in version control.

Functional programming makes explicit, not only which variables you are accessing, but which version of it.

It may seem like you are copying variables every time you want to modify them but really you are just giving different mutations, different names. This doesn't mean things are actually copied in memory. Like git, the compiler doesn't need to make full copies for every versions. If it sees that you are not going to reference a particular mutation, it might just physically overwrite it with the next mutation. In the background "var a=i, var b=a+j", might compile as something like "var b = i; b+=j";


I encountered a large company where they had a private git server for their engineering teams.

Over time someone discovered that the number of repositories and usage was much greater than they expected. What they found was that non engineering folks who had contact with engineering had asked questions about how they manage their code, what branches were, and etc. Some friendly engineering teams had explained, then some capable non engineering employees discovered that the server was open to anyone with a login (as far as creating and managing your own repositories) and capable employees had started using it to manage their own files.

The unexpected users mostly used it on a per user basis (not as a team) as the terminology tripped up / slowed down a lot of non engineering folks, but individuals really liked it.

IT panicked and wanted to lock it down but because engineering owned it ... they just didn't care / nothing was done. They were a cool team.


Unfortunately git does not handle binary files elegantly (unless you use git-lfs). You can inflate storage rapidly by, say, editing a 10M zip file a few times. I've had to GC more than one repo where someone accidentally added an innocuous binary file, and the next thing you know the repo has exceeded 2G of storage space.


> I've had to GC more than one repo where someone accidentally added an innocuous binary file

My god, the things I've seen in repos. vim .swp files. Project documentation kept as Word documents and Excel spreadsheets. Stray core dumps and error logs in random subdirectories, dated to when the repo was still CVS. Binary snapshots of database tables. But the most impressive by far was a repo where someone had managed to commit and push the entirety of their My Documents folder, weighing in at 2.4GB.


If you crawl a package repository such as PyPI, you will find a lot of that same stuff in packages as well. Which is even weirder because those are created from a setup.py which does not have a `git add .` equivalent. People are not good at building clean archives.


I found git-lfs to be a huge pain, since the "public" server implementations are basically github and gitlab. We have plain git repos (via NFS/ssh plus bugzilla hooks), so we either have to use some random-github-user's virtually unmaintained implementation or roll our own - both not the best options. On the other hand, we put our custom built GCCs plus sources into a git, and trust me, having a 8GB repo (after a few version bumps) is really annoying, so having git-lfs would be plain amazing.

(I checked this out the day before I left for vacation, so to be fair, my research might have not been thorough enough to find each and every implementation - but I think it is comprehensive enough to make some preliminary judgement)


Did you try the lfs-test-server?

https://github.com/git-lfs/lfs-test-server


we've got Bitbucket's LFS pointed to our Artifactory server. not the cleanest solution, but haven't had any major problems on over a year.


External hosting is not an option for us ;) The gccs are the biggest pain point, but customer projects plus binaries are the other - and those are just too sensitive to be pushed into someone's cloud.


Bother our Bitbucket and Artifactory instances are internally hosted.


Luckily storage is getting cheaper. I do wish someone hadn't checked a custom-built nginx binary into ours though.


Even with infinite storage, having lots of blobs can make a repo unmanageable. In order to get an 8GB repo onto github, I had to make temporary branches and and push them incrementally.

I highly recommend git-annex. It is like git-lfs but a bit less mature but much more powerful. Especially good if you don't want to set up a centralized lfs server.


Yea, I recommend git-annex too.


It's not just a question of storage, as the size of the repository increases git starts having a hard time dealing.

Binary files don't cause the issue, but because binary files don't deltify / pack well significant use of them makes repos degenerate much faster.


I heard of a web consultancy around 2006 where the Subversion repository history contained a full copy of the Rolling Stones discography in MP3.


The real genius of git is the clear, concise user interface for that data structure.

(for those with a severe sugar hangover, I'm being a little bit sarcastic)


Quoting myself from yesterday:

https://news.ycombinator.com/threads?id=miohtama

"Git is what a version control UX would look like if it were written by kernel developers who only knew Perl and C"

Back in a day we had Subversion, Mercurial, Bazaar, some others. I used all of these. All of them were more coherent than Git. However they were slower - but not much - and they were not used by the most popular software project in the world. Then, GitHub popularized git and Github become well funded enough to take over the software development world.

Bitbucket, now Atlassian, started as a hosted Mercurial repos. Bazaar was DVCS for Ubuntu, developed by Ubuntu folks.

Will we see another DVCS ever again? I hope yes. Now all software developers with less than 5 years of experience are using Google and GitHub as the user interface for Git. Git's cognitive burden is terrible and can be solved. However Git authors themselves are not priorising this.

Software development industry could gain a lot of productivity in the form of more sane de facto version control system, with saner defaults and better discoverability.


> Will we see another DVCS ever again? I hope yes.

As just the first example off the top of my head, Pijul [1] is "new" compared to the others you listed. There will likely always be folks exploring alternatives.

> Now all software developers with less than 5 years of experience are using Google and GitHub as the user interface for Git. Git's cognitive burden is terrible and can be solved. However Git authors themselves are not priorising this.

I have the opposite impression, that the Git Team is finally getting serious about the UX, whether its just all the fresh blood (thanks at least partly to Microsoft moving some of their UX teams off of proprietary VCSes to converge on git), or that Git's internals are now stable enough that the Git team feels it is time to focus on UX (as that was always a stated goal that they'd return to UX when all the important stuff was done).

Clear example: The biggest and first announcement in the most recent Git release notes was about the split of `git checkout` into `git switch` and `git restore`. That's a huge UX change intended to make a lot of people's lives easier, simplifying what is often people's most common, but most sometimes most conceptually confusing git command given the variety of things that `git checkout` does.

The Git UX is better today than it was when it first "beat" Mercurial in the marketplace, and there seems to be at least some interests among git contributors to make it better.

[1] https://pijul.org/


> However they were slower - but not much

I don't think we should underestimate how much performance differences can color opinions here, especially for something like CLI tools that are used all the time. Little cuts add up. At work I use both git and bazaar, and bazaar's sluggishness makes me tend to avoid it when possible. I recall Mercurial recently announced an attempt to rewrite core(?) parts in Rust, because python was just not performant enough.


Related to this here is some discussion from Facebook, Microsoft, and why Facebook is using Mercurial over Git:

https://news.ycombinator.com/item?id=16565299


The speed of branching in git, compared to subversion, was a huge part of convincing me to move. Not an entirely "fair" comparison given that subversion is a centralized VCS, but speed is very important.


Isn't subversion's take on branching, as well as tagging, actually copying/duplicating whole directory trees?


This is the interface you're presented with, but it's actually a "cheap copy" underneath. So if you write "svn cp https://svn.myserver.com/trunk https://svn.myserver.com/branches/foo" that takes about 1-2 seconds in my experience (no matter how many files, how long the history, or how many binary files you have, etc.).


Likewise, Git has been steadily improving on the user interface front for years. It's much better than 1.0 was, but it's still not the easiest DVCS to learn.

In my experience, saving a little on runtime doesn't make up for having to crack open the manual even once. UI is a big cut.

From an implementation POV, it's also generally easier to rewrite core parts in a lower-level language, than it is to redesign a (scriptable, deployed) UI.


> From an implementation POV, it's also generally easier to rewrite core parts in a lower-level language, than it is to redesign a (scriptable, deployed) UI.

But only if the data structure is simple and works well for the problem domain. Bazaar (a DVCS from Ubuntu; I mean bzr, not baz) had a much simpler and consistent UI, but it had several revisions to the data structures, each one quite painful, and it was slow; they were planning the rewrite to a faster language but never got to it. (Mercurial also used Python and wasn't remotely as slow as bzr - the data structures matter more than the language).


>In my experience, saving a little on runtime doesn't make up for having to crack open the manual even once.

Please don't ask me to give up features so you don't have to read the documentation. It is the most basic step of being a good citizen in a software ecosystem.


> In my experience, saving a little on runtime doesn't make up for having to crack open the manual even once. UI is a big cut.

Maybe, but I used to work in a multi-GB hg repo, and I would have given up any amount of manual cracking to get the git speed up. Generally you only open the manual a few times, but you can sync many times a day. I'd give a lot to get big speeds up in daily operations for something I use professionally.


At my current place of employment, I don't find git to be super performant. We do have an ugly monolith of a repo though.


Unless you are hitting large binary checkins and not utilizing LFS I can't imagine a scenario where another current RCS would perform better.


"Large binary checkins" is not the actual issue. Git degrades as repository size increases. Large binary checkins make it much easier / faster to reach this situation, but you can also reach it just fine with regular text-based repository if they're big and have a significant history (long-lived and many contributors).


Unless you mean another DVCS, P4 can run circles around git on large repos (mostly on account of not conceptually trying to copy the entire state of the entire repo at every commit).


Run a reconcile offline changes on an actually large repo and come back and tell me that again. ;)


> However they were slower - but not much

CVS was much much much slower; multiple branch handling was horrible until ~2004 (and even on a single branch you did not have atomic commits). Also, no disconnected operation.

SVN was only a little slower than git, but didn't have disconnected operation, and horrible merge handling until even later (2007 or 2008, I think)

Bazaar 2 was, at the time, while comparable in features, dead slow compared to git. But it also sufferend from bazaar1 (branched from arch=tla) being incompatible with bazaar2 and an overall confusing situation.

Mercurial and Git were a toss-up. Git was faster and had Linus aura, Mercurial had better UI and Windows support. But all the early adopters were on Unix, and thus the Linus aura played a much bigger part than the Win32 support.

Github became externally well funded after the war was over. But it was self well funded, because git was more popular (in part because github made it so ...)

Really, I think the crux of the matter is that Git's underlying data model is really simple, and the early adopters were fine with UX ... mostly because those adopters were Perl and C people. So the UX was not a factor, but speed and Linus aura were.


My experience was very different. SVN was far slower than git. It was slower for me to checkout my company's SVN repo at head than to use git-svn to clone the entire SVN history locally. And from then on, most git operations were effectively instant, save for pushing and pulling to SVN.

The killer feature though was that git didn't put my data at risk while I worked. With the normal SVN workflow, your working directory was the only copy of your changes. And when you sync'd with upstream it would modify the code in your working directory with merge information. Better hope that you get that merge right, because there's no second chances. Your original working directory state is gone forever, and it's up to you to recreate it with the pieces SVN hands you.


I have seen in the wild at a previous job, a repo with over 200m commits, but, also in the repo, single commits with over 500m lines changed diffs. and git on modest hardware would get through it. slowly, but eventually.


A very good insight, sir! Your memory serves better than mine.

However I believe in long run Hg caught up in the speed and Bazaar was getting a lot of better as well.

SVN merge was nightmare. People avoided doing work that would result a merge as it hurted to get it executed nicely.


My experience was that SVN was a lot slower than CVS, from using the Apache and FreeBSD repos from the UK. The chatty protocol suffered a lot from transatlantic round trip times.


Fossil. It's very fast, works well on low bandwidth connections, much cleaner interface, highly customizable, very easy to setup and self host. And the repo is stored in a SQLite database, so it is very easy to backup and explore w/ SQL. It's a shame it isn't more widely used.


I love much about Fossil and used it a lot several years ago, but my workflow have mental model have since come to rely heavily on interactive rebase, a philosophy that Fossil abhors.



I don't understand why people aren't using Fossil[1]. It's still simple, while maintaining a consistent and sensible user interface. It might not be as flexible, but the repos are just sqlite databases, and Sqlite can be called from almost any language, so there's huge potential for custom tools that solve a specific use-case etc. It's main advantage, though, is that it's about much more than just code. Issues (called tickets), Wikis or even Forums can be a part of your repo. That means there's absolutely no vendor lock in. In fact, you can host your repos by just SCPing a repo file to a public server. You can also collaborate on issues offline etc. It's written by the Sqlite guy, so it's highly reliable and well documented, upto the technical details like file formats etc. It's designed so that repos can last for hundreds of years. The C code is also of very high quality.


> I don't understand why people aren't using Fossil

For the same reason that BD won over HD-DVD: «Greater capacity tends to be preferred to better UX», except in this case it's performance rather than capacity.


I'm going to say this again and take the downvotes but it's comments like this that generally come from people who don't get git.

git is not the same as those other pieces of software mentioned.

git's default workflow encourages lots of parallel work and making tons of branches (which because of bad naming are confusing because git branches are not what other software calls branches) .

it's a fundamental difference and has increased my productivity and changed my work style for the positive in ways what would never have happened with CVS, svn, p4, hg, etc... all of which I used in the past for large projects.

If you're using git and your mental model is still one of those other systems you're doing it wrong or rather you still don't get it and are missing out.

I'm not suggesting the UX couldn't be better but when you finally get it you'll at least understand what it's doing and why the UXs for those other systems are not sufficient.


> Back in a day we had Subversion, Mercurial, Bazaar, some others.

I don’t agree bazaar was a UX panacea over git, and it was not just “not by much” slower. Subversion was a piece of shit full stop (especially if you had the misfortune of using the original bdb impl), bested in this regard only by VSS. I think slower “not by much” is the understatement of the century for a repo of any substantial size for all but mercurial on your list.

You don’t even mention perforce, leading me to think most of your experience is skewed by the niche of small open source projects.

Mercurial was a contender... great windows support too. I think it was less kernel that killed it and more github.


Bitbucket started about the same exact time as GitHub. It's not necessarily a given that Mercurial lost because of GitHub.

I think it was perceived performance that led git to besting Mercurial, which the Linux Kernel team certainly contributed to that drama, including the usual "C is faster than Python" one-upmanship, this especially funny because it was despite most of git at the time being a duct taped assortment of nearly as much bash, perl, awk, sed scripts as C code.


>Software development industry could gain a lot of productivity in the form of more sane de facto version control system, with saner defaults and better discoverability.

So write it.

I'm sorry to be so dismissive but it seems notable that those who like git get along with using it while those who complain about it just throw peanuts from the gallery. If it's obvious to you where git's flaws lie, it should be easy to write an alternative. If saner defaults and better discoverability are all you need, you don't even have to change the underlying structure, meaning you can just write a wrapper which will be found by all the competent developers whose productivity is so damaged that they do what they do when they encounter any problem and search the internet for a solution.

It seems notable this hasn't happened.


About a year ago I dropped into a place that was still using SVN. Now they're switching to Git. This experience has really shown me how much SVN just gets out of the way compared to git-- much less I had to think about when using it.


> However they were slower - but not much

Depends, we went from CVS to Git and nightly jobs tagging the repository went from taking hours to being almost instant.


What don’t you like about git?


7000 votes on https://stackoverflow.com/questions/4114095/how-do-i-revert-...

If one cannot figure one of the most common use case of a version control system without Googling a StackOverflow answer then we have a problem somewhere.


Reading the answers to that Stack Overflow question provides great insight into why git is so successful. "One of the most common use case" is actually several closely related use cases, and git has one-line commands to cleanly handle all of them.

I will say from experience that it's not hard to use git productively with a bit of self-study and only a few of the most common commands. You still have to understand what those commands actually do, though.


This person didn't know (or at least didn't know how to say) which of the several "most common use cases" they wanted to actually accomplish. I think most of the value of this question comes from the distinctions the top voted answers make between "temporarily navigate to", "delete local progress", "publish the reverse of the published changes"; all three of these are very common operations. The actual commands git uses to accomplish these aren't important, and this question should be popular in any distributed version control system. It doesn't matter how much sense the names of your commands make, someone starting out won't know that these three things can even be accomplished.


To be fair, 'revert' is too vague a term, and the very first sentence of the chosen answer asks what the asker meant. I think the answer is quite clear and concise once the question is clear.


    man git
The problem is people are unwilling to read the documentation. I have little patience for them demanding I change my workflow to accommodate their sloth.

Fortunately, I don't have to worry because the overlap of 'people who don't RTFM' and 'people who are capable of articulating how they want to change git' have so far failed to write a wrapper that's capable of manipulating git trees without frustrating everyone else on the same repo.

And of course they can't: version control[1] is not a trivial problem. So I see no reason for us to demand that someone knows how to do it without studying when we don't expect the same for other auxiliary parts of software development such as build systems or containerisation or documentation.

[1] As opposed to the backup system the link wants to use it as: asking better questions is another important step. There's little reason to checkout an older commit as a developer unless you want to change the history, in which case it's important you understand how that will interact with other users of the same branch. If you don't need it to be distributed, you already have diff or cp or rsync or a multitude of other tools to accomplish effective backups.


I am a big fan of git but honestly, if you can't recognize that there are unlikeable things about it you're suffering from some kind of stockholm syndrome. Just start with the fact that several of the most common actions / commands are named in ways that are either directly misleading or at very least severely conflict with standard use of common version control terms.

(one of my favorite, for example, is that `git checkout` causes silent data loss while every other git command will print out giant errors in that scenario)


You can't checkout if you have tracked changes. If you mean you lose untracked changes, then a) it's unsolvable in the general case unless we all start doing out-of-source builds, so don't have to worry about build artefacts and b) it's already solved by git-worktree, so if you haven't RTFM, adding new features won't matter anyway.


Using it, probably.


The trick to suffering the git user interface is using magit in emacs. Even if you don’t usually use emacs, it’s probably worth installing it and setting it up with a command to start it up straight into magit.

Otherwise I’m hoping for pijul to somehow gain popularity (and a bit of polish) and become mainstream. I guess a motto for it could be “the high quality user interface and semantics of darcs without the exponential time complexity”


There's a few good UIs for git if you don't like its command line; along the lines of magit, I've recently been using fugitive in Vim and it's terrific. For the Mac, there's the free and open source Gitup, and of course there's a host of commercial clients.

But, having said that, I made my peace with the git command line years ago, in part by learning to appreciate aliases:

    co = checkout
    ci = commit
    dt = difftool
    mt = mergetool
    amend = commit --amend
    pfwl = push --force-with-lease
(The first two are my personal hangovers from Subversion.) I also have a "gpsup" shell alias which expands to

    git push --set-upstream origin $(git_current_branch}
The latter is taken from Oh My Zsh -- which actually has dozens of git aliases, most of which I never used. (When I realized "most of which I never used" applied to all of Oh My Zsh for me, I stopped using it, but that's a different post.)

tl;dr: I used to have a serious hate-on for git's command line, but one of its underestimated powers is its tweakability.


You don't even need git aliases for this, I personally use bash aliases for 90% of git use cases. Thus I type gc instead of "git commit", gd instead of "git diff", ga instead of "git add", etc.


I especially like the intuitive order of command line arguments


some intuition you have...


Try gitlab.


Wow, that is a beautiful post, thank you for writing it out that way...it makes me pine for VCS in my job.

Can you or someone else reflect on my file system? I work for the government doing statistical analysis of healthcare data, and there is no VCS where I code, other than how you name the files and where you put them in folders and how you back them up manually.

I am facing a major data-branching event where I'm going from ~40 scripts (R, SQL, STATA) on one dataset, to then three overlapping but different datasets and having ~100 scripts. I just don't know if my brain and wits are up to the task of maintaining these 3 branches with 3 different languages, given all I have is folder/file names and my knowledge reservoir and memory...

I know this is a perfect use case for git, but I've never used it before and no one else in my department uses it. I don't know if I have the time and energy left at this job to implement a new system of VCS AND reproduce my code for 3 different-but-similar projects.

Burnout approaches...


Tell management that your current approach isn't going to work for much longer, and say you have some ideas that might improve the situation.

Get your department to pay for you and ~2 colleagues to go on a git training course for a few days. As well as teaching you how to use git, it'll give you some time with an expert to look at your problem, and give you some relaxation time helping the burnout, and with 3 of you on the course, you'll likely get buy-in for a new setup.

Beware that git isn't a silver bullet. While it solves a bunch of issues, it causes many new ones - especially when you have lots of people who aren't knowledgeable about version control using it. I wish git had better integration with 'regular files' - ie. so that Mary in the marketing department can update a readme file without having to learn a totally new way of working. I wish you could "mount" a git repo as a drive in Windows, and all changes would be auto-committed to a branch as soon as a file is saved, and that branch were auto-merged to master as long as tests pass. Then people without git knowledge can work as before.


> wish you could "mount" a git repo as a drive in Windows, and all changes would be auto-committed to a branch as soon as a file is saved, and that branch were auto-merged to master as long as tests pass. Then people without git knowledge can work as before.

Cool idea for a project



Cool find, didn't know about that.

Does it only check files passing tests? I read quickly and didn't see that


You can use Git on your own without anyone else being affected. It doesn't require a server to add benefit. Learn to work with it and then introduce your coworkers later.


I've done exactly this ~4 years ago when I briefly worked at a place that used Subversion, after an acquisition. I wanted to be able to dick around in my own branches, with proper diffing and tracking and all, without updating the server, which appeared to be impossible (more or less). There was a git-to-svn I could use but considering how easy it was to screw up other people's state in Subversion, it made me nervous. So I just worked in my own, local git then copied the files to SVN when ready to commit something worth sharing.


It’s possible you can’t install it in the computing environment


Sublime-merge (the Gui git client from the sublime text people) is available in a portable version, and so can be run as a .exe from the filesystem, or a mountable drive. Comes with its own git binary.

The GUI is stunningly beautiful and functional, and there are more than enough keyboard shortcuts to keep things snappy once you're in the flow. I used to live and die by the terminal, now I am in love with sublime merge.

I used the portable version for a job where I didn't have install rights to the corporate laptop, and it preserved my workflow and kept me sane during my dev work. The portable version can run a little slow, but it's a pretty good solution.


I'm in a similar situation and the entire git for windows setup (including git bash that works beautifully with things like Windows network drives!) can be used without ever needing admin privileges. So I not only have git but also vim and perl and the whole *nix kit I was so sorely missing.

Some truly locked down environments may not allow it but if the poster has other open source tools like R they can probably run .exe files.


git is actually pretty easy to drop into a terrible methodology without too much disruption.

git works by creating its own .git directory wherever you create a new git repository, but doesn't touch the files and directories outside of it until you tell it to.

So you can have a directory of old code and you just cd to it and run 'git init', and now you have a git repository in the same directory. It won't be managing any of the files yet, but it will technically be there.

Because git is just a bunch of extra data in a .git directory, and because git is also built as a distributed VCS, the "make a copy of a directory to back it up" methodology actually works pretty OK with git. Ideally you should be using 'git clone' to copy your directories and 'git pull' to keep them in sync, but if you just Control-C Control-V your source code directory, git will actually be just fine with that, and you can still later use git to sync the changes between those two directories.

I'm not going to put a full git tutorial into this post, about how you add files to the repository and make commits, but I just want to convey that while git has a justifiable reputation for sometimes devolving into arcane incantations -- it's actually low effort to get started and you only need to learn three or five commands to get like 95% of the value from it.

Once you learn those three or five commands, you'll find yourself running 'git init' in nearly every directory you make -- for your random scripts, for your free time coding projects, for your free time creative writing projects -- and you'll even find it easy to use on horrible "27 directory copies of the source code with 14 file renames" projects where none of your teammates use git; you can use git yourself in such cases without adding any real friction, and it still helps you even if your teammates just copy your code directory or send you copies of their code directories.

EDIT: One other note: git can also go away easily if you decide you don't like it. You don't need to run git commands to create, edit, copy or otherwise modify the files in your code base, like you do with some other source control systems, so if you can just forget it is there if you are busy and don't want to worry about it, and then later go ahead and add or commit all of your work. If you really don't like it, you just stop running git commands and you're no longer using it: you don't need to 'export' or 'ungitify' your code base. So it's pretty low-risk in that way as well.


Other cool things about git being "just a directory full of files":

- you can put the git directory somewhere other than in your working directory, if you really want to. Or reference a bunch of .git directories in a series of commands without having to change your current directory. Sometimes this is handy (usually for automation or something like that).

- If you're nervous about some command you're about to run—something that might screw up your git tree—just copy the .git directory somewhere else first. You can copy it back to entirely restore your state before the command, no need to figure out how to reverse what you did (assuming it's even possible).


Wow, thank you for this, it is a gem of a comment. I truly want to implement this and I see a massive potential to improve what I do...but...

My brain is basically overloaded with stress and I'm headed for burnout...only 18 months into this position. I just can't handle the tech stack, the shitty office, the commute, the feelings of being the worst analyst and the worst researcher in every single room I'm in. It is totally wearing me down. Management said new employees can get work from home after 12 months, then at 18 months I asked, and they revoked their verbal agreement and said they'd reconsider their decision if I made an article and let someone else be first author on it (unethical).

Outside of my complaints...I'm just not a great worker. I just feel that the whole team and department would be better off without me, that I can not handle this tech stack and QoL and its frustrations...govt is a very very restrictive environment and I feel like a circle being jammed into the square hole. I can't implement most of what these comments stated because I can not install anything onto my computing environment...even Python, I have to go through red tape and request special access to use Python instead of R and STATA.

I'm sorry to vent but all of these shortcomings are seriously burning me out.


It's fine to vent; it's half of what the internet is for.

Since the internet is also for acting like you know what you are talking about and offering unsolicited advice, I'll also drop some here. Feel free to ignore it, and I hope you situation gets better, either at your current job or a new one.

I won't speak too much to your work skills, because I don't know you; but feeling like and worrying that you're terrible at your job is a pretty normal experience. You pretty much have to rely on whether other people think you are doing a good job because people in general are garbage at judging their own skill. It's pretty hard to tell the difference between "I think I'm doing poorly and am" and "I think I'm doing poorly and am actually doing fine", without a lot of feedback from people you trust (ideally, your coworkers).

If your coworkers think you're doing fine, well, you can't stop worrying about it, but you'll at least have some evidence against your feelings; if your coworkers think that you're under-performing, they might at least be able to offer some advice on how to do better.

The burnout advice I have to give is in three parts: first, focus on making some small, incremental progress every day; second, avoid the temptation to overwork; third, make sure to invest time in your life outside of work.

The first is both about positive thinking and also about developing good work habits. The second is because it doesn't usually work (you end up doing less with more time, which is even more depressing than feeling like you aren't getting enough done in 8 hours). The third is because you will feel better and be more resilient if your entire identity isn't invested in your job. It's easier to both to avoid burnout and to recover from it when it does happen if your job is only one part of your life.


> I'm just not a great worker.

I sincerely doubt that you. You sound like a conscientious employee in an environment not set up for the kind of work you were hired to do. You also sound like you want to leave your job - which can give you leverage. Not that you should threaten to quit, but that since you are so unhappy, you are willing to quit. That means you can start saying what kind of computing environment you need. Not want, but need.

Personally, I think that having source control is basic table-stakes when writing code as a part of a job.


I’m sorry to hear that. I’d recommend looking for a new job (if possible), the market is in your favour at the moment (edit: if you live in a big city in Europe or the USA).

Otherwise, another poster commented that a git training course paid for by the company could help (+ give you some relief from burning out).


Now you have hidden subfolders with .git in your source.

And remember git doesn't save directories that are empty.


Git being distributed means you can use it without any centralized "master"--your local repository contains the entire history.

And if git seems too difficult to start with, Subversion can also "host" a repository on the file system, in a directory separate from your working directory.


Agreed with this, SVN (short for subversion) is a good alternative.

I understood and was comfortable with SVN within a few minutes (using the TortoiseGit front-end, which I highly recommend).

I wrestled with git for months and at the end still feel I haven't subdued it properly. I can use it reliably but SVN is just so much friendlier.

So my suggestion is go with SVN + TortoiseGit. SVN is your butler. Git is a hydra that can do so much, once you've tied it down and cut off its thrashing heads.

It's not just me, our whole (small) company moved to it it burnt too much of our time and mental resources.

Edit: after learning TortoiseGit, learn the SVN command line commands (it's easy), and learn ASAP how to make backups of your repository!


SVN is easy, Git is simple.

Getting started with SVN is very quick, but once you need to peek under the hood, you'll find out it's super complicated inside.

Git is just the other way around: the interface is a mess, but the internals are simple and beautiful. Once you understand four concepts (blobs, trees, commits, refs), the rest falls into place.

Recommended four page intro to git internals: https://www.chromium.org/developers/fast-intro-to-git-intern...


I'll check your link, thanks.

Could you explain what you mean by svn being super complicated inside? I presume you mean from a user's not a programmer's perspective; I never found it confusing, ever.

It has it's flaws (tags are writable, unless that's been cured) but it's really pretty good, and far better than git for a beginner IMO.


I meant that SVN's internal concepts and workings are not simple. It's easy to use in the beginning, but it becomes difficult to even understand what's going on when you get into some kind of a tricky or unusual situation.

In Git, no matter how strange the situation, everything is still blobs, trees, commits, and refs. There are very few concepts used in Git, and they're simple and elegant.

SVN to Git is like WordPress to Jekyll - WordPress is easier to use than Jekyll, but Jekyll is simpler than WordPress.


I'm afraid I've still no idea what you mean. I've had plenty of confusion with git, and none that I can ever recall with SVN.

SVN's concepts are straightforward - commit stuff, branch, branches are COW so efficient, history is immutable unlike git (for better or worse) erm, other stuff. Never got confusing.


I'm not sure what you mean by SVN being super complicated? I've been using and administering it for years and it's just as straightforward as git (if a little easier because centralization is simpler to grok than decentralization).


Did you mean TortoiseSVN? TortoiseGit is a frontend for git, as the name implies, AFAIK it doesn't work with SVN at all.


(cringes)

Yes, I did. Thanks.


I faced a moment like this, where I realised I needed git to survive a big set of changes. (Though I was on SVN before, which was better than nothing but a far cry from git).

However, branching may not be the ideal solution given how you describe your issue. With git branches, we typically dont want to run something then switch branches then run something else. I would say branches are primarily for organizing sets of changes over time.

If you have multiple datasets with similarities, what you may need more than git is refactoring and design patterns. To handle the common data in a common way, and then cleanly organize the differences.

That said I would still definitely want all scripts in git. It is not that hard to learn, lean on someone you know or email me if you need to.


git for ML projects with data: https://dvc.org

In particular, dvc carefully handles large binary files.


Totally agree with this point, so much that I did a presentation on "Git Data Structure Design" two months ago for Papers We Love San Diego that hammers it home over the course of 50 minutes.

https://www.youtube.com/watch?v=fHSZz_Mx-Uo

To become a Git power user, it is far more beneficial to learn its underlying content-addressable-store data structure rather than explore the bazillion options in its command line interface. It is surprisingly easy to create a repository manually and then to start adding "blob" files to the store!


Fyi this is called a merkle tree. Also the data structure that blockchain uses and a couple other protocols. They’re wonderful and surprisingly easy to implement.

https://en.wikipedia.org/wiki/Merkle_tree


Wow, 1979. And patented.


Most ideas in CS are old as dirt. There was a flurry of theory advancements mid-century and a lot of work since then has been putting those ideas into practice.


Even back in renaissance days with the most advanced mathematics there were unknown prior art


I have a theory this is a consequence of Euclid not being taught anymore.


..so right about the time computers were invented, people figured out how to use them and did some basic research.


Patents from 1979 have expired a long time ago.


Git is great but all the major drawbacks stem from this design as well? Immutability and a full copy cause public rebase headaches, extremely large repos (before LFS) and the lack of partial checkouts to name a few.

Doesn't it show more of the drawbacks of this functional data structure?


I think rebases are causing so much pain that it is better to have tool support such that they are not needed. E.g. it was a revelation to discover that github doesn't care in PRs - just do a merge with master, done.

As for the working directory - yes, there could be more management around that. I'm not sure why the git community went for nested repos / submodules rather than partial checkouts. It's a different question than the data structure of the repo itself, though. Compared to other VCS it still seems miles ahead.

Large repos: It seems one could alleviate that by limiting the pull history (and LFS if needed), right?


It seems like git can also be used as an immutable database as well. Does anyone here have experience / comments on using git as a database backend?

EDIT: I found a couple of interesting references for folks who may be curious about this as well. I especially like [2] for its diagrams.

[1] https://stackoverflow.com/questions/20151158/using-git-repos...

[2] https://www.kenneth-truyers.net/2016/10/13/git-nosql-databas...


It seems you would want a hosted git installation like github / gitlab or VSO for that. The main concern to me is that API keys are usually too limiting (or too expensive). E.g. if you don't want to manage local state by having a working directory you need to do many API calls to make a change (create the objects, create the trees, then create the commit, then change the ref) so that it is almost more worthwhile to use the same data structure on a more generic backend like a NoSQL key-value store. I haven't done this though (though it's been on my wish list for long).

What it also doesn't give you for free is sensible search indexes. I do think though that combining it with a search index could be very powerful.


It is a beautifully simple data structure. I just wish its UI was that simple.


If I had known that from the start, I would have found git much easier to learn!


But git also has (and most use cases demand) the concept of tree deletion.


eh, it's just copy-on-write - an ancient OS technique. You can call it functional if you like but the idea is old and has been applied very generally for decades.


database version of this is datomic (closed source). Are there any others?

browser version of this is datascript (oss).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: