Hacker News new | past | comments | ask | show | jobs | submit login
5% of 666 Python repos had comma typo bugs (inc V8, TensorFlow and PyTorch) (codereviewdoctor.medium.com)
360 points by rikatee on Jan 7, 2022 | hide | past | favorite | 327 comments



The high-level goals of python end up creating these little syntactic landmines that can get even experienced coders. My personal nomination for the worst one of these is that having a comma after a single value often (depending on the surrounding syntax) creates a tuple. It's easy to miss and creates maddening errors where nothing works how you expect.

I've moved away from working in Python in general, but I think the #1 feature I want in the core of the language is the ability to make violating type hints an exception[1]. The core team has been slowly integrating type information, but it feels like they have really struggled to articulate a vision about what type information is "for" in the core ecosystem. I think a little more opinion from them would go a long way to ecosystem health.

[1] I know there are libraries that do this, I am not seeking recommendations.


A lot of people in this thread are using this to make fun of Python, but the exact same issue exists in something like c++, here's some I fixed recently:

https://github.com/UWQuickstep/quickstep/pull/9

https://github.com/tensorflow/tensorflow/pull/51578

https://github.com/mono/mono/pull/21197

https://github.com/llvm/llvm-project/pull/335


I didn't understand anyone to be saying that Python is the only language to have this flaw.

Also, I personally don't mind this approach to string concatenation. I think it's a fine compromise between easy formatting and clarity. I was whining about a corner case of tuple construction - which as far as I know is not a feature of any other language.


A better compromise is to insist on this:

    (
       "one",
       (
          "a very very"
          "long long two"
       ),
       "three"
    )
And of course a,b should be syntactically invalid. It must be (a,b)


I think automatic string concatenation and singleton tuples were not introduced according to some high-level goal. They are just historical baggage. Automatic string concatenation comes from C, and the singleton tuple syntax probably just seemed like a good idea at first.

In hindsight, singleton tuples are not common or useful enough to deserve their own syntax. If the way to create them was something like this:

    t = tuple.single("hello")
we'd thing it's ugly or inconsistent, but definitely not confusing or bug-prone.


One place where singleton tuples used to be common is with the old "%"-formatting, specifically in the case where there is a single argument and its value might be a tuple:

    x = (1,2,3)
    #print("the value of x is %s" % x)   # breaks if x is a tuple
    print("the value of x is %s" % (x,)) # works even if x is a tuple
There is a readable way to create singleton tuples, without the sneaky trailing comma or a new function like tuple.single:

    tuple(["hello"])
The square brackets can be slightly annoying. I recall writing the following function to omit them:

    def tup(*args):
        return tuple(args)
This basically lets you use the usual tuple syntax, just prefixed with the word "tup". The advantages are that you don't need a trailing comma for singleton tuples, and it's more obvious that a tuple is being created (it can be difficult to distinguish between tuple literals and parentheses used for grouping in a complex expression).

I am reminded of a somewhat similar issue with empty set literals: {1,2} is a set, {1} is a set, but {} is a dict. The way to create empty sets is using set().


I’ve been writing Python professional full time for 8 years and still occasionally make the trailing-comma-tuple mistake. These days at least I’ll recognize and be able to find it quickly rather than wasting time. Can be caught with a linter, but not every codebase is readily linted.


The lack of a static type-system is IMO what makes these one-character mistakes very annoying. The compiler can't tell you something is wrong, so you're just left to figure out why things are broken, just to realize it was the smallest of typos.


I love how simple and forgiving Python is for small projects. The "trailing comma creates a tuple" situation comes out of, as far as I can tell, a desire to create maximally convenient syntax in the scenarios where tuples are intended. I think that's great for small code!

I just wish that the core team would take that same zeal for a "pythonic" experience with small code and use it to develop more scaled-up systems for dealing with larger code bases. My idea is to enforce strong pre-conditions on function calls using type hints, but I am sure there are other ways to do it.


For a language that is so incredibly picky about it's whitespace rules, it's a little laissez faire on the string-concatentation/tuple syntax side. I say this as someone who loves python and uses it extensively.


If you use mypy (as anyone should for any non-hobby Python usage) then Python has one of the strongest type systems available. Optional types, generics, "Any" escape hatches, everything you could want.


mypy is a great project and I agree that basically every project at scale should use it. However, I think you're wrong about the strength of the Python type system and what a good type system can "get" you. I think mypy both does an amazing job at static checking and that more powerful type systems go far beyond static checks and into changing how you structure and write code. The newly introduced "structural pattern matching" they just introduced[1] is an example of the kind of feature that could be usefully expanded by making type a first-class part of the Python runtime.

Again - the dynamism of Python means teams can write amazing extensions to Python (like mypy), but that isn't a replacement for the core team having a plan for how they think typing information should be used at runtime. Their current answer seems to be "nothing," which disappoints me.

[1] https://www.python.org/dev/peps/pep-0622/


> one of the strongest type systems available.

This is simply not true - Python with mypy isn’t even as strong as Typescript, let alone Rust, F#, Haskell and so forth.


Would mypy have caught any of the issues highlighted in the article?


No. Mypy only cares about types, it would only have been caught if something was expecting tuple of certain length, otherwise not.

The problem in the article is more related to syntax, not types, with the problem that both forms are valid syntax with different but still very similar outcome.

Pylint on the other hand can find it with implicit-str-concat check enabled.


The "trailing comma creates a tuple" bug actually comes from a disconnect between what people think defines a tuple (parenthesis) and what really does (comma). I always put parenthesis around a tuple for clarity.


whats the reason for allowing like

foo,

to be a tuple? why not make this a syntax error? is there a use for single value tuples?


Yes, single tuple values are useful for example for passing to a function which expects a tuple (because it might expect zero, one or multiple values).

The thing is, your example is the way tuple should be defined. The parenthesis are merely allowed (and ignored). Why? I see this as a mistake of language creators. But to be fair, it is difficult to make a perfect language (or anything really), and Python is pretty close imho.


C lets me do this, and doesn't say much about it.

  char ch_arr[3][10] = {
      "uno",
      "dos" 
      "tres"
  };


C's type system is neither very static nor outstandingly helpful by today's standards, yes.


On the other hand a good type system doesn't:

    # let ch_arr = [
      "uno";
      "dos"
      "tres";
    ];;
    Error: This expression has type string
           This is not a function; it cannot be applied.


This is not a good type system. It's a bad language where you can invoke functions without parentesis


Why should function arguments be delimited by parentheses? We don’t do that in Bash or Objective-C, for example


to be able to detect the difference between "this expression is a function" and "this expression is the result of the computation of a function"

the parentheses are not for function arguments, are for "invocation".


> to be able to detect the difference between "this expression is a function" and "this expression is the result of the computation of a function"

If function (or method) arguments don't require parenthesis, referring to (rather than calling) a function/method usually requires quite distinct syntax, so it's quite easy to it apart from a call.

It may not be familiar to people coming from languages where no-parens refers to the function and parens call it, but being clear and distinct and being intuitive to people indoctrinated in contrary syntax are not the same thing.

E.g., in ruby (which has methods but not functions in the strict sense) I can call a method with:

  thing.square # or thing.square()
Or access the corresponding method object with:

  thing.method :square # or, thing.method(:square)
Either of the former options are distinct from both of the latter.


A syntactic feature like this can only be judged in the context of the language. I'm sure there's languages where parentheses around function arguments prevent ambiguity, but there's also languages where it's very unambiguous, or the opposite might be true.

For other examples of languages where invocations don't use parentheses for arguments, OCaml and Haskell. In fact, I'd argue that if they tried to add that feature to those languages (parens around arguments to a function), it'd make things very confusing given the way functions and tuples work.


Can you show an example of how they could be confused?


What does this do?


An array of char arrays. But with the missing comma, it does something similar to what Python does in the linked article. Instead of ("uno", "dos", "tres"), you get ("uno", "dostres").


Fully agreed. If python had a proper static type system, those typos would hardly matter, and you'd have the best of both worlds: Convenient, concise syntax, but still confidence in your code.

I say "had a proper type system", but actually it turns out that it does have something like that: When I use python for anything else than a most tiny script now, I use "mypy"[1] which implements static typing according to some existing Python standard (whether that came about because of mypy or the other way around, I don't know).

It is so, so good to have mypy telling me where I messed up my code instead of receiving a cryptic, weird runtime error, or worse, no error and erratic runtime behavior. Because not knowing that a particular type is unexpected and wrong, values often get passed along and even manipulated until the resulting failure is not very indicative of the actual problem anymore.

[1] http://mypy-lang.org


I’m not clear how a type system would pick up a missing comma in a list of strings, unless the type was specific enough that the contents of the list or the length was encoded in the type.


True, in this particular case that would only help for fixed length strings, which is far from the encompassing case. I was thinking more generally and lost what the actual issue here is about.


I feel like it's been pretty clear from day one that type hints are meant for static analysis with tools like mypy. It's not exclusive to that use and has a lot of other possible applications, but the primary goal has always static analysis.


I’d rather a compile time error over an exception (or both), which in many cases can occur. I know mypy does this, maybe I should alias python=“mypy&&python”


For those looking to avoid this specific problem, there is a flake8 rule: https://pypi.org/project/flake8-no-implicit-concat.

More broadly, the https://codereview.doctors makers are making the point that their tool caught an easy-to-miss issue that most wouldn't think to add a rule for. A bit of an open question to me how many of those there really are at the language level, but still seems like a neat project.


Also all but 1 of the issues they found relates to test code, it seems people are a little less careful compared to functional code.

Also in terms of mistakes codereviewdoctor twice linked to the same issue in their blog https://github.com/tensorflow/tensorflow/issues/53636 and raised the PR to the wrong project https://github.com/tensorflow/tensorflow/pull/53637 (I guess Tensorflow vendors Keras, easy mistake)


https://github.com/tensorflow/tensorflow/tree/0d8705c82c64df...

    STOP!
    This folder contains the legacy Keras code which is stale and about to be deleted. The current Keras code lives in github/keras-team/keras.

    Please do not use the code from this folder.
Yeah, not the most obvious notice.

The fact they didn't find the same mistake(s) in keras-team/keras (I assume they scanned, it's one of the most popular Python repo) makes me believe these issues have been fixed/removed in up-to-date karas repo.


once tensorflow pointed to keras-team this happened

https://github.com/keras-team/keras/issues/15854

resulting in

https://github.com/keras-team/keras/pull/15876


The automatic bug report generation tool produces the following:

"Absent comma results in unwatned string concatenation on line 330"

Bug-ception!


> all but 1 of the issues they found relates to test code, it seems people are a little less careful compared to functional code.

Also a factor that bugs in functional code are more visible, both during development and to users once shipped. So there may have been an equal number or more such bugs in the non-test code, that just didn't remain in the code base for this long.


Ime, Black will add parenthesis to clearly and explicitly indicate a tuple where there is trailing comma. Figured this out when I made the trailing comma mistake and wondered why Black kept reformatting my code.


Black rules. I love it that I don't need to have a discussion about style with anyone when Black is used on the project.


TBH I think every language should have a longer like this and teams should just apply it and never need a discussion about formatting.


The URL in this comment has an incorrect TLD: it should be `doctor` (singular).

https://codereview.doctor/


there is also https://pypi.org/project/flake8-tuple/

typo in the url (or in HN's markup) btw: it's https://codereview.doctor


The removal of implicit string concatenation was proposed for Py3k[1], but was rejected.

[1] https://www.python.org/dev/peps/pep-3126/


The rejection notice seems completely counter intuitive to me. How is adding a plus "harder" compared to removing a foot gun?

> This PEP is rejected. There wasn't enough support in favor, the feature to be removed isn't all that harmful, and there are some use cases that would become harder.


This change would break a lot of legacy code for no good reason

The most common way to split a string in lines is using this concatenation formula.


> This change would break a lot of legacy code for no good reason

Preventing a bug that occurs in 5% of observed codebases (and anecdotally, happens to me during development all the time) seems like about as good as reasons get.

Swapping a perfectly fine print statement for a function, on the other hand… that’s the breaking change in Py3k that’s never seemed worth it to me.


I've never heard from Guido on this, but I've always felt that he created the print keyword in the very early days, just because it was easy and he always thought the language would be a niche small language. But, as the popularity of the language increased, the print keyword just stand out as a sore thumb and he just had to fix that.


But wasn't this proposal part of the move to python 3? strings where broken left and right anyway.


Right, there was lots of deliberate breakage, _and_ this is purely syntaxual hence the sort of thing 2to3 could trivially deal with.


> the sort of thing 2to3 could trivially deal with

2to3 could also trivially add +, and if anything, that would actually help surface these kind of bugs, because if you randomly see a + in the middle of your list of strings, it's much easier to spot the bug than if there was a missing comma.


> The most common way to split a string in lines is using this concatenation formula.

Is it really? I tend to avoid it in favour of ””” or ‘\n’.join(<list of lines>), because it looks like a mistake.

Triple quotes are kind of annoying if the string is indented, but you can just not indent the string to avoid the whitespace.


Both of your solutions are great but don't fully cover the use case. They are useful for multiline strings, but implicit concatenation is also often used to break long strings that may not have newlines.


In y opinion that would be better served with ‘’.join( ‘hello’, ‘world’)

No footgun potential, and as others have mentioned the “good usage” would often be bad simply because it ends up looking like a mistake even if it’s intentional.


I use it, personally. The other two options I find too aesthetically displeasing: not indenting the string looks bad when it's within an indented block of code, and using join and putting the strings in a list is just too much boilerplate. I will use """ if I don't care about the extra space put at the start of each line by the indentation.



Does Python support the concept of allowing code to opt in to new safety features? I can understand rejecting something like this for the sake of legacy compatibility (something Python has abandoned too readily in the past), but it seems like an option—or maybe even a default—might be nice.

I suppose this is also something you could catch with a linter?



I'd say that's a "kind of", since it implies the feature will eventually become mandatory. I was thinking more along the lines of Javascript's 'use strict';


No, there's a general aversion to "use flag" features among the Python core dev due to not wanting to support multiple versions of Python behavior and how they may interact over the long term.

"from __future__" is meant to only ever be used temporarily with a specific Python version slated for it becoming the default behavior.

This discussion about flags has come up recently as part of the debate of accepting PEP 649 or PEP 563 or something else continues. If the Steering Council does not accept PEP 563 it will need to be figured out how to deprecate "from __future__ import annotations" without making it the default and how to implement it's replacement.


Most of the "bugs" caught here (including in TensorFlow and in my own project, Xarray) seems to actually be typos in the test suite. This is certainly a good catch (and yes, linters should check for this!), but seems a little oversold to me.


Same :P I'm actually responsible for one of these (https://github.com/pytorch/pytorch/issues/70607), but it's a typo in a list of tests to skip.


A typo in a list of tests to skip means tests are run that are not intended to be run. This can lead to unexpected failures, so in my opinion is not the same as the errors in test suites where tests run with other test data than intended but should still pass.


Literally the second item in the "Zen of Python" (https://www.python.org/dev/peps/pep-0020/):

Explicit is better than implicit.

And yet, s = ["one", "two" "three"] will implicitly and silently do something, that is probably wrong most of the time.


I mean the zen being wrong is kind of a meme at this point. The whole “only one obvious way to do it” isn’t just false but the exact opposite is true. Python is one of the most flexible languages with many many ways to do the same thing; more than any other language I can think of.


Notice that, in the original quote,

    There should be one-- and preferably only one --obvious way to do it.
the author used two different ways of hyphenating (three, if you count the whole PEP 20). PEP 20 is clearly not meant to be taken as law. Nor PEP 8. Nor PEP 257.

People frequently mistake "one obvious way" with "one way". There are lots of ways to iterate through something, for example, but there is really one obvious way. And the philosophy here still applies: when you read anyone else's python code, the obvious way is probably doing the obvious thing. I think that is the more appropriate takeaway from PEP 20.


> the author used two different ways of hyphenating

No, first, it doesn't use hyphenating at all, it uses hyphens as an ASCII approximation for typographical dashes used to set off a phrase (a distinct function from hyphenation), and, second, in that quote they used one way of doing it: “two dashes set closed on the side of the main sentence and set open on the side of set-off phrase”.

It is an unusual way of doing it—just as with actual typographical dashes, setting open or closed symmetrically would be more common—but it's not two ways.

EDIT: And the third use (in the heading and later in the body) is seperating parts where neither is a mid-sentence appositive phrase, and uses open-on-both sides. So that's not a different way of doing the same thing, it's a different way of doing a semantically different thing.

Actually, I think the dash use makes a good illustration of how the “it” in “one way to do it” is intended.


> “two dashes set closed on the side of the main sentence and set open on the side of set-off phrase”.

Eh, I don't think that's the interpretation the author was going for. The author wanted to show two different ways of approximating a dash, and he had limited options.

If he'd done this-- for example-- he would have been showing one way, not two.

If he'd done this --for example-- you would have called it "two dashes set open on the side of the main sentence and set closed on the side of set-off phrase".

If he'd done this-- for example -- it would have been too obvious (on the same line).

I suppose he could have done this-- for example--but I still think that would have been too obvious. You're not supposed to see it on a first read.

> And the third use (in the heading and later in the body) is seperating parts where neither is a mid-sentence appositive phrase, and uses open-on-both sides. So that's not a different way of doing the same thing, it's a different way of doing a semantically different thing.

It's a different use of a dash, but it's still a place where you'd typically use a dash.

-----

Edit: You know what, thinking about it again—perhaps both interpretations are valid. That almost adds to the effectiveness of the whole thing.


It's not even obvious how to run Python or dependencies in the first place. Even putting aside the 2.7/3.x fiasco (that still causes problems even today), you're left with figuring out wheel vs egg vs easy-install vs setuptools vs poetry vs pip vs pip3 vs pip3.7 vs pip3.8 vs piptools vs conda vs anaconda vs miniconda vs virtualenv vs pyenv vs pipenv vs pyflow.


it's like you read my mind.


> And the philosophy here still applies: when you read anyone else's python code, the obvious way is probably doing the obvious thing.

I don't get what you mean by this.

When I read someone else's code, what is obvious to me isn't necessarily what was obvious to the author. For an illustration of this, have a look at the day 1 solution thread from this year's Advent of Code - https://www.reddit.com/r/adventofcode/comments/r66vow/2021_d... (you can search for Python solutions) - and see how many different ways there are to solve a fairly straightforward problem.


I can think of at least 2 obvious ways to iterate through something: for loops and comprehensions.


You're right that both iterate through something but `for` loops and comprehensions aren't used as if they were interchangeable.

For example, you'll sometimes see people do bad stuff like this:

  >>> lst = []
  >>> 
  >>> [lst.append(i + i) for i in range(10)]
  [None, None, None, None, None, None, None, None, None, None]
  >>> 
  >>> lst
  [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
  >>> 
When they should be doing this:

  >>> lst = []
  >>> 
  >>> for i in range(10):
  ...     lst.append(i + i)
  ... 
  >>> lst
  [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
  >>> 
Or just this:

  >>> lst = [i + i for i in range(10)]
  >>> 
  >>> lst
  [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
  >>>


The first append version will more often be in a loop. It's unlikely that someone will know enough to use comprehensions but not enough to still use append.


Agreed. I've mainly seen the first `append` version in code written by people who've just discovered comprehensions and code golf.


    lst = [range(0, 10, 2)]


That's wrong in multiple ways. You want

    lst = list(range(0, 20, 2))


Ohh, yeah, you're right.


Even simpler!


To generate a list/dictionary/geneator from an input iterable, you use a comprehension of the appropriate type.

To iterate through it without doing one of those things, you use a for loop.

In “one obvious way to do it”, “it” refers to a concrete task; the same is not necessarily intended to be true of arbitrarily broad generalizations of classes of tasks.


I suspect this comment was an elaborate nerdsnipe.


The author uses "only one" to clarify "one". So obviously "one" means at least one.

    There should be at least one-- preferably only one --obvious way to do it.
Kinda funny meta joke considering everybody conflates "one" and "only one" to mean the same thing. Preferably there would only be one obvious way to describe "one". :p


> I mean the zen being wrong is kind of a meme at this point. The whole “only one obvious way to do it” isn’t just false but the exact opposite is true. Python is one of the most flexible languages with many many ways to do the same thing; more than any other language I can think of.

Not in comparison to Perl, which usually has multiple ways to do anything, each 'obvious' to different sets of people (each Perl codebase therefore seems to have a distinct dialect based on which 'obvious' alternatives are chosen).

The other direction languages can take that is being contrasted, is there being one non-obvious way to do something.

Python's 'most obvious way' isn't necessarily the fastest/most concise/most efficient/scalable/etc. way to do something in Python, but it will usually be obvious to most Python developers. And although broad styles have certainly developed over time (imperative, functional, OO) as Python has gained power and flexibility, the dictum still largely holds true.


10 years ago I'd have agreed with you. But Perl has gone a long way in pulling back from some of that insanity while Python has been giving C++ a run for it's money in terms of features.


I'd totally agree - there's been a burst of sort of the perl style stuff (:= ?) to gain relatively small wins.

ie, instead of

for line in lines: print(line)

we are supposed to be using

while line := f.readline(): print(line)

I've not been super impressed with this type of thing.

That said, string formatting is better with f strings.

They also rolled back some the forced breakage from trying to force unicode with 3 which made a big difference. 3.3 added back u''

Lots of good cleanups lstrip vs removeprefix etc.

Underscores in numeric literals (10000000 vs 10_000_000)

So lots of good stuff still landing.


> ie, instead of

> for line in lines: print(line)

> we are supposed to be using

> while line := f.readline(): print(line)

No, we’re not. Walrus, in loops, IME, is more for replacing this pattern:

  while True:
    myvar = get_it()
    if not ok(myvar):
      break
    # code that uses myvar
with this pattern:

  while ok(myvar := get-it()):
    # code that uses myvar


False. I have been harshly attacked here on HN for suggesting things like for line in lines - literally been called "stupid".

I'm not the only one who looked at the recommended examples of the use case here and went, huh?

https://news.ycombinator.com/item?id=17450890

Recommended new way:

  if any(len(longline := line) >= 100 for line in lines):
     print("Extremely long line:", longline)
Old way:

    for line in lines:
        if len(line) >= 100:
            print("Extremely long line:", line)
            break

I prefer the old way. These were examples in the PEP!

In your example get_it() might be better as a generator or iterable. A lot of code looks great if you push that type of thing down a bit, and sometimes memory is helped as well. Then you iterate over it, for values in get_it. This keeps python very natural. You start to get a lot of weird line noise type code with := vs the old python style which while a bit longer was basically psudo-code.


I still don't see the need for things like the walrus-operator.

All it does is increase line-noise, and for what? So we don't have to write 2 short lines, or save an indentation level somewhere?

There is a good reason why assignments in Golang are not expressions, even though they are in C, and the language is otherwise deliberately close to the mindset of C; The added convenience makes the code much harder to read.

Sure;

    char c;
    while((c = getch()) != EOF) {
        // do something
    }
requires less lines than;

    char c;
    while(1) {
        c = getch();
        if (c == EOF)
            break;
        // do something
    }
but it's also easier to read, because each line carries less information. That's what people call "line noise".

IMO, := is a step in the wrong direction, and sadly I see python take more and more of these, going from the deliberately simple and clear language to something that's becoming needlessly hard to read by piling on things it doesn't even need.


Except exit.

I knew Python wasn't for me in my first foray into it when I fired its REPL and then went to exit it with control-C or whatever and it literally printed out the right way to do it but then didn't do it. Python was more interested in having me do things a certain way even when it knew what I intended to do, just to be a twit.


Ctrl-c raises a KeyboardInterrupt error, which is useful for programs to catch. If you type

   >>> exit
   Use exit() or Ctrl-D (i.e. EOF) to exit
You will get that error response. The goal of this is to have the REPL language the exact same as the scripting language. exit() is supposed to be called as a function to make the language more consistent, so just typing `exit` will do nothing


> which is useful for programs to catch.

Useful would be, if the default handler for SIGINT would not raise an exception, but have a useful default like eg. terminating the program. Go handles SIGINT this way by default.

If I want an exception, I can just tell the program:

    import signal
    signal.signal(signal.SIGINT, throwException())
The way it is now, the exception bubbles up to runtime, and if it isn't handled (eg. in the REPL) the program crashes, or worse, hangs if there are other threads of execution running:

    import threading
    import time
    def sleepN():
        for i in range(20):
            time.sleep(1)
    threading.Thread(target=sleepN).start()
    time.sleep(20)
Press c-C here, and the thread will still run, because the bubbled up Excp only kills the main thread. This is a real footgun in applications which rely on SIGINT being a termination signal, and have long running threads.


The REPL prints the value of a variable that you type in. exit is a variable, and so the REPL prints its value. If you want to run it as a function, you can do that, and indeed its string value is a message telling you to do that.

    $ python3
    Python 3.9.2 (default, Feb 28 2021, 17:03:44)
    [GCC 10.2.1 20210110] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> exit
    Use exit() or Ctrl-D (i.e. EOF) to exit
    >>> exit.eof
    'Ctrl-D (i.e. EOF)'
    >>> exit.name
    'exit'
    >>> exit = 42
    >>> exit
    42
    >>> exit()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'int' object is not callable
    >>>
I would have special-cased exit, though.


ipython has 'exit' without the parentheses.


It was a meme when Zen was written, the spaces around the em dash are handled 3 different ways. Twice in the line you abbreviated, removing the joke.


Python finally ended up following Perl's TMTOWTDI motto! https://en.wikipedia.org/wiki/There%27s_more_than_one_way_to...


the zen of python was written in the 90s.

from that context it makes sense, because the only goal of python in the 1990s was to be more popular than perl, which was notorious in having many ways of doing the same thing.

but yeah, python had had significant feature creep over the years, it's nowhere near the small clear lang it used to be.


And still no expressive switch/case statement, breaking out of loops and ending scripts early (for explorative programming).


>no expressive switch/case statement

match/case (not a drop in switch statement)

>breaking out of loops

  break
>ending scripts early (for explorative programming)

  exit() or sys.exit()


I think by breaking out of loops they meant breaking out of nested loops.


Exit or sys exit kills the kernel, so for explorative programming in like spyder it is not that useful.


Are you looking for breakpoint()?


> And still no expressive switch/case statement

There's match/case in 3.10 - https://www.python.org/dev/peps/pep-0636/


Matplotlib is an example of a library with at least two "correct" ways of plotting


But only one of them is recommended - the one that makes less sense.


How is working with figure and axes objects the one that makes less sense?

Is it really that crazy do set up a figure, axes on that figure, and plot on the axes, returning an artist object for each plotting command?


Yes, it is crazy. I guess this isn't really the place for it but ... From the official docs:

    The Figure is the final image that may contain 1 or more Axes.

    The Axes represent an individual plot (don't confuse this with the word "axis", which refers to the x/y axis of a plot).
This is infuriatingly bad and I firmly believe that it makes sense only to people who already know how it works. There's an image, axes (this word alone is a crime), plot, figure... it's like they took a bunch of synonyms and arranged them randomly to put together an API.


>axes (this word alone is a crime),

why so? you prefer something like axiis?


See, that's the thing:

> Axes object is the region of the image with the data space.

In matplotlib axes is not the plural of axis. It has its own meaning specific to the API. And at the same time it's the plural form of another word (axis) which is also relevant in this context and it sounds almost identical when pronounced.


I like the wording in the MATLAB docs (since Matlab committed the original sin, the axes/axis/figure API has been around since the late '80s, matplotlib is just a port to python):

https://www.mathworks.com/help/matlab/ref/axes.html

https://www.mathworks.com/help/matlab/ref/axis.html

https://www.mathworks.com/help/matlab/ref/figure.html

So they emphasize the cartesianess of the axes.


I dunno. One sets global values everywhere, then collects them all into a plot. The other creates a bunch of apparently disconnected objects, sets a bunch of different attributes on each one, and then gets the plot from one of those objects.

If I was designing something like it, I wouldn't recommend either. The global one has many fewer WTFs per character, but the objects one looks like it works in a multithreaded program or that you can create more than one plot without displaying them (but I've never tested this).


one is more or less based on matlab's plotting procedures, the other is an attempt at a cogent implementation of a OOP implementation. However, the OOP paradigm just doesn't seem very good for plotting.

Personally, I like plotting in R way better than in python. It has a lot better developer UX.


Which two ways?



It's sort of like the Unix Philosophy. It sounds good and is probably a good thing to strive for generally, but it's ultimately pointless when it comes to actually evaluating whether approach A is better than approach B.


> Complex is better than complicated

What? Something being complex is artificial, we try to avoid it. Problems can be complicated, we try to simplify them, and more complicated the problem is, we tend to develop more complex solutions. So comparing them does not make sense?

Or did I always know them wrong?


Complex: consisting of many different and connected parts.

Complicated: consisting of many interconnecting parts or elements; intricate.

Nothing specifically artificial about either one. Software that is well decomposed is Complex (made of many smaller connected parts). Software that is is poorly decomposed is Complicated (made of many smaller interconnected parts).

Connected vs interconnected?

Interconnected: connected at multiple points or levels (aka spaghetti code)


Complicated: this mutha is hard all by itself

Complex: we took all of these simple steps, lumped them together, now we have this


Yeah that was what I was trying to say!


It's not particularly well-worded. A lot of dictionaries list complex/complicated as synonyms.

I always took it to mean 'complex' as in having many connected parts, and 'complicated' more as in over-complicated or convoluted - the opposite of 'simple'. In other words, breaking something complicated into a system of intentionally-designed pieces is probably better than a chunk of opaque code to brute-force the current case. A good system is probably also 'simpler', despite having more pieces and interconnects.


I first encountered the notion of complex/complicated in Antifragile I believe, and IIRC it's based on the [Cynefin framework](https://en.wikipedia.org/wiki/Cynefin_framework).

My understanding is that: * Complex domains lend themselves to experimentation and emergent behavior. * Complicated domains lend themselves to analysis, expertise, and rule following.

The Wikipedia article offers the domains as containing "unknown unknowns" and "known unknowns" respectively.

I'm trying to think how this maps to Python -- the language is complicated, while the problems we're solving are expected to be complex? Or, maybe, the language lives at the boundary between complicated and complex. We push complicated procedures into the language, and let the programmers deal with complex issues?


Hmmm, it sounds like you're expecting "two" and "three" to be separate list elements because of some sort of implicit behavior due to being written in a list context. This is the opposite of what "Explicit is better than implicit" means.

This is a list and you must explicitly place a comma when you want to start a new element in the list. Is there ever a time a new element follows a previous one and is NOT separated by a comma? No, this is explicit.

Whereas, strings also always concatenate in this manner be it in a list context or not. It seems like you're assuming behaviors from other languages would be the same in another.


No, we don't want it to implicitly be a list item. We want it to fail as invalid syntax. If I wanted the two and three strings to be combined, I would have /explicitly/ used an operator for that. It's the implicit behavior of that which is the problem.


Not to mention the implicit string concatenation that you get instead.


Ah yes, why would anyone expect lists' main purpose to be listing?

Sarcasm aside, I'd assume people primarily list things in between [ and ], and sometimes concatenate things in there too. The language should err on the side of doing what people expect, unless explicitly told not to.

> It seems like you're assuming behaviors from other languages would be the same in another.

Rather, I think people expect a language, especially one this big and important, to work for them, and not to be designed with unergonomic features instead.


> it sounds like you're expecting "two" and "three" to be separate list elements

I'd expect that to be an error.


Funny enough, in dynamic languages i expect it to do something unexpected and unwanted.

This is why i like Go/Rust. I detest the implicit warts of these languages.


It's not related to being dynamic or not, it's a syntactical choice: that's also the way to concatenate string literals in C.


Well, there are dynamic languages and dynamic languages. There are Python and Ruby and there are Elixir, Erlang and Lisps.


I'm not a python programmer, but the implicit string concatenation seems surprising to me.


It's idiomatic in C.


I'm not a python programmer either, but I would be seriously annoyed at implicit anything instead of syntax error


Your sarcasm is misplaced. I would prefer a SyntaxError to either of the implicit behaviours.


I could see lisp programmers missing the commas out of muscle memory


> Is there ever a time a new element follows a previous one and is NOT separated by a comma?

Yes:

  [ "one, two", "three" ]
The comma is not an absolute context-free indicator of element separation.


This is not what implicit is about.


Implicit concatenation sure seems implicit to me


Implicit things are rarely nice in code for production environments. It makes bug tracing and security much more complicated


This is indeed the point. Some use cases are amazing and increase quality while others are just pure evil.


A lot of people are criticising dynamic typing for this.

It doesn't seem to have anything to do with typing discipline.

    words = (
        'yes',
        'correct',
        'affirmative'
        'agreed',
     )
Would be a tuple (immutable list) of strings, while

    words = (
        'yes',
        'correct',
        'affirmative',
        'agreed',
     )
would also be a tuple of strings.

If haskell had for some reason decided to have the same syntax sugar, it also would have caused an issue.


You got me for a second there.


I am a bit in shock. Accidental string concatenation. Python just lost a lot of reputation in my brain.


Misspelling a variable on the lhs of an assignment just causes a new variable to be created with the new name. That's a lot worse in my book.


I dont think that's the same kind of thing. Your example is a tradeoff that anyone who uses a language that doesn't require explicit variable declaration faces, and it's pretty tough to argue such languages really shouldn't exist.

Missing an operator resulting in explicit behavior is much more subtle and not even obvious behavior. For those who use python, it is worse.


> ...it's pretty tough to argue such languages really shouldn't exist.

"Shouldn't exist" is too strong.

Dynamic languages that let you create a new variable via assignment shouldn't be used to create non-trivial software. How about that?

Scripting languages have a place. That place is 100% in creating quick-and-dirty scripts and tools. Or in doing some kind of one-off data transform (as is common in machine learning scenarios). Anything that has a life span of two weeks or less, or a code length of fewer than a hundred lines? Yeah, script languages rock for that.

Explicit/static typing adds vastly more value to large projects than the cost of the overhead. The fact that you can't really gain that value in Python means that Python should be relegated to quick and dirty scripts.

Same for JavaScript, Ruby, and other completely dynamic languages.

You'll note that all of these languages are getting types one way or another, meaning that there are a lot of people who do recognize their value. Though TypeScript is years ahead of the rest in the completeness and sophisticated of its type system; bugs like the comma bug detailed by OP, along with simply every JavaScript "wat" bug, simply can't happen in TypeScript in strict mode. And static types enables entire other categories of bugs to be detectable via a linter as well.


I've been building non-trivial software in dynamic languages for twenty years. They work great.

I'd take a project in a dynamic language with a decent test suite over a project without tests in a statically typed language any day of the week.


But a dynamic language needs all the tests a compiled language needs AND type/syntax tests (that are handled by the compiler in a static language).

There are reasons dynamic language (or specifically Python), but I haven't heard one explanation how it helps writing fewer tests.


> I'd take a project in a dynamic language with a decent test suite over a project without tests in a statically typed language any day of the week.

I'd take the opposite. I've read too many useless tests in python codebases that can be accomplished by a static type checker. "Decent" does a lot of heavy lifting in your comment. And what about a dynamically typed codebase without any tests? I'm sure they exist.

I'd rather dive into a big ball of mud with a compiler that will help point me to my mistakes before I release them, than having to sift through a ball of mud trying to find that mistake with production services flailing.

That all being said I've worked with both types of languages in successful projects. But I prefer the development experience of the statically typed variety.


  it's pretty tough to argue such languages really shouldn't exist
Well, I agree with OP so that is at least two people. I really don't see it as a good trade.


Explicit variable declaration is just adding a keyword (such as var or let) when you're declaring a new variable instead of modifying one.

The cognitive burden of having to memorize and look for which variables are new vs which are being modified is simply not worth it in my opinion, even for a scripting language. Maybe for esolangs, simple math or first time learning programming.

In any case, it's a short coming of the language (IMO) but not a deal breaker. We learn to live with it.


I'd say unexpected behavior is always worse than expected one.

Yes, you'll certainly find somebody who doesn't know what 'not statically typed' means, but ... And yes, there are also C(++) users, that expect strings to be concatenated like that.


You seem to also not know what "not statically typed" means. It certainly does not mean "not properly scoped".


Yes, of course. But you see that no scope keywords exist in Python. But there exists `+` to concatenate strings (too).


I left python around the times of 2/3 drama, are nonlocal and global not there anymore?


Keywords like namespace, no; but functions and classes and modules provide for a lot of scoping opportunities.


The problem is `fop` should be `foo`:

    foo = 5
    fop = 6
Keywords like `let` solve this problem:

   let foo = 5
   fop = 6 # error


Not entirely:

  let foo = a();
  let foo = b(foo);
  let fop = c(foo);
  let foo = d(foo);
(Which is valid, e.g., in Rust.)


I hate variables shadowing, I'm very surprised that they allowed it in Rust, I saw an unpected behaviour caused by variable shadowing in C++ (global variable hidden by member variable) just last week..


You do get a warning, though. And most Rust projects I've seen usually adhere to 0 warnings.


A good IDE has many other safety nets for that error.

Auto completion, highlight matching variable, gray out unread variables and warning of unused assignment.

I’ve written lots of python and can’t recall ever having this issue. More likely is a logic typo of two similar variables like length_x vs length_y, where a “let” wouldn’t have saved you anyway if both are already defined.

JavaScript, pre strict TS, on the other hand, where missing var implied global was a real motherload of bugs. Or kotlins “val” vs “var” changing semantics completely…wow. But those are different concepts from basic definition I know.


Or := for declaration like Go and Toit


Yes, or another symbol instead of `=` for assignment, like `<-` (F#)


While I agree, this is somehow something I expect. Implicit string concatenation without operator or function around it sounds just like a terrible idea. It breaks the basic syntax concept of `foo X bar`. On the other hand it is probably very handy with DSLs and things like that.


Not so much DSLs, it's probably something as banal and ancient as

  usage = (
    'usage: foo [options] filenames...\n'
    '  -f force concatenation\n'
    '  -c for convenience\n'
  )
  print usage
Edit: forgot to add parentheses


Isn't that common for all/most languages that don't require explicit typing?


This is a scoping rule, not typing. Scoping is a mechanism of symbol resolution, i.e. what do you mean by `foo` at line N. Is it a local, an argument, a global, or addresses a symbol defined in an enclosing scope? Most languages use explicit local definitions, searching implicit ones in outer scopes bottom-up, ending at the global scope. Python was the first popular non-basic language which made implicit assignments to be local and shadowing and function-scoped:

  global x = 1
  def setx():
    if True:
      x = 2 # completely different x
    print x # prints 2, visible outside of `if`
  setx()
  print x # prints 1
This led to a funny keyword 'nonlocal', because you can't simply ignore scoping and pretend that you're BASIC in any serious program.

(To my opinion, python had a good start, but lost in the woods for no clear reason. It's a movie mutant of a language, which tried to appeal to non-programmers and somehow succeed, and then realized that non-programmers eventually become ones, and it's not hard. Now it's too late to fix this mess. End of opinion.)


JavaScript (strict mode) doesn't have explicit typing, but it still requires variables to be declared.


Same for Perl.


Uh? Perl optionally requires you to declare variables, which is a good idea IMHO, no noise for small script and any experimented Perl programmer will have learned that 'use strict' is a really good idea for big scripts..


It would be impossible in any language that requires either explicit typing or some kind of 'let' keyword. (Or, in the fringe case, a language like Go which uses a different operator for initialisation-plus-assignment.)


Exactly. That's why I asked about languages that don't require explicit typing. My point is that it's a feature of many languages rather than a Python idiosyncrasy.


Declaration and explicit typing are logically orthogonal, but few if any languages require typing but not declaration. Lots require declaration but not typing.


It is common tonall languages that have the same syntax for definitions and mutation.

In Scheme, for example, this is not an issue.


That’s a complaint against the entire type system, nothing to do with misspelling.


It has nothing to do with the type system? It's an issue with implicit declaration. You could very easily require explicit declaration while retaining the selfsame type system.


Huh, you're right. It would be bizarre to see something like this in Python though. I've never even thought of it as being implicit declaration.


I was going to comment something like "who would even use this?" and then I remembered that I have in fact used that feature :) It's a somewhat "nice" way to write long strings and keep the code from getting too wide. I never did it inside an array, but I found breaking up a long string into smaller ones and wrapping them in parens without a comma was convenient, for things like error messages.

But that's just what comes with a hyper flexible language like python. You can do lots of things in lots of different ways, but you can also screw things up just as easily, and your IDE won't tell you because technically it's valid code.


I completely get that. That is a very nice feature for building DSL or libraries with special needs. But it makes the overall language very dangerous.

Is this "operator" overloadable on each type in Python?

And that scares me a lot. I think I have to reevaluate my position towards Python.


It's not really an operator. It's part of the syntax of string literals. "foo" "bar" is an alternative way of writing the string literal "foobar". If foo is not a string literal, foo "bar" is invalid syntax.


Okay... So it is not a implicit operator. That is good. Some small reputation points are regained.

Thanks.


Why not just use plusses? Or perhaps a join func, which would accomplish the same.

I get the use case as you described it, but it just seems like minimal effort to accomplish and have some semblance of explicit/safety.


or if that's the use case, require the whitespace to include a \n or \r\n... It's not like python doesn't have significant whitespace already.


That wouldn't fix most of the cases highighted by the tool in the article.

So strange that Python has completely different syntax from C, but they chose to copy this obscure syntactic feature _even though they have the plus operator on strings_.


Heh. I use it all the time the way you do and didn't realize this is alien to many developers (no one in my team every complained about it).

It's common in some languages and used the way you use it. I looked in PEP8 and it seems they don't discuss this.

I think it's a perfectly valid use case, but clearly there are two camps to this. If this is so contentious, I would recommend PEP8 be revised to either explicitly endorse it as a way to split long lines or to explicitly discourage it and recommend the + operator instead.


You could have the same behavior by enforcing + operation in between

  mylongstring = "hello" +
    "world"
No idea if python's way of indentations allows this but sounds like it should


No, it doesn't:

    mylongstring = ("hello" +
       "world")
or, without `+`

      mylongstring = ("hello"
       "world")


Use \

    mylongstring = "hello " \
      "world " \
         "my " \
     "name " \
    "is"*


The use of \ is discouraged in Python. From PEP8:

> The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation.


See I knew Python just wants to be more like lisp.


Not sure if it's irony or not. After all, this is not really accidental string concatenation but an easy to make type error which can go undetected due to the dynamic typing (and the lack of thorough type annotation in most code).

The string concatenation in itself should not be a problem as it's really just string constants. (But again, it might be irony exactly because of this :) )


Unfortunately no irony.

I come from a programming platform (C#) where productivity is a key element of language design. I highly doubt that Anders Heijlsberg would have accepted such a error prone concept like a literal free implicit operator on a key type like strings.


Well, I guess it's true for most language that productivity is intended to be a key element of design. (For python, definitely. But I also remember James Gosling saying this about Java.) This implicit concatenation seems to come (inherited?) from C.

I kind of remembered that some languages do support it for braking strings into multiple lines conveniently. I'm a bit surprised that it works even on line (I've never used it, because why would have I), but you'll likely to make the mistake on multiline statements anyway. I've also checked and it doesn't work in java (which I kind of remembered, though I mostly do python these days).


> for braking strings into multiple lines conveniently

What is inconvenient about just adding a + at the end or beginning of the line?


In most languages an array with 3 elements has the same type as an array with 2 elements so the type system isn't going to warn you about the difference between

("foo" "bar", "baz")

and

("foo", "bar", "baz")


They still tend to differentiate between 2- and 3 element tuples (but I agree that the implicit concatenation is problematic).


Fair enough. I was only thinking about the str vs tuple case. So when you have 2 elements in the parenthesis.


C/C++ has the exact same thing, no?


I really like the idea of automated code review tools that point out unusual or suspicious solutions and code patterns. Kind of like an advanced linter that looks deeper into the code structure. With emerging AI tools like Github Copilot, it seems like the inevitable future. Programming is very pattern-oriented and even though these kinds of tools might not necessarily be able to point out architectural flaws in a codebase, there might be lots of low-hanging fruits in this area and opportunities to add automated value.


Consider that you may be describing a compiler. Typos are not generally a problem in statically typed languages with notable exceptions such as dictionary key lookups etc.

Even without static typing, argument length verification etc. can be done with a suitable compiler. In python we are left chasing 100% code coverage in unit tests as it's the only way to be certain that the code doesn't include a silly mistake.


I think 100% code coverage is folly. Spreading tests so widely near-inevitably means they're also going to be thin. In any codebase I'm working on, I would focus my attention on testing functions which are either (a) crucially important or (b) significantly complex (and I mean real complexity, not just the cyclomatic complexity of the control flow inside the function itself).


Fully agree, but I never want to see a missed function argument programming error in customer facing code. In python you really do need code coverage to achieve this goal - static languages have some additional flexibility.


Or a rich suite of linters religiously applied. Never save a file with red lines in flymake or the equivalent. Ed: actually, I am unsure if my current suite would miss required parameters. I tend to have defaults for all but the first parameter or two, so not a big issue for me I guess. I do like a compile time check on stuff tho, one of the reasons I am doing more and more tools in Go.


I actually recently joined a startup working on this problem!

One of our products is a universal linter, which wraps the standard open-source tools available for different ecosystems, simplifies the setup/installation process for all of them, and a bunch of other usability things (suppressing existing issues so that you can introduce new linters with minimal pain, CI integration, and more): you can read more about it at http://trunk.io/products/check or try out the VSCode extension[0] :)

[0] https://marketplace.visualstudio.com/items?itemName=Trunk.io


cool product :) it is just linting or do any of the tools do code transformation to offer the fix for the lint failure? (code review doctor also offers the fix if you add the github PR integration)


If a linter provides autofix suggestions, we will propagate it all the way back to the user!


This is basically linting, i.e. code analysis. The techniques used might be more current (as they have been evolving, as you say, for pattern matching) but linting is just that: a code review tool to find usual bugs. (This is what did happen in this blog post. It wasn't looking for unusual solutions but usual mistakes.) The packaging, form of the feedback seems also different and that in itself may make a lot of difference in ease of use and thus adoption.


Admittedly, the difference here is that codereview.doctor spent time tuning a custom lint on a variety of repos. In an org with a sufficiently large monorepo (or enough repos, but I don't really know how the tooling scales there) it's possible to justify spending time doing that, but for most companies it's one of those "one day we'll get around to it" issues.


yeah something like sonarqube or https://codereview.doctor (if you use GitHub)


Or people could just write it correctly in the first place! Controversial I know! Seems like people would rather half-ass things and then let some AI autocorrect fix it up for whatever reason rather than doing it properly.


As a comparison, in Ruby

  puts "a" "b" == "ab" # true
and

  puts "a"
    "b" == "ab"
prints "a" with "b" == "ab" evaluated to false and discarded. This could create bugs as with Python. However

  ["a"
     "b"] == ["ab"]
is syntax error at the beginning of the second line. The parser expects a ] It would evaluate to true if it were on one line.


In Ruby one too many commas can also cause problems:

# list

list = "a","b",

# function

def foobar

end

=> ["a", "b", :foobar]


I actually prefer Python approach here in that within () [] {} newlines are simply whitespace with no special meaning - this allows for very flexible formatting of expressions which is still unambiguous.

The implicit concat of string literals is the culprit here. It really should require "+".


Ironic to see this today. I spent an hour debugging this very same issue this morning.

I was just doing some simple refactoring, changing a hard coded sting into a parameterized list of f-strings that’s filtered and joined back into a string.

I’m glad that I had unit tests that caught the problem! I couldn’t figure out why it was breaking, that comma is very devilish to spot with the naked eye. I’m surprised my linters didn’t catch it either. Maybe time to revisit them.


I like this. It's clearly meant as marketing for their product, but imo the best kind of marketing. They don't just run their tool and automatically make tickets, but check for false positive and (offer to) make pr's.

It's both good for those projects and for the company that does the marketing since they reach there exact target group. Plus it gets them on the front page of HN.


A great addition to prune a ton of false-positives is to check the length of the strings. Almost always, the intentional implicit concats will have a very long string that reaches the max line length, whereas the accidental ones are almost always very short strings.


Nice! Internally we have a PCRE support on our code search and I regularly run a regex to find and fix these. I've also found a ton on opensource project which I've been trying to fix:

https://github.com/YosysHQ/prjtrellis/pull/176

https://github.com/UWQuickstep/quickstep/pull/9

https://github.com/tensorflow/tensorflow/pull/51578

https://github.com/mono/mono/pull/21197

https://github.com/llvm/llvm-project/pull/335

https://github.com/PyCQA/baron/pull/156

https://github.com/dagwieers/pygments/pull/1

https://github.com/zhuyifei1999/guppy3/pull/12

https://github.com/pyusb/pyusb/pull/277

https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull...

It is indeed a very common mistake in Python, and can be very hard to debug. It bit me once and wasted a whole day for me, so I've been finding/fixing them ever since trying to save others the same pain I went through.

EDIT: I will point out that I've found this error in other non-Python code too, such as c++ (see the 2nd PR for example).

Here's the regex for anyone curious:

[([{]\s*\n?(\s*['"](\w)+['"],\n)+(\s*['"]\w+['"]\n)(\s*['"]\w+['"],\n)*


Just to be clear, the V8 "bug" was in the test runner code and caused mis-parsing of command line options for testing for non-SSE hardware. Not exactly a critical bug.


The way the bug arrived in that test runner is interesting. It sneaked in mid-review. Possibly bugs added in the middles of code reviews are more likely to get through.

https://chromium-review.googlesource.com/c/v8/v8/+/2629465/3...

Personally, I prefer uniform lists with leading commas, because it's easier to add and remove lines for later, inevitable refactoring. For example, I prefer:

  things = [
    'foo'
  , 'bar'
  , 'baz'
  ]
This drives some people crazy, but I think it's the One True Way.


Isn't

  things = [
    'foo',
    'bar',
    'baz',
  ]
even better? In your case, if you want to add something to the beginning of the list you'll have to modify two lines.


Depending on the context, yes. But sometimes you are not allowed the last comma.

ETA: Let me expand on why it's important to put the comma first. Which list is more clear to you:

    a
  , dog
  , weather
  , banana
  , b
  , car
or

  a,
  dog,
  weather,
  banana,
  b,
  car
With the leading commas, they all line up, and you can see them in a neat little row. I really prefer it especially in contexts where the trailing comma is not permitted, such as a SQL query:

  SELECT
    name
  , date
  , operation
  FROM
    stuff


The whole "666" thing really threw me off. I thought it was some Python specific term or something at first glance. They open with a sentence that mentions "5% of the 666 Python open source GitHub repositories" as though there were only 666 total open source Python GH repos. Picking a number with other fun connotations or whatever to use as a sample is fine, but without setting that context, it was kind of distracting from their main content.


Did you figure out what the context is, and if you did, would you mind spelling it out for me? I still haven't figured out what correction to make to that sentence to get it to make sense.


in a blog post about the evils of typos there was a typo! classic https://en.wikipedia.org/wiki/Muphry%27s_law ;)


Also this classic:

> Apple I was the first product ever announced by the company in 1976. The computer was put on sale for $666.66 at the time.

https://9to5mac.com/2021/11/25/steve-woz-signs-rare-1976-app...


They ran their static analyzer over a sample of GH repos. They chose 666 as the number for their sample size. That's all.


It's further evidence that the Illuminati intentionally put these typo bugs there to destabilize the global order.


tl;dr: Python concatenates space separated strings, so ['foo' 'bar'] becomes ['foobar'], leading to silent bugs due to typos.

I've been bitten by this one at work, and can't help but think it is an insane behaviour, given that ['foo' + 'bar'] explicitly concatenates the strings, and ['foo', 'bar'] is the much more common desired result.

edit: This also applies to un-separated strings, so ['foo''bar'] also becomes ['foobar']


I assume it's based on the C behavior, where it can be handy with macros

I don't think it fits well in python


Maybe. We must remember that Python was designed at the very end of the 80s so what was normal for developers back then could be unexpected nowadays. An example: the self in Python's OO is a C pointer to struct of data and function pointers. It should be perfectly clear to anybody writing OO code in plain C at the time (rising hand.) Five years later new OO languages (Java, Ruby) kept self inside the classes but hide it in method definitions.


But Python 3 was designed in the 2000s and had many breaking changes. Seems like they could have changed this behavior with that version.


I assumed it was borrowed from shell, where everything can just be put next to eachother since it’s all text.


It's a holdover from C, where implicit string literal concatenation is very useful in the preprocessor.


I luckily never accidently used this space-concatenation thing, but I've been bitten by the fact a=(1) doesn't create 1-element tuple multiple times in my early days learning Python.


I still don't understand why it doesn't! So I still get bit from time to time.


If a person decides to add parentheses to some booleans or arithmetic,

    (4 + 5) * (8 + 2)

    (this and that) or (theother)
These elements should not become 1-tuples after the interior contents are evaluated. I sometimes add parentheses even around single variables just for visual clarity.

Also, this allows you to do dot-access on int / float literals, if you want to

    # doesn't work
    4.to_bytes(8, 'little')

    # works
    (4).to_bytes(8, 'little')


In principle, a 1-tuple shouldn't even be a thing - any single value is a 1-tuple by itself already. However, in a dynamically typed language, this approach complicates things elsewhere - e.g. if you have a value / 1-tuple that is a list, you'd expect iteration over it to give you list elements, not the single element that is a list. But if you have a value that is a tuple of unknown size, you don't want to special-case iteration for when that size is 1.


It depends what you mean by tuple. In Python, tuples are basically just immutable lists. Just as lists with 1 element are useful, so are tuples with 1 element. You might be dealing with a tuple of unknown length, where the length could be 1. In other contexts, the word "tuple" often carries the connotation of "having a known fixed length", in which case the notion of a 1-tuple as distinct from the value itself is less useful.


Presumably because parantheses don't really have anything to do with tuples, it's commas that do. Parantheses are there to help the parser group things in case of ambiguity, and to support expressions spanning multiple lines.


Since you typed it twice, I don't think it's a typo. It's parentheses not parantheses.


Thank you! I guess spelling from my native language is creeping over to English on occasion :)


This seems like not a big deal. It’s a common mistake and is in 5% of repos but it’s not causing major damage.

And there’s no evaluation of importance as to whether these instances are in test files or non-critical code. Packages are big and can have hundreds or thousands of files.

It could be that if these mattered, they would have been detected and fixed.

A good example for unit tests and perhaps checking to see if these bugs are covered or not covered.

I like these kinds of analyses but don’t like the presented like it’s some significant failure.


5% of 'released' software is quite a lot, more importantly it's a class of errors that definitely should not exist. This is a 'bug' in the language effectively there just isn't any real upside.

Python has a few of these things, which is really sad.


It's a class of error that would be caught by even the most basic testing. A better title for the article is that 5% of 666 Python repos have typos that demonstrate the code in them that is completely untested. It doesn't matter which language it is: untested code is untested code in any language.


The errors were usually in tests themselves. Are you arguing that tests need their own tests to test that they are testing the right thing? Usually I think people believe that tests do not need to be tested and should not be tested, i.e., that you measure "100% coverage" against non-test code alone.


I don't think anyone could disagree: you could never exceed 0% code coverage if your definition was recursive (i.e. included tests, tests-of-tests, tests-of-tests-of-tests, ...).


Only if you generate infinite tests, then your coverage approaches 0%. But 100% covered code + 0% covered tests = ~50% total coverage.

Also, the obvious solution is self-testing code. (Jokes aside, structures like code contracts attempt something like this).


unfortunately like 10% of the bugs were in the tests themselves. e.g., the sentry one https://codereviewdoctor.medium.com/5-of-666-python-repos-ha...

the tests are only as good as the code they're written with, and as good as the code review process they were merged under.


One of the habits I have when writing kernel code is to intentionally break code in the kernel to verify that my test is checking what I think it's checking. That's because of a lesson I learned a long, long time ago after someone reviewed my code and caught a problem: when your code has security implications, you need to make sure the boundary conditions that your tests are supposed to cover actually get tested. Having implemented a number of syscalls exposted to untrusted userland over the years, this habit has saved my bacon several times and avoided CVEs.


I believe that, whenever possible, tests should be written in a different language that the one used for the code under test (even better, in a dedicated, mostly declarative, testing language).

It avoids replicating the same category of errors in both the test and the code under test, especially when some calculation or some sub-tests generation is made in the test.


"It's a class of error that would be caught by even the most basic testing. "

You could say that about anything and everything in software.

It's not acceptable that testing needs to be run for something the language should 100% accommodate.

The whole point of the language is to provide algorithmic clarity and avoid these things.

This isn't really an issue of 'trade offs' is just a bad feature of the language that should have been remedied more than a decade ago.

The lack of proper declaration of variables is even more absurd, there's only downside to that.


I checked those those 11 links to issues for major software. 10 bugs were actually in tests...


This is understandable since many of those projects are not written in python. So the python code in them is only in incidental scripts like test harnesses. If V8 was written in python then performance would probably not be very good.


I do not see this from a verification perspective ... But also from a productivity perspective.


9 out of 10, actually; the Tensorflow links are the same link.


There were proposals to fix some of these but the unicode zeal beat out some of the more boring (but I'd say as important) cleanups.


yeah the impact varies. the sentry one seems pretty big: https://codereviewdoctor.medium.com/5-of-666-python-repos-ha...

test did not work but did not fail either, imagine being that dev maintaining the code that the test professes to cover. Imagine being the user relying on the feature that test was meant to check (if the feature under test actually broke).


I mean, if you’re ultimately going to combine the list into a string anyway it’s no big deal.

Along those lines. I wonder how many of these come from ad-hoc file path handling instead of using pathlib.


The first one, the implicit concatenation, I can see. But the rest of the things seem like most of the time they're intentional.

    {
        'key': (
            'long string long string long string'
        )
    }
Using parentheses like this to put long strings on their own line is standard practice.

    title = 'Hello world',
I, for one, have often used this deliberately.


I often use split().

Instead of:

  s = ['a', 'b', 'c']
I'll type:

  s = 'a b c'.split()
For multiline lists where I want to get rid of leading whitespace I'll add lstrip():

  lines = """line 1
             line 2
             line 3
  """.split('\n')
  lines = [line.lstrip() for line in lines]


Heh. "cromulent" again.

..."there are perfectly cromulent reasons a developer would do implicit string concatenation spanning multiple lines"...

https://www.merriam-webster.com/words-at-play/what-does-crom...


Seems expected, as linters can't be sure when it's not intentional. Like this request to pylint:

https://github.com/PyCQA/pylint/issues/1589

Is there usually enough context for a linter to make an educated guess?


I would have thought it would be a no-brainer to just ban it and insist on an explicit + operator. I'm pretty surprised that issue was so flippantly closed.


> I would have thought it would be a no-brainer to just ban it and insist on an explicit + operator.

Maybe as a matter of linting. As a matter of language design, I think + for string concatenation is a big mistake; using different symbols for numeric addition and string concatenation is something Perl got right.


Yes, I meant as a matter of linting. I can understand the arguments being different for the language as a whole, particularly when legacy compatibility is a consideration.

But my impression using pylint is that its default settings are wildly opinionated, hence the surprise that this wouldn't have fallen under that umbrella.


The PR has been merged (for lists and tuples and sets only).

https://github.com/PyCQA/pylint/pull/1655


can do a good job at allowing long urls for example, but would be whack a mole trying to cater for "all" purposeful implicit string concatenations


Splitting long URLs onto multiple lines because you have a hard line length limit is considerably more harmful than exceeding the length limit in such cases, because you break the URL up so that tooling (including language-unaware static analysers) can’t conveniently access it. (e.g. if you want to open the link, you can’t just copy it or click on it or whatever, but must first join the lines, removing the quotation marks.) Any tool that forcibly splits up such lines when there is no fundamental hard technical reason why it must is, I categorically state, a bad tool.


Interesting. I've hit this bug before, but not often in Python as far as I can remember. I guess if I need a huge list of something, I'm more likely to look to a dict than use a list with normal indexes.


I wonder how many of those 666 have syntax bugs which are _difficult_ to locate using code analysis tools, because they are legit in themselves and you need to know what the author meant to make the call.


When I used to write code, especially SQL statements I would:

    "put"
  , "Commas"
  , "first"
  , "to"
avoid these kinds of things.


Clearly bugs by programmers who don’t adhere to the Oxford comma.


Alternative title: 5% of Python repos has inadequate test coverage.


Most of the errors were in the tests themselves.


Haha the bug in Tensorflow is in "tensorflow/tensorflow/python/keras/engine/training_generator_test.py". clickbait.


And then people make fun of JavaScript! (Just joking, I like Python, also JS, I guess everything has its quirks, it's a good thing we have linters)


Ironically there are a variety of typos in the article.

A paragraph is repeated and the markdown links at the end are broken because there is a space between ] and (.


I'm sure the devil is in the details on this bug.


v8 may be a repo that includes some Python, but there is no reasonable standard by which it is a “Python repo”.


I wonder if any of the found issues will turn out to be important issues.


nice ad!


Why 666?


It's a biblical number. No deeper meaning.


Python was never supposed to be a language for anything more complex than basic scripting and prototyping. Use proper languages with static typing (and better speed) for anything serious. And no, JavaScript isn't a good language either.


Yea, but TDD! :eyeroll:


I can see the value of a lint (if there's a newline without a comma, warn), but concatenating strings by multiplication is the correct thing to do (since it's also used this way in mathematics of parsers).

Using the plus operator to concatenate strings is just weird.

Think of the usual algebraic properties these operators are supposed to have.

"+" always is supposed to be commutative--so "a"+"b" = "b"+"a", if those mean alternatives (they usually do mean that in mathematics), is just fine.

On the other hand, multiplication is often not commutative--also not here. "a" "b" != "b" "a".

So string concatenation should be the latter. And indeed that's how it's in regular expression mathematics for example.


Juxtaposition is not multiplication in this context - you can't write (2 3), for example, it has to be (2 * 3).

Furthermore, Python already uses * for strings to indicate repetition: ("foo" * 2 == "foofoo").

String concatenation really just needs its own separate operator. & is an obvious candidate, if only it wasn't so commonly appropriated for bitwise AND - which is a very poor use of a single-char operator as it's not something that you need often, especially in a language like Python.

On the other hand, D uses binary ~ for concatenation. That has a neat mnemonic: it's a "rope" that "ties strings together".


& is also used for set intersection in Python. I think + for string concatenation isn't too bad, really. It fits in with the fact that length(s + t) = length(s) + length(t), the same way we write A × B for Cartesian product (since |A × B| = |A| × |B|, even though this operation is neither commutative nor associative) or B^A for a function space (since |B^A| = |B|^|A|).


Which invertible commutative string operation would you choose for + ?

This might be nice from a math point of view, but I think users are going to be confused using "string"^3 for repetitions (instead of "string"*3). + and * make too much sense to the unwashed masses.

At any rate, explicit is better than implicit.


There is no reason you couldn't use str * str for concatenation and str * integer (or even string * real) repetition.

Well, except if you wanted to support user classes that could duck type as both strings and numbers, which it would make awkward.


Not in Lisp! ("foo" "bar") and ("foobar") are lists of length 2 and 1, respectively.

(Python copies some bad ideas from C. Another one is having to import everything you use. It seems that since Python is written in C, its designer took it for granted that there will be something analogous to #include for using libraries, even standard ones that come with the language.)

Implicit string literal catenation is tempting to implement because it solves problems like:

   printf("long %s string"
          "nicely breaks up"
          "with indentation and all",
          arg, arg, ...)
and if you're working in a language which has comma separation everywhere, you can get away with it easily.

There are other ways to solve it. In TXR Lisp, I allow string literals to go across multiple lines with a backslash newline sequence. All contiguous unescaped whitespace adjacent to the backslash is eaten:

  This is the TXR Lisp interactive listener of TXR 273.
  Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
  TXR needs money, so even abnormal exits now go through the gift shop.
  1> "abcd \
      efg"
  "abcdefg"
If you want a significant space, you can backslash escape it; the exact placement is up to you:

  2> "abcd\ \
      efg"
  "abcd efg"
  3> "abcd    \
     \ efg"
  "abcd efg"
  4> "abcd    \ \
               efg"
  "abcd     efg"
  5> "abcd    \ \
     \         efg"
  "abcd              efg"


I like imports, it tells me what files symbols are coming from, even for built in libraries.

Maybe it is that through my work I use a half dozen languages, where it is hard to remember each in detail.

I have also worked on a javascript project where there were no imports/requires and the build process created one file. So you had to inspect the confusing build script to even know what was what.


I like the explicit nature of Python's imports.

And especially how I can choose the best way to indicate the sources of names in my code:

   import time
   t = time.perf_counter()

   import time, my_module
   t1 = time.perf_counter()
   t2 = my_module.perf_counter()

   from time import perf_counter as std_counter
   from my_module import perf_counter as my_counter
   t1 = std_counter()
   t2 = my_counter()

   try:
       from my_module import perf_counter
   except ImportError:
       # Fall back to standard implementation
       from time import perf_counter
   t = perf_counter()

   # import time as m  
   import my_module as m
   t = m.perf_counter()


> import perf_counter as my_counter

Yikes; you're renaming/aliasing global identifiers! Just no.


You could fairly easily work with a bunch of .js files that get catenated together by using an editor that can jump to a definition.

Build processes creating one file is the seven decade norm in computing.

Even if you literally don't catenate the .js files into one, they get loaded into one running image one way or another.


You mean

  long %s stringnicely breaks upwith indentation and all"
? In my experience, this always gets ugly when you want to insert spaces (= about always). Do you put them at the end or at the start of each string (apart from the first or last string)

I think scala’s mkString (https://superruzafa.github.io/visual-scala-reference/mkStrin...) is the best solution, visually, for such things, but unfortunately, it would require hackers in the parser to do the concatenation at compile time, where possible.

Scala’s multiline strings look nice, too, if you want to insert newlines, except for the stripMargin thing (https://docs.scala-lang.org/overviews/scala-book/two-notes-a...)


The spaces aren't the point of the comment; rather that we can break the literal into pieces and indent those pieces without affecting the contents. In a non-strawman real exmaple with real data, of course we include all the necessary spaces in the literals. However, this bug is easy to make in C; I've seen it numerous times.


That’s preciseLy my point. This looks nice, but it’s too easy to forget tone of those spaces and to hard to spot that.


I don't know of a good design that won't lead you to make errors when you don't want the spaces. You'd need some piece of syntax which indicates whether you want a space there or not. For instance, there could be a rule that a string literal ending in non-whitespace cannot joined with a literal starting with non-whitespace:

  "foo" "bar"     // error
  "foo " "bar"    // OK
  "foo" " bar"    // OK
  "foo" "" "bar"  // OK: "" doesn't start with non-whitespace, since it's empty
  "foo" " " "bar" // OK
The nice thing about this is that it's perfectly comatible with existing C.

All we have to do is to implement a compiler warning which detects when the rule is violated.

Users who implement it have to fix situations like "foo" "bar" into "foo" "" "bar".

Probably the rules should be smarter. Some kind of tokenization concept could be at play so that gluing together two letters or digits is bad, or two punctuation tokens, but letter/number and punctuation is okay.

  "foo" "1"     // error? OK?
  "foo" "bar"   // error
  "foo" ".bar"  // OK
  "1." "2" ".3" // OK
 
  "1." ".2"     // error: punct-punct


> Another one is having to import everything you use.

The alternative is what exactly? Have the entire standard library exposed at once? Make all modules create non-conflicting names for exported objects, so that the json parse function has to be called json_parse and the csv parse function has to be called csv_parse?

Seems less than ideal to me.


That's one way.

If these things are classes in a plain old single-dispatch oop system, you can havec a json-parser and csv-parser which have parse methods.

There could be packages/namespaces. So csv:parse and json:parse. These packages are standard and so they just exist; nothing to import.

In Python, you cannot use anything without an import! The top-level modules (which serve as de facto namespaces) themselves are not visible.

Say there is a csv module with a parse. You cannot just do:

  csv.parse(...)
you have to first say

  import csv
This jaw-droppingly moronic.


It lets you debug. E.g. if they have made a file called cvs.py in the same directory, then print (cvs.__file__) will show you this. If they have some weirdly screwed up paths with multiple pythons installed and multiple copies of the modules etc., same.

I will not Go lang has the same feature carried forward from C. It helps a lot in the reading code side of the code lifecycle. And Go compiler makes you keep the imports up to date, which is good.


> It lets you debug.

It lets you debug Python problems which the system created in the first place.

> If they have some weirdly screwed up paths with multiple pythons installed and multiple copies of the modules etc., same.

Doesn't happen in a sane language. Or, even not a sanely defined language/implementation.

I can easily have multiple different GCC copies (possibly for different processor targets) on the same machine. Each one knows where its own files are; an #include <stdio.h> compiled with your /path/to/arm-linux-eabi-gcc will positively not use your /usr/include/stdio.h, unless you explicitly do stupid things, like -I/usr/include on the command line.


You might like [Hissp's][1] import system. It does compile down to Python.

[1]: https://github.com/gilch/hissp


> This jaw-droppingly moronic.

It can be slightly inconvenient but doesn’t feel moronic to me. It means that except for the built-in functions, everything can be traced to either a definition or an import. Makes tracking code much easier.


Why not import the built-in functions too? The only thing not requiring import can be import.

  from python import def  # now you can def
That should be even easier to track things; now you don't have to deal with the difficulty of def not being defined anywhere in your code. It's traced to an import, which is telling you that def comes from python, liberating you from having to know that and remember it.


"def" is not a function. It's not even an identifier.


This exercise requires you to imagine a somewhat different Python in which you can (and must) do from python import def if you are to use def.


I mean, that would be Lisp, and in that context I wouldn't really have a problem with it.


   @"
  here strings in PS are fine for this purpose and 
   even allows whitespace anywhere            
    but because of the latter you can't indent it    
     with your other code   
 "@ -split "`r`n" | % {'<SOL>{0}<EOL>' -f $_ }
 <SOL>    here strings in PS are fine for this purpose and <EOL>
 <SOL>     even allows whitespace anywhere            <EOL>
 <SOL>      but because of the latter you can't indent it    <EOL>
 <SOL>       with your other code   <EOL>


I posted a Unix StackExchange answer with some tricks for doing this in shell programming, very similar to your <SOL> trick.

https://unix.stackexchange.com/questions/76481/cant-indent-h...


Having everything be imported is what makes the language be useable. Especially if you never import * you can easily find the definition and meaning of everything you read on the screen. A prime example of explicit is better than implicit.

And backslash doesn’t let you have the literal obey the proper indenting. Might as well use “””


> you can easily find the definition and meaning of everything you read on the screen

I don't want to be finding definitions of things that the language provides in the code.

Languages that don't work this way have IDE's, editor plug-ins or other tools for easily finding the definitions of things that are in the language, without hunting for them through intermediate definition steps in the same file.

"I've spent all my life in and out of jails, so I expect bars on doors and windows ..."


I'm gonna disagree on the import thing. Compared to Ruby where requires are magic bags of metaprogramming bullshit, Python is much much easier to reason about. It takes some getting used to that require 'json' actually adds methods to existing classes.


"require 'json'" is just another #include in disguise, and if it monkey patches existing classes, it ... probably should not exist in any form.

If the language supports json, it should just do that.

  1> #J[1,2,3]
  #(1.0 2.0 3.0)
  2> (get-json "[1,2,3,{\"foo\":true}]")
  #(1.0 2.0 3.0 #H(() ("foo" t)))
  3> (put-json #(1.0 2.0 t))
  [1,2,true]t


Welcome to Ruby.

    $ irb
    irb(main):001:0> { hello: "world" }.to_json
    NoMethodError (undefined method `to_json' for {:hello=>"world"}:Hash)

    irb(main):002:0> require 'json'
    irb(main):003:0> { hello: "world" }.to_json
    => "{\"hello\":\"world\"}"


I mean, I understand that classes which are open to extension with new methods is useful, and the right way to do OOP and all.

If it was CLOS with multiple dispatch, it would be easier to swallow. Because it would look like:

   (to-json { hello: "world" })
   ;; error: no such function!
Then load the module, and you have a generic to-json function now, with a method specialized to handle the dictionary object and all. (I still wouldn't want to be doing this if it's supposed to be a language built-in).

I regard the ability to add new methods to a class as good, but with a valid use case, like extending some third party piece with new methods in your own application. And the fact of not having to declare methods in a class definition, which is cumbersome. Just write a new method in that class's file, at the bottom, and there it is.

I ideally don't want that third-party piece itself to be divided into three pieces that I have to separately load to get all of the methods. Or worse, pieces from separate third parties that add methods to each other.

I copied a thing or two from Ruby in TXR Lisp. The object system as a derived hook, and that was inspired by something in Ruby:

  1> (defstruct foo ()
       (:function derived (super sub) (prinl `derived @super @sub`)))
  #<struct-type foo>
  2> (defstruct bar foo)
  "derived #<struct-type foo> #<struct-type bar>"
  #<struct-type bar>
  3> (defstruct xyzzy bar)
  "derived #<struct-type bar> #<struct-type xyzzy>"
  #<struct-type xyzzy>
The derived hook is inherited (like any other static slot), so it fires in bar also. The function can distinguish which class is being derived by the super argument.


The difference is: in C, it's pretty unlikely someone wants to add strings. I suppose it's even illegal in the later C versions.


It is positively not illegal in any standard verision of C since ANSI C 89.

It's an essential feature used in all sorts of everyday code.

C99 added printf conversion specifiers that are hidden behind macros, and idomatic usage of them relies on string catenation.

  uint32_t x = 0;

  printf("x = " PRIx32 "\n", x);
where PRIx32 might expand to "%lx" (if uint32_t is the same as unsigned long in that compiler).

All sorts of C macrology relies on string catenation. Kernel print messages:

  printk(KERN_EMERG "%s: temperature sensor indicates fire!", dev->name);
                   ^ must not have comma here


Interesting. Arguably tho this shows how C is aging. I find that PRIx32 a bit ugly.

Although I just had a (logging) use case in go where I missed cpp macros - wanted the log statement to get something from the file and just had to pass it in as another parameter.


I have also never used PRI-anything. It's a crime against readability.

If I have a uint32_t which needs printing I cast it to (unsigned long) and use %lu or %lx. This requires more typing in the argument list, but keeps the format string tidy. It's important for the format string to be tidy, because that's the reason of its existence: to clearly and concisely convey the shape of what is being printed.


I know that. I meant that “abc” + “def” is most likely illegal (although “abc” + ‘d’ is not).


> I meant that “abc” + “def” is most likely illegal

That would be adding 2 pointers, and that's indeed illegal.

However, you can subtract them: “abc” - “def” . Now, the result is not a pointer any more, it's a ptrdiff_t (an integer type), so most compilers will warn if you try to assign that to a char *.


You started talking about "adding strings" in a thread about adjacent literals, without mentioning any + operator..

String catenation ("adding") by adjacency (no visible operator) is a thing; "add" doesn't imply that we are talking about a + operator:

  $ awk 'BEGIN { x = "abc-" 2 + 2 "-def"; print x}'
  abc-4-def


Because the parent compared Python's behavior to that of C. The difference of course is that adding strings doesn't make sense in C, so there's no danger of misinterpreting "abc" "def" in C, as there is in Python.


The same comma typo bug could happen in C.

  execl("/bin/sh", "/bin/sh", "-c"
        "echo foo", (char *) NULL);
Here we get one "-cecho foo" argument passed to the shell instead of two, so it can't work.

Initializers are another example:

  char *strArray[] = {
    "how", "now"
    "brown", "cow"
  };
In non-variadic function and macro calls, you will most likely get an insufficient arguments error, unless another mistake compensates for that.


The Python certainly looks nicer though.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: