Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Descent to C (2013) (greenend.org.uk)
156 points by x-sp on Sept 3, 2021 | hide | past | favorite | 91 comments


> (In fact, that's what array[i] means – the language defines it to be a synonym for *(array+i).)

To really drive home the primitiveness of C arrays, should probably also mention that, because addition is commutative, you could also write

     i[array]
and somewhat surprisingly, it will compile, and work, and it means "*(i + array)" which is equivalent to "*(array + i)"

But nobody really does that, because that would be kind of insane.


Once the point is home, you can drive it a little bit more by exploiting the fact that string literals can be converted to pointers to the first characters, and do

    putc(2["ABCDEF"], stdout);
This prints 'C'.


int const* const x; // C

int const& x; // C++

A reference is functionally equivalent to a const pointer. (Reference reassignment is disallowed. Likewise, you cannot reassign a const pointer. A const pointer is meant to keep its pointee [address].) The difference between them is that C++ const references also allow non-lvalue arguments (temporaries).

It is much easier to read from right to left when decoding types. Look for yourself:

- double (* const convert_to_deg)(double const x) // const pointer to function taking a const double and returning double

- int const (* ptr_to_arr)[42]; // pointer to array of 42 const ints

- int const * arr_of_ptrs[42]; // array of 42 pointers to const ints

- int fun_returning_array_of_ints()[42];

Try it out yourself: https://cdecl.org/

Hence, I am an "East conster". (Many people are "West consters" though.)

You can return function pointers:

typedef struct player_t player_t; // let it be opaque ;)

int game_strategy1(player_t const * const p)

{

    /* Eliminate player */

    return 666;
}

int game_strategy2(player_t const * const p)

{

    /* Follow player */
    
    return 007;
}

int (* const game_strategy(int const strategy_to_use))(player_t const * const p)

{

    if (strategy_to_use == 0)
        return &game_strategy1;

    return &game_strategy2;
}

Functional programming = immutable (const) values + pure functions (no side effects).

Consting for me is also a form of documentation/specification.

"East const" for life! :)


10 or 42?


Thank you. 42. I edited my comment above.


If array were, say, uint32_t* then what `*(array + i)` would do is actually `(intptr_t)array + i * 4` and not `(intptr_t)array + i`. If array were uint16_t* then it's `(intptr_t)array + i * 2`. In short the way the pointer arithmetic gets translated greatly depends on the type of the pointee and thus is not as primitive as it can be.


I think it depends on the coder if it makes more sense to them to write out the scale by element size or not. For me, the `+` is just pointer arithmetic and of course an addition in number of elements (so I don't think of the scaling at all).

Just saying that "actually it's just array + i" makes more sense - for me(!).


This is actually a really good way to drive home that arrays don't really "exists" in C and are just syntaxic sugar for pointer arithmetic.


They exist in the sense that their length is sometimes known if the full definition is visible. sizeof(array) isn't the same as sizeof(generic ptr to array).


This is not entirely a complete view. For example, sizeof() operator returns the declared size of an array. It seems that C recognizes the concept of an array.


"Because real programmers only need void*".


There's a language for that https://github.com/kyouko-taiga/void-lang


This sub-thread is better than a Vim thread. :)


The OP probably doesn't want people to run away and never look back


Interesting. Looks like Forth. :-)


Speaking of such funny business: unfortunately I have seen *(array + i) quite a few times.

To make it worse, it’s in a parser for external data in binary format, where you really shouldn’t be playing funny tricks.


> nobody really does that, because that would be kind of insane

Yeah. That's why people don't do things in C. It's more like most C programmers probably weren't aware of this. After your comment, we'll start to see C codebases everywhere with that.


Most programmers probably weren't aware of this, but no true Scottish C programmer would be unaware of it.

If you want to manipulate memory directly - which is risky though sometimes useful - C is one of the best languages in which to do it. Memory addresses are numbers, and C will let you work with those numbers in whatever way you want: add, subtract, multiply, divide... and if you didn't shudder at the suggestion of dividing a pointer because there are very, very few reasons to do so then C is not the language for you!

If you don't want to manipulate memory directly, you probably shouldn't be using C; stick with a nice garbage-collected, type-safe, object-oriented, cross-platform language. If you do want to manipulate memory directly, but you want more guarantees on what you can do with pointers, try Rust.


Just wanted to clarify that Rust allows for the same manipulation as C, it just all has to happen in the context of an unsafe code block. I think your comment might be taken to imply that Rust isn’t as capable as C in that regard.


Plenty of languages allow for the same memory handling capabilities, including BASIC.


Eh? Says who? Every C programmer I've ever met knows about this. It's basic C.


Yeah you have to do something like this if you want to truly raise eyebrows.

    /\
    */ best c comment
    *\
    /


One trick that I like is replacing { and } with <% and %>.


Wow, a new C trick I didn't know about. What's the history here?

Also, gross.


On some ancient systems { and } don't exist. So an alternarive made of more common characters is provided. There is also a two-character combination for [ and ] and a three character combination for |, & and some other characters. The feature is called digraphs/trigraphs and is disabled on most modern compilers by default.


> The feature is called digraphs/trigraphs and is disabled on most modern compilers by default.

Digraphs and trigraphs are treated differently. <% and %> are fine on GGC and Clang, but ??< and ??> are disabled by default. Both need the -trigraphs option.


Ah okay I knew about that whole thing, I didn't think C had so many of them.


I learned of this in college. Most C programmers know how C arrays work.


This “trick” has been know at least as far back as 2008: https://stackoverflow.com/questions/381542/with-arrays-why-i...


It's actually spelled out in K&R in more technical language:

"The array subscripting operation is defined so that E1[E2] is identical to *(E1+E2). Therefore, despite its asymmetrical appearance, subscripting is a commutative operation."

(my emphasis)


One of the very first IOCCC winners used the trick in 1984 (1984/anonymous):

    int i;main(){for(;i["]<i;++i){--i;}"];read('-'-'-',i+++"hell\
    o, world!\n",'/'/'/'));}read(j,i,p){write(j/p+p,i---j,i/i);}


I think this is equivalent to:

    int i;
    int main(){
        for(
            ;
            i["]<i;++i){--i;}"]; // loop until i == 14? (the NUL byte on the end)
            read('-'-'-',i+++"hello, world!\n",'/'/'/') // read(0, <one byte of "hello, world!\n" at a time>, 1)
        ) {};
    }

    int read(j,i,p){
         write(j/p+p,i---j,i/i); // write(0/1+1, i-- - 0, 1) --> write(1, i, 1) --> write a byte to STDOUT
    }


what does this program do? i compiled it and ran it and got no output. i hope i haven't fork bombed myself or something


Prints "hello, world!", assuming sizeof(int) == sizeof(char*) (and the same alignments and ABI, etc).


It's not inherent in C being a low-level language that it has such a painful memory model. It's a consequence of having to fit the compiler into a really tiny machine by modern standards. Originally, there were no function prototypes.

I once proposed a backwards-compatible way out for C.[1] Basically, you get to talk about arrays as objects with a length, even if that length is in some other variable. And you get slices and references. It was discussed enough to establish that it could work, but the effort to push it forward wasn't worth it.

Slices are important. They let you do most of the things people do with pointer arithmetic. Once you have size and slices, you can still operate near the memory address level.

[1] http://www.animats.com/papers/languages/safearraysforc43.pdf


It's similar to what I proposed in 2009:

https://www.digitalmars.com/articles/C-biggest-mistake.html

except mine is much simpler :-)

The proven utility and effectiveness of this is apparent in D.


By now it should be clear that WG14 doesn't care about improving C's safety.

A type like yours could be added, just like a string library like SDS, but it will never happen.


Hey! The author wrote PuTTY!

That was a very informative read. I feel like I am the exact target audience: I'm coming from a programing background steeped in C# and I'm learning C from the K&R book.


There is also the excellent “Modern C” book:

https://modernc.gforge.inria.fr

The page has a link to a free PDF version.


Never thought pointers could expire, but it makes sense.


That's a fairly common trap for new C coders. Setting a pointer to something on the stack and then returning that pointer when you exit the function. It's extra insidious because it will appear to be working fine until you call another function and then try to dereference the pointer. Then your data suddenly becomes corrupt even though your program was nowhere near it.


Unless it points to a static local variable.


This is (sort of) the idea behind Rust's lifetimes. (Rust focuses more on scope, whereas pointer expiry is more about “object lifetimes”, but it's basically the same thing.)


I don't think that's true, it's just that the lifetime of a stack object is limited by its scope. Clearly when it goes out of scope its lifetime needs to end. You can end that lifetime prematurely, for example if you drop(x) then the lifetime of x ends immediately [ in fact the implementation of drop is entirely empty, but it takes the actual x itself as a parameter, not a reference to it, and so when the empty function exits the parameter's lifetime ends and x goes away ] even though it hasn't gone out of scope in your function where you created x.

I believe once upon a time Rust needed a lot more hand-holding rather than just inferring lifetimes from scope, but in modern Rust the behaviour feels pretty natural for scopes.


Scope and lifetime are two different things, but they're related in the case of automatic (local) variables.

Scope is the region of program text in which an identifier is visible.

Lifetime is the duration during program execution in which an object (stored in memory) exists.

If you define a local variable, the scope of its identifier extends from its declaration to the end of the enclosing block; its lifetime is the execution of the enclosing block.

If you allocate an object on the heap by calling `malloc()`, its lifetime ends when its deallocated by calling `free()`.

An object defined at file scope or with the `static` keyword has a lifetime that's the entire execution of the program.

In either case, if you have a pointer to the object, that pointer becomes invalid when the object it points to reaches the end of its lifetime. (And C doesn't make it particularly difficult to cause problems by trying to access an object that no longer exists.)


Rust used to use lexical scope, but now uses scope based on the control flow graph.


Some past threads:

The Descent to C - https://news.ycombinator.com/item?id=15445059 - Oct 2017 (2 comments)

The Descent to C - https://news.ycombinator.com/item?id=8127499 - Aug 2014 (15 comments)

The descent to C - https://news.ycombinator.com/item?id=7134798 - Jan 2014 (230 comments)


> So why is C like this, anyway?

Worth mentioning that C is over 40 years old, and was designed to be easily portable across a range of machines that had less compute power and memory than today's smaller microcontrollers.

As a result, a lot of things were left undefined, or were designed in a way to be easy to implement rather than easy to program for.

There existed other programming languages that were better, but their compilers weren't as broadly available, and their better features came at the cost of speed, which at the time was a premium.


> left undefined, or were designed in a way to be easy to implement rather than easy to program for.

I'd tweak your statement a little, or even a lot: "left undefined" most often meant "left to be defined by the compiler writers to fit the architecture of the underlying hardware, in a way that would make it easy to program to beneficially exploit features of the architecture"; and (yes) in a way "that would not be very portable, and might even be subject to change between compilers".

did the underlying machine use 2's complement? did the underlying machine have addressable bytes? big endian? 8, 16, 32 or 36 bits?

These are all things you need to know to write tight efficient code in the days of slow clockspeeds and limited RAM. C let you do that without using assembly, but by using the "undefined" features of the language, because they were clearly defined locally and were features that were very important to be easy to write code for.

consider how you would implement setjump and longjump, or even printf, or efficiently unpack or serialize bits for a communications protocol, without these supposedly "undefined features", or how you would write those if those features were actually undefined. People who put strlen(str) or a divide and a mod in the control expression for a loop would know better if they understood a bit more about the undefined features.

this is in contrast btw with some other things that actually are undefined, such as what the order of evaluation would be for complex expressions making up argument lists, etc.

I'm writing this explanation not so much to explain these technical details to noobs, but rather to get the people who understand this stuff to stop throwing around the term "undefined" with regard to C because they are cooperating in the evisceration of some ideas that are really worth exploring or understanding more deeply.


Sorry, you're absolutely right, and I was lazy in my comment.

There is indeed a big difference between "undefined" behavior and "implementation-defined" behavior.

For example, there's a lot of spooky "undefined behavior" around dereferencing pointers. In one famous case [1], dereferencing a pointer actually led the compiler to skip a later check on whether the pointer was NULL, because if the pointer was already dereferenced then it must have been valid.

Another classic "implementation-defined" detail is what is the size of a "char"? Nowadays we can readily assume it's 8 bits, but that wasn't so guaranteed when C was written!

[1] https://lwn.net/Articles/342330/


16-bit chars still persist in DSP land. It is a tribute to C's flexibility that it can easily be implemented for such architectures.


that's interesting, I've done DSP on paper (and microcode, i'm old school), but never on such an architecture.

and it's interesting to think, "well, that's useful, let's make it a defined behavior!" but then that defined behavior could be a nightmare of inefficiency on a machine with a different architecture. (i mean, probably not for the particular case of 16 bit chars, but in the general case quite unpredictable)


Not really related to C, but I once worked on an SoC where the DSP was mixed 16/24-bit, so they mapped each word to the ARM 32-bit bus. That is, each 4-byte address would map to one 16-byte (or 24-byte) word of the DSP's memory & register space.

But under the DSP was an I2C-based ISP that was glued in, and _its_ 8-bit words were mapped to each of the DSP's 16-bit (/24-bit) words.

So if the ARM wanted to talk to the ISP's control space, it read one 8-bit word of the ISP in each 32-bit word address.

The cherry-on-top was that the DSP was Big-Endian and the ARM was Little-Endian, and so you had to swap all the data you were exchanging with it.

Still the biggest HW WTF I've encountered in my career. This was the STMicroelectronics Nomadik 8810/8815/8820 chips. https://en.wikipedia.org/wiki/Nomadik


Languages like Ada and Modula-2 have such issues much better defined on their standards, but don't come with something like UNIX as extra package.


C was created to port UNIX from PDP-7 Assembly into the newly acquired PDP-11, for the V5 port, and was for many years only available on UNIX.

Just those other languages were only available on their hosted OS.

The symbolic price for the license, availability of source tapes and the UNIX V6 annotated source code book made the rest.


Some nitpicks with the article:

1. Much of section 2 is wrong except the part about arrays representing contiguous objects. The rest is largely an implementation detail.

Zeta C and Vacietis work considerably differently, as allowed by the standard

In addition there are many (mostly obsolete now) architectures in which, when you convert a pointer to an integer, you can't perform arithmetic and convert back because a pointer isn't just an integer address; it could represent segments or support hardware tags.

> C will typically let you just construct any pointer value you like by casting an integer to a pointer type, or by taking an existing pointer to one type and casting it so that it becomes a pointer to an entirely different type.

To be fair, they do say "typically" in here, but these behaviors are (depending on the case) all either implementation defined or undefined; the C standard specifies a union as the only well-defined way to type-pun to non character types.

> The undefined-behaviour problem with integer overflow wouldn't happen in machine code; that's a consequence of C needing to run fast on lots of very different kinds of computer, which is a problem machine code doesn't even try to solve

Some architectures trap on integer overflow, which I suspect is the reason why integer overflow is undefined rather than implementation defined. Certainly compilers today take advantage of the fact that it is undefined to make certain useful optimizations, but from what I can tell of the history that's not why it was undefined in the first place.


To be clear, signed integer overflow is undefined. Unsigned integer overflow is well defined.

This is why some C programmers dictate that all code must use signed integers to avoid unexpected bugs, but many others (including myself) disagree that's a good way of going about it since, as you said, it's not guaranteed to trap or do anything to help the programmer.


I've never seen signed-integers being required. They tend to introduce unexpected bugs rather than prevent it. Of course you can accidentally end up with signed integers when you didn't intend it. Here's my favorite accidental undefined signed integer overflow, assuming 32-bit integer size (the 64-bit version is similar)

  uint32_t foo(uint8_t x) { return x << 24; }
Yes, this is signed integer overflow, since x gets upgraded to a signed integer before the shift, if an integer is 32-bits in size, this can result in undefined behavior if the top bit of X is set. Fortunately I've never seen a compiler optimize this to stupidity.


If memory serves, the LLVM Project prescribes signed integers be used in all cases except where unsigned is mandatory.


I use C because I think that it is good (some improvements could be made, but the other programming languages which try to make C better tend to make many things worse in my opinion; I like many (but not all) of the features of (Digital Mars) D, though). I think that the worst feature of C is the confusing syntax for types. (Maybe the way to make a better one might be like "LLVM with macros", although there are a few problems with LLVM too.)

Some compilers and libraries, such as GNU, do have some improvements. For example, in GNU you can make zero-length arrays (which I use sometimes), ?: without anything in between (which I use often), and some other things.

They say there is no object-oriented programming in C. Well, C doesn't have object-oriented features, although you can still do some limited object-oriented stuff in cases where it is useful. For example, there is the stream object (called FILE); GNU has a fopencookie function to write your own implementation of the stream interface too, even though standard C doesn't have that.

Object-oriented programming is good for some things but too often is overused in modern programming, I think. You shouldn't need object-oriented programming for everything.

It is true that some of the undefined behaviour stuff is too confusing and perhaps should be changed; in some cases the compiler has options to control these things, such as -fwrapv (which I often use).

I like the string handling of C; you can easily skip some from the beginning, and can sometimes use the string functions with non-text data too, and it doesn't use Unicode.

It says "C lets you take the address of any variable you like", but this is not quite true. There is a "register" command which means that you cannot take the address of something.

I think that many things in C (both things that they mention and some that they don't) (including pointer arithmetic, no bound checking, untagged unions, string handling, not using Unicode, setjmp/longjmp, etc) are often advantages of C.


great article, very informative and easy to read. I wish I would've known about this when I was learning C/C++ (around the time it was published no less), coming from higher-level languages like C# and PHP.


How is C# higher-level than C? I'm not aware of anything you can do in C that you can't do in C# as a first-class feature.

Edit: I define the level of a language as the lowest-possible feature. I guess others define level as the highest-possible feature? I don't really know who's right here.


“...a high-level programming language is a programming language with strong abstraction from the details of the computer.”:

https://en.m.wikipedia.org/wiki/High-level_programming_langu...


C# has exceptions, OO classes / interfaces, `unsafe`, etc. Lots of features that make it a higher level language than pure C.


In the same way as C is a higher-level language compared to assembler.


The following is NOT a criticism of C. Just pointing out different problem domains.

> Modern high-level languages generally try to arrange that you don't need to think

> or even know – about how the memory in a computer is actually organised

Modern high-level languages try to arrange that you don't need to focus on the irrelevant. If you're working on, say, an accounting system, memory layout is not part of the problem you are trying to solve.

For certain applications, C is simply too low level.

A language is too low level when it forces you to focus on the irrelevant.

For low level operations you probably cannot beat C.


> For low level operations you probably cannot beat C.

It's not hard to beat it. For example, you cannot do vector operations in C. (Hence C compilers often offer extensions.)


If one of your requierements is performance, ressource efficiancy and responsiveness, memory layout is the opposite of irrelevant.

I can't imagine living in a 3rd world country with today's electron's catastrophy.


As per the somewhat famous blog post: C is not a low-level language. Especially on todays CPU’s I fail to see why would we consider C anything close to truly low level. It has no real way of managing cache, has absolutely zero support for vector instructions, etc. *

With these in mind, Rust is lower and higher level than C at the same time.

* other than some compiler specific pragmas, but I would be hesitant to call that natively supported


> Especially on todays CPU’s I fail to see why would we consider C anything close to truly low level.

It's effectively the lowest you can go without throwing portability out the window. It doesn't let you manage caches directly, but it gives you good control over memory layout in general, and that's often enough to give you good cache usage across a variety of chips.

If you want to go lower than that, you're probably looking for assembly.


It's a poor fit for this role, even if it is in practice what we have available today.

One of my favourite examples is volatile. Volatile is more crazy in C++ but it's pretty crazy even in C. In both cases the standard basically shrugs, "Hope you know what you're doing because we sure don't" and offers no real insight into what this feature promises to do for you. But there is no other mechanism provided for MMIO.

Think of Rust's std::ptr::read_volatile (and write_volatile). These intrinsics do the thing you actually wanted (reading, or writing, a fixed size blob of "memory" that presumably wasn't really just RAM) and thus are important for writing device drivers and so on with MMIO.

[ You may be thinking, "But I need the correct size of blob read or written or my driver won't work", Rust has generics, so these functions are generic over the integer type you're reading/ writing, if you read a u64 that's a 64-bit read, if you write four u8s that 4 x 8-bit writes and so on ]

But C's volatile is a type qualifier instead. Why? Would it mean anything to, for example, integer divide an MMIO fetch by fifteen and write it back? a/= 15; No. So then why make it a type qualifier? When volatile was added to C they'd only just invented simple optimisations like re-ordering so they had no idea this was a bad idea, and it seemed simpler than adding an intrinsic (though not by much) but today we know better.


> One of my favourite examples is volatile.

It's strange that an ancient feature that is almost practically deprecated serves as one of your favourite examples.

I'm not an expert, but volatile was originally meant to access hardware registers and I believe it did fulfill its purpose at the time. I've never reasearched the official definition of volatile, but it is widely acknowledged that there are few or even no legit uses of it today, because it is not suited to general concurrent programming. This seems like a good SO answer: https://stackoverflow.com/questions/2484980/why-is-volatile-...

If I understand right, I think you could do multithreaded programming if you restrict to doing message passing of message structures that are marked all volatile. But you can't use much else, because non-volatile data accesses can be reordered around volatile data accesses.

In short, probably just don't use volatile. Use the "modern" memory barriers, those exist in C/C++ just as they exist in Rust.

> Would it mean anything to, for example, integer divide an MMIO fetch by fifteen and write it back? a/= 15; No. So then why make it a type qualifier?

This seems a strange example to me as well. This is equivalent to (a = a / 15). I.e. a volatile read followed by a volatile write. Not defending volatile at all, but I can't see what is so wrong here.

On a different note, attaching too many semantics to types is basically the complaint that I have about most languages. Types are too static and too inexpressive to capture all but the most trivial invariants, so complicated types often require messy and theoretically unnecessary workarounds. I always say, I'd almost like to types capture only the physical shape (sizes, alignment...) of the data, because that is extremely important for optimization and also it doesn't really change. In C, that is almost true, with some exceptions (signed/unsigned, and yes, the crufty volatile)


Addendum: Think about volatile like you think about const. If you use it to declare memory that is in fact different, it totally makes sense. If you do other weird things, it breaks (just like const).

This is unlike signed/unsigned storage specifiers which are not really attributes of the memory, and which create some serious complexities that lead to weird situations. But for practical reasons there haven't been any successful attempts to get rid of them.

Again, not defending volatile at all. Just saying that it probably made sense at the time at that it is not as ill-conceived as you imply. Volatile is cruft, but certainly does not disqualify the language. Just use modern synchronization primitives instead, like any other reasonable person.


I go just as low level as C with Object Pascal, Basic, Modula-2, C++, Ada compilers.

Or by using any of the kids on the block from the last decade.


I’m fairly sure that C++, Rust and Zig are just as much portable.

Any reason for not having native support for vector instructions?


Surprise: there's a whole bunch of different reality under your the convenient illusions of your "high level" programming language.

Hardware is the reality. It's not very much like the Java or Python programming model. We shouldn't hide this from programmers.


I disagree. C is also an abstraction of reality over assembly, assembly is an abstraction over machine code, which abstracts microcode, which abstracts over logic gates, and there are both deeper and adjacent abstractions as well. Underlying it all are different mathematical abstractions from karnaugh maps to quantum physics.

A good developer will IMO select an abstraction that best matches their goals. If you’re writing a device driver then the abstraction might be assembly language. If you’re writing business logic it might be PL/pgSQL. But regardless of which abstraction you choose, you’re hiding something from someone.


In other words, hiding the complexity beneath


"The Python interpreter is written in C, for example."

https://github.com/RustPython/RustPython


THE Python interpreter is written in C. People usually refer to CPython when phrasing it like. That does not mean there are no python implementations in other languages, but they all have minuscule market share compared to CPython.


> No object orientation

I feel like C exists at a level below such concepts. Simply being able to define a function `void do_stuff(struct mystruct *obj)` opens the door to object-oriented style programming. A lot of people seem to define OOP by the presence of superficial stuff like inheritance, polymorphism etc, but really those are additional concepts that aren't useful for every program. The real difference is mutating state on the heap. So you could say C is an object-oriented language, by default, because it doesn't stop you doing this stuff, unlike a higher-level language like Clojure which simply doesn't have mutation (for the most part). Or you could say C is a functional language because if you don't explicitly pass pointers then you get copies. Really it's both and it's neither. It's whatever you want it to be.


That you can do functional or OOP in C does not make C either kind of language, it just means that C is flexible enough that you can make the computer do things the way you want it to, no matter what that means, and other languages purposefully prevent you from doing what you might want to do.

C++ is object oriented not because it has compile time support for polymorphism or any of that other bad programming practice, but because classes have code sections that live with them, whether on the stack or in the heap, that can operate only on memory belonging to that instance of the class.

Object oriented programming is a coding style and choice. Some languages make it a first class part of the language design. It is purposefully not part of C.

However you can do OOP like things in C: a popular paradigm is to pass around pointers to structs that (should) live in the heap, and to have a number of functions which work on these structs. This is very similar in practice and mental modelling to OOP as users of C++ might know it, but is distinct in that no code ever lives in the stack or heap, and no code is restricted from operating on any of the program memory.


Hmm. In what sense do you believe that class has a code section that "lives" on the stack or heap?

On a modern system you can't usually do that because of W^X rules (also on a non-x86 modern system the performance would be abysmal if you tried because why waste transistors supporting something only crazy people would want?)

So perhaps notionally in the abstract machine if I have sixteen Clowns in a C++ vector there are sixteen copies of the Clown method squirt_water_at() in the vector too, but I assure you all the compiler emits is one copy of squirt_water_at() for Clowns, to the text segment with the rest of the program code, and maybe if Clowns are virtual, a pointer to a table of such functions lives with each Clown just in case there are Jugglers and LionTamers in the vector too - although compilers can sometimes figure out a rationale for not bothering.


Regardinf W^X, doesn’t the Linux kernel has some optional expensive debug operation that can be turned on/off through a self-modifying code removing the expensive branching?


I mostly agree with you. But "or any of that other bad programming practice"? Polymorphism is not a bad programming practice. Yes, it can be misused. No, that doesn't make it bad in and of itself.


Yep, can definitely do OOP in C. Except over here in embedded land those structs don't live in the heap... but as globals (still referenced via pointers tho).

Was demonstrating the difference between inheritance and composition in OOP C to my junior dev just this week.


No, mutating stuff on the heap is not the real definition of OOP. That's the definition of "mutable" programming, which is not a term that we use a lot, but it obviously is the opposite of "immutable" programming, which is where you can't change stuff on the heap.


What is then? When you start listing out properties you can always find an OO language that doesn't support it. But all of them support mutation.


Sure, they all support mutation. So do languages that are clearly not object oriented, like, say, ALGOL-60. (You can do mutation in Haskell, too.) So "mutation" is at least somewhat orthogonal to "object oriented".

What is object oriented? It's worse than "you can find an OO language that doesn't support feature X". There are (at least) two schools of OO, and they define OO differently. There's the Alan Kay school, where objects are independent entities that send messages to each other. And then there's the C++/Java school, where objects are less independent, and they call each other's functions.

What both of those have in common, though, is the idea of an "object", which is a bundle of a data structure plus code. In general, the associated code is the only code that can modify the data in the structure. (Yeah, I know, public data. But if you do that as your normal approach, then you're not really doing OO, you just have a bunch of structs that anyone can modify.)

An OO language, then, is a language that either supports or requires OO programming. Java, for example, requires it - a function has to be a member of some class. (Yeah, I know, it could be static, and it could not operate on any of the data of the class. Java still forces you toward OO more than C++ does.)

What about something like C? It lets you create structs, but it does nothing to let you restrict access to "associated" code, nor does it give you a way to associate code with the structure. (Yeah, I know, file static.)

I keep saying "yeah, I know" as I admit the exceptions to what I'm saying. None of this has rigid, clear boundaries. Still, there is a set of things that are generally OO, and a set of languages that generally encourage and/or support programming that way.


> What both of those have in common, though, is the idea of an "object", which is a bundle of a data structure plus code. In general, the associated code is the only code that can modify the data in the structure.

But you can do that in C. The argument for C not being object oriented is that it does allow you do stuff that isn't object oriented, but you just said yourself that even so-called OO languages allow you to do that anyway.

Object-oriented programming is a thing that C lets you do. It's fine if it lets you do other things too.

You should check out Common Lisp's CLOS which is the best object-oriented framework I've ever used. It's nothing like the "bundle of data structure and code" you describe.


Well, no, C lets you do that stuff completely manually. C is not object oriented because it gives you zero help in programming in an OO style. It doesn't forbid it, but it doesn't forbid much of anything.

As opposed to C++, which gives you a bunch of tools to help you, and to Java, which forces you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: