Vim 7.3.1000

tsm · on May 21, 2013

I guess I should be thankful 98% of my programming life is in higher-level languages than C, but...am I the only one whose terrified of the magic numbers? In hex? Can anyone provide context about what's so special about 0xfb20 and 0xfb4f?

goldfeld · on May 21, 2013

All my programming life has been in higher-level languages, yet with Rust I'm being inexplicably drawn to down and dirty systems programming. It's cool to finally be able to use :make and all the C-oriented things like [i and ]i, ctags and the like. It's interesting seeing how much 'dogfooding' vim has in that tons of stuff are undoubtedly geared towards his own needs developing vim (and C applications in general.) Pretty much how I work with coding products, I work my best when I'm the use case, and great open source products like vi/vim and emacs excel in that legions of people share the creator's use case (and well, configurability right down to the bone.) I've found that a non-trivial amount of people do share the use case for the first project I've launched[1], I only hope for it to hold true for my most ambitious ones in the working.

I don't know Bram, but I can't help but wonder why such a distinguished open source figure has ultimately ended up working for Google. Maybe he enjoys the access to resources. I wish donations could still support him directly rather than his idea of aid (well it does seem to have a bend towards education at least). I wish he wouldn't need a day job and could yet breed other new and great contributions to the world. I don't feel particularly endeared towards Google, despite all the collateral good it has done while doing a lot of evil (i.e. something to do with the looming ad empire strengthening big corporations.)

[1]: https://github.com/goldfeld/vim-seek

gbog · on May 22, 2013

I use ctags with Python, PHP and I'm sure it will works with most other languages invented.

meritt · on May 21, 2013

http://www.unicode.org/versions/Unicode6.2.0/ch08.pdf page 250

Screenshot of relevant text: http://i.imgur.com/xMnHDg4.png

Basically certain Hebrew characters have descenders which can overlap with diatric marks in the next line of text. These variations are suitable replacements that alter it slightly so it won't overlap.

IvyMike · on May 21, 2013

I'd so prefer to see

  #define HEBREW_CHAR_MIN 0xfb20
  #define HEBREW_CHAR_MAX 0xfb4f

(Probably with even more descriptive names, but I don't know the problem domain.)

redcircle · on May 21, 2013

It is nice to move away from the C preprocessor:

enum { HebrewCharMin=0xfb20, HebrewCharMax=0xfb4f, };

srl · on May 22, 2013

Can you explain why this is preferable to using the preprocessor? I've always considered it a pretty fundamental principle that the "surface" interpretation of code should be as close as possible to the "real" meaning, and we're definitely not trying to represent an "enumeration". What's wrong with #define, or `const uint32_t HebrewCharMin...`?

IvyMike · on May 22, 2013

I'd argue that in good old C, #defines for pure constants are pretty well established and considered idiomatic. In other words, I wouldn't ding it in a code review. (Nor would I ding the enum solution. Either one is perfectly cromulent.) In some ways, even though we now have better alternatives in C, I kinda feel like any C tool and any C programmer had better be able to deal with #define constants. Quickly browsing the vim source, it looks like they have seem to default to using #define for constants, and "when in Rome" is a good rule of thumb to live by. :)

On the other hand, I'm really a C++ guy, and in that domain, I'd prefer your "const uint32_t" solution. It just feels more idiomatic.

Additionally and somewhat pedantically, I prefer naming conventions that make it clear when values are a constant, so I'd use "kHebrewCharMin" or something simliar.

_pmf_ · on May 22, 2013

One thing is that the compiler can warn about missing cases when using enums (GCC optionally does this); with defines, it's not clear that several values form a closed set.

The other thing is that function signatures can now show 'enum myenum e' instead of 'int v', which is much clearer.

srl · on May 22, 2013

> One thing is that the compiler can warn about missing cases when using enums (GCC optionally does this); with defines, it's not clear that several values form a closed set.

But here, the values /don't/ form a closed set, and using enums would imply that they do.

zaphoyd · on May 22, 2013

#define is a problem in general because the compiler doesn't understand as well what is happening because the substitutions take place outside of the language's type system. Extensive preprocessor use makes writing analysis tools, refactoring IDEs, and sane compiler errors much more complicated. Const in C doesn't behave similarly to #define so it isn't really a substitute.

Note: in C++, global/static const values do behave as compile time constant expressions and are an excellent tool for this purpose.

redcircle · on May 22, 2013

I don't feel that there is anything "wrong," just limiting, because the C preprocessor makes it harder to use tools that walk the abstract-syntax-tree while editing code (or to analyze code). LLVM-based tools are really nice. I don't care whether it is "const uint32_t ..." or enum, or anything else, as long as it is the C/C++/ObjC language.

Macha · on May 21, 2013

Given that it's in a regex source file, and the variable is named c, they're probably character codes for Unicode ranges.

sltkr · on May 21, 2013

Correct -- 0xfb20 through 0xfb4f are alternative glyphs for Hebrew characters. (I don't know what makes these so special that they need to be handled separately by the regexp processing code, though.)

jlgreco · on May 21, 2013

This might be a good case for unicode character literals. Code dealing with unicode can be a nightmare to work with, even if you already know what those numbers are (arguably as this bug demonstrates).

sltkr · on May 21, 2013

You realize you're talking about a codebase which still uses pre-standard parameter type declarations, right?

jlgreco · on May 21, 2013

Oh yes, Vim's code is anything but pleasant. I worked with it a decent amount several years ago when I was maintaining, for a short period of time, a patch that would add a terminal emulator to Vim windows.

tsm · on May 22, 2013

I was just searching for such a thing yesterday. Did the patch go anywhere?

qu4z-2 · on May 23, 2013

Just curious what the advantages are of this versus C-z'ing back to the shell?

EDIT: Oh, I guess that doesn't work in gvim?

tsm · on May 23, 2013

It's nice to see terminal output at the same time as you're editing. Right now I'm working on a Rails app, so it'd be great to see RSpec failures while fixing my code--currently I either alt-tab between terminals or use a second monitor.

Also, sometimes you don't want to wait for something. `bundle install` can take a good 30 seconds...I'd rather not watch that!

qu4z-2 · on May 24, 2013

Thanks for the answer. Seems reasonable. Although at that point I'd somewhat suggest tmux :)

Kurtz79 · on May 22, 2013

Well, there is a place for "magic numbers" if considered within context.

Most people here would read 0xFF0000 and automatically think "red".

But I agree that in this case a define would have helped.

willlll · on May 21, 2013

Here's to another 1000 commits for the world's best text editor. Cheers!

brianpgordon · on May 21, 2013

Sublime Text is on its 1000th commit?

bwilliams · on May 21, 2013

You spelled Vim wrong.

n3rdy · on May 22, 2013

imagine if you had said emacs instead..

paragonred · on May 22, 2013

Maybe in vintage mode

mhi · on May 21, 2013

NOTE: If you experience troubles with more complicated regular expressions in the next time, it might be because of the new engine.

:h two-engines

:h 're'

rbonvall · on May 22, 2013

Doc link for those of us that don't have a recent enough version of Vim: http://code.google.com/p/vim/source/browse/runtime/doc/patte...

jng · on May 21, 2013

Pretty ugly bug. Where did it happen? Anyway, congrats & long life!

jlgreco · on May 21, 2013

Apparently this is in the code that decomposes strings so that they can be compared (necessary with Hebrew and Arabic seemingly?).

This is an interesting snippet from that code...:

        /* decompose the character if necessary, into 'base' characters
        * because I don't care about Arabic, I will hard-code the Hebrew
        * which I *do* care about! So sue me... */
        if (c1 != c2 && (!ireg_ic || utf_fold(c1) != utf_fold(c2)))
        {
            /* decomposition necessary? */
            mb_decompose(c1, &c11, &junk, &junk);
            mb_decompose(c2, &c12, &junk, &junk);
            c1 = c11;
            c2 = c12;
            if (c11 != c12 && (!ireg_ic || utf_fold(c11) != utf_fold(c12)))
                break;
        }

Apparently string comparison is harder than I previously thought.

plorkyeran · on May 21, 2013

String comparison with Unicode is pretty astonishingly complex, partially because equality is not as well defined as it seems to be on the surface. Should e and é be equal? If you're dealing with user input from people who are unlikely to know how to type é, then they probably should, but in many cases they shouldn't. A more complex case is é and é (precomposed vs decomposed forms), which nearly always should be equal, but a simple byte comparison will say they're different.

Fortunately, there are ICU bindings for every non-toy language which solves these sorts of problems for you (although ICU has the drawback of being absolutely huge).

jlgreco · on May 21, 2013

There are a lot of things I have seen in Unicode that seem like they should not exist in the first place. MATHEMATICAL [BOLD|SANS-SERIF|DOUBLE-STRUCK|MONOSPACE] DIGIT for example... I guess those things potentially carry significant meaning in some mathematics texts though.

I guess the ICU stuff probably gives you an strtol equivalent that can handle that sort of stuff.

chris_wot · on May 22, 2013

The LibreOffice guys have told me hat ICU has security concerns and is, for all intents and purposes, no longer being developed. They are switching to another engine (hard buzz? Name escapes me).

Anyone know if this is true?

pdw · on May 22, 2013

HarfBuzz and ICU are very different things. HarfBuzz is a small library for text shaping (basically, putting font glyphs together to form words, which can be quite complex for some scripts). ICU on the other hand can do pretty much everything that's vaguely related to internationalization. It's quite possible that LO was only using it for text shaping of course.

lucian1900 · on May 22, 2013

Harfbuzz [1] is for text shaping/layout, not unicode support.

1. http://www.freedesktop.org/wiki/Software/HarfBuzz/

chris_wot · on May 22, 2013

Drat it. iPad "corrected" my spelling and I didn't notice.

AutocorrectThis · on May 22, 2013

The raw power that you feel in your hand when using VIM is amazing, I never get that feeling with Emacs, but each to their own. Not needing to reach for the arrow keys, backspace etc. is neat too.

tincholio · on May 22, 2013

I think the raw _editing_ power of Vim is, as you say, probably unmatched in Emacs. Vim, however, cannot hold a candle to all the rest of Emacs and its elisp-y goodness. I'm a long-time Emacs user, and having recognized that Vim's modal editing and "language" are a better way of editing, opted for using Evil-mode, which gives you the best of both worlds. The transition was not without pain, but it was overall quick and worthy.

pekk · on May 22, 2013

elisp is not such a great language, so the value of editing using a pile of it is mixed. It boils down to a matter of taste. If you like elisp, it's a big win and if you don't, it's a deal-breaker.

swah · on May 22, 2013

It is a great language when run inside Emacs for doing Emacs stuff. Elisp is orders of magnitude more powerful than Vimscript on Vim. Its easy to see by the best things each community has produced on their editors.

Disclaimer: I love all 3 editors for they all have great ideas.

sbrother · on May 22, 2013

You never need the arrow keys or backspace with emacs either: C-f, C-b, C-n, C-p, and C-d :)

hyperbling · on May 22, 2013

in my experience using emacs reaching for the meta key is worse than reaching for the arrow keys. at least with arrow keys you don't need to morph your hand into pretzel.

ececconi · on May 21, 2013

Best editor

lucb1e · on May 22, 2013

I'm not sure I understand the point here. Why is this on Hackernews?

lucb1e · on May 22, 2013

Seriously, do I have to get downvoted 5x (by >750 karma users) for not understanding something that is apparently obvious to everyone else? I'm genuinely not getting it.

chrismorgan · on May 22, 2013

This is the first time Vim's patch level has gone over 999; matter of fact, Bram expressed some concern about it. The patch level has hitherto been expressed as three digits (e.g. 7.3.052).

Some information is available in the "Plans for Vim 7.4" thread in the vim_dev group: https://groups.google.com/forum/?fromgroups#!topic/vim_dev/Z...

lucb1e · on May 22, 2013

Ah okay, thanks for explaining :)