All Software is Legacy

doxcf434 · on Feb 27, 2016

I recently had to maintain some new perl code. I didn't think it would be a big deal, but found a number of things I take for granted today that perl hasn't kept up with:

1) The perl cpan module doesn't resolve dependencies

2) The cpan module has parsing errors when passing in a list of CPAN packages

3) You have to manually grep your perl code to see what modules it depends on

4) Module installs take a long time since they can compile and unit test the code, unit tests can even make connections to the internet or try to access databases and fail, so you just have for force them to install

5) Non-interactive installs of CPAN modules requires digging in the docs and learning you need to set an env var to enable

6) CPAN modules aren't used that heavily and can have bugs that would be caught in wider used modules. (e.g. the AWS EC2::* modules don't page results from AWS so results sets can be incomplete, whereas the wider used boto lib works correctly and is better maintained.)

7) Perl devs don't think twice about shelling out to an external binary (that may or may not be installed)

8) Even if regexs are not needed, inevitably the perl dev will use them since that's the perl hammer, and it's hard to know what the intention is with regexes or what the source data even looks like

9) You have to manually include the DataDumper package to debug data structs

10) You have to manually enable warnings and strict check, it's not on by default.

Anyhow, I think we've made a lot of progress since the 1990s. :)

eCa · on Feb 28, 2016

A few comments:

* It is often recommended to use cpanminus[1] instead of the CPAN.pm module. But it is up to the distribution you try to install to declare it's dependencies correctly. Not doing that is a bug.

* If you use cpanminus you can use the --notest flag to skip tests. But tests are a feature.

* Software have bugs. Reporting them when they are found is how software get less bugs.

* Cpan distributions should not[2] use external binaries (and exceptions should be clearly documented and motivated).

* The ease of use of regexes in Perl is not an argument for not documenting them (and in this case) the document format they are meant to parse.

* There are several different data dumpers. No assumption on the user's preference is made.

* If you use a newer Perl (5.12+) you get strict enabled automatically[3], and also (depending on which version your code requires) some new features. Due to backwards compatibility it is not possible for newer Perls to enable strict or warnings implicitly.

The Perl of today is also vastly improved since the 1990s, hopefully you will come across some modern perl too.

[1] https://metacpan.org/pod/App::cpanminus

[2] https://www.ietf.org/rfc/rfc2119.txt

[3] https://metacpan.org/pod/release/JESSE/perl-5.12.0/pod/perl5...

doxcf434 · on Feb 28, 2016

I think the difference is in other languages I don't have to think about these things any more than I think about what IRQ my sound card is on.

In the CPAN case, if cpanminus is the "good one", then it should be installed by default and CPAN.pm needs to tell you to use that instead or just be deprecated. I don't want 5 choices in package managers, I just want the good one. :)

eCa · on Feb 28, 2016

One factor that sometimes leads to problems in this regard is (as mentioned) backwards compatibility. Pretty much nothing that once has worked can be removed or changed because somewhere mission-critical software depends on it.

Another issue is discoverability. A concrete example is that https://metacpan.org/ is a much better (imho) presentation of cpan than http://search.cpan.org/.

It is the curse of being a very stable language and ecosystem.

zzzcpan · on Feb 27, 2016

> Perl devs don't think twice about shelling out to an external binary

No, most of them do. Perl ecosystem has a killer feature called cpantesters, that allows everyone to see which modules work on which systems out of the box. You should always check cpantesters matrix before choosing a particular dependency.

> Even if regexs are not needed

They got overly complicated over the years, but they are needed. They are DSLs to make things easier when working with strings. I.e. so you wouldn't have to write 20 lines of hard to grasp code with bytes.Index(), bytes.HasSuffix(), bytes.TrimRight(), etc., like people do in Go, but a single nice regexp and therefore reduce your chances to make a mistake in that code.

giovannibajo1 · on Feb 28, 2016

> so you wouldn't have to write 20 lines of hard to grasp code with bytes.Index(), bytes.HasSuffix(), bytes.TrimRight(), etc., like people do in Go

Go has regexps, and a very good implementation at it.

Depending on what you do and on the specific code-path, compiling and/or executing a regexp might be slower than manually parsing the string. Go standard library is pretty concerned with performance (much more than Python's or Ruby's, for instance), so it tends to avoid regexps.

zzzcpan · on Feb 28, 2016

It shouldn't be like that, that's the problem. Regular expressions should be compiled into a native code and be even faster than a bunch of hand written bytes.HasSuffix() combinations.

giovannibajo1 · on Feb 28, 2016

Your previous post said that they are a very useful DSL for Perl so that "people don't have to do like they do in Go".

Both Perl and Go implement regexps, and neither or them compile them to native code. So I don't get your previous comment at all.

The main difference is that, in Perl, if you ever had to write manual string parsing, it would be much much slower than using regexps as Perl is an interpreted language. So regexps are needed to perform fast string parsing. In Go, you have regexps if you want, or you can go even faster if you feel it's required.

zzzcpan · on Feb 28, 2016

> Both Perl and Go implement regexps, and neither or them compile them to native code. So I don't get your previous comment at all.

Ok, I'll try to explain.

People feel discouraged to use regexps in Go, because they are very slow for many typical parsing and validating cases and require extra step of compilation and all of the additional code complexity associated with that. So, people do parsing manually instead, with all of its problems. It's not that they need that performance, almost no one does, but the whole idea behind regular expressions is not working, parsing code is still bad most of the time.

sgift · on Feb 28, 2016

You've made me curious: Is there a language out there which does this, i.e. compiles Regex down to native code which is then as fast/faster than hand-coded bytes.hasSuffix(..) calls?

nitrogen · on Feb 28, 2016

I found this with a bit of searching and clicking around on Stackoverflow: https://www.colm.net/open-source/ragel/ (via http://stackoverflow.com/a/15608037).

I didn't look long enough to know if there's an easy way to convert a regular expression to Ragel syntax.

brakl · on Feb 28, 2016

> Go has regexps, and a very good implementation at it.

In my experience, porting code from Perl to Go, Go's regexp package is vastly inferior to Perl's, in multiple areas, speed, memory, unicode handling (eg: \b works on ascii-only in Go), etc. For example, for some large regexps handling url blacklists, reduced programmatically with Perl's awesome regexp assembly tools, I had to rely on PCRE in the end, Go just could not cope with that (not even the c++ re2). I do avoid regexps, regexps are usually best avoided, and all that, but there are areas in which they are by far the best option. In those areas, I postulate, from my own experience, that Perl's implementation is king. Speed, memory usage, Unicode.

burntsushi · on Feb 28, 2016

> (not even the c++ re2)

Did you try using RE2's "set" functionality?

brakl · on Feb 28, 2016

No, I did not get that far, would've meant a larger rewrite of the ecosystem, the data files were created by other tools, already in "alternate form" [1] needing to be used by other programs as well. I stopped trying to load them with re2 (both Go and C++), after glancing over all those gigabytes of RSS, while Perl kept them in the 2-300 MB range. PCRE was a good compromise at the time, but with other tradeoffs, because C libs seem to be frowned upon in the Go community, ie. semi-official voices arguing how best to avoid them. :/ (eg: blocking inside C isn't under the gomaxprocs limit, costly overhead crossing the C boundaries, static binary troubles, less portability and so on)

#1. perl -MRegexp::Assemble -E'my @list = qw< foo fo0z bar baz >; my $rx = Regexp::Assemble->new->add( @list )->re; say $rx'

(?^:(?:fo(?:0z|o)|ba[rz]))

voltagex_ · on Feb 28, 2016

cpantesters looks very useful. [1]

I wonder if there's anything like that for Python and Ruby.

[1]: for example, http://cpantesters.org/author/D/DAMOG.html

paulryanrogers · on Feb 28, 2016

Less code is generally better. But I've noticed a lot of folks still using ^ or $ when what they really mean is \A or \z

voltagex_ · on Feb 28, 2016

What's the difference? ^ and $ is basically all I remember from when I read Mastering Regular Expressions

eCa · on Feb 28, 2016

\A and \z always match beginning/end of the string.

^ and $ can be changed to mean beginning/end of each line in the string with the /m flag.

Grishnakh · on Feb 28, 2016

>8) Even if regexs are not needed, inevitably the perl dev will use them since that's the perl hammer, and it's hard to know what the intention is with regexes or what the source data even looks like

I'm going to disagree with this one. There's lots of things in any language where it can be hard to see, at a glance, what the intention of the programmer was. That's why we have commenting. You're supposed to comment your blocks of code so that someone else can look at it and understand what that block of code is supposed to do.

Unfortunately, as far as I can tell by looking at other people's code, I appear to be one of the only programmers on the planet who actually uses comments....

paulryanrogers · on Feb 28, 2016

Ideally the code itself should communicate that intent. And comments can become obsolete as code changes. Hence the movement to reduce comments to only what's necessary.

singingfish · on Feb 28, 2016

1. What? (anyway use cpanminus these days). 2. Again what? 3. Nope, there are a variety of tools available. Try `cpanm Perl::PrereqScanner::App` followed by `scan-perl-prereqs .` 4. Yeah you can skip test runs `cpanm --notest` , you really want to? The subsequent complaint, you're clearly having an experience I don't have. 5. Again see cpanm 6. Can't comment on this one. 7. Umm, that's a code smell. From cpan that outcome is rare. 8. You use regexes when you need certain kind of things done fast. Don't forget the `/x` flag to ensure it's documented if a non-trivial regex. 9. Actually I spend most of my time in the perl debugger. Older perl codebases do suffer from the magic payload pattern quite a lot. Modern perl, less so. 10. Yeah I agree, one should probably have to explicitly turn off warnings and strict, but whatever.

Anyway I agree, perl has made huge progress since the 1990s. I also agree there's a problem with discoverability in some parts of the cpan ecosystem. Be sure to read the Modern Perl book next time you need to do some perl work. You ought to be pleasantly surprised. Personally with the Moo(se)? family of modules, I enjoy having a multiparadigm language with reasonable optional runtime typing to keep me sane. My biggest complaint is the reference counted garbage collection.

Mithaldu · on Feb 28, 2016

> 1) The perl cpan module doesn't resolve dependencies

What? CPAN absolutely does.

> 2) The cpan module has parsing errors when passing in a list of CPAN packages

Both from the commandline, and in CPAN itself can i install a list of modules as such:

    cpan Data::Dumper Devel::Confess
    
    install Data::Dumper Devel::Confess

> 3) You have to manually grep your perl code to see what modules it depends on

Or you can use a CPAN module for that.

> 4) Module installs take a long time since they can compile and unit test the code

Or you just install them like this, if you're confident in your system:

    install Data::Dumper Devel::Confess

> 5) Non-interactive installs of CPAN modules requires digging in the docs

Non-interactive installs should be using your operating system's package manager, unless you have a special use-case, in which some doc digging is fine.

> 6) CPAN modules aren't used that heavily and can have bugs that would be caught in wider used modules.

You mean "Some CPAN modules".

> 7) Perl devs don't think twice about shelling out to an external binary (that may or may not be installed)

Again, some.

> 8) Even if regexs are not needed, inevitably the perl dev will use them since that's the perl hammer

Eh, fair enough.

> 9) You have to manually include the DataDumper package to debug data structs

    Data::Dumper was first released with perl 5.005

> 10) You have to manually enable warnings and strict check, it's not on by default.

Same in JS, and similar with other languages.

> Anyhow, I think we've made a lot of progress since the 1990s. :)

Not really sure, the trolling culture seems to still be the same as back then.

jdc · on Feb 28, 2016

Regarding the module dependency woes, check out Carton (https://metacpan.org/pod/Carton).

egraether · on Feb 28, 2016

In my opinion the biggest problem with legacy code is understanding its implementation as someone who hasn't worked on it before. In a lot of cases it's not documented well and the original authors have already left, so there's no one to ask. You are left with reading code written by someone else, which takes a lot of time.

This is not Perl related, but I'm currently working on a developer tool that makes this part of the job easier. It's a source explorer for C/C++ named Coati that simplifies navigation within source code and thereby makes understanding the implementation faster and easier. https://www.coati.io/

akkartik · on Feb 28, 2016

Your first paragraph totally resonated. I've been thinking about this problem for several years. However, my approach is diametrically opposed to yours. I think our problems in software all stem from focussing on the code as the tangible artifact to maintain control over. We should instead be focusing on the space of possible inputs that the code is intended to work for. This is something you can't deduce from the code (and automatically using the computer to deduce it, well forget about it), it requires cooperation with the original author to present things in a way that makes the state space more explicit. This is why I love projects with lots of tests. I can't be bothered to analyze static code structure, either manually (what people call 'reading'[1]) or automatically. Just show me how the program is supposed to run in all the different situations that you've considered. Let me change it and rerun the tests to find out if I broke something.

Modern programming practice emphasizes tests, which is great. However, not all kinds of tests can be written so far. So we end up doing manual certification work everytime we release or publish a new version of software, for performance, fault tolerance, etc. I want to make it all automatic. Some links about my project in case you'd like to learn more: http://akkartik.name/about; http://github.com/akkartik/mu#readme. I'd love to hear your thoughts, either here or over email (address in profile).

[1] http://akkartik.name/post/readable-bad

reledi · on Feb 28, 2016

Tests significantly help with understanding a codebase and safely making modifications.

Unit tests describe the behavior and usage of the individual systems, while integration tests describe business use cases in core workflows (usually just the happy paths). Integration tests should ideally be written in a gherkin style with feature description and acceptance criteria clearly outlined.

Tests should be optimized for readability foremost. For example, most people try and make their tests DRY (which is an abuse of DRY because it should only apply to concepts, not code, but I digress) while they should instead be making them DAMP. Each test should be able tell a story without jumping up and down around the file and outside the file. It's a lot more dangerous to misunderstand a test than it is to have a little bit of code duplication in your tests.

When I come across a project that has no documentation, I look at the tests. This isn't an excuse not to write proper documentation of course.

crdoconnor · on Feb 28, 2016

>Tests should be optimized for readability foremost. For example, most people try and make their tests DRY (which is an abuse of DRY because it should only apply to concepts, not code, but I digress)

That's wrong on so many levels. DRY applies to code.

>while they should instead be making them DAMP. Each test should be able tell a story without jumping up and down around the file and outside the file. It's a lot more dangerous to misunderstand a test than it is to have a little bit of code duplication in your tests.

This is a problem with Gherkin. Gherkin is not particularly well suited to making tests that are both DRY and readable due to its syntax.

akkartik · on Feb 28, 2016

> which is an abuse of DRY because it should only apply to concepts, not code

As a 'copyista', I would change this to "DRY should only apply to production code, not tests." It wasn't clear from your reply if you agree or disagree.

A couple of great recent links about this IMO under-discussed topic:

http://www.sandimetz.com/blog/2016/1/20/the-wrong-abstractio...

http://programmingisterrible.com/post/139222674273/write-cod...

http://bravenewgeek.com/abstraction-considered-harmful

crdoconnor · on Feb 28, 2016

>Modern programming practice emphasizes tests, which is great. However, not all kinds of tests can be written so far.

I don't think this is true in practice. It's just that a lot of scenarios people need tests for aren't easy to replicate and most people end up not bothering and relying upon manual checking instead.

akkartik · on Feb 28, 2016

Are you aware of any projects where this is not true? Perhaps Sqlite3 (http://www.sqlite.org/testing.html). But in my personal experience it's not "most people" but "all people". I'd love to hear contradicting data or anecdotes.

crdoconnor · on Feb 28, 2016

I've worked on projects before which initially had no tests, then some tests, then progressively more and more sophisticated tests.

Each time the harness became capable enough to replicate a certain type of scenario it didn't take long before those kinds of bugs dried up.

The bugs then always nearly always migrated to the areas the test harness couldn't easily create scenarios for - whether that was interactions with crazy external APIs, odd timezones or weird quirky browsers or whatever.

Continually modifying the test harness to be made capable of testing bugs reported from production was often a huge amount of work but it paid off handsomely.

jacques_chester · on Feb 28, 2016

> So we end up doing manual certification work everytime we release or publish a new version of software, for performance, fault tolerance, etc.

What can humans do, sitting at a computer, that software can't be written to do?

Before other nitpickers derail this question, I'm talking about validating a software product, not hard AI.

akkartik · on Feb 28, 2016

We don't need to think about AI. I'd rephrase your question like this:

What can the author do, sitting at a computer, that later readers can't undo?

When phrased that way the answer is obvious: a lot. Like multiply two large numbers together, or obfuscate a program to make its meaning less clear. Both are in principle possible to undo, but the "load factor" can be arbitrarily large, making the effort for the 'reader' entirely uneconomic.

If even a human can't undo some things that the author did, how can we expect the computer to do so?

jacques_chester · on Feb 28, 2016

I'm afraid I don't follow you at all.

akkartik · on Feb 28, 2016

Sorry, I misunderstood what you were replying to.

I'd change my answer to: nothing at all in the context of manual testing. And yet in practice the answer so far seems different from our in-principle argument. I'm trying to make principle and practice line up better.

t3hprogrammer · on Feb 27, 2016

I share a similar sentiment with a different phrase: "code is a living history of past ideas, good and bad."

pluma · on Feb 27, 2016

Code is a liability. The best code is code that doesn't need to be written.

ktRolster · on Feb 27, 2016

If it doesn't need to be rewritten, if it is that good, then it is an asset because you can depend on it.

ciroduran · on Feb 27, 2016

Problem is... that code is an asset, until it's not. As most cases, it depends on the project, but changes in requirements have insidious effects on code, which might turn your asset in a liability before you realise it.

amelius · on Feb 28, 2016

Ideas are more important than code.

akkartik · on Feb 28, 2016

One of my favorite quotes: "the future is a disagreement with the past about what is important." (http://carlos.bueno.org/2010/10/predicting-the-future.html)

milesf · on Feb 28, 2016

A softer phrase that might work better is "All Software is Experimental". It has less pejorative connotations, and could be a shibboleth among seasoned developers.

akkartik · on Feb 28, 2016

Yeah, exactly. But what does it mean if even seasoned developers rely on interfaces in all this experimental software staying fixed for all time? The article falls for this disconnect as well, with its great enumeration of the problems contrasting with the repeated references to interfaces as a solution. Interfaces are part of the problem. (Or more precisely, our propensity to freeze interfaces, and our reluctance to rethink interfaces at will.)

My favorite bit of the article is something I've been thinking about for years:

..paradoxically the cleaner and saner your interface the more likely it is to succeed and thus more likely to become constrained by its users, to solidify.

I've been idly dreaming of a future where we decouple modules not by their interfaces but by their behavior. The major failure mode of our age is premature freezing of interfaces, and it happens because it's hard to judge how done the interface is from the outside. An interface that looks really clean can be exposed by a detailed look at the implementation that covers a corner case that hasn't been addressed yet. I'm trying instead to think about the input space of a function. I'd like in future to be able to make guarantees like this:

> The behavior of this function is fixed when argument foo is less than 5. However, for larger values we're still nailing down some details.

The argument foo might shift from being the first to the third, or its type may change in some subtle way (that still preserves previous guarantees). So upgrading will often cause some pain. However, the upside is that the pain is bounded. You never end up in some 1% situation where upgrading your OS borks up your Rails app and takes days to fix, or breaks things in subtle ways that you don't notice for months.

You know that the upgrade effort will be bounded because you know that as long as you pass in the right value into the right slot, and as long as it's less than 5, behavior won't change and all your tests will pass. That seems like it might be superior to an interface, even if it adds a little extra work up-front.

More details, in case you have any thoughts: http://akkartik.name/about http://akkartik.name/post/libraries2

akkartik · on Feb 28, 2016

I've been chatting with the author over email about my no-freezing-interfaces idea, and the discussion helped uncover two situations I haven't considered sufficiently:

a) Security issues. We need some way to allow people to quickly apply certain tiny changes without having to worry about the possibility of an interface change, while ignoring all the others.

b) Clients sometimes end up in situations where they can't control when upgrades happen. That would break things even worse with my approach than it already does. My proposal really relies on people being able to control when they upgrade.

Back to the drawing board..

tariqali34 · on Feb 28, 2016

Legacy software is successful software. What's the point of building a program that will be discarded after a few months because its users fled to the next big thing? That just means more lines of code being written overall, needlessly.

bottled_poe · on Feb 28, 2016

Legacy software is software that served a purpose but no longer meets the business objectives. Some common factors that influence the decision to replace a legacy system with a new one include: maintenance cost, scalability, flexibility. The usual approach in startups is to build something at minimal cost, then replace it later if the business requires it. Everyone has a different opinion regarding the importance of these factors at various stages of software life.

vinceguidry · on Feb 28, 2016

I used to fantasize about having a code base I'd maintain for the next X0 years. Something like Dwarf Fortress, or hell, even TempleOS. Then I slap myself and come to my senses. Who knows how different coding will be 10 years from now? Do I really want to get married to something that's bound to be obsolete sooner rather than later?

my5thaccount · on Feb 28, 2016

If you aim to create great software that is flexible and powerful from the beginning, you'll be fine. Many businesses are still running very well on my 10 year old code and will be 10 or even 20 years from now I'm sure.

Let this inspire you: http://www.bricklin.com/200yearsoftware.htm

mkramlich · on Feb 28, 2016

if this is who I think it is, Dan Bricklin, I want to chime in to say thank you for VisiCalc, sir! I was a user and big fan of your work.

for the kids/newbs: effectively the (co)inventor of The Spreadsheet. and consider that spreadsheets were easily the killer app for the first PC's, thus giving them much more business value, in turn causing more money to flow in, helping to create many more paying jobs, etc, in a happy snowball effect which leads to Linux, Google, AWS...

here's where you might say, "sorry, different Bricklin." oops

spc476 · on Feb 28, 2016

I've been maintaining my blogging engine [1] for sixteen years now, and very little of the original code base (from 2000) exists. It's been an interesting experiment.

[1] https://github.com/spc476/mod_blog

pjc50 · on Feb 28, 2016

I've joined a company with a multi-decade code base, one of whose original devs is still here. Much of it is untouched for a decade.

gaius · on Feb 28, 2016

Legacy means "in production, making us money".

mooreds · on Feb 28, 2016

We should all be so lucky as to write legacy code because the alternative is worse: code that is never deployed, or deployed only for a short time. A project and website I worked on for about 10 years was recently shuttered, and I am extremely proud of what that code did for the business which paid me.