I worked using VA-Smalltalk and Envy for 3 years.
One interesting thing about Envy, is that all the versioning model was available as objects that you can manipulate, and versioning was not file based but per module/class/method.
In our team each developer worked on a branch, we developed an in-house tool to automatically merge all the branches. Having the class model available made easy to identify conflicts and class shape changes. The tool resolved most of the cases and also created the migration script for the object database (GemStone).
Nowadays I'm working with JavaScript, using VSCode, webpack, and a long list of tools... it feels like something went wrong in the dev tooling evolution.
I am fully in sync with you, having used Smalltalk/V for Windows in the university, a couple of years before Java was announced.
For me in an alternative universe of ChromeOS with Dart (given Gilad Bracha's presence on the team) could have been a much better experience with a Smalltalk like OS, instead no.
I used Envy from 1992 to 1993 on a prototype project to replace Sybase SQL server. It impressed me deeply for two reasons.
1.) It was directly embedded in the development environment. Just saving a method would check in the change automatically and make it sharable across the team. This was at a time when most people either not using source control at all or wrestling with cumbersome systems like SCCS. Combined with Smalltalk's built-in dev environment, it is still the easiest source control system I have ever worked with, as least from the developer perspective.
2.) More interestingly, Envy also made metadata about the code base and versions of code easily accessible. Not everyone knows this now, but Smalltalk was unique in that the entire environment above the level of the VM was built in Smalltalk itself and not only available for introspection but for modification. For instance, even execution stack frames were objects that you could see and edit directly to change their behavior. (Not a good idea I soon learned.)
Getting to the point, you could easily introspect every change to code and by combining introspection of classes and methods quickly determine which changes were happening where. We built a test framework that could list the changes since the last run, then compute dependencies and run an appropriate selection of test cases to check. This cut down test time by 90% or more for new changes. Because the dev environment was pure Smalltalk it was easy for us to integrate this this back into the tools so that it was available just by pressing a button.
Many of the ideas of Smalltalk such as VMs, self-documenting methods and classes, and auto-compilation made their way into other languages and products. However, I have never encountered anything quite like Envy since. It displayed the same level of transcendent innovation for developer productivity that characterized the original Xerox PARC windowing system for desktops.
I know so little about Smalltalk and its way of software development. A year ago, I read about VisualAge, and asked this question on the lisp subreddit:
It has an interesting response from lispm. Having only worked in languages where the source is stored in files, it's hard for me to get a grasp on this kind of development. I catch rare glimpses of this. Here's another example, from http://dept-info.labri.u-bordeaux.fr/~strandh/Teaching/Langa...
> A major problem with Scheme that does not exist in Common Lisp is that the Scheme language does not define an internal representation for programs, although to be fair, all Scheme implementations do.
> For instance, in Scheme, there is a difference between:
(define (f x y . (z))
...)
> which is syntactically illegal, and:
(define (f x y z)
...)
> which syntactically legal.
(You'll probably have to read that short section "Program/data equivalence"; I couldn't find the right excerpt)
Scheme code is stored in text files. Common Lisp code is a sequence of lisp objects. This is a strange difference to me; I'm an unexceptional web programmer. But it's "the very basis of the Common Lisp macro system, which uses Lisp data to represent and manipulate Lisp code." I am condemned to be intrigued!
The distinction is that Common Lisp source code is not text; it's Lisp values. What you see in a text file of "Lisp source code" isn't actually Lisp source code; it's a text serialization of Lisp source code. Other serializations are possible.
As a practical matter, most of the time the distinction doesn't matter. You edit text files. Your Lisp reads the text files and the first thing it does with the contents is convert them to Lisp source code. The source code is then processed as you would expect.
One place where the distinction matters is in macro processing: Lisp macro expansion operates on Lisp source code, not on text. If it operated on text, as, for example, C preprocessors do, then macro expansion would be a matter of string substitution. Instead, Lisp macro expanders operate on abstract syntax trees represented as convenient Lisp data structures, which means that macros are not limited to what can be conveniently done with simple string substitution. You have the full Lisp language available, operating on a convenient Lisp representation, to compute the expansion.
More generally, when source code is represented as convenient data structures in the language, it's easier to build tools that operate on it. C compilers do not operate directly on text input; they convert it to internal data structures more convenient for the various stages of the compiler to work with. So does Lisp. The difference is that in a C compiler the representations used by the compiler are private implementation details, and in Lisp they are standard surface features of the language, available to every user.
For JavaScript, the Esprima parser [1] converts JavaScript to what appears to be a JSON format. It's formalized by the Estree spec [2]. You can try it out here [3]. JSON is just regular JavaScript data structures (maps and lists). If you want type safety, there are Typescript definitions, but you could ignore that.
Another example: Go has a parser and AST that comes with the standard library, which again is just using regular Go data structures (structs and interfaces). If you wanted to write a macro preprocessor, using the built-in parser seems like a much better idea than string manipulation.
So, I'm wondering if there's anything more to the way Common Lisp does it?
Lisp does not use a tokenizer or a parser. It uses a reader, which reads a textual representation of data into data objects.
A reader is similar to a tokenizer, but the reader just creates an internal representation of some data from an external representation. Basically what a primitive JSON reader would do.
A parser would know the programming language syntax and create an internal representation of the program. The Lisp reader does not know anything about the programming language syntax. It just knows about numbers, strings, lists, arrays. But the reader does not know about conditionals, variable declarations, iteration statements, assignment statements, ...
As you can see the evaluator just returns the data, though the symbols are upcased by default. The data is not enriched as a tokenizer or parser might do. Ideally the data structure prints as it is read. There are no type or syntax annotations necessary/used. Thus it is neither a tokenizer output nor a program parse tree. In your first link the tokenizer breaks up the input in strings and categorizes it. The 42 is read into a string '42' and it is annotated as being a numeric. The Lisp reader does not do that. It reads the two characters 42 and returns an integer number object with the value 42.
We can describe this data, it is a list with three elements:
CL-USER 9 > (describe *) ; describe the last value
(DEFPARAMETER ANSWER (* 6 7)) is a LIST
0 DEFPARAMETER ; a symbol
1 ANSWER ; another symbol
2 (* 6 7) ; a list
On this level of s-expressions we know nothing about Lisp as a programming language. It is only known that DEFPARAMETER is a symbol, but not known that it is a language construct and which. It's just a symbol in the first position of a list.
Since this list is also a valid Lisp program (because we wrote it as such), we can compute its value and some side-effects. This process is called evaluation.
CL-USER 10 > (eval **) ; eval the second last value
ANSWER ; it just returns the name of the defined variable
We can also input it directly, since the READ-EVAL-PRINT-LOOP already evaluates:
CL-USER 11 > (defparameter answer (* 6 7))
ANSWER
As you can see, the input to the evaluator is not a tokenizer output and not a parse tree. It's just an expression tree, where the leaves are data objects. It's not (token 3 :type number) or (literal-data :type number :value 3) or something similar. It is just the 3 as a data object.
The evaluator itself could be an interpreter or a compiler. The interpreter runs walks the expression tree and computes the values from the expressions, which it sees in data format. A compiler-based evaluator would take the whole expression tree, compile it and then run it.
There is definitely a tokenizer in Common Lisp, and one that is pretty much "sealed off" to the programmer. The current read table can assign a character to be a "constituent". Constituent characters are gathered by the reader into a token. Then they are analyzed. A sequence of unescaped digits like 12345 becomes an integer. Floating-point literals are recognized. A symbol token is analyzed for the package : or :: . And so on.
That's from the programmer point of view an implementation detail. The main interface for the programmer is the function READ and it returns data, not a stream of tokens.
Although there is no program-visible "token" data type, the specification describes tokens and uses that word. Since the reader is available to programs, programs can be written to test hypotheses about the implementation's treatment of tokens; tokens can be composed as character strings, passed into read-from-string and the results of the operation can reveal something about the tokenizing black box, without yielding the tokens themselves.
(That said, the Common Lisp syntax isn't completely treated as a token stream by the reader, in the glossary sense of "token", since that word is defined only as the read syntax of numbers and symbols; thus string literals and other bit of notation aren't defined as tokens.)
Exactly, it reads single tokens in some places and then creates data from them. It uses the word token for that, but that's arbitrary.
That's not what a Tokenizer does, which usually would create a stream of tokens for all the elements in the input. A token would be a string with some metadata.
I would want a Lisp VCS to preserve comments, which are not usually present in the deserialized form of "Lisp source code", so I don't really think your description helps here.
I haven't used Xerox Lisp enough to remember whether the Common Lisp side did keep comments in a similar way to InterLisp-D.
There's no realistic danger of a Common Lisp implementation abandoning support for the text serialization of Lisp source code, because the reader is part of the language spec. That being the case, you can always count on support for Lisp text files, and therefore for comments stored in them.
On the other hand, nothing in the standard forbids an implementation from keeping comments around in the image. Off the top of my head I don't know of an implementation that does it, but it might be a handy feature to have.
Basically the comment function * was taking unevaluated args as a list. An NLAMBDA. It just returned its arg.
(* this is a comment in Interlisp)
This means also comments could also only be used where they fit into the Lisp syntax and not changed the control flow.
In Common Lisp:
(defun comment (words) words)
(defmacro § (&rest words)
`(comment '(,@words)))
(let ((a 20))
(cond ((< 0 a 10) (§ Only when A is between 0 and 10.
This is a typical value.)
(* a 0.1))
(t (§ a is not between 0 and 10)
(* a 0.2))))
Now it would need a special pretty printer command, so that
* comment lines would be layouted such that they are aligned fit on a line.
* the position of the comment should stand out from the code.
Interlisp had a pretty printer macro for it.
It might be better to turn that into a reader macro, so that inside the comment one would then not be constrained by s-expression syntax....
Interlisp has also the feature to store/retrieve the comment text on demand from a file. Thus the comment text might not be needed to be present in the image itself.
I'm quite thankful that we don't have to worry about the syntactic position of comments, and the wise folks of CL, as you may know, made it so that saving comment text can be as easy as:
Working with Smalltalk is pretty cool. I sometimes say that the image-based part of Smalltalk is like Lisp, but done right, because the state of the code in your image and the state of the code in your source files can't ever get out of sync; the IDE is the image.
It's also quite interesting from a language design perspective. For example, the syntax is mega-minimal (it's called Small talk for a reason!). The syntax can be summed up as:
- A small handful of reserved words (self, true, false, nil, and one or two rarer cases that escape me ATM).
- Local variable declarations.
- Message calls on objects (what other languages usually call methods). Either unary calls "anObject unaryMethod" (a no-arg method), keyword calls "anObject withFoo: someObject withBar: anotherObject" (equivalent to anObject.method(foo, bar)", and binary calls "anObject + anotherObject" (single arg methods, written with infix syntax).
- A syntax for lambda closures, called blocks in Smalltalk "[:a :b | $code ]", basically equivalent to (lambda (a b) $code).
- Statements are terminated with a period.
- "^foo" returns foo from a method.
- The semi-colon is used for calling a sequence of methods on the same invocant (a better way to do fluent interfaces, basically).
From these pieces everything else is constructed. There's no reserved syntax for things like conditionals or loops. For example, a conditional works like this "condition ifTrue: [ $trueCode ] ifFalse: [ $falseCode ]" which is just plain Smalltalk. The implementation is simply that the Boolean type has two subclasses True and False, and the implementation of the ifTrue:ifFalse: method calls the true code in True and the false code in False. Dead simple! Loops and exception handling are done similarly by calling methods on blocks.
> I sometimes say that the image-based part of Smalltalk is like Lisp, but done right, because the state of the code in your image and the state of the code in your source files can't ever get out of sync; the IDE is the image.
That's what Interlisp did too. Particular Interlisp-D and its later version Medley.
Oh, neat! I guess I shouldn't be surprised that advanced Lisp systems from that era did this kind thing as well. The Lisp and Smalltalk systems of that era really are something, even compared to modern development environments. Which is especially impressive when you consider the limitations they were working within.
It might be interesting to compare how the error locations are reported, particularly when macros are involved. A typical AST can generate a file, line, and character offset for each node (at least), for printing errors. But if you don't have line numbers, there must be some alternative?
You have the compiler use a special reader which records the source locations of each form in e.g. a hash table. When compiling a specific source, look it up in the hash table to get a source location mapping. There's a couple of downsides to this:
- This only works when the reader returns unique objects. It's possible to track the location of a cons cell, but it's not possible to do it for any specific use of a symbol or a integer.
- CL macros can do arbitrary transformations on the data. If those transformations involve loss of object identity, the tracking breaks down. (E.g. code-walking macros will often end up doing spurious copies, and cause unnecessary loss of source location mappings).
What was distinctively git-like about it that makes that connection more meaningful than "it was a version control system"?
That is, what made it Smalltalk's git, and not, say, Smalltalk's CVS or ClearCase? The linked-to page suggests that ENVY used a centralized system. (Eg, "The Git model is better for distributed development" and "it had a small process that ran on a server to manage record locking".)
Consider also some of the limitations in ENVY. "You would bring your changes to the machine in the form of (IIRC) a list of RCS-like versions of individual methods, then review and merge them with latest." and "What wasn’t available, was automatic merging".
Yeah. I would say that Monticello was Smalltalk's git. It does the same sort of method-level versioning as ENVY, but it's a true distributed version control system: there's no central server, no locking, and you can always push and pull versions between repositories and merge them regardless of where they came from.
Monticello preceded git by a couple of years, so it took a little while for the Squeak community to embrace it and figure out how to organize its self around DVCS-based development.
Ah, I think I understand it. I think it comes from a belief that most people treat "git" as a synecdoche or even the entirety of version control systems.
The quote here uses "git" in two ways: 1) "ENVY was Smalltalk's Git" meaning "ENVY was Smalltalk's version control system", and 2) "anyone heard of Git" meaning "anyone heard of the Git system, started by Torvalds."
If so, I think it does a disservice as it reinforces that wrong belief, and is confusing for those who know of other VCSes.
From TFA it sounds like Envy was version control that people didn't find skull-crushingly painful to use, which establishes it as pretty different from CVS or ClearCase.
Heh-heh. After working with RCS, CVS, SCCS, SVN, and Bazaar, I find Mercurial to be a good fit for my model of how to work, while using git drives me batty. Perhaps my skull has been crushed already. :)
FWIW, I and others in my group used CVS for years, after migrating from RCS. We didn't find that difficult.
I didn't get a sense from the article that the people involved found RCS, ClearCase, or CVS that hard to use, only that Envy was a good fit for how they used Smalltalk.
Accepting a code change locally, in the usual way, created a new edition in the database. Once editions were versioned and released they would become visible to other developers, and loadable by-default.
I immediately recognized the name OTI, because I had seen it in some early articles about SWT, which is now the cross-platform UI toolkit of Eclipse. So, was SWT also originally implemented in Smalltalk? Or did it come after the transition to Java?
I don't know about the internals. But OTI developed Visual Age for Smalltalk and other VA for X products such VA for Java. They were all written mostly in Smalltalk, including VAJ (very interesting! It was a VM that ran both Smalltalk and Java).
Eclipse was supposed to be inherited from VAJ, but having used Eclipse in its early days, I didn't see much resemblance..
> On the contrary, we developed a style where no branch lasted more than a day, and usually only half a day. This came about, I think, due to our direct observation of something Kent Beck taught us, which was that if something hurts, we should do it more often … If you wait a week or a month to integrate outside changes, it’s quite difficult, tedious, slow, and error-prone. When you integrate after a few hours, it goes much more easily.
This strikes me as relevant to the monorepo vs. multirepo debate: having multiple repos makes it easier for things to get out of alignment, necessitating all-or-nothing hell-merges.
I was also surprised by how long ago these practices I tend to think of as modern actually started. I was learning waterfall back then!
In our team each developer worked on a branch, we developed an in-house tool to automatically merge all the branches. Having the class model available made easy to identify conflicts and class shape changes. The tool resolved most of the cases and also created the migration script for the object database (GemStone).
Nowadays I'm working with JavaScript, using VSCode, webpack, and a long list of tools... it feels like something went wrong in the dev tooling evolution.