Is an array language a language where you only have arrays? That honestly sounds...

klibertp · on May 20, 2021

No, an array language is one in which all the built-in operations are applicable to arrays. Take addition as an example:

        4 + 4
    8

In array languages the same operator can be used for arrays; or equivalently you can say that the example above sums two arrays of length 1. In J, you can do this and expect it to work:

       4 4 4 + 2 2 2
    6 6 6

This is true for all the built-ins, and many user defined operations (as long as you don't fiddle with so called rank of the verb you're defining).

Numpy is close to that, but there's still a distinction between arrays and scalars, while in array langauges that distinction is often blurred:

       4 + 1 2 3
    5 6 7

Edit: in J, you can have atoms, or scalars, but you need to box them:

       (3;4;5)
    ┌─┬─┬─┐
    │3│4│5│
    └─┴─┴─┘

but then you can't do anything with them until you unbox them again:

       (3;4;5) + 3
    |domain error
    |   (3;4;5)    +3

       (+&4) each (3;4;5)
    ┌─┬─┬─┐
    │7│8│9│
    └─┴─┴─┘

(Examples straight from J REPL)

slver · on May 20, 2021

It seems almost as if it'd be more useful to not explicitly expose operators as applicable to arrays, but implement SIMD optimization for operator expressions in .map(function) and something like .binaryMap(right, function) ex.

    [1, 2, 3].binaryMap([10, 10, 10], (a, b) => a * b) // Produces [10, 20, 30]

This would be easier to optimize when compiled because the expressions can be simplified and mapped to the right SIMD instructions.

klibertp · on May 20, 2021

Unfortunately, I have no idea about the implementation and what optimizations are done in J or APL for which operations :( I know that there are hardcoded "fast path" expressions (particular combinations of operations) which have much better performance than more general expressions doing the same thing, so it might be that the optimization happens at that level.

OTOH, your example is very verbose when compared to J's version:

       1 2 3 * 10
    10 20 30

Plus, in J it generalizes to higher dimensional arrays:

       i. 3 3
    0 1 2
    3 4 5
    6 7 8
       (1 + i. 3 3) * 10
    10 20 30
    40 50 60
    70 80 90

slver · on May 20, 2021

My example is verbose in order to clearly communicate the principle. I can trivially shorten it to

    [1, 2, 3].map(a => a * 10)

if I wanted to literally carry out that task alone.

    [[0, 1, 2],
     [3, 4, 5],
     [6, 7, 8]].flatMap(a => a * 10)

klibertp · on May 20, 2021

I think this is pseudo-code? `flatMap` and `=>` for lambdas look like Scala, but there are no array literals using `[]` there. Assuming you mean Scala-like semantics, your second example wouldn't work at all:

    scala> Array(Array(1,2),
                 Array(4,5)).flatMap(a => a * 10)
                                                       ^
           error: value * is not a member of Array[Int]

You would need to write it like this:

    scala> Array(Array(1,2),
                 Array(4,5)).flatMap(a => a.map(x => x * 10))

But then you'd get a flattened array of ints, not array of arrays of ints:

    res6: Array[Int] = Array(10, 20, 40, 50)

So to get the same result as

    (i. 2 2) * 10

You'd need to write:

    scala> Array(Array(1,2),
                 Array(4,5)).map(a => a.map(x => x * 10))
    res7: Array[Array[Int]] = Array(Array(10, 20), Array(40, 50))

...which is more verbose, no? :) EDIT: obviously, I mean "map in map" part, not the Array initialization!

The problem with list comprehensions and `map`, `filter` and friends is that they work very well for flat lists, or lists of lists in some specific circumstances (ie. when you can use `flatMap`). 2D arrays in the general case, and arrays with higher dimensions are really hard to work with using these primitives, unless you have some clever overloads, like what Clojure does. I think Haskell also has a solution for this, but I don't know it enough to comment further, unfortunately :)

slver · on May 20, 2021

What I can't understand is what would J do if I tell it to sum these two arrays:

    1 2 3 4 5
    6 7 8

I.e. 5 elements and 3 elements

rklmno · on May 23, 2021

Like was mentioned, the simplest behavior is treat this as an error case.

Other possibilities include:

(*) extending the length of the short array to match the length of the long array -- padding with zeros on the right -- before adding.

(*) finding all possible sums between pairs with one number from one array and the other number from the other array.

(*) cyclically extending the shorter array (probably not useful in this example, but since the example didn't come with a use case that's difficult to know for sure).

(*) Treating each array as a decimal number (or a number in some other base)

(*) combining the two array as a single list of numbers and summing everything to a single value

and... so on...

One point being that programming languages support arbitrarily complex operations.

Another point being that "sum" in this context carries some understandable ambiguity -- the sort of thing which we usually try to constrain with use cases or examples or similar elaboration.

klibertp · on May 20, 2021

       1 2 3 + 4 5 6 7
    |length error
    |   1 2 3    +4 5 6 7

In general - depends on the operation. Sometimes the verb checks that both operands have the same rank and dimensions, sometimes the shorter/smaller side gets applied column-wise (or cell-wise/page-wise):

       (2 2 $ 1 2 3 4)  NB. $ means "reshape"
    1 2
    3 4
       (2 2 $ 1 2 3 4) + 1 2
    2 3
    5 6

Sometimes the shorter side is repeated as much as needed to get the correct length, and sometimes the shorter side gets extended with 0s, and sometimes the longer side gets truncated:

       (2 2 $ 1 2 3 4 5 6)
    1 2
    3 4
       (2 4 $ 1 2 3 4 5 6)
    1 2 3 4
    5 6 1 2

To be perfectly honest: it's very unintuitive to me and I have to check the docs pretty often to see how the given verb behaves outside of the simplest case (ie. when the rank and dimensions match). But, I learned J as a hobby and never invested enough time into it to really learn all the built-ins, nor did I try using J outside of toy examples, so maybe this behavior becomes convenient once you internalized it?

EDIT: forgot to mention, you can also control the effective rank of the verb, like this:

       <"0 (2 2 $ 1 2 3 4)
    ┌─┬─┐
    │1│2│
    ├─┼─┤
    │3│4│
    └─┴─┘
       <"1 (2 2 $ 1 2 3 4)
    ┌───┬───┐
    │1 2│3 4│
    └───┴───┘
       <"2 (2 2 $ 1 2 3 4)
    ┌───┐
    │1 2│
    │3 4│
    └───┘

the `<` verb means "box", without the `"n` part you'd get the last value, which is a 1-dimensional array of boxes of length 1 (with 2x2 array inside the box). By selecting rank of the verb, you can decide on which level the verb should be applied: with `"0` it's applied to the smallest possible chunks of the array (cells - you get 2x2 array of boxes), with `"1` you apply the verb to bigger chunks (rows), and so on, for input arrays of any dimensions (so if you have a 3x2x2 array, `"2` will apply the verb to the 3 2x2 arrays).

dzaima · on May 20, 2021

I don't think $ is of worthy note here. It's literally a builtin for cycling data, that's what it does.

All simple arithmetic follows leading axis agreement[1] (so erroring on mismatched length, and mapping over cells for high rank). Don't know much about other builtins, but, unless J does some weird stuff I don't know about (I don't know much of J), it shouldn't be too illogical.

[1] https://aplwiki.com/wiki/Leading_axis_agreement

klibertp · on May 20, 2021

> I don't think $

You're right, although cycling instead of erroring out was surprising to me. Another verb I had trouble with is roll/deal: https://code.jsoftware.com/wiki/Vocabulary/query

> it shouldn't be too illogical.

I'm not saying it's illogical. I'm pretty sure there are good reasons behind the design of each word, especially since there are so few. As I mentioned, I suspect it's just a matter of me not getting enough exposure to the various use-cases of many standard words.

slver · on May 20, 2021

Gotta say, "depends on the operation" was the most unfortunate possible answer to my question, alas :-) Maybe one day we'll figure out how to make a consistent array language ;-)

thrwaeasddsaf · on May 21, 2021

Why is it unfortunate? It's entirely plausible that they've managed to pick the behavior that makes most sense for each operator. Different operators being different, it doesn't make sense to expect all to behave the same. And it would be unfortunate if the common use cases were complicated by having to code out the sensible behavior if the defaults were wrong.

klibertp · on May 20, 2021

Well, for what it's worth - and as mentioned by a sibling poster - most verbs, and certainly all arithmetic operators, obey a few simple rules for determining which behavior you'll get. I should have noted this, before I started whining about all the other verbs, which don't :D

ithkuil · on May 20, 2021

Give this a try: https://t3x.org/klong/klong-ref.txt.html

rscho · on May 20, 2021

even less verbose:

(>: i. 3 3) * 10

rscho · on May 20, 2021

...or

10 * >: i. 3 3

FWIW

dzaima · on May 20, 2021

That just removes a tiny tiny bit of dynamic dispatch overhead. Which is still needed anyways, as array languages can often dynamically switch between 1-bit, 8-bit, 16-bit, 32-bit integer (and 64-bit float) arrays, depending on the elements, completely transparently to the user.

slver · on May 20, 2021

Most compilers can inline statically defined closures in these contexts. And tracing JITs do this even when the closure is not define statically (but is stable).

It's more about allowing the SIMD goodness without the ambiguity and restrictions of "scalar operators work on arrays" implemented naively.

klibertp · on May 20, 2021

> without the ambiguity

There's no ambiguity! In J, all (with some exceptions) operations work on arrays, period. `1 + 1` is not an addition of two scalars, but instead of two arrays of length 1. As I mentioned, you can have scalars, but a) they still live in arrays and b) you can't do anything with them without unboxing. So there's no ambiguity, as far as I can tell.

Also, APL and J are implemented as intepreters, but that doesn't preclude using SIMD instructions when executing expressions. I'm 100% sure the operations are not "implemented naively" :)

rscho · on May 20, 2021

> `1 + 1` is not an addition of two scalars, but instead of two arrays of length 1

I think that's true of (Dyalog) APL, but not of J [1]. APL follows the nested array model (that you describe), while J follows the flat array model that does have true scalars.

[1] https://aplwiki.com/wiki/Array_model#Flat_array_theory

dzaima · on May 20, 2021

AFAIK, J considers `1` to be an array - `L. 1` and `L. 1 2` both give 0 (`L. (1;2)` gives 1).

APL's simple scalar numbers are much more like regular numbers in non-array languages.

rscho · on May 20, 2021

Traditional array languages (APL,J) are not amenable to static compilation due to other dynamic issues, though. I think Dyalog APL is experimenting with a bytecode interpreter, but no JITs in sight either (that I know of). The very dynamic aspect of those languages makes that difficult. I'd love a compileable similar language, though. To replace R/Python for statistics that are sooooo annoying when exploring data due to verbosity.

rklmno · on May 23, 2021

That's not really the issue.

It's certainly true that, depending on how the code is written, you may not be able to optimize out the language implementation from the compiled program. But that just turns into a link against a library with support for the language for your hypothetical compiled program.

That said, the compiler being hypothetical is a very real obstacle. But even there the problem is not that the language can't be compiled -- it's that no one has bothered to implement a compiler for it.

Anyways, if you throw in some ML type inference, and some tools for characterizing the resulting code and its interfaces, you can generate code from an array language which is quite similar to the code you would get from a variety of other languages.

dzaima · on May 20, 2021

A compiler could inline SIMD with or without an operation having an explicit map, the dynamic type checks needed are gonna be the same. (and, since this is an array language, the tiny extra overhead of completely dynamic dispatch is gonna be small compared to the executed SIMD afterwards anyways)

You lose a lot of simplicity & brevity by requiring explicit mapping, and gain close to nothing.

mlochbaum · on May 20, 2021

That's the case in APL and J. K uses nested lists to represent arrays, and has non-lists (atoms). But the convention is that an n-times nested list is considered an n-dimensional array so even an atom is an array, with 0 dimensions.

There's a page on the various approaches to arrays in the APL family at https://aplwiki.com/wiki/Array_model .

rklmno · on May 20, 2021

Mostly yes.

Numbers (and character) are implemented as arrays with 0 dimensions. Text would be an array of characters with 1 dimension (the number of characters), and generally speaking the dimension of an array is a one dimensional list of non-negative integers. Many array languages also include an array type which is approximately the same as a C pointer to an array, with a bit of jargon thrown in to distinguish a reference to an array from the array itself.

Something like an SQL table in an array language would be implemented as a list of columns (rather than as a list of rows) and a corresponding list of column labels. This has some interesting benefits.

That said, functions in array language are typically not arrays (though they presumably would have a textual representation). So... not everything is an array.

rscho · on May 20, 2021

From the J docs: https://www.jsoftware.com/help/dictionary/dx005.htm

  nub=: (i.@# = i.~) # ]
     5!:2 <'nub'
  +-------------------+-+-+
  |+--------+-+------+|#|]|
  ||+--+-+-+|=|+--+-+|| | |
  |||i.|@|#|| ||i.|~||| | |
  ||+--+-+-+| |+--+-+|| | |
  |+--------+-+------+| | |
  +-------------------+-+-+

... so J functions have an array representation, at least.

rklmno · on May 23, 2021

Yes, each J function has several array representations.

mjaniczek · on May 20, 2021

You could consider Pandas / numpy an array language, even though it's really a library for Python. The exposed API is IMHO what's important.

Wezl · on May 25, 2021

I agree, but you have to take extensibility into account. You either have an extensible language and extend it[0], you make it an array language to the core, or you just have a few slightly convenient functions (not a language). (I don't have enough experience to judge which category Pandas / numpy are in)

[0]: https://github.com/phantomics/april