The other answers are good/interesting, but did not cover the software. This hardware is optimized for Forth. Much like the C language is pretty much oriented around the available addressing modes of a classic PDP-11, this hardware is really good at forth.
Forth is a decent embedded language because you kind of edit your source as a REPL loop, and snapshot the environment, kind of? I'm talking about classical Forth which I haven't used in a long time not the new wave stuff as linked to today. Also another nice thing about FORTH is it upshifts and downshifts in abstraction level VERY fluidly. If I had to write a I2C interface to a device and I could select any language, FORTH would certainly be the easiest / most fun. So first you write a forth word(s) to control an I2C clock pin, and the data pin. Then you write a word that speaks very basic I2C out goes a byte in comes a byte by toggling the pins appropriately. Then a word that lives on top that speaks the I2C logical protocol of addressing and bus contention and enumeration and whatever. Then a word that uses the below stuff to talk to a I2C A/D converter chip, in a generic sense. Then a word that talks to a specific A/D converter at a specific address and scales its output so decimal 1024 means +3.15 analog volts or 72.1 degrees F something. Then...
That brings up another interesting aspect of FORTH and this hardware, if you are familiar with the Propeller chip this is like a 144 cog prop, sorta. So that's one interesting way to structure separate blocks of code, run each block on separate hardware...
This isn't going to go over very well with the HN communities interest in scalability, but much like the prop, if you have N cores/cogs/cpus on chip it scales real nice up to N and then doesn't do so well at N+1. Of course you can probably organize your program to separate at hardware, so unless you have 145 peripherals (or 9 for the prop) you'd be OK. Of course that enforces a certain hardware focus and demand for a device driver oriented architecture as a tradeoff, so its hardly the universal silver bullet.
I would not suggest writing an embedded webserver that serves each page or server process from a separate piece of hardware. On the other hand writing an embedded webserver that uses one dedicated core/cpu/cog per ethernet port sounds pretty reasonable, assuming you haven't already used up all your hardware.
On the other hand if you wrote a thermostat in FORTH on a multiprocessor chip like this, you'd probably have one processor do nothing but read the HEAT/COOL/OFF switch and update the internal shared memory state of the device, and spend most of its time in low power sleep mode. To say this would be easy to test and troubleshoot would be an understatement, there's only 3 states plus maybe "switch is broken so set alert mode and default to off" or something. And another CPU that completely separately does nothing but read a temp sensor and update internal shared state periodically and spends most of its time sleeping. Repeat until you're done. With a single processor system it depends on your chosen flavor of syntactic sugar (if any) but it would be much more of an ordered list, OK now we look and see if a button was pressed, then read the temp sensor, then update the screen, then... A long schedule or to do list. Or you can use syntactic sugar to program it differently, but fundamentally the processor will still do it that way, although probably a little slower.
"That brings up another interesting aspect of FORTH and this hardware, if you are familiar with the Propeller chip this is like a 144 cog prop, sorta. So that's one interesting way to structure separate blocks of code, run each block on separate hardware..."
But on paper it appears that a cog has more power than each one of these limited cores. Plus historically the Propeller has been cheaper than most of the esoteric crazy Forth chips and implementations.
Well, I just wanted a vaguely similar hardware architecture known by more people. If price is an issue you're probably stuck with the smallest possible PIC that'll do the job, which is an interesting optimization issue. You'd use something that costs and order of magnitude more to minimize dev time, or minimize bugs/debugging time, or maximize feature set or maybe some other reason. It fails at cost so miserably that you wouldn't be using that as a factor when optimizing.
Its important for medium size runs. You make a couple things for a small vertical market, or custom, or R+D and it doesn't matter, any choice is like one hour of embedded developer salary so it doesn't matter. If you're doing huge runs of a million in China you "have to" use the cheapest thing out there, and this won't be it. I bet around 10K production runs is where part cost creeps out of salary rounding errors but isn't high enough to utterly force the use of something specifically designed to be cheap.
As far as cost goes TI and Microchip have some low power offerings where $20 for this forth chip or $12 for a propeller minus the cost of a cheap low power competitor means you can afford a larger battery in the budget. Or you could afford more "whatever" in another part of the design.
Forth is a decent embedded language because you kind of edit your source as a REPL loop, and snapshot the environment, kind of? I'm talking about classical Forth which I haven't used in a long time not the new wave stuff as linked to today. Also another nice thing about FORTH is it upshifts and downshifts in abstraction level VERY fluidly. If I had to write a I2C interface to a device and I could select any language, FORTH would certainly be the easiest / most fun. So first you write a forth word(s) to control an I2C clock pin, and the data pin. Then you write a word that speaks very basic I2C out goes a byte in comes a byte by toggling the pins appropriately. Then a word that lives on top that speaks the I2C logical protocol of addressing and bus contention and enumeration and whatever. Then a word that uses the below stuff to talk to a I2C A/D converter chip, in a generic sense. Then a word that talks to a specific A/D converter at a specific address and scales its output so decimal 1024 means +3.15 analog volts or 72.1 degrees F something. Then...
That brings up another interesting aspect of FORTH and this hardware, if you are familiar with the Propeller chip this is like a 144 cog prop, sorta. So that's one interesting way to structure separate blocks of code, run each block on separate hardware...
This isn't going to go over very well with the HN communities interest in scalability, but much like the prop, if you have N cores/cogs/cpus on chip it scales real nice up to N and then doesn't do so well at N+1. Of course you can probably organize your program to separate at hardware, so unless you have 145 peripherals (or 9 for the prop) you'd be OK. Of course that enforces a certain hardware focus and demand for a device driver oriented architecture as a tradeoff, so its hardly the universal silver bullet.
I would not suggest writing an embedded webserver that serves each page or server process from a separate piece of hardware. On the other hand writing an embedded webserver that uses one dedicated core/cpu/cog per ethernet port sounds pretty reasonable, assuming you haven't already used up all your hardware.
On the other hand if you wrote a thermostat in FORTH on a multiprocessor chip like this, you'd probably have one processor do nothing but read the HEAT/COOL/OFF switch and update the internal shared memory state of the device, and spend most of its time in low power sleep mode. To say this would be easy to test and troubleshoot would be an understatement, there's only 3 states plus maybe "switch is broken so set alert mode and default to off" or something. And another CPU that completely separately does nothing but read a temp sensor and update internal shared state periodically and spends most of its time sleeping. Repeat until you're done. With a single processor system it depends on your chosen flavor of syntactic sugar (if any) but it would be much more of an ordered list, OK now we look and see if a button was pressed, then read the temp sensor, then update the screen, then... A long schedule or to do list. Or you can use syntactic sugar to program it differently, but fundamentally the processor will still do it that way, although probably a little slower.