If you want to play around with the F18A instruction set, I wrote an emulator for it[1]. This also comes with a system to synthesize F18A code with MCMC, using the basic technique from the Stochastic Superoptimization[2] paper.
Additionally, there is an interesting project I worked on with several other people to make writing code for GreenArrays more accessible[3]. This includes a system for synthesizing and verifying F18A code using two different solvers (Z3 and Sketch[4]) as well as a system for automatically partitioning and laying out code over the different cores. It's a work in progress, but well worth a look.
Amazing architecture; I remember reading about it and colorforth several years ago when I was researching parallel architectures. I almost want to buy a chip and mount it on a schmartboard, but I've got so much going on right now that I know it would probably sit there after that.
The real reason for my comment, however, is a question: is that idiomatic F#? If so, I like it! It's got a lot of Haskell's succinctness, power, and functional goodness without becoming a tangled mess of indecipherable symbols. Yes, you can write Haskell this way, but in my experience, almost no one does. If this is typical for F#, it deserves a (very close) look.
Add to that the fact that I was impressed with the readability and power of a bit of C# code the other day. I really got to give these MS languages another chance. It's too bad I can't stand the OS (and yes, I've tried Windows 8 -- even worse). How's Mono these days?
Mono's actually really solid, and I'm looking to bring it more heavily into use at work. Some of the most active F# contributors are Mono-first, too, so it's a well-supported target.
F# is an ML implementation for .NET; being an ML-based lanaguage, it shares many similarities with OCaml, but as I pointed out yesterday, it is definitely not the same thing as OCaml:
But in detail, "F# is basicall ML.net" is about as useful a language summary as "Objective-C is basically C++.mac". Despite the common heritage, the two feel quite different once you sit down and spend some time with them.
And seriously, this "the Common Language Runtime is only available on Windows" trope needs to die. Mono's been around for. . . what, a decade now? Something like that.
I agree that the underlying runtime and ecosystem affect the "feel" of programming in a given language a lot, but the comment above was specifically referring to the snippet of F# in the blog post, which could be trivially translated into OCaml.
AFAIK the main differences around the runtime are that the CLR has true concurrency and access to more libraries without using FFI, while OCaml is able to produce very tiny executables (~200kb) if you use the native compiler.
From my research, it looks like F# does not have structural typing (since it uses the .net type system). Does this mean you can't use constraints on types, like make a type with a range (0.0-1.0) or a type from an enum (man|woman|child)?
These kinds of type constraints are one of my favorite things about Haskell, and one of the reasons I'm really interested (in another language category) in Ada -- you can actually model real-life things using the type system.
F# uses nominal typing - that's another difference between it and Caml.
That said, F# will let you do the things you listed - one can be handled with traditional OOP techniques, and the other is handled by discriminated unions. Structural typing is a bit different from what you're describing - it's more a kind of compile-time duck typing.
I actually think F#'s switch to nominative typing is a big improvement over its Caml roots. When I'm trying to build out a type system that's meant to use type checks to enforce sanity, I want to have explicit control over when and where types are considered to be equivalent. Just because the types foo and bar implement members with similar signatures does not mean that they carry the same semantics.
Therefore, IMO having the language try to be clever and jump to the conclusion that anything that looks like a duck must be a duck is a hindrance to type safety. It's also a bit of a maintainability hassle: Having a formal type definition for each equivalence class makes it easier to find which types need to be modified accordingly when an equivalence class's definition changes.
I use Windows at my day job and get paid well to develop and admin for it. I also have a few personal applications that require a Windows VM at home.
But my personal machines run Linux, which I find is more pleasing for development, exploring, security, tinkering and such. Plus you can tailor your environment to suit your needs (from embedded to supercomputing), not that of a marketing manager.
I live in Northern California near Chuck Moore. He came and spoke at our office last week and showed off the new GreenArrays architecture. Afterwards we were trying to come up with things to disrupt with a super low power / massively parallel chip like that. The problem he brought up is that in products like smart phones the power usage is in the display and the radio. So there isn't much savings in a lower power processor.
But... After the discussion several of us wondered about electronic ink signage. With the Green Arrays stuff you could definitely be completely solar powered.
It would also make an interesting Personal Area Network central control unit for Quantified Self stuff using harvested energy.
I would support a kickstarter for either of those, but I'm not going to pursue them myself right now.
As someone with little interest or knowledge in low level hardware, what are the advantages of this? What would you use it for?
edit: Just to clarify, with little interest I mean little interest in acquiring in-depth knowledge about it. I'm still interested in hearing about advances like this and their uses.
The main advantage is that it's exceptionally power-efficient. Take a look at the chart on these slides[1] for more details. It gets this efficiency despite being manufactured with relatively outdated methods--if I'm not mistaken, it has 150nm transitors!
My understanding is that it's power efficient because the stack-based design allows you to transfer very little data in and out of registers. Additionally, the chip is very simple, has very few transistors and no clock. All this helps save power. Also, the cores can power down very quickly when they're not used, meaning you only pay the energy cost of the ones you are using at the moment. Of course, you shouldn't take my word for it: I'm desperately not a hardware person.
I've heard several compelling ideas for using these chips. They're a good fit for small, simple devices that need a long battery life: things like sensors in everyday goods and such, think "internet of things". Another possibility is using them in phones so that the main, power-hungry chip could go to sleep leaving a chip like this to monitor inputs and awaken the phone if needed.
I believe the big reason for the power-efficiency is that it's entirely clockless - the power consumption of an idle transistor is very low compared to one that's switching, and a central clock causes many, many transistors to switch every clock cycle. Being clockless is another way to only pay the power cost for silicon that's actually doing something.
Another thing I've been looking at it for is that the design makes it very easy to bit-bang protocols - if you can't get an off-the-shelf chip to speak a protocol for you, using one of these is far cheaper than sticking an FPGA in there, and is similarly cheaper (and easier to patch later) than getting a custom chip unless you're doing a truly massive run of devices.
While this is somewhat true, it becomes less relevant over time. As transistor size shrinks, leakage and static power loss becomes more and more of an issue.
AMULET was an attempt to make a commercial async chip and it flopped pretty early, one of the reasons being interfacing with anything else in the entire world kinda requires a clock.
What that chart reports is, in effect, energy per instruction executed. By that metric, the GA144 indeed does extremely well. But not all instructions are equal, and it's far from obvious what the right conversion factor is between F18A instructions and, say, ARM instructions.
My guess (based on pure industrial-grade ignorance) is that about half the gap between the GA144 and, say, the ARM Cortex-A9, would go away if you corrected for the difference in useful work done per instruction.
The GA144 still comes out looking good even if that's so. But in that chart it looks miraculous and I think that's a bit of an illusion.
The GA144 gets its efficiency from the fact that everything about the design is simple and it uses very few transistors. How many transistors in an A9? The async nature of the GA144 processors saves power if they are stalled, waiting for something to happen. But if they are in a spin loop doing timing, they are as inefficient as any other MCU. The static power is very low because they aren't using state of the art, low voltage processes with leaky transistors. With Vcc at 1.8 V the transistors can turn on and off hard.
I think it is a mistake to compare the GA144 to an ARM. They are not really designed for the same sorts of tasks. There are big ARMs for large memory apps which the GA144 is entirely unsuited to and there are small ARMs with more limited memory that have no where near the power of the GA144. But then the GA144 with all processors running flat out uses power like the big ARM... so where is the right comparison?
I compare the GA144 to FPGAs. The GA144 has an array of very small processors with very limited memory. But they can be viewed as very beefy logic blocks that execute sequentially rather than parallel logic. To use the GA144 for something useful you need to design your app to suit. Thinking of it as a standard CPU won't get you very far.
You won't be running a useful version of Linux on the GA144 and it is too expensive to suit toaster type apps. But it might be the right device for any number of embedded apps that need raw performance like hearing aids or other signal processing apps.
I'm not sure what the GA144 would be great at. I can find a few apps it would be good for such as a small analog I/O board I currently build and need to respin because the FPGA is EOL. But I'm not sure I would ever use the GA144 for this product because it may go EOL at any time depending on the company. They are very small and I see no sign they are getting bigger. Will they be around in 5 years, 10 years?
This is not exceptionally power efficient. It does around 90 gigaops/sec(probably 5bit ops) and programmable only in forth. Parallela has a chip that does 90 gflops/watt and is programmable in c/open-cl/r and maybe other tools. That's much more usefulland power efficient.
But what about better manufacturing technologies for the F18A chip ? first, it negates some of the advantages of working asynchronously , because the much higher leakage current. Second, the processors game was always about software tools and ecosystems, which is the weak point of the F18A, and without solving this , we won't see it in more advanced manufacturing processes.
Actual power efficiency depends on the application of the CPU. If you have a use case that mostly leaves the CPU in an idle state only to react on occasional events, the F18A would probably be more efficient than the Parallela chip.
You are comparing apple and oranges here. Programming the F18A directly in Forth allows for much more control over what is executed. I would say that on the contrary, if one is ready or have t put the necessary efforts into it, the chips can be extremely efficient. Provided you're not trying to do silly things like running an OS on it...
The manufacturing process might not be up to date, but that's precisely, I believe, one of the strengths of the chip: very low power consumption with cheaper manufacturing process.
If you look at the datasheet, there's a control interface (I2C), various clocks and a data interface (DOUT0-7). The clock minimum is ~6 MHz, so no biggy and suited for embedded systems. Until you start to work out the details: you need to send commands, read in a hefty datastream and do something with it (like writing to storage). That means you either write out the pixels as they come in, or you buffer an entire picture (or large part) and then write it out.
The first solution means that you'll need an embedded system with a clock that has an integer amount times the minimum clock for the sensor (you need to do stuff between different moments when data comes in).
The second solution means a large amount of RAM (think order MBit).
Embedded systems never scale the way you'd like them to, typically everything scales in one go. You want more internal RAM? Well, we also scaled this and that even though you won't need it, and it's 32-bit now instead of 8- or 16-bit. You want external RAM? Well here's an IC with the extra pins and an FSMC but it also has more Flash than you'll need and it comes with USB hardware you won't need. Result: more $ per unit even though you'll only need 40% of its functionality.
There's also an additional problem: This type of control from one single-core system will mean that you're going to write very dense code loops that won't be very happy about exceptions (e.g. flash write delays).
What you need in this kind of systems is a parallel approach:
- 1 core to do the I2C (very low requirement)
- 1 core to check the timing for your output stream (straight forward logic)
- several cores to read/process/filter the incoming stream (follow the datasheet)
- several cores to store everything in temporary external RAM on the fly (lockstep)
- one or more arbiters to tie everything together (the challenging part)
An 144 core system like this would be severely underemployed for this type of camera sensor. You could scale up the sensor to very high performance cameras.
So true, especially for the hobbyist. If you're big enough the chip vendor will build a custom combination for you but everyone else has to pick from the available tiers. Luckily, if power not price is the main issue you can typically clock down or disable portions of most modern microcontrollers.
The other answers are good/interesting, but did not cover the software. This hardware is optimized for Forth. Much like the C language is pretty much oriented around the available addressing modes of a classic PDP-11, this hardware is really good at forth.
Forth is a decent embedded language because you kind of edit your source as a REPL loop, and snapshot the environment, kind of? I'm talking about classical Forth which I haven't used in a long time not the new wave stuff as linked to today. Also another nice thing about FORTH is it upshifts and downshifts in abstraction level VERY fluidly. If I had to write a I2C interface to a device and I could select any language, FORTH would certainly be the easiest / most fun. So first you write a forth word(s) to control an I2C clock pin, and the data pin. Then you write a word that speaks very basic I2C out goes a byte in comes a byte by toggling the pins appropriately. Then a word that lives on top that speaks the I2C logical protocol of addressing and bus contention and enumeration and whatever. Then a word that uses the below stuff to talk to a I2C A/D converter chip, in a generic sense. Then a word that talks to a specific A/D converter at a specific address and scales its output so decimal 1024 means +3.15 analog volts or 72.1 degrees F something. Then...
That brings up another interesting aspect of FORTH and this hardware, if you are familiar with the Propeller chip this is like a 144 cog prop, sorta. So that's one interesting way to structure separate blocks of code, run each block on separate hardware...
This isn't going to go over very well with the HN communities interest in scalability, but much like the prop, if you have N cores/cogs/cpus on chip it scales real nice up to N and then doesn't do so well at N+1. Of course you can probably organize your program to separate at hardware, so unless you have 145 peripherals (or 9 for the prop) you'd be OK. Of course that enforces a certain hardware focus and demand for a device driver oriented architecture as a tradeoff, so its hardly the universal silver bullet.
I would not suggest writing an embedded webserver that serves each page or server process from a separate piece of hardware. On the other hand writing an embedded webserver that uses one dedicated core/cpu/cog per ethernet port sounds pretty reasonable, assuming you haven't already used up all your hardware.
On the other hand if you wrote a thermostat in FORTH on a multiprocessor chip like this, you'd probably have one processor do nothing but read the HEAT/COOL/OFF switch and update the internal shared memory state of the device, and spend most of its time in low power sleep mode. To say this would be easy to test and troubleshoot would be an understatement, there's only 3 states plus maybe "switch is broken so set alert mode and default to off" or something. And another CPU that completely separately does nothing but read a temp sensor and update internal shared state periodically and spends most of its time sleeping. Repeat until you're done. With a single processor system it depends on your chosen flavor of syntactic sugar (if any) but it would be much more of an ordered list, OK now we look and see if a button was pressed, then read the temp sensor, then update the screen, then... A long schedule or to do list. Or you can use syntactic sugar to program it differently, but fundamentally the processor will still do it that way, although probably a little slower.
"That brings up another interesting aspect of FORTH and this hardware, if you are familiar with the Propeller chip this is like a 144 cog prop, sorta. So that's one interesting way to structure separate blocks of code, run each block on separate hardware..."
But on paper it appears that a cog has more power than each one of these limited cores. Plus historically the Propeller has been cheaper than most of the esoteric crazy Forth chips and implementations.
Well, I just wanted a vaguely similar hardware architecture known by more people. If price is an issue you're probably stuck with the smallest possible PIC that'll do the job, which is an interesting optimization issue. You'd use something that costs and order of magnitude more to minimize dev time, or minimize bugs/debugging time, or maximize feature set or maybe some other reason. It fails at cost so miserably that you wouldn't be using that as a factor when optimizing.
Its important for medium size runs. You make a couple things for a small vertical market, or custom, or R+D and it doesn't matter, any choice is like one hour of embedded developer salary so it doesn't matter. If you're doing huge runs of a million in China you "have to" use the cheapest thing out there, and this won't be it. I bet around 10K production runs is where part cost creeps out of salary rounding errors but isn't high enough to utterly force the use of something specifically designed to be cheap.
As far as cost goes TI and Microchip have some low power offerings where $20 for this forth chip or $12 for a propeller minus the cost of a cheap low power competitor means you can afford a larger battery in the budget. Or you could afford more "whatever" in another part of the design.
So, I'm actually looking right now for a microcontroller to run a bunch of real-time I/O that may involve a certain amount of fairly timing dependent bit-banging. I'm planning on building a pinball machine, and so need something that can take input from a couple of buttons, a bunch of switches and some sensors, and drive a bunch of solenoids, LEDs, and the like.
Since I will need to do a bunch of PWM or possibly bit-banged serial protocols to drive all of the LEDs, and I want to have good real-time guarantees on input from the buttons and driving the solenoids, I'm trying to figure out what microcontroller I should use; something like a Propeller, or an FPGA, or maybe one of these, or something else entirely.
Anyone have any experience with these kinds of issues? Will the GA144 be able to do some fairly timing-dependent bit-banging reliably? I notice in the docs that they say the time it takes for each instruction to execute is not fixed. How would you go about doing a timing dependent protocol on these? Is there a high-resolution timer you can poll to ensure that your transitions occur in a timely manner? Or should I bite the bullet and start learning FPGA programming?
Your solenoid response time is likely to be much, much slower than the bit-banging rate of any modern uC. A standard PIC or similar would do fine (assuming it has the I/Os you need).
For solenoids, sure, though some of them (the flipper solenoids) need to be PWMed to vary their strength; you do a full-strength pulse at the beginning to drive the flipper with a good amount of force to hit the ball, but when you hit the end of the stroke, there's a switch that tells you to switch to PWM with a low duty cycle so you avoid overheating the solenoid but can still hold the flipper up to trap the ball. So you need at least the ability to PWM that. Sure, I could probably build a little timer circuit to do that, but it would be convenient if I could just bit-bang a GPIO pin instead of building more hardware.
For driving a few dozen RGB LEDs with PWM to control brightness on each of the three channels, you're very quickly going to reach the limits of what you can do with a standard PIC. Or even if not driving the LEDs directly but using something like these: http://www.adafruit.com/datasheets/WS2812.pdf you need to be able to produce a fairly high precision 800 khz serial signal with no clock.
Between the number of inputs and outputs needed (lots of switches and sensors on the playfield, lots of solenoids to kick out balls and control various toys, lots of LEDs), and the high-speed output needed to drive the LEDs, I'm not sure a standard PIC is going to cut it, though I haven't actually sat down to calculate out the precise number of I/Os and speed that I'll need so I may be wrong. I'm more looking for a ballpark of what I should get a dev board for and start playing around with, I may wind up changing my mind and switching to something else later.
For the LEDs and solenoids I'm guessing the PWM frequency is in the sub 1 kHz range, so you should be able to bit-bang perfectly well with even a low-powered uC. Timing precision for those shouldn't be very strict.
As for the WS2812, that's annoying :) It would be nice if it could use SPI or a UART. I see the NeoPixel library uses bit-banging with interrupts disabled and relies on instruction timing, yuck. But hey, it might work. I've seen single-wire stuff work on Windows CE, for crying out loud. AFAIK you don't need a continuous 800 kHz signal, you can just send it when you want to update.
Yeah, the WS2812 does have a fairly unique interface, but at 45¢ for an RGB LED with a built in constant current driver, it's pretty much the cheapest option per LED. Other options I've seen like the A6281 are $2 for just the driver itself, without the LED.
Since there are probably going to be at least a few dozen LEDs, the per LED cost can add up quickly, so I'm fine with spending a bit more on a more powerful microcontroller or FPGA and learning a new system. And of course, part of the goal is to learn new things and have fun with the project, so some of these new highly parallel microcontrollers like the GA144, Propeller, XMOS, or even an FPGA would be more interesting to work with than a standard PIC, even if they may technically be overkill.
This is the first time I've done hardware hacking in a while; the last time I did it I used a 68008 as my microcontroller. So I'm hoping to find something that will be interesting to learn, not too hard to use, and capable of everything I'm going to throw at it, so I don't have to go through several iterations of buying expensive development boards only to discover that I can't drive my dot matrix display or I can only drive half of the playfield LEDs that I need or that interrupts from the flipper buttons cause my LED color fades to flicker since they screwed up the timing of my signal to the driver or something of the sort.
One of the problems is that since there are so many choices, and I haven't done hardware hacking in a while, I don't have a good sense of what I'll be able to do yet. Maybe I should start small, with an Arduino or a PIC or something of the sort, and move up if I discover that that can't do something that I want.
Coming back into the hardware world after being away for several years, I find the ARM Cortexen pretty damn compelling for the possibility to do the high level programming in something besides C. And a STM32F4 Discovery board is only ~$15.
(And yeah, I feel dirty for blowing up a GreenArrays thread with a suggestion for a more traditional microcontroller, especially not having really studied the GA chip.)
What you're describing seems comfortably doable with a mid-range PIC with a lot of I/Os, but it really depends on how you're looking to program this (and how complex your game logic is going to be). For example, bit banging all of the PWMs can be done from one timer interrupt, but you need to be clever so that it doesn't use too many cycles. Keep in mind at this level, C is considered a high level language.
A FPGA is extreme overkill for this, and keep in mind that you still need some CPU to drive the logic (which could be on the FPGA chip itself, but I don't think this class of FPGA is reasonably priced). A CPLD is still overkill, but at least you don't need to futz with separate configuration memory. And you'll still be doing low-level programming for reconfigurable logic, just in Verilog instead of assembler.
Your level of parallelism isn't actually very high, and I suspect a large single-thread 32-bit microcontroller will do you just fine unless you're looking for an excuse to play with something else.
Your design would be made vastly simpler by using LEDs that have the PWM built in. Then your MCU only needs to write to them when the settings need to change. Adafruit also carries this type. I've tried them and they are very easy to use. The interface is essentially a SPI bus. So, can I assume you're not willing to pay the extra cost for built-in PWM?
You might look into an io expander chip to hook to your pic via I2C or some other protocol. Many of them also have a few PWM outputs. 4, 8, or 16 additional GPIO's are pretty common, and higher pin count expanders are out there, but they get a bit more specialized as the counts go up.
Can you tell me a little about it, and why I should prefer it over a GA144, FPGA, or Propeller? I have a lot of options already, I'm looking to narrow them down, not expand the list to have even more to consider; but if there's a compelling reason I'll take a look.
Yes, that's what inspired me to look into the Propeller, which is an interesting architecture with 8 cores and no interrupts; instead of interrupts, you just dedicate one or more cores to polling for input, while the other cores continue on doing their thing with no interruption.
The GA144 basically looks like an even more extreme version of that idea. Instead of eight small cores, it has 144 really small cores. So now I'm trying to figure out which I want to start playing around with; a Propeller, a GA144, an FPGA, or something else.
What to do with 144 cores? I just watched a video last week about a guy doing SDR to monitor wireless tire pressure transmitters. Every car sold since 2010-ish has at least four of them, each with a unique 32 bit ID, for police tracking purposes, err, to "save the children" or whatever BS.
Anyway the complaint was there are 127 or 121 or some prime number of different algorithms from different mfgrs so you need to run 120-something decoders to completely decode each protocol. This is a pain with a single core laptop. But with a 144 core forth chip I think giving each precisely one protocol to decode in parallel would be fine.
If you are interested in a more "conventional" Forth chip, check out the J1[1]. It's 200 lines of verilog, so you can put it on an FPGA pretty easily.
But it's even easier than that because the J1 is the basis of the Gameduino[2]. The Gameduino is a shield for Arduino that gives you a VGA interface, sprites, and graphics.
I was at Chuck Moore's talk at Strangeloop, it was... esoteric. From what I gather, it's not really something you'd want to write a full application in - Forth came across to me as a very low-level language, much like assembler. Which makes sense given how the chip executes it natively.
Forth is actually pretty high level. You just have to get there first by building up the abstractions. You can end up with an almost Ruby/Python like language on top of it if you want. But you probably won't want that after using Forth for a bit and will end up writing DSL's for your programming concerns.
The use case I had for Forth was a PLC-like controller. You ended up describing the configuration in Forth by the time the abstraction was built. Example for a simple "virtual relay" controlled light switch taht turns two separate circuits on and off in a ladder:
bob relays
0 switch inports
1 light1 2 light2 outports
switch inport
bob coil
ladder
bob contact
begin
light1 outport
light2 outport
logical-and
ladder
plc
Was awesome. Unfortunately I don't have rights to the source. It ran on a Z80 board.
Forth is quite interesting, and cool in its own way, but: how can anything where you have to remember what is supposed to be on the stack 'high level'?
As I understand it, the ideal in Forth is to build up a base that hides those complexities in a domain appropriate API. You don't throw 3 numbers onto a stack and leave them on the stack to represent a vector, you wrap that up with a call to 'make-vector', and from there on you're dealing with 'vectors' as a unit rather than '3 points on the stack'. Or if you don't, you develop an API that deals with them sensibly so that every routine that operates on vectors doesn't have to get into the nitty-gritty. Write a routine that represents the common operations (add-vectors, multiply-by-scalar, dot-product, etc) and build your complex routines on those, so that vector projection could be:
: project-a-onto-b (a b -- c) swap rot dot-product swap scalar-mult;
Ok, 3 stack manipulations but someone who's done more than just toy programs in RPL and Forth might be able to come up with some other routines that can hide those as well. EDIT: And once written, the user of the routine needs to know, "put the two vectors onto the stack, the first one is what I'm projecting, and the second is what I'm projecting onto".
Another point to consider, dynamically typed languages suffer a similar 'failing' if they're to be considered 'high level'. How can they be 'high level' if you can add 1+"b"? (in code, won't work in runtime unless it's weakly typed, similar to having too few items on the data stack). How can any language be considered 'high level' if you have to remember how many parameters, their types and in what order for every function call?
Tragically, my HP48G+ (?? I know this model exists, may have been mine, sounds right) bit the dust a few years back (water damage, think it happened while stuff was in storage between residences). It wouldn't be particularly useful in my day-to-day stuff, but RPL is probably the main reason Forth seemed so natural to me when I played around with it a few years back.
In his talk, Chuck Moore was wearing his FiveFinger shoes on stage. That cracked me up. Absolutely no convention is followed blindly by this guy.
You call it esoteric; I call it thinking in revolutionary terms rather than evolutionary terms. I think it's brilliant. Also, that "dude" is Chuck Moore.
I saw Richard Stallman give a talk while rubbing ointment into his feet.
And when I was at high school looking around my prospective uni, I got help from most of my questions from a random bearded guy in the compsci building who was wearing socks and sandals. He turned out to be the professor specialising in programming paradigms / AI.
Sounds like the future if just for the low-power.
« 144-core asynchronous chip that needs little energy (7 pJ/inst). Idle cores use no power (100 nW). Active ones (4 mW) run fast (666 Mips), then wait for communication (idle).»
You're right -- it sounds like the future for several reasons. I can't wait to get my hands on an eval board.
Why is it that if I have to list the three or four people doing the most to break new ground in our industry, they're all over 60? Chuck Moore is 75 and he's doing this.
I would love to see this video. The only link I could find [1], unfortunately, was behind an auth wall restricted only to attendees of the event. Does anyone know where it might be found?
Okay, so how did he design the physical chip? Physical design starts with some kind of RTL, usually. Unless he mapped out the chip on a transistor level by hand, he used some kind of software to aid him in the design. If he didn't use Verilog/VHDL, I'd like to see whatever logic-level design he did make.
He did design the physical chip, gates and all. He wrote the design software himself (supposedly, 500 lines of forth called OKAD). He uses a simpler model that is accurate enough for the process he is using.
Everything he does is fascinating, but don't expect any of it to be useful unless you are, in fact, Chuck Moore (perhaps Chuck Norris also gets a pass...)
I've had two separate people tell me about forth and green arrays this past week. My interest in piqued. Someone please point me in the direction of a good introduction to the language and its merits.
Additionally, there is an interesting project I worked on with several other people to make writing code for GreenArrays more accessible[3]. This includes a system for synthesizing and verifying F18A code using two different solvers (Z3 and Sketch[4]) as well as a system for automatically partitioning and laying out code over the different cores. It's a work in progress, but well worth a look.
[1]: http://hackage.haskell.org/package/array-forth, https://github.com/TikhonJelvis/array-forth
[2]: http://cs.stanford.edu/people/eschkufz/research/asplos291-sc...
[3]: http://www.forth.org/svfig/kk/11-2012-Bodik.pdf
[4]: https://github.com/TikhonJelvis/array-forth-sketch, https://bitbucket.org/gatoatigrado/sketch-frontend/wiki/Home