I agree about the product; perhaps they were acquired for the people rather than the actual IP. Would make sense from that perspective for HF, which has leaned strongly in the text direction, if they want to expand in the AI space.
The text makes it fairly clear that there is one particular line of code that each slightly longer example is showing off. The one line of interest is highlighted at the top of each example.
> NB. If it's a graph: write out the edge list (etc.) .
I don't understand what issue you are referring to.
For a dense network, each pair of adjacent layers forms a complete bipartite graph. In other words, edges are all pairs with one node in layer N and another in layer N+1.
CNNs and RNNs take a little more work, but still easy to describe the graph structure.
I think op means that a graph is not sufficient to describe a NN. If a layer is Y=XB, then you draw that as set of nodes Y and individual weight b_ij as edge-weights from X. Right.
But can you describe things like concat, max-pooling, attention etc. without changing the meaning of the edges?
Or do you have to annotate edges to now mean "apply function here"? If so, op probably wants to say that you are describing more than a graph. There's a graph there, but you need more, you need elaborate descriptions of what edges do. In that case, op could be correct to say that technically, NN are not graphs.
Or, perhaps NN can generally be represented by vertices and edge lists. It certainly isn't the usual way to draw them, though.
> if your definition of "mediocre" is "from a non-top-20 school", that definition is absurd.
My guess is that they are talking about a program that is Top-20 in a particular field, or even large sub-field, rather than the overall university ranking. This can be quite different than an overall ranking. For example, SUNY Stonebrook and University of Minnesota are powerhouses in topology and combinatorics, respectively, but certainly not amongst the most prestegious in other fields.
With the qualifier of "within a a field", this comment is far from absurd. The exact cut-off, of course, differs from field to field. In Near Eastern Languages, don't waste your time outside the Top-3. In CS, even graduates from programs in the Top-50 have a chance a faculty job.
As somewhat alluded to in the article, there is a similar movement within the study of Classical Greek. In some ways this is even more interesting for classical Greek epics because there is good evidence that they were traditionally sung. There are a number of interesting recordings of the Iliad and Odyssey in the original Greek [1].
The opening of this essay is a beautifully written and surprisingly timely reflection on the importance of not judging the worth of time spent on certain pursuits purely on their direct, material utility.
I was hoping the rest would build on the deeper importance of fields such as art, music, and literature at the level of both the individual and society as a whole. Unfortunately, the rest is a bit underwhelming. It is mostly examples showing how particular pursuits in mathematics and theoretical physics eventually created more practical applications in future generations. Interesting, of course, but still focused on raw 'utility', just one step more removed.
I agree with a lot of the points raised here. I think many of the problems with spreadsheets are due to the software rather than the users. As mentioned in the article, its hard to slowly iterate from a small manageable spreadsheet to an larger software solution.
For example, Excel would be a lot more usable and maintainable for me if there was a way to make a special "data sheet" in which data types are forced to be consistent within columns and there was a concept of column names. Still GUI-based and user-friendly. That would encourage a logical seperation between data entry, data output, and computations. In my experience, the main challenge of helping users with spreadsheets is when they create spaghetti code that mixes data and computation together.
Tables are probably the most overlooked feature of Excel.
Why use tables?
* Each column is uniquely named - no more wondering if you are referencing the right cell, no more thinking about "to $ or not to $"
* The table's rows and columns are reliably discovered by pivot tables - no more wondering if the entire dataset is referenced by the pivot
* New columns that are formulas are automatically applied to every row
* Tables have names, so it is easy to understand which table a pivot is referencing
The true, reliable and sane power of excel lies in Tables + Pivots + Charts. If you drive most of the problem solving into those paradigms you will keep hair!
Most people have no clue these features exist, or that Excel even has advanced data modeling that supports all sorts of goodies including strict types and joins/merges/appends. It's not as accessible as just basic sheets, but I've had no problem crunching millions of rows of data in Excel with sub-second response times.
Excel 2013 and later has a columnar database capable of handling millions of rows, but not through the standard sheets interface so it loses a lot of the utility people are used to.
Yes, although you can have multiple sheets each with a million rows. Kind of like sharding.
But I’d be fascinated what kind of system spec is required to get good performance on those kind of numbers. I’ve been on a Mac for years, where Excel is crippled by limitations.
Tables have that all-too-common symptom of something that makes easy things even easier, but hard things way harder. Try making a table with a formula that involves more than the single row in which it's placed. What about tables with multi-row headers (e.g. title and units)? Section breaks in them? Merged parent data? I find I either want standard Excel layout, or a database. Tables don't really sit "between" those two; they're just their own extremely over-simplified universe that doesn't play nicely with anything else.
But hey, to each their own. Good on ya if Excel tables are what you need.
This, combined with judicious use of named ranges makes for much more pleasant formulas. Seeing `tax_rate` in the formula instead of `A$7$` is well worth the extra clicks.
I usually end up assigning names to nearly everything, single-cell constants, user input fields, computed lists, etc, etc
About tables, do they work when you automate the data entry part? For example, copy from a speadsheet into your spreadsheet with the first macro, and then transform the data with a second? That's 90% of my use case for Excel. (I know that doing this in Python/whatever would be "better" but I have to distribute this to users, and the only thing they have on their computer is Excel, and they're gonna manipulate the data after in Excel anyways).
I think Libreoffice Calc supports python integration. Maybe that way one could have the best of both worlds. Unfortunately I suppose most users are stuck in the MS garden.
On the one hand, it is great for doing a quick and dirty analysis with data not in a DB, on the other, it mangles data and translates the keywords/function names.
> For example, Excel would be a lot more usable and maintainable for me if there was a way to make a special "data sheet" in which data types are forced to be consistent within columns and there was a concept of column names.
- column naming: https://smallbusiness.chron.com/give-name-columns-excel-7344... (you can name individual cells and ranges too. Which means you can actually start to write formulas that look more like C with name variables, very loosely speaking of course)
There's a surprising amount of hidden functionally in Excel. Personally I think that while ribbon bar might have made core features a lot easier for some, it's made a lot of the more advanced tools harder to discover.
I've had similar thoughts in the past. Databases are very machine-friendly, but too static and inflexible to match the usability of spreadsheets. Spreadsheets are extremely user-friendly, but too inconsistent and unconstrained to be efficient for programmatic access.
It seems like there should be a way to combine the two. Maybe a minimal set of optional constraints (like a separation between data and code) like you proposed would be a good starting point. Make tables a first-class citizen backed by an embedded SQLite database (or something similar); let users write real SQL to query tables in formulas, maybe update the file format a bit to make it easier for programs to parse and access concurrently. Could be an interesting project...
Honestly, what's missing is a GUI builder for Postgres that non-programmers can use.
Others have mentioned Filemaker and Access, and I think that's exactly right - non-programmers can understand datatypes, that's not the issue. The issue is a UI they can use and (more importantly) iterate on themselves.
One of the major strengths of spreadsheets is "touchability" - your stuff is right there. Psql is the opposite - nothing is visible without the right incantation, and non-programmers can't do much about that.
This article is dead on from an analysis/criticism/insight: devs are called in when it's a sinking ship and don't see all the cargo the ship has hauled.
Why hasn't microsoft or someone taken the basic spreadsheet model to a shared-database scalable one? The UI is basically set at this point.
A naive schema (filename, tab, x, y, value) is what an excel sheet is. It's not like we are dealing with "impedence mismatch" and even shared editing can be reasonably handled with database transactions (or RAFT if you want to get really big/distributed)
I think the lack of this is a sign of Microsoft Office completely owning the space and not wanting to innovate at all. And the huge amount of effort it would take to replicate excel-level operations in a database application server is nontrivial.
But man, you could have an API for doing excel operations against a database schema, and export to excel...
And as you said, you could do lots of schema based options in databases that aren't natural to excel.
How would you put guardrails on the spreadsheets to make sure spreadsheets do not accidently damage the database performance through bad calls? In the hierarchy of skills, SQL is significantly less prevalent that spreadsheets.
Excel data is usually on a scale that DBs have no problem with, even doing full table scans. I am assuming this for something that starts life as a spreadsheet, not a database first where the sheet is initialized via a query.
Perhaps you could come up with a list of commonly-used functions and write optimized SQL functions for them, such that users start off learning with that API and gradually learn SQL when they're more comfortable.
There is so much excel voodoo involved to do things with that software. I still believe that the learning curve of excel is no harder than the learning curve of doing the same exact thing in R or python, plus you'd end up having the data and the formulas in different places which brings loads of benefits (for instance, git). People are just familiar with excel because thats what they used to make a chart in science class in 7th grade since 1995, but they really could have learned to make the same chart in 7th grade with a language like python too, if it were only taught python instead of excel in school. And then we'd have a generation of workers fluent in a language like python rather than fluent in the very limited use case by comparison Excelese, and we would no doubt reap the benefits in our GDP. Its like we are limiting the knowledge of fire among our tribe when we don't really have to, it's perfectly learnable.
Return on investment, essentially a zero barrier to entry, and immediate gratification, and it's wildly flexible.
It takes about 10 minutes to learn the basics a spreadsheet, and what you get back is immense.
You couldn't get python up in running in 10 minutes, then you need to learn the language, the syntax, and how to structure a program, frankly I'm tired just starting to type out what needs to happen even before you can do *ANYTHING*.
Some people can get YEARS of productivity from those first minutes with a spreadsheet. The return on that initial 10 minute investment justifies spending more time to learn the more esoterica aspects of a spreadsheet, but even then, most people don't want to be bothered with learning how to "program".
Why do so many people start playing guitar/piano/drums/etc, but so few finish? Because it requires a significant up front investment, no immediate gratification, and a long, slow return on that significant up front investment, learning music is somewhat flexible, but you need to be highly skilled in order to exploit the flexibility.
Python is preinstalled on macs at least, that's a decent chunk of personal computers. You have to pay for excel. And you don't need to bother with virtual environments to fiddle with a flat file.
I think an important part of the battle for attention is that excel is visible as a GUI that inexperienced users can open from an icon and see and manipulate. A simple GUI utility with Python input in one pane and data input in the other might put them on more equal footing.
I was going to say that I have a small, very limited amount of experience with MS Access. I agree; the middle ground between Excel and databases sounds very similar to Access.
We had been using Excel for org charts like most places. We wanted to add grouping, metadata, and neatly be able to extend/add information with or without constraints. This was much before I knew anything database related, and Access seemed to be more powerful than Excel. Having generated forms to fit the data model was much more user friendly and a lot less error prone than adding a new row to an Excel sheet.
People criticize Access for being a dumb database or a Excel with too much heavy lifting. It occupies a specific space as a DB on rails.
I was mostly being glib, but I've had some bad experiences with Access over the years. I actually think lightweight database and scripting utilities in Excel could be good, but it would be susceptible to some of the design traps that Access steps on.
What are the biggest design traps that Access fell into? If someone were rebuilding spreadsheets with more structured design like Access, what should they avoid?
I think Airtable is starting to get us there, but it's a long way to go.
Something that combined Airtable, excel, and maybe a more userfriendly (and more restricted) version of darklang for defining formulae could be really slick.
Airtable is pretty, but lacks simple functionality that spreadsheets have had for 30 years (like aggregates across rows). Airtable is more of a replacement for Access than Excel.
Airtable is a great example of what I would like to see. Basically, the ability to create sheets that are are like airtable-like that users can reference as usual in other general-purpose excel sheets.
Excel has, IIRC, had data validation longer than it has had tables (even if you count the time before Excel 2007 when tables were called lists),and its had both (with the same inclusion) for >20 years.
I'm not sure about type enforcement, but there is the concept of "named ranges" you can apply to columns. So instead of C1:C99 you can refer to PRICES.
This variant (better known as the epsilon variant) is no longer even classifed as a variant of concern by the WHO [1]. It seems that was likely because it is being dominated by the delta variant.