Related, I feel we need to figure out a way to systematize information in scientific papers to make mining them for cross-cutting insights possible. I suspect there are lots of discoveries hiding in plain sight, if one knew which particular papers from which disciplines touch on the same underlying phenomenon or concept.
This is interesting problem, unfortunately it's really, really hard.
I'm most familiar with mathematics, so I'll use it as an example, but this is not limited to mathematics.
If you take any new paper on research mathematics, in a hot field like algebraic geometry or partial differential equations, then unless you're an expert in that field, it will almost always be literally impossible for you to understand -- not simply hard to follow the arguments, but simply impossible to understand even what it's about. Look, I just grabbed random example from recent posts on arXiv[1]: try reading an abstract and explaining it back to me. For 99.99% English speakers, this will be indistinguishable from random gibberish in a paper written by recurrent neural network trained on arXiv papers.
However, 0.01% of people will understand something, and for probably 1% of these, the abstract will make perfect sense. However, if you ask these people to explain it, you'll either spend an hour or two on getting some very superficial understanding of what's at stake here, which won't be very useful to you -- you still won't be able to actually read and follow the paper, and use the insights for your own purposes. Alternatively, and if you're intelligent enough, they can spend a year or two teaching you required background. Then you can see the insight for yourself.
The problem here is that you need literally years of background studies to appreciate the insight. There likely is no quick and easy way around it, otherwise some of the extremely smart people involved would already have had figured it out -- assuming otherwise is hubris. This doesn't mean that the system cannot be improved upon: there's tons of ways to make things simpler, clearer, more digestible. However, you'll still be left with hard problems of hard things being hard.
OK, let's check your numbers. There are about 1.5 billion people who speak English, and you are telling us that only .0001 * 1.5 billion = 150,000 of them will get anything out of the abstract? And that only .01 * 150,000 = 1,500 of those will find it to make perfect sense? That doesn't seem right.
Abstract. Using elliptically fibered Kummer surfaces of Picard rank 17, we construct an explicit model for a three-parameter bielliptic plane genus-three curve whose associated Prym variety is two-isogenous to the Jacobian variety of the general three-parameter hyperelliptic genus-two curve in Rosenhain normal form. Our model provides explicit expressions for all coefficients in terms of modular forms.
Oh. Oh my. Checks list of Sokal Squared spoof papers, nope. "Indistinguishable from random gibberish in a paper written by a recurrent neural network" it is then. Rather than being conservative, now I see that you were wildly optimistic in your estimate. There can't possibly be over 1,000 people in the world to whom this would make perfect sense, can there?
IMO it can be both "hard" and "really, really hard", depending on paper's difficulty. Some papers are much easier to read - e.g. those describing experimental results. Often, being the 0.01% who understand something is enough - it's one thing to read a paper, understand the basic reasoning and trust the conclusion / proposed method; it's another and more difficult thing to fully comprehend the paper and verify its correctness. Often you don't need that full comprehension to make progress and put the paper to good use.
> The problem here is that you need literally years of background studies to appreciate the insight. There likely is no quick and easy way around it, otherwise some of the extremely smart people involved would already have had figured it out -- assuming otherwise is hubris.
That's fair; something of a scientific version of efficient market hypothesis, I guess. If following the bleeding edge of a scientific domain didn't require years of background studies, it would be easy, so scientists would quickly zoom through it, until the going got difficult again.
> This doesn't mean that the system cannot be improved upon: there's tons of ways to make things simpler, clearer, more digestible. However, you'll still be left with hard problems of hard things being hard.
Yeah, I was just thinking about ways to tackle the things that can be made "simpler, cleaner, more digestible". I don't deny that there are fundamentally hard problems we have to face directly, but right now, those problems are wrapped in a lot of irrelevant cruft that makes them bigger than they really are.
> try reading an abstract and explaining it back to me
Challenge accepted, though I know the result kind of reinforces your point. But here's what I understood from that abstract:
> Using elliptically fibered Kummer surfaces of Picard rank 17, we construct an explicit model for a three-parameter bielliptic plane genus-three curve whose associated Prym variety is two-isogenous to the Jacobian variety of the general three-parameter hyperelliptic genus-two curve in Rosenhain normal form. Our model provides explicit expressions for all coefficients in terms of modular forms
We took a particular weird abstract shape with interesting properties, and used it to describe a particular different weird abstract shape, whose properties are important to us. Abstract shapes can be written down as maths, and depending on the way you write it, they can have properties exposed directly as "knobs" to tweak - e.g. "circle of radius r" has a radius exposed directly, whereas "circle that fits in that place" hasn't. In this paper, our description of the weird abstract shape has its important knobs exposed.
This is interesting problem, unfortunately it's really, really hard.
I'm most familiar with mathematics, so I'll use it as an example, but this is not limited to mathematics.
If you take any new paper on research mathematics, in a hot field like algebraic geometry or partial differential equations, then unless you're an expert in that field, it will almost always be literally impossible for you to understand -- not simply hard to follow the arguments, but simply impossible to understand even what it's about. Look, I just grabbed random example from recent posts on arXiv[1]: try reading an abstract and explaining it back to me. For 99.99% English speakers, this will be indistinguishable from random gibberish in a paper written by recurrent neural network trained on arXiv papers.
However, 0.01% of people will understand something, and for probably 1% of these, the abstract will make perfect sense. However, if you ask these people to explain it, you'll either spend an hour or two on getting some very superficial understanding of what's at stake here, which won't be very useful to you -- you still won't be able to actually read and follow the paper, and use the insights for your own purposes. Alternatively, and if you're intelligent enough, they can spend a year or two teaching you required background. Then you can see the insight for yourself.
The problem here is that you need literally years of background studies to appreciate the insight. There likely is no quick and easy way around it, otherwise some of the extremely smart people involved would already have had figured it out -- assuming otherwise is hubris. This doesn't mean that the system cannot be improved upon: there's tons of ways to make things simpler, clearer, more digestible. However, you'll still be left with hard problems of hard things being hard.
[1] - https://arxiv.org/pdf/1901.09846.pdf