Hacker News new | past | comments | ask | show | jobs | submit login
Why is there no semantic ontology of sentiment in academic citations? (shkspr.mobi)
115 points by edent on July 10, 2022 | hide | past | favorite | 86 comments



> Perhaps we need an army of (paid?) dogsbodies to manually go through every paper ever published and assess the sentiment behind each citation?

This would be very difficult to do. In many fields, papers are written on the assumption that the reader is really quite familiar with the topic, so the potential recruits to this army would be thin on the ground. They are busy doing other things, doing research and writing papers of their own. Unless there is a new funding agency that would hire this army, I see no way to make progress on this. And the researchers in the field would be very unhappy to see their grants cut by a factor of 2, just to fund this army.

I like linkages as much as anybody else, but we use natural language in papers for a reason: the message is complicated. In my own field, I might read many dozens of papers before I found one that said another paper had been disproved, for example. And -- to raise another topic -- it's very hard to publish a paper saying that another result had been confirmed, since reviewers would reject it on the basis that we already knew the result.

Besides, in many fields, citations are provided just for completeness. "Previous studies have dealt with -TOPIC- (-CITATION_1-, -CITATION_2-, ...)" is quite a common construct in the natural sciences, for example. Exactly what information can be gleaned from this is uncertain to me, beyond "-CITATION_x- CITES -CITATION_1-", "-CITATION_x- CITES -CITATION_2-" and so forth. And this citation linkage is already available.


There's been something kind of like that in the field of law in the US for around 150 years.

When you were doing legal research on a topic and found a case or statute that you wanted to use you could look that up in a multi-volume book set called "Shepard's Citations". You would look up the case or statute in Shepard's (which people called "Shepardizing") and Shepard's would list almost all the other cases and statutes that cited the one you were interested in, and for each would tell you if you if the later case or statute followed yours or overruled it.

Shepard's is still around but now online, although the print edition is still around.

Shepard's deals with the problem of finding out what happened to a case or statute. Another problem in legal research which was also solved with multi-volume sets of books is finding cases and statutes of interest in the first place. There were books that organized all of the law into a large topic outline, assigning codes to each topic, and books that published cases and statutes with notes summarizing what topics were covered and what the case or statute said about them, and books that indexed those books by topic.

You could go into a law library to research a topic, use those indexes to find relevant cases, use the summaries to see if it is likely a case you need to look deeper into, and Shepardize it to figure out if it is still good.

It was essentially a database system, implemented entirely in books.


> Besides, in many fields, citations are provided just for completeness. "Previous studies have dealt with -TOPIC- (-CITATION_1-, -CITATION_2-, ...)" is quite a common construct in the natural sciences, for example. Exactly what information can be gleaned from this is uncertain to me, beyond "-CITATION_x- CITES -CITATION_1-", "-CITATION_x- CITES -CITATION_2-" and so forth. And this citation linkage is already available.

I published one paper before leaving academia. It was published in a top journal in the field and has about a dozen citations now (I get a notification email each time a new citation pops up). Every single one of them is this kind of citation-in-passing. Not a single one actually engaged with the content of the paper.

I have a couple other half-finished papers that I was working on before I quit, and sometimes I think about going back and polishing them up for publication, but given the reception my previous paper received it's hard to justify the time. It's not like I need papers for tenure now.


As an academic, I find that positive or negative sentiment isn't expressed in citations — it's expressed in the review process before the paper is published. Once a paper is published, academics assume that it's already been vetted and is good, and the citations rarely bother to express agreement or disagreement.

When I cite a paper, I don't include a sentence saying whether the paper is good; I say why it is good, because the author of that paper is likely to be reviewing my paper, and I need to acknowledge his strengths before differentiating my work, or he'll kill my paper in the review process.

My advisor taught me that the point of a citation is not to give your opinion on which other papers are good or bad; it's to tell the editors which other scientists should be on your paper's review committee. The way editors choose reviewers is to look through the citations, find people cited who they know, and then ask those people to review the paper.

Thus, the author's goal when citing other work is to (1) highlight potential reviewers who would see your work favorably, and (2) say something positive about them to put them into a mood to see your work favorably.

There is nary an incentive to express negative sentiment in citations.


While the parent is right in stressing the importance of the review phase, which is a filter to decide whether we will ever see a candidate paper or not, it is indeed the case, as the OP posits, that there is merit in extracting the context in which a citation is used, which has been termed "citation polarity".

For example, 'Chomsky (1969) was entirely wrong when he said that “it must be recognized that the notion of 'probability of a sentence' is an entirely useless one, under any known interpretation of this term".'

There are Natural Language Processing methods and tools to extract the citation polarity. Researchers like S. Teufel have used citation polarity and other methods to analyze publications ("rhetorical zoning") and to create maps of the published literature: https://www.cl.cam.ac.uk/~sht25/az.html

A scientific literature search engine (Semantic Scholar, CiteSeerX etc. - sadly Microsoft Academic has recently been discontinued) can benefit from such knowledge; for example, PageRank can be modified so as to incorporate citation polarity in the random walker model that underlies it. Think of it as adding a "dislike" button to a system that already offers a graph with "like" button relations.


The Microsoft Academic dataset has been picked up by OpenAlex.org


Citation rings are a huge problem, but couldn't the sentiment-analysis side be readily addressed? There's a difference between papers that just cite a paper without judgment (+0), papers whose results explicitly support another paper (+1) and papers that explicitly reject another paper (-1). Such an AI could also bypass "citation analysis" and instead ingest the entire scientific literature, and find refutations in papers that don't even cite each other.

In practice, 2 problems: 1) this requires sophisticated AI; and 2) many people will keep assuming that high citations = credible (i.e. assume that +0 and +1 are equivalent).

But this method will help researchers go beyond "it's highly-cited therefore I trust it"; it'll help students and rigorous scholars find how well-supported a paper is, despite its popularity. And it'll surface refutations that may be hard to otherwise find; for example when a scholar that's famous in a field, and unknown in another, decides to publish a paper in the latter field that may not be widely seen.

Ultimately (though this is far off: theoretically completely possible but highly technically difficult) we can also imagine AIs that ingest the whole literature, gains deep cross-domain knowledge, and are trained to detect poor methodologies, or automatically highlight new insights in one discipline that can enrich another discipline, or (ideally) make its own judgments about the merits of any finding based on the knowledge it acquired. After human review, it could systemically help "clean up" the scientific literature of bad methodologies and ideas, and free scientists to spend less time working on scientific dead ends.

That last paragraph won't be doable for a while, but the first 3 seem readily within reach of today's AI technologies (and seems to be what scite is doing).


I disagree that that is the sole (or even most important) point of a citation. Most of the time you can view a citation as part of the (extended, implicit) problem statement for the paper, e.g. here is what X did in this space, and we are different/better/build on that work by doing Y. If an important citation is missing, it shows you aren't familiar with what has been done before in relation to the problem you are trying to solve (bad). Of course reviewers and readers are free to disagree with your characterization of a cited paper in relation to your own.

Having said that, maybe if you publish in journals a lot this is more of an issue. In conferences the Program Committee is usually set in advance so there is less scope to target specific reviewers (other than those on the PC).


>My advisor taught me that the point of a citation is not to give your opinion on which other papers are good or bad; it's to tell the editors which other scientists should be on your paper's review committee

As a professional consumer of academic papers paying commercial rates for decades I appreciate your candor profoundly. Nevertheless what you say is a statement at least of disingenuity and in my opinion prima facie admission of the existence of a defrauded system.


That said, I have hope we can solve this problem outside the academic incentive system. Here's a relevant prototype developed by my research lab: https://braid.news/about


Thanks. I’ve wanted something similar to rank HN comments - I really want to read some users and I really don’t care to read the same tripe repeated by some other users.

I presume it is a hobby project but calling it PeeryView and then using braid.news was confusing to me. I didn’t even realise the “home” link would take me to the site (or maybe I didn’t see the home link, because I look to top left for that? e.g. Y logo on HN)


This is a very cynical and opportunistic way to do science. It may be the winning strategy to play the academic game and advance your career, though.


This would be interesting indeed. But it wouldn't be enough to really appreciate the paper's value. Often times a paper is cited because it is related 'prior work', even if it's crap. This can happen because a reviewer asks it. Sometimes the reviewer is the crap's author, for instance. Yes, many things are broken about bibliometry and scientific publishing in general.


Absolutely. Peer review causes peer pressure.

Ofc, it depends heavily on how big one's 'niche' is, but...


There already is a project trying to use NLP to detect the sentiment of scientific citations: https://scite.ai/


This seems to be very hard to pull off, though, given the way people regularly cite work in some scientific domains. For instance, unless specifically comparing to an approach, researchers in ML rarely endorse or disavow a specific approach (just my observation in graph learning).


Yes, using NLP (DL) techniques makes more sense than to require every author to annotate their work.


Running some superficial "sentiment analysis" neural net over scholarly work and basing any sort of analysis on that is a terrible idea for obvious reasons.

It is already a terrible idea for content moderation / sorting reviews / etc, and the stakes are lower.


Don't understand why this keeps being downvoted. NLP is still in its infancy when it comes to detecting nuances of human communication. These are valid objections.


Moreover, scientific citations are amongst the worst candidates for sentiment analysis since authors often express no sentiment about cited papers at all, and often prefer to use dry, neutral tones when describing a paper's shortcomings.

You might need both general intelligence and a deep knowledge of the field to conclude an author noting that X discovered Y on the assumption of Z is not a neutral description but a stark emphasis of their differences, or possibly even straw-manning the original paper.


the goal is ease of use though, not serious analysis


I agree, Natural Language Processing is the right tool and field for this. What is the purpose of language, if you as an author must add an extra layer of metadata to it to convey meaning. Or maybe academic authors should just bute the billet and start using smileys to convey sentiment. These would be easy to work with.


I found the title nearly incomprehensible, so here's a brief pull quote:

> Using Google Scholar (or any other knowledge graph) I can find just about any academic paper. More importantly, it lets me see every paper which references that paper.

> Great! I can see if something has been cited lots of times, or very few times. That gives me a weak signal about its "importance".

> But it tells me nothing about the sentiment of those citations.

> Suppose I've just read (Smith, 2015) and I want to know whether the consensus is that the paper is a work of genius or absolute horseshit. What are my options?


Read the paper; ask someone else who read the paper :-)

Your best bet would be to find a review paper that cites Smith (2015) and see how they interpret it and how Smith (2015)'s conclusions jibe with the rest of the review. A few review series even use little icons to indicate "important" and "crucial" papers. Even there, you've got to be careful that you haven't fallen into some weird fringe-y subfield though.

Raw bibliometrics aren't totally useless--I'd bet most bad work is ignored, rather than refuted--but I'm really skeptical that anyone can automatically put a valance onto most citations.


The nature of citations are often much more ambiguous, though, and assigning sentiment will often be unhelpful.

Background citations are often of the flavor of: "These guys did almost exactly the same thing as us, but due to the minor detail of wanting to get our own work published, we think their work is inferior."

Or, "This cited work had a great idea, but due to rush to publish they left out a bunch of details and took a few short cuts on the implementation. We've fixed some of these holes, and called the original authors dolts to elevate the importance of our own incremental improvements."


This is correct, however, it does not eliminate the value of citation polarity; citation polarity must be seen relative to the rhetorical zone type in which it occurs: the parent's examples are valid, but typically limited to zone type BACKGROUND ( https://www.cl.cam.ac.uk/~sht25/thesis/t.pdf ), where the related work is described, and often differentiated from one's own work.


Here's the latest research (including downloadable data if you want to play) about citation polarity: http://jurgens.people.si.umich.edu/citation-function/


Yah.... I think this kind of semantic ambiguity and complexity points to 90% of the information content of the citation being the existence of the citation. After all, I wouldn't bother calling someone a dolt if their paper was actually insignificant.


Simple relational networks benefit from transitivity: friends of friends are more likely to know each other. Signed sentiment networks do not: the enemy of my enemy is not always my friend. And things get even more complicated when you want to reason along paths where the edges have a deep taxonomy: if paper A disagrees with paper B, and paper B views C as foundational, can we say anything about A -> C?

This is a deep problem because real networks are rarely fully observed. Even if you can accurately map all citations, you are not observing all the opinion relations with which you would be labeling the edges. Most network analysis methods lean heavily on simple, transitive relations that diffuse information over paths, in part because it helps compensate for missing and noisy data.

Interestingly, taking signed networks and treating them as simple relational networks often yields useful predictive results. Hate is not the opposite of love! Squishing both into a "feels strongly about" can't answer all the same questions, but it can answer some interesting ones.


Sounds like the Citation Typing Ontology. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2903725/

DDG finds it's in use in at least one journal: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-0...


Came here to mention CiTO. See also https://sparontologies.github.io/cito/current/cito.html. It’s been around a long time, from when the Semantic Web was the next big thing.


When I was an undergrad first learning about Semantic Web, this idea was so obvious it was frustrating that it didn't already exist. It would make current citation seem primitive. Different fields would perhaps follow different citation ontologies, analogous to citation/bibliography formats today.

Imagine what it would do to literature review to draw from a 'tree of knowledge' of all the supporting texts of a work, or to see all the work that claims to derive particular meaning from some seminal work.

>Most academics can't even be bothered to write alt-text for their images, so any extra labour is likely to be resisted.

This is probably true, and it will ultimately mean authors sacrificing their agency over the semantic understanding of their writing; allowing e.g. Google Scholar to decide the meaning of their works' relationship to the literature.


Citations take a lot of time already as it is. Any extra metadata would take a ton of time and for what? Just so some data nerds that wouldn’t read the paper anyway can play with it?


My undergrad semweb zealot self would say that it would ultimately replace the need for a lot of verbiage that only really does the work that the semantic connection would.

Today, I would say that your writing is getting coded with significant additional meaning, do you want to do it or do you want someones AI to do it after you publish it?


Well in many fields, you will have to write that "unnecessary verbiage" anyway, and with good reason as humans are crap at reading metadata or switching between text and another document such as a citation tree. I definitely do not want to do it, and no matter what someones AI or algorithm will anyway analyze my work, not matter what I do.


It's worse. Most academics don't even make their citations links, even though this is probably the most obvious use case ever for links. Instead, we get PDFs that really want to pretend they're paper journals, and citation formats that try to just infodump enough for you to go find the paper yourself.


The issue with links is that they break. Infodumping, by not specifying the exact location, may be more robust.


Links make the paper harder to read, and often you will not be able to use the links to directly access the source anyway.


C'mon... It's not that hard. Link the text to the references section with the full citation, and include a link in the reference to look it up by the arxiv or by doi. If the link doesn't work, the citation should be enough to recover the text, assuming that the civilization hasn't collapsed.


I disagree. Links in the bibliography do not make the paper harder to read.

The rents from locking away research under foolhardy copyright must be removed, I agree on that point.

Zotero solves the first problem way better than Endnote, as an aside.


And only having biliography/endnotes is annoying. If I want to check sth, I do not want to scroll all the way to the end first, so footnotes ftw. And for those, having long extra things in them like links is annoying. Though admittedly of course this depends on personal preferences and discipline.


> Links make the paper harder to read

Do they? There is no law that mandates that links are blue and underlined.

> often you will not be able to use the links to directly access the source anyway.

This is true and I consider it a tragedy.


They make a whole footnote much longer. Even when not blue and underlined.


The link text doesn't have to be the whole URL.

<a href="http://example.com/very-long-id?garbage-amount=2KB">J. Scienceperson, 2022</a> or the equivalent in your document format will do.


What footnote? This seems very format-specific.


The problem is that there's more than one view on ontology, and the dominant view changes sometimes.

Look at biology, where the knowledge is expected to be based on directly observable, highly repeatable phenomena like animals or plants. The tree of life that orders the species, families, etc up to the realms at root is constantly reviewed and reordered close to leaves, and had seen a few changes close to the root.

Ontology, being a lot less strict than disciplines studying more direct observation, would have much trouble producing a universally agreed-upon tree of notions, and a lot of energy would be spent around contested branches.

I suppose the only possible outcome is an ontological forest, or shrubs, like what we have now, which is only locally helpful.


> Imagine what it would do to literature review to draw from a 'tree of knowledge' of all the supporting texts of a work, or to see all the work that claims to derive particular meaning from some seminal work.

You can visualize all of these things using Scholia https://scholia.toolforge.org . Built on free and open citation data, no need for Google Scholar or anything proprietary.


I've tried a few of these, kind of cool to play with, but for my work so far irrelevant.


Something similar to this already exists for citations in court decisions, where you can see if a citing decision has a positive, neutral or negative treatment. It’s not unfeasible.


How would you treat something like "They had the right idea, but their simulation was in 2d only, and we know now that this makes the simulation results garbage. They just happened to get lucky and got garbage that matches the real thing in 3d" with that?


Such papers would probably make it explicit that a given methodology was inadequate (and explain why) so while it may be challenging, it's still a solvable NLP problem.


Negative


The thing you are missing here is that citations are often intentionally ambiguous as to whether they are neutral or negative.

Or, phrased differently, there is essentially no possible benefit for a researcher to say "this piece of related work is junk". Let the readers decide or infer instead.


They wouldn’t say it like that, but there are plenty of citations that easily fall in the bucket of negative treatment, at least in biochem, health science, environment and ecology... What you describe is a neutral treatment, and is probably the most prevalent type of citation.


I think that's important to consider. It's kinda presumptuous to explicitly say someone is wrong. Safer to let the data speak for itself.


It's more nuanced than that: it may not even be obvious if someone is actually wrong.

Most citations discuss related work, rather than direct replications; the latter are incredibly rare in most fields. As a result, the authors are piecing together a theory using data from many groups, each using different species, different techniques, and different conditions. Some of these might reflect legitimate differences between perfectly conducted, reasonably interpreted experiments.


This is probably something that varies by the field, but I don't think most citations can be meaningfully placed on a positive / neutral / negative axis. When I cite something, it usually falls into one or more of the following categories:

- background reading

- related work

- source for a specific claim / result

- credit for software / data / other resource used in this work.

I don't evaluate the work when I cite it. Instead, I try to provide useful information to the reader and to give credit when academic courtesy requires it.


That’s a neutral treatment then. (I’m a biochemist, and I see no reason why a similar system couldn’t apply to my field.)


That sounds interesting. Do you have a link? Or more details? Ta.


Legal citations have specific ‘introductory signals’ to specify the way in which a case is being cited. Eg ‘see’ or ‘contra’ or ‘but cf’.

Found a summary of one of the standards here: https://tarlton.law.utexas.edu/bluebook-legal-citation/intro...


It's what LexisNexis / WestLaw have built their foundational businesses on - lots of $$$$. The new tech-infused player in this area is Casetext (YC13) which moved from a centralized employer-based sentiment model to a more social/open model.

edit: feel free to take a look at my oldest submissions.


In Anthony Grafton's The Footnote: A Curious History, he claims that citations practices in the field of history have the tendency to use "cf." -- which normally means something like "compare with this reference" -- to mean "compare with this reference (which is actually wrong and stupid)".

So you have to watch out for the possibility of ironic or sarcastic references, which may vary from field to field.


> Perhaps we need an army of (paid?) dogsbodies to manually go through every paper ever published and assess the sentiment behind each citation?

This is similar to what SciFinder, a database created by the Chemical Abstracts Service (CAS) does for every paper. They have a substantial team that parses through every journal of significance where chemical research might be published. They are searching for chemical structures and substances and encoding each structure within their proprietary database. That database offers a concrete value to researchers at companies, universities, and law firms, whether looking for prior art or simply the most reliable methods of synthesis. Hence that product is greatly valued by those who use it and they can pay for people to do the complex task of indexing those papers.

While the idea of determining the sentiment of references is interesting, I'm skeptical whether there is nearly the same level of potential commercial value in creating such a database. It would be useful for reviewing grant applications, but those reviewers tend to be experts who already know if a paper received negative attention in the literature or even just on Twitter.


There aren’t meaningful disagreements in many disciplines as those who disagree and are smart are selected out of a field.

There’s too much academic labor coming into any field so there’s always some useful idiot rather than a brilliant opponent to recruit.

They’re not going to structure their work to make it clear since obfuscation is more useful to the status quo.


>There aren’t meaningful disagreements in many disciplines as those who disagree and are smart are selected out of a field.

This trope needs to die. There are major competing theories in pretty much every academic discipline. The claim is so ignorant and baseless that I think less of this site every time I see it come up. It is the HN version of a shitpost.

We get it, you don't like academia. To be honest, there's a lot to dislike. But please, for the love of god, stop writing this low-effort nonsense on a site that is meant to have users who actually use their brains.


Yes there are competing theories, but these disagreements are only of degree not kind.

I said no meaningful disagreements. I lived through this, and there are probably more than a few post-academics around here who know what I'm talking about. We tried and discovered for all their prattling about being anti-status quo and counter-whatever, they are really quite interested in suppressing ideas, and are stuck in the past.

In the humanities you're stuck with some flavor of critical theory. If you're not on board with that at a fundamental level, no tenure, no advisor, no papers; the end.

In the sciences, physics for instance. Don't like string theory? Did you want to be a non theoretical physicist? Sorry, no room for competition around here for the inane bullshit that can't even be wrong. No grant money. No experiments either. Those might prove it all wrong.

There's no room for change or growth in the modern academy because it's been captured by hucksters and charlatans who are at least cunning enough to keep out the competition.


Since you haven't defined what "meaningful" is in this context or why it matters, this is impossible to engage with. It's a complaint that is intentionally vague so it can't be contradicted. No thanks.


This exists.

Welcome to Scite.

https://scite.ai/


Legal writing has something like this.

If you look at the - sometimes dreaded - Bluebook, "signals" play a large role in indicating the relevance of the cited source to the argument.

See generally Peter W. Martin, Introduction to Basic Legal Citation: § 6-300 Signals, Legal Information Institute (last visited July 10, 2022), https://www.law.cornell.edu/citation/6-300.


The legal example is actually maybe even more interesting than you may think. The big traditional databases have always employed human reviewers to identify cases that overrule or question other cases. Attempts to automate this process by e.g. Casetext have had somewhat limited success, despite the fact that this is a restricted context with formal language. It’s getting better, but whether it will ever be good enough is still an open question.

See https://en.m.wikipedia.org/wiki/Shepard%27s_Citations

(By the way, signals are not much used in opinion writing; that aspect of the Bluebook is geared more towards law reviews. Approval or disapproval of prior authority is in the opinion text itself, as this is a formal activity of the court. Notation as to whether a given cited case approves or disapproves another case would be in a parenthetical after the citation, e.g. “Smith v. Jones 123 Foo2d 456 (1999) (overruling Jones v. Smith 12 Foo2d 345 (1989)).”)


Tangent:

> Most academics can't even be bothered to write alt-text for their images, so any extra labour is likely to be resisted.

Isn't this the role of the figure caption? At least, every paper I've ever written has fully captioned figures and duplicating that in "alt-text" seems completely pointless.


In HCI and SE there is very little incentive and it is way too much effort to disagree with prior research. Most work isn’t replicable. The fields value novelty.

Why would I spend countless hours replicating someone else’s human-subjects study when I could pursue something new?


Author may want to look at scite.ai

It does something similar to what they propose.


I think if you did parse the sentiment of citations you would get something quite bland, like 90% are very neutral (i.e. the effect of X on Y has been investigated in A, B, and C). The cynic in me would say that this is because a lot of these type of citations are a defence against getting A, B, or C as a reviewer, or even worse the citation was inserted on review to make A, B or C happy because they were the reviewer and went shamelessly fishing for citations.


https://retractionwatch.com/2014/11/17/fake-citations-plague...

Faking Google Scholar citation counts is relevant here. Absolutely insane how easy it is to "game" this system.


This paper [1] aims for a related goal: automatically parsing the argumentative function of citations ("citation frames").

[1] Jurgens, D., Kumar, S., Hoover, R., McFarland, D., & Jurafsky, D. (2018). Measuring the evolution of a scientific field through citation frames. Transactions of the Association for Computational Linguistics, 6, 391-406.


The same arguments apply to HN comments. Up/down could profitably be replaced by a spectrum of adjectives mapped to floats, e.g. { worst: -1, best: 1, meh: 0, ok: 0.5 }, etc. ad infinitum. It would give us a nuanced way to express sentiments while retaining the value of aggregation.


I have been wondering more generally, why there is no social site to attach metadata to research papers. That way, such work could be crowd-sourced.

Especially interesting metadata would be the definitions used in the paper, as sell as the philosophy of science on which the paper is based on.


In science, this is a second order effect compared to the problem that negative results and failed experiments are never reported, especially in the case of the physical sciences. “If my experiment doesn’t work, you get to repeat it.”


Because those are big words with big concepts behind them. All of which require an educations and an individual with drive to breach a new topic, study it and bring home data. Which is a rare thing.


Why can't well funded researchers just assign a post-doc to go through all the papers and analyze their sentiments? Doesn't seem so difficult logistically or technically.


How do you control for bias in the post-doc? What exactly is "sentiment"? Is it a qualitative description, or something more analytical? Why does a future researcher care about the sentiment instead of the content itself? Or is this to better help guide other research directions?


As it becomes more common you could pool together the results of the sentimental analysis and try to find common agreement among dozens or hundreds. Presumably that would be close to an unbiased look.

It would be helpful in the ways expressed in the article, i.e. to allow current and future researchers to see if highly cited papers are actually expressing near complete agreement or near complete disagreement, or a mixed viewpoint, at a quick glance.


We just call this "literature review", and everyone does it at times regardless of seniority level. But the aim is not a sentiment graph because it doesn't really matter whether author A said paper B was good or bad. What would we even use that information for? We instead want a comprehensive list of papers that are relevant to what we're trying to do right now, with a note on each saying why it is relevant.


Seems like a good project for an undergrad who is just getting started in research, actually.


"Why is there no semantic ontology of..."

bc humans can't be bothered to define, structure, and the populate ontologies with real world data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: