Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The thing that established journals do better than arXiv is clarity

You can upload whatever trash methods you want, but a normal journal will have at least one guy who tells you to wipe your ass and make your bullshit presentable in public, if only because they're expected to be gatekeepers for a minimum standard of supposed reproducibility.



In practice I'm not seeing much of a difference. Maybe it is just being in ML, where if you wait for conferences you're far behind. If a paper is that shitty, it is usually very apparent. Like if a paper isn't in latex you know... I mean there's a lot of garbage in conferences and journals too, I just haven't found it to be a meaningful signal.

And it is still silly that people call it "peer review." Peer review is not 3-4 randos briefly glancing my work in an adversarial setting who say my work is not novel because it is the same as some unrelated work that they didn't read either; peer review is the grad student building on top of my work, peer review is lucidrains rewriting my work from scratch, it is Ross Wightman integrating it into timm and retraining, it is the forks that use my work for projects (hobby or professional). Peer review is peers looking and reviewing. More peer review happens on Twitter than these conferences. You can say these conferences and journals are a form of peer review, but we gotta stop saying that just because something is a preprint that it isn't peer reviewed. That's just incorrect. Peer review is when peers review.


Sure, peer review is peer review. It depends on your peers.

I'm a chemist, who has considerable established competition. So our reviwers know the systems and studied the method in grad school and sweated over it as a postdoc and had to innovate for cash in a professional capacity while doing something other than what we were doing.

They'd excoriate for vague methods, or poor explanations, or general nonsense. You had to explain everything and make sure to refence the Big boys you were close to but definitely not stealing from.

I think ML is unformed enough that there isn't that sort of public stricture as a random field in chemistry. Feel lucky. We don't even get to hide behind beautiful LaTeX, because there's too much benchwork for anybody's boss to ever give a fuck about it.


> You had to explain everything and make sure to refence the Big boys you were close to

That sounds problematic

> there isn't that sort of public stricture as a random field in chemistry. Feel lucky.

It's pure noise over here. I don't feel lucky, I feel frustrated. I get 3 reviewers and 3 completely different reasons to reject, often nonsensical (not joking, recently was rejected because a broken reference link to the appendix). Then submit to another conference and it is another 3 reviewers with 3 different reasons to reject that are completely different from the previous ones. Often asking me why I didn't cite arxiv papers that were released after the submission deadline. ACs don't care as long as reviewers agree because they gotta keep that acceptance rate low. It's playing the fucking lottery except your graduation and career depends on it despite people frequently saying conferences don't matter they'll still critique on lack of top tier publications.

Your system doesn't sound great and also sounds frustrating, but I'll trade you. Regardless, I don't have faith in either system to significantly determine if a work is good (I do believe they can identify bad works, just not good works).


> > You had to explain everything and make sure to refence the Big boys you were close to

> That sounds problematic

Bingo, it is problematic. It's part of how "the Big boys" get their academic and compensation (investment in their firms) rewards. You have to play the publish-or-perish, paper rank game to get ahead and stay ahead in academia, and this leads to all sorts of problems. Authors don't want to question "the Big boys" because that leads to their papers not getting published because the "the Big boys" and their bootlicker wannabes are the reviewers and they will exact their tribute. Make it to "the Big boys" club and now you're a gatekeeper and now you're also responsible for perpetuating this system.

It's why Nature-style peer review needs to become a thing of the past.

I'm not saying that "popular" (for a value of "popular" that involves peers at large, not the public at large) peer review is / will be without problems. But it seems to me that it will -at least for a while- be less corruptible.


Yeah, this is a hill I will die on. I love researching, but once I grab my PhD I do not plan to push to journals/conferences unless it is requested in a job. It just holds no meaning and I'm tired of pretending it does. Perpetuating the system harms my fellow researchers, kills innovation, and just kicks the growing can down the road.

The other hill I will die on is that we shouldn't refer to journal/conference publishing as "peer review." This is one form of peer review, but there are MANY more. And as far as I'm concerned, 3 randos that briefly skim my paper in an adversarial setting (zero sum) looking to reject works barely constitutes peer review. Peer review is what happens when your peers read your work, test it, build upon it, replicate it, etc. We need to stop this language because it helps no one.


> Peer review is what happens when your peers ...

I would add: when your peers' review commentary is public.


I love when this happens tbh, but it is rather rare. Often I can tell that a paper was rejected for dumb reasons and that gives me a good signal to actually read it.


Yeah, most of peer review is private

If journals posted first-submission comments for published papers I think there's be a radical change in comments. For good or ill, I have no idea.


> I get 3 reviewers and 3 completely different reasons to reject, often nonsensical

Could one problem with these kinds of reviews be that the reviewers have absolutely zero skin in the game? Like, you can write complete nonsense and nobody will blame you personally for it, you have no stake in the outcome of the submission/paper what so ever, and you're much more incentivized to come up with reasons to reject than to say "good enough". I realize the latter part can be good in theory, as it sets a high bar, but it also often feels like this goes awry when not kept in check somehow.


> Could one problem with these kinds of reviews be that the reviewers have absolutely zero skin in the game?

Actually, it is worse. They probably have skin in the game, and incentives to reject! Most CS fields publish through a conference system. This is a zero or one shot system (if you get a rebuttal phase, _if_ you want to count that). These conferences have acceptance rates that they "need to maintain" to keep their rankings. In other words, you are competing against all other papers being submitted to the same conference, not just papers with similar topics. Even if it is only a little, rejecting a paper actually increases the odds that your submission makes it through.

But yeah, you're in the right ball park. This is also why you will quite frequently see conference review guidelines blatantly violated and why you will see area chairs and metareviewers just not care. They all have incentives to encourage rejection, let alone be impartial.

I think the system works when the community is small and there is accountability among peers. Accountability creates a dampening effect on bad behavior. But at scale, you only need a few bad actors to setup a feedback system and to not just spoil the entire barrel, but the entire shipment.

Edit: I should also add that there's an additional negative incentive. You are not judged by how often you review, how many reviews you perform, or how good your review is (hard to measure). So reviewing ends up taking away time from the very limited time you have to do work that you are actually evaluated on. This is likely why reviews are so rushed. There's a feedback loop too, since many will see that others rush when reviewing their work so they get tired and end up rushing when reviewing the work of others. "If they aren't going to give my work their time, why should I have their work my time?" thinking grows.


>> I don't feel lucky, I feel frustrated. I get 3 reviewers and 3 completely different reasons to reject, often nonsensical (not joking, recently was rejected because a broken reference link to the appendix).

The people who review your papers in conferences and ask you why you didn't cite future arxiv papers are the same people who put their work on arxiv and cite each other's preprints. You can't rely on the process of "peer-review" on arxiv any more than you can rely on the conference peer-reviews because they're performed by the same people, and they're people who don't know what they're doing.

The sad truth is that the vast majority of the researchers in the machine learning community haven't got a clue what the hell they're doing, nor do they understand what anyone else is doing. The typical machine learning paper is poorly motivated, vaguely written, and makes no claims, nor presents any results, other than "our system beats some other systems". As to reproducibility, hell if we know whether any of that work is really reproducible. Everybody who references it ends up doing something completely different anyway and they just cite prior work as an excuse to avoid doing their job and properly motivating their work. The people who write those papers eventually get to be reviewers (by sheer luck), or sub-reviewers. They have no idea how to write a good paper, so they have no idea how to write a good review, either. And they couldn't recognise a good paper if it jumped up and bit them in the cojones.

I love to cite Geoff Hinton on this one:

  GH: One big challenge the community faces is that if you want to get a paper
  published in machine learning now it's got to have a table in it, with all
  these different data sets across the top, and all these different methods
  along the side, and your method has to look like the best one. If it doesn’t
  look like that, it’s hard to get published. I don't think that's encouraging
  people to think about radically new ideas.
  
  Now if you send in a paper that has a radically new idea, there's no chance
  in hell it will get accepted, because it's going to get some junior reviewer
  who doesn't understand it. Or it’s going to get a senior reviewer who's
  trying to review too many papers and doesn't understand it first time round
  and assumes it must be nonsense. Anything that makes the brain hurt is not
  going to get accepted. And I think that's really bad.
https://www.wired.com/story/googles-ai-guru-computers-think-...

So the problem is not arxiv or not arxiv, the problem is that peers in peer review lack expertise and knowledge and they can't do their job well.


As to your very last point, it isn’t my “job”. It’s yet another task that I take on for no recognition, nor additional pay - as is much of academic life.


Jobs come with a lot of shitty aspects. Don't get me wrong, I generally don't enjoy reviewing either. But I put a lot of work into it because regardless of what I think, this has a significant effect on real people and their entire livelihoods can depend on this task. Especially those in their early career. One or two publications in a top tier journal can land them that internship or job which snowballs.

So I'd ask you do one of two things, either:

- Review a work with the diligence and care that you wish someone would give to you

or

- Don't review

I'd also appreciate it if you openly recognized how stochastic the system is and that if/when you become in a position where you need to evaluate someone, that you remember this and take it into consideration. It has a lot of value to you too, since if the metric is extremely noisy it doesn't provide you value to heavily rely upon that metric. Look for others.


I do see it as my job, and my responsibility. I also see it as my job to help more junior colleagues, and even to teach what I've learned to undergraduates. I don't want to do any of those things. I don't even want to write papers. I just want to sit on my couch, code, and discover new knowledge that blows my mind.

But, people pay me to do a job. It's not in my contract in any clear terms, but to do my job well I need to do all those things; and I like doing my job well.


> The people who review your papers in conferences and ask you why you didn't cite future arxiv papers are the same people who put their work on arxiv and cite each other's preprints.

I don't have a problem citing arxiv works. I mean I have no faith in the official system, so why wouldn't I? But this is a clear indication of an impossible bar to pass and demonstrates the ridiculousness of the system. It isn't just that there's a malicious or dumb reviewer, it is that the other reviewers don't call them out, the AC doesn't call them out, and the metareviewer doesn't call them out. The best case was that broken link person and the AC made them update their response, but this was in the response to my 1 page rebuttal (for 4 reviewers) where they got to say whatever they wanted and I had no chance of response. (Their initial review was literally 2 lines) I don't know how anyone can see this and not think that the system absolutely failed at every level. You're probably unsurprised to hear that this is a common occurrence rather than uncommon.

> the vast majority of the researchers in the machine learning community haven't got a clue what the hell they're doing

I'm well aware. __I__ have no idea what I'm doing, but at least I can tell you the difference between probability and likelihood or that tuning on the test set is information leakage. Research often requires venturing into the unknown and unexplored. That's fine. I don't care if we're all stumbling around in the dark. I do care when people are not just unable to admit it, but unable to recognize this. But that's the classic "tell a lie enough times and you'll start to believe it" situation.

I am in full support of that Hinton quote. This is the first I've heard it (or recall at least), but I often say quite similar things (in fact, just did in another thread). I do mean it when I say that our current system harms us and I'm confident that we won't get to AGI with this system.

> the problem is that peers in peer review lack expertise and knowledge and they can't do their job well.

I won't disagree with this point, but I believe that this is a systematic problem rather than an individual. The system encourages this behavior rather than stamps it out. So in that sense, I think people are doing their job very well. It is just that I don't think their job actually aligns with the intent of the job. Classic case of irony, that the group of people that highly discusses alignment is one of the worst at this. But I guess we shouldn't be surprised given that lately we've seen how unethical we've seen people who write about ethics are.

I do want to add one thing though. A good paper is hard to recognize. A bad paper is easy, but a good paper may be indistinguishable from a bad paper. This is the "paradox" of research and something people need to take to heart. That is if we want to align our job descriptions with our actual jobs.


>> Research often requires venturing into the unknown and unexplored.

Oh, in that sense I'm clueless too. We all are. What I'm pointing out though is that many people in machine learning research don't really understand what's the point of research, or even what's the point of a research paper. They copy each other's writing style and produce the same kind of low-information-content paper.

There's been a huge influx of new people to machine learning research in recent ish years. I think it started around 2012 and the ImageNet results. I did my Master's in 2014-15 (part-time) and I was surprised by the number of people in the machine learning class I took, and how many of them had nothing to do with computer science or AI, and were instead coming from backgrounds in business or more rarely science. At least I'd expect the people with the science backgrounds to know how to write a paper, but it seems most people who come to machine learning from, say, physics or biology, bring with them a facility with calculus and continuous mathematics, but not much in terms of scientific methodology.

Hinton hints at that in the rest of his comment I didn't quote:

What we should be going for, particularly in the basic science conferences, is radically new ideas. Because we know a radically new idea in the long run is going to be much more influential than a tiny improvement. That's I think the main downside of the fact that we've got this inversion now, where you've got a few senior guys and a gazillion young guys.

Well I guess the "gazillion young guys" need a bit of time to figure out how to write good papers, and review them.

>> I do want to add one thing though. A good paper is hard to recognize. A bad paper is easy, but a good paper may be indistinguishable from a bad paper. This is the "paradox" of research and something people need to take to heart. That is if we want to align our job descriptions with our actual jobs.

That's true, and part of the reason why reviewing is hard work. I still find it very hard to reject a paper. What if I'm the one who doesn't get it?


> (...) make sure to refence the Big boys you were close to but definitely not stealing from

Much as giving credit is important, this is unfortunately also how you end up with intro sections littered with references to the same ten papers everyone's read already just to back up some extremely vague/general statement.


I hate this. Honestly, I'd love to see short papers. Just cite what is relevant and not much else. But I see citations exploding. Look at this paper[0], it is 34 pages in total and 5.5 of those are the bibliography! It has almost 100 references! ~14 pages are images (not figures... images). Nearly an entire page worth of material is just the citations! Not the text referencing the citations, but the citations themselves. It is absolute chaos.

No, this isn't a survey paper. So it is serving no purpose other than greasing their peers.

[0] https://arxiv.org/abs/2303.11435


I agree with the core of your message, and you are far from alone with this opinion.

The history of science is a sequence of profound platitudes, each time refining the prior ones. It is a part of awakening to pick the lowest hanging fruits first, just to get a foot in the door. While its not theoretically impossible to go straight from ignorance to -say- the postulates of quantum field theory, its incredibly unlikely to scale such a huge step at once, both for scientists at the frontier as well as each generation of students catching up. In this sense the profound platitudes are quasi necessary to build up an understanding of the world around us. Individually each outdated platitude may appear so very wrong when looking back with 20 / 20 hindsight, and most of these individual platitudes could plausibly have been skipped by scaling 2 or 3 steps at a time, but no one could scale all of them at once.

Literally respect and review mean the same thing. To view or observe anew (re-).

One may objectively claim that each step or correction of a prior platitude was an improvement of our worldview, even though in hindsight each step of correction also contributed the next erroneous stumbling block to be scaled.

Throughout history policy decisions (from household decisions, to court decisions, to budget decisions, to research direction decisions, etc...) humans sought ways to settle matters.

The desire for Finality is universal and justified. However the expectation that this universe came with a manual is not.

A core tenet of the scientific mindset is the recognition that this finality is a temporary illusion.

Selling this illusion of a recipe for finality is profitable and what journals and gatekeepers have gravitated toward, especially since information became dirt cheap to duplicate.


I think you missed my point.

I have no problem citing, and actively encourage it. I do have a problem with someone rejecting my work because I failed to cite a work which would have been impossible for me to cite. The key is that it was released (as a preprint) AFTER my paper. Not only is it in bad taste to make concurrent works a requisite (happy to update btw), but it is ludicrous to require that I have a time machine.

This isn't about respect and me complaining about pomp and circumstance. This is me complaining about the frequency that I have been criticized for not having a time machine.

The point is that a system that allows people to reject works due to lack of access to a functioning time machine is not a system we should support.


I think this is a simple mix-up ;)

Perhaps you confused me for xorbax, or you were still in the stream of consciousness of discussing with him: if you double-check which of your comments I replied to I think that you'll find it is not the one where you mention the demand for a citation that was simply non-existent.

It appears you believed my comment to be a reply to: https://news.ycombinator.com/item?id=36917506

while in fact I replied to: https://news.ycombinator.com/item?id=36917126

I would appreciate if you re-read your own comment at the last link, and observe the second paragraph you wrote about peer review, and then re-read my correctly placed comment. I think you'll find my comment to make more sense, and in my opinion ties in nicely with your second paragraph. I was not tryihg to summarize you, merely appending my gripes with the supposed finality and atomicity of the Journals redefinition of "peer review".

In reality the real peer review is in the back and forth discussions within the community, spread out over many papers, and sometimes even over generations.

As an oversimplified example consider relativistic mechanics correcting Newtonian mechanics: if we anachronistically pretend Newton published his Principia in some Respectable Journal under the current implementation of "peer review", then it would have been accepted with apparent Finality under peer review, even though it would eventually be repudiated by relativity generations later; this artificial example should make anyone pause about the insinuated Finality of "peer review", effectively forcing one to recognize that the current concept of artificially muzzled "peer review" through the arbitrary selection of the reviewers by journal editors does not constitute true peer review. I'm not saying it's useless to have an initial screening, but then it needs a better name, perhaps "Quality Control" or "Publication Criteria" or "Discipline Specific Methods and Publication Criteria" etc... Before the modern incantation where reviewing by peers is made artificially atomic and pseudo-final, anyone in the field could provoke questions about specific claims. The illusion of Finality works discouraging: where-as in the past you would feel compelled to respond to any criticism, the current concept of "peer review" implies "it already happened since its published" and nothing forces an author to come up with a rebuttal for a criticism. While this may help prop up appearances of the scientific field with an aura of Finality of conclusions made in its domain, its really damaging to the scientific process.

Back to the comment confusion:

I've seen this happen to others when a sub-thread gets a lot of attention, resulting in a high rate of replies, where one of the authors gets confused which exact comment was being replied to. I would be lying if I were to claim it never happened to me ;)


I think you're right. There's a lot of comments I've responded to and the alignment isn't always clear who's responding to what, so I use context to fill in the gap. My bad. You are totally right about your comment making more sense haha. (I'm known to be an idiot, so people should question if you really want me reviewing your papers lol) But as your last line is suggesting, I think this is just a human thing and when a comment gets a lot of attention and someone tries to respond to a lot of different people. Too many trains of thought to juggle around in a format that isn't super explicit. It's why I like my code to have colors and group chats to have colored bubbles for identification. I need the aid lol.

> In reality the real peer review is in the back and forth discussions within the community, spread out over many papers, and sometimes even over generations.

This is 100% what I believe and I'm surprised others don't. Like how arxiv exists because people were already sharing papers outside the walls of journals. For some reason people are acting like this sharing means these are drafts rather than the actual publications themselves. As if arxiv and what's published are different versions. Maybe a lot of non researchers are participating in these discussions and assuming things?

I'm sure you know this, but others might not. Newton is the perfect example here because many of the works at these times were "published" years or even decades after the original work was done. Which is the whole Leibniz and Newton debate thing. I'd also highly recommend the podcast "Opinionated History of Mathematics". It's hilarious and well done. By a mathematician but for the general nerdy public

https://intellectualmathematics.com/opinionated-history-of-m...


Eh, not really.

There are exceptions [1] but most journals don't expect the reviewers to even attempt to reproduce results, which makes sense given how specialized and expensive scientific experiments often are. As a reviewer on open code papers I would usually try to run the provided code, it didn't always work and that wasn't always addressed before publication. (I was also usually the only one who even tried.)

Usually peer review is more about making sure the work is novel and interesting, fits the journal's audience and doesn't have any glaring flaws. Not entirely unlike code review: if it builds, merge it, and we can address problems in a future PR. Those are basically the reviewer instructions you get from most journals IME.

[1] OrgSyn famously requires a reproduction from one of its editors lab before it accepts any paper,

http://www.orgsyn.org/about.aspx

It has a very high reputation amongst chemists, even if it's "impact rating" is low. High impact journals are not usually considered the most accurate.


I don't think you are really arguing against what the parent poster was saying. That is, I interpreted the parent commenter as saying that journals require that submissions at the very least be in a clear, understandable, "your paper must be at least verifiable (or falsifiable)" format, not that they actually attempt to reproduce the results.


(not OP) Verifiability/falsifiability are big words, mostly it is not clear what that means in a specific case. Crucially, that is not what journals/editors/reviewers do. They check if they find the contribution convincing, novel, and in line with the discipline's community standards, nothing more.


No, I think a half decent paper is expected to 3ither explain their methods or reference a paper that explains their method

You don't get to handwave away the instructions of your experiment in my mind. Maybe other fields are fine without that, but I would never write a paper that doesn't clearly explain how I made samples or reference a paper which does. To do otherwise is bad science.

It's not about a reviewer replicating it, it's about anybody replicating it in a year or 30 years.


> As a reviewer on open code papers I would usually try to run the provided code

You're only one of two people I've ever heard make this claim. Which I'm sure you're aware, but many people probably aren't. Fwiw, I'm often called diligent because I read the code (looking at main method and anything critical or suspicious. Might run if suspicious). Even reading supplementary materials will earn you that title (which is inane). According to this informal survey, ~45% of neurips read the supplementary material <13% of the time and less than a third always read it[0] (I'm in that third, and presumably xmcqdpt2).

> Usually peer review is more about making sure the work is novel and interesting

This is why I find p̶e̶e̶r̶ ̶r̶e̶v̶i̶e̶w̶[1] journal/conference reviewing highly problematic and why this system is at the root of our current existential crisis: the reproduction crisis. Reproduction is the cornerstone of science. And many MANY good works are not novel in the slightest. See the work of Ross Wightman (timm) or Phil Wang (lucidrains). These people are doing critical work in the area of ML but they aren't really going to get "published" for these efforts. Many others do similar work, but just not at the same scale and so you'll likely not hear of them, but they are still critical to the ecosystem.

But with your next point: if it builds, merge it; I'm all for. The system should be about checking technical soundness and accuracy, NOT about novelty and how interesting it is. Of course we shouldn't allow plagiarism (claiming works/ideas that aren't your own), but we should allow: replications, revisiting (e.g. old methods, current frameworks (see ResNet strikes back)), surveys, technical studies, and all that. Novelty is a sham. Almost all work is incremental and thus we get highly subjective criteria for passing the bar.

Which is probably why high impact journals are not considered the most accurate. Because they don't encourage science so much as they encourage paper milling, rushing, and good writing.

[0] https://twitter.com/sarahookr/status/1660250223745314819

[1] We need to stop calling journal/conference reviewing "peer reviewing." Peer review is when your peers review. Full stop. This can come in many forms. Similarly publishing is when you publish a paper. Many important works come through open publishing.


That could be an LLM prompt. Cleanliness and structure analysis checks for papers would be very useful tool.


> if only because they're expected to be gatekeepers for a minimum standard of supposed reproducibility.

There are decades of beta-amyloid papers that show that this is false. Like any for-profit entity, premium journals' primary purpose is to satiate shareholders - it is their feduciary responsibility to get away with as much as possible in the name of profit.

That's not to say that open access doesn't have equally as concerning issues: which of the two is better or worse is an extremely difficult call to make.


Just because someone wiped their ass doesn't mean they're not still full of shit.

It just means it's less stinky/noticable.

I think I did take this analogy too far though.


> make your bullshit presentable in public

Triple integrals and sigmas over sets, instead of clearly stated limits, and that are not analytically solvable, is not "presentable". Scientists do this stuff on purpose to make their papers harder to read.

> expected to be gatekeepers for a minimum standard of supposed reproducibility

They don't enforce this at all. Otherwise they would demand code and training data be released for ML papers.


> Scientists do this stuff on purpose to make their papers harder to read.

Not just that, but if your paper has math that the reviewer doesn't understand they're more likely to think the work is good and rigorous. It's not like they read it anyways.

> and training data be released for ML papers.

Other than checkpoints and hyper-parameters, what do you want? The wandb logs? I do try to encourage people to save all relevant training parameters in checkpoints (I personally do). This even includes seeds.


> what do you want?

Hyperparameters yes, but also the data used for training. I should be able to reproduce the checkpoint bit-for-bit by training from scratch. If their training process is not deterministic, also release the random seed used.


Oh yeah, that I agree. I'm kinda upset Google is frequently pushing papers with JFT and 30 different versions of it and making conclusions based on pre-training with it. This isn't really okay for publication. Plus it breaks double blind! I'd be okay if say CVPR enforced that they train on public datasets and can only add proprietary after acceptance (but you've seen my views on these venues anyways).

All ML training is non-deterministic. That's kinda the point. But yeah, people should include seeds AND random states. People forget the latter. I also don't know why people just don't throw args (including current iteration and important metrics) into their checkpoints. We share this frustration.


Being pseudorandom is often the point. That's very far from deliberately being nondeterministic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: