I think the article doesn’t provide enough arguments why this is even an issue.
For example what is the probability of such character being rendered incorectly in some standart tex? lets say a wikipedia article.
Even more so the argumet that people don’t report this because they are "not speakers of English!” is just an assumption. Not to mention that translation applications are more than good enough for such a task.
>Even more so the argumet that people don’t report this because they are "not speakers of English!” is just an assumption. Not to mention that translation applications are more than good enough for such a task.
Frankly, people have learned helplessness[0] about these oddities and don't think to report them when they see them, so the inference that something isn't serious just because it's not pointed out is weak.
In the first place, the proportion of software users who raise issues on GitHub/other is small, and when devs are a group of people who communicate in characters that are not used in their daily life, the translation apps they have at hand is not very encouraging.
The issue is that the language is not accurately represented – imagine in English instead of the Latin letter "a" you see the Greek letter "α". It's still legible but it's not unreasonable to ask for an accurate depiction of a language.
The letter “a” has a frequency of ~8%, and it would indead be anoying, but lets say the letter “q” is rendered incorrectly which has a frequency of ~0.1% then thats just some minor issue.
While in a practical sense your argument might hold water (which I doubt, to be honest), this is not just a practical matter. This is also a matter of respect. If you think that mis-representing q is not a problem because it appears so less, then do you also think that disregarding the religious tenets of minorities is fine since they make up such a small part of the population?
I don't know why we shouldn't strive to have computers to reproduce language precisely, since computers are how we communicate most of the time and how the majority of content ends up being preserved. After all, maybe we shouldn't bother with spelling in English either, since everyone can safely approximate the meaning anyway!
I don't know what model this specific application is based on, but you can get something similar wit StableDiffusion + ControlNet ie `control_v11p_sd15_scribble` in the ControlNet github repo.
When you have a sub 1% chance of getting an interview, asking for a cover letter or some kind of non trivial time comitment is just wrong.
IMHO a better approach would be to select lets say 10% of the candidates and ask only them.
If an applicant can’t take the time to write a cover letter — and by “cover letter” I just mean an email that’s says here’s who I am and here’s what I like/do — why should the person hiring take the time to read what the applicant has copied and pasted to countless other employers?
> and by “cover letter” I just mean an email that’s says here’s who I am and here’s what I like/do
This makes it sound like it would be something that would be "copied and pasted to countless other employers", just like a resume often is.
But your last point made it sound like the applicant should invest more of their time than that.
Conventionally, a good cover letter will be, at least to a degree, bespoke for that job. But that is a non-trivial time investment on the part of the applicant. And it can be a significant waste of time in cases where something on their resume disqualifies them from consideration regardless of their cover letter.
Also, in the last several years, the impact of a good cover letter on getting an employment offer has diminished tremendously to the point that expect it of applicants isn't not respectful of their time.
There's a half-way between full-bespoke and ready-to-wear ChatGPT boilerplate, you can assemble a cover letter to order from reusable snippets, then add something specific to the company and job opportunity, assuming it's one worth the extra effort.
> Also, in the last several years, the impact of a good cover letter on getting an employment offer has diminished tremendously to the point that expect it of applicants isn't not respectful of their time.
Do you have anything to substantiate this claim?
I'm an employer and I read and respond to dozens of applications when I have an open position.
Usually when I receive an excellent application, the cover letter is roughly a paragraph in length. It isn't some lengthy, typeset file attachment with a mission statement or some other nonsense that would typically appear if you Googled "example cover letter".
My thoughts on ideal job applications closely mirror those described here[0].
> Do you have anything to substantiate this claim?
No, it's my personal experience.
It's an observation supported by people I know who have gotten a similar sense at their jobs. One director level friend, with insight into how they do hiring, explained that it has a lot to do with how HR filter things. (As someone else said in this thread.)
Some HR departments will sanitize information to reduce bias, like a photo of the candidate, but this can include a cover letter.
It's going to be different with different employers. So you could be an exception. But the likelihood that anybody reads my cover letters these days is so low that writing them is almost never worth the time. I've also found it to be more difficult these days because the quality of job postings themselves have diminished. A lot of the complaints that OPs article makes can be said about the job postings. There's just nothing to latch on to.
But yeah many/most places don't read cover letters. Some don't even read resumes. It's very clear from the start of the call you can see them start to skim through it or just ask you to just about read it for them. I even had a call where one of the interviewers had someone eles' resume up and referred to it by mistake and apologized and then did it again five minutes later.
So, no, I have nothing to substantiate the claim.
Edit: Okay here's something you might consider "substantial".
Apparently it's a thing now to not even send applicants rejection notices -- and just to ghost them instead -- on the basis that they might be a wacko and react poorly to rejection. So, as a policy, companies will inflict on all candidates, including nice people, the ill of ghosting them, to spare themselves the risk of receiving a shitty email from a shitty person. They are so thin-skinned/careless that they can't tolerate bad behavior in the form of an email that they don't even let you know they're not interested in hiring you. Even if you started the "hiring pipeline". Even if you apply with a cover letter.
People on this website who apparently work for companies that do this admit that's what their company does. So if companies do do this and admit it then maybe that's substance for you.
In my opinion, the kind of psycho who can't send a rejection email is totally consistent with everything I've experienced above the edit.
I can't really argue against that, and I certainly agree that it's going to be different with different employers.
Genuinely, I'm sorry you've had shitty experiences in the past. I've had a whole bunch too. I will say however that in the times in my career when I did apply for jobs, most of my successful applications [I believe] came as a result of a short, to-the-point, personalised cover letter.
I 100% agree. In the past, I've written good cover letters and they've been crucial to getting job offers. Hiring/tech culture has changed where cover letters are largely meaningless. And wish that weren't the case.
And I don't think writing a cover letter is going to harm your chances at a job. Or if it does, you might not want to work for a company that penalizes them anyway. But the attitude of "well I don't want to work for a company that doesn't read my cover letter" isn't really viable as an applicant as it rules out a lot of opportunities for work.
The whole situation is so silly too because a nice cover letter can make rejection emails nicer, like "we enjoyed your cover letter and your joke about the thing, but ...". I'd like to think that makes it easier for employers to send rejection emails because they can say something that sounds genuinely positive and personal instead of cookie cutter fake encouragement; "I'm sure you'll land on your feet!" So it makes their job better and probably would help avoid getting hate mail from people acting poorly. It's a win all around. But it's just a race to the bottom at this point.
Anyway sorry for the rant. Thanks for the replies. I agree with you 100% and wish cover letters were more meaningful but for every one of you that reads them there are nine who don't. So I'd just ask, don't be surprised if people don't write cover letters anymore.
I mean half these companies, especially the "always hiring" ones aren't even trying to fill positions, they're just signaling growth to investors. Sending anything to them is a waste of time.
As an anecdote (since we're in that territory here), we (major financial firm) were sent only resumes by the HR team to filter through and select candidates for interview. Going through my circle, it's the same for guys working in any company that's bigger than 50 employees.
While the topic is intriguing, I dislike the use of "public services" for this type of research.
For instance, adding substances to a water reservoir to study their effects is unacceptable, without permission or supervision.
Similarly, conducting such research without Wikipedia's permission/supervision should not be accepted.
Someone tried something similar but with higher risk: inserting security backdoors into the Linux kernel. They were caught and (AFAIK) their entire school was permabanned from sending pull requests.
This was also my thought. Search for hypocrite commits, and a link to an lwn article: https://lwn.net/Articles/853717/ . They did ban their whole school
I'm of quite the opposite opinion. Within reason (importantly), I believe any public service, which is also managed by an anonymous, decentralized community, ought to be under test constantly and by anyone. What's the alternative, really?
Imagine if it was taboo to independently test the integrity of bitcoin for example.
The sibling mentioned the linux kernel case. I admit that one felt wrong. It was a legitimate waste of contributor time and energy, with the potential to open real security holes.
I don't pretend to have reconciled why one seems right to me and the other wrong.
> Imagine if it was taboo to independently test the integrity of bitcoin for example.
> The sibling mentioned the linux kernel case. I admit that one felt wrong.
> I don't pretend to have reconciled why one seems right to me and the other wrong.
The "how" is what matters here, not just the "what". "Testing the integrity of Bitcoin" by breaking the hash on your own machine (and publishing the results, or not) is one thing. "Testing" it by sending transactions that might drain someone else's wallet is quite another. Similarly with Linux, hacking it on your own machine and publishing the result is one thing. Introducing a potential security hole on others' machines is another. Similarly with water: messing with your own drinking water is one thing. Messing with someone else's water is quite another.
> Similarly with Linux, hacking it on your own machine and publishing the result is one thing. Introducing a potential security hole on others' machines is another.
Playing devils advocate for a moment. How else do you test the robustness of the human process to prevent bad actors? Don’t you need someone to attempt to introduce a security hole to know that you are robust to this kind of attack?
You do it w/ a buy-in, e.g. permission from some of the maintainers - so they are aware. If you do not get permission, you do nothing. It's similar to penetration testing/
Interestingly, while I 100% agree with you regarding the parent's question about security holes, I'm actually not sure how an experiment like the one on Wikipedia could be performed even with proper buy-in from all the owning entities (Wikimedia Foundation?) Is it even in principle possible to test this ethically without risking misleading the users (the public)? If not, does that mean it's better if nobody researches it at all? The best I can think of is by making edits that as harmless as possible, but their very inconsequentiality would make them inherently less likely for them to be removed. Any thoughts?
The usual answer is the chain of trust. However, that might be against the wikipedia principles. There is "importance scale" for articles, for anything considered C+ class important, editing becomes similar to pull request, or the page has a warning of having unverified info.
It's a hard problem having fully editable storage by anyone, while maintaining integrity.
You sift through the edit log to find edits correcting factual errors.
Then you find the edit where the error was introduced.
You can probably let an LLM do the first pass to identify likely candidates. With maybe 20 hours of work you could probably identify hundreds of factual errors. (Number is drawn from a hat.)
Excellent point. That's more difficult but I think the ethical way to do it would be to recruit subject matter experts to fact check articles across a variety of disciplines. Bonus, you can then contribute corrections.
In general what I'm saying is, this is a fertile ground for natural experiments. We don't need to manufacture factual errors in Wikipedia. They occur naturally.
I mean, you're asking for a retrospective study, as opposed to a randomized controlled trial. It's useful and a great idea, but it's not like it's an equivalent way of getting equal quality data.
But is the goal to conduct a randomized controlled trial, or to measure the correction rate within the bounds of ethics? You go to war with the army you have.
Well the goal is to measure the correction rate within the bounds of ethics, but the question is how accurate the result would be without an RCT. Intuitively I would hope it's accurate, but how would you know without an experiment actually doing it? How do you know there aren't confounding factors greatly skewing the result?
If you'll grant that we're also able to replicate the study many times, we're left with errors that are not caught by Wikipedians or independent teams of experts. At that point I think we're looking at errors that have been written into history - the kind of error that originates in primary sources and can't be identified through fact checking. We could maybe estimate the size of that set by identifying widely-accepted misconceptions that were later overturned, but then we're back to my first suggestion and your objection to it.
But more importantly we probably won't catch that sort of error by introducing fabrications, either. Fabrications might replicate a class of error we're interested in, but if we just throw it onto Wikipedia, it's not going to be a longstanding misunderstanding which is immune to fact checking (at least without giving it a lot of time to develop into a citogenesis event, but that's exactly the kind of externality we're trying to avoid).
(Of course, "how many times do we need to replicate it?" remains unanswered. I think maybe after we have several replications and have data on false negatives by our teams of experts, we could come up with an estimate.)
> Playing devils advocate for a moment. How else do you test the robustness of the human process to prevent bad actors? Don’t you need someone to attempt to introduce a security hole to know that you are robust to this kind of attack?
How do you test that the White House perimeters are secure, or that the president is adequately protected by the Secret Service?
I think the key difference is supervision, is there another party keeping an eye on what is tested and how. And maybe insuring no permanent damage is done at the end.
That's frankly one of the first thoughts that came to my mind.
I've asked the author about ethical review and processes on the Fediverse.
That said, both Wikipedia and the Linux kernel (mentioned in another response to this subthread) should anticipate and defend against either research-based or purely malicious attacks.
If it's a mature product, you should be able to pick it up and rattle it without it breaking. If it's still maturing, then maybe the odd shock here and there will prepare it for maturity?
It's true that the system must be tolerant to these sorts of faults, but that doesn't mean we have a right to stress it. The margin for error is not infinite, and by consuming some of it we increase the likelihood of errors going undetected for longer.
Sometimes it will be worth it anyway, and I don't have an opinion about this Wikipedia example, but I think it's pretty uncontroversial that the Linux example was out of line.
I think one would have to weigh the pros and cons of this kind of research. In particular, the main cons (IMO) are:
* users are misled about facts
* trust is lost in Wikipedia
* other users/organizations use this as a blueprint to insert false information
Harm 3 seems to be the most serious, but I suspect it has happened/will happen irrespective of this research.
As opposed to the water reservoir example, these harms seem quite small by contrast. I would have liked to see a section discussing this in the blog post, but perhaps that's included in the original paper.
Everything was reverted with 48 hours, your arguments might all apply theoretically but given scope, size, practice and handling, I wonder - apart from the theory - what your opinion is how they practically apply for this case.
I didn't make it very clear, but I agree that the specific example isn't problematic. The false claims weren't meant to be any sort of targeted disinformation, and like you mention they reverted it in 48 hours.
PASSIVE VOICE: Thomas Jefferson’s support of the new Constitution was documented in a letter to James Madison.
ACTIVE VOICE: Thomas Jefferson documented his support of the new Constitution in a letter to James Madison.
IMO the passive voice makes much more sence for this example.
Passive doesn't just sound better in the example, it's objectively correct.
Jefferson wasn't attempting to document anything. That wasn't the action he was taking at all. We only consider it documented after the fact; it's completely incidental to what he was trying to accomplish.
Therefor, only the passive voice describes what occurred. The only people actively documenting anything are those who deliberately set out to create documentation of events (and I think they might all be assholes, as a rule... so maybe he was actively documenting?).
Imo, the difference is in the focus. The passive voice version makes the "support of the new constitution" the most important thing the sentence is about. The active voice makes "Thomas Jefferson" the most important thing the sentence is about.
So, it really depends on what the rest of the paragraph is supposed to be about.
Which is why the blanket aversion many style guides have to the passive voice makes no sense. Use it when it makes sense to use it, obviously. Often, the subject of the verb really is of little importance, particularly in formal accounts of events
I think the issue is not with genderization but the choice of using it instead her name or affiliation (University of Nottingham) or country of residence (United Kingdom). Suggesting that "She Turns Fluids Into ‘Black Holes’ and ‘Inflating Universes’” is more “news worthy” than “Dr. Weinfurtner Turns Fluids Into ‘Black Holes’ and ‘Inflating Universes’” which imho is at least a little bit patronizing.
The phrase "[s]he turns fluids into black holes" could have come directly from some new-age bible. The only thing that made me think "maybe they're talking about a person" was seeing it featured on HN, so that was purely by context. For me, your suggested change isn't patronizing -- rather, it's an essential addition that upgrades the headline from content-free gibberish to an interest-piquing description.
It might also be one of the reasons why the assumption is veering towards "man" in headlines such as "professor turns fluids into black holes." When media go out of their way to stress that it concerns a woman, the default assumption for ungendered terms will remain male.
I thought "She" seemed odd too. I suspected it might be a sort of acronym or code-word, and when I started reading the article, that's what I was looking for an explanation of.
Really nice idea,
the point I don’t get is why we need to crop the image? If the barcode numbers are in a specific format why can’t we just filter the OCR results?
Technically it could use some more advanced regex/filtering on the OCR output, you're right! I found that with some receipts, you actually can omit cropping with some exceptions. I'd definitely like to remove the need for cropping altogether.
IMO the author underestimates the positive effects of non combative debates between adults. For a long time before we could just google, debating was one of the main ways we used to check our ideas. Do you believe the sky is red? Try to debate someone and see if you can convince them. And while we now can just google most of the simpler things, that's not the case for complex ideas.
As the author notes by simply listening you will have a better view of the other persons arguments. But at the same time you deprive them from the opportunity of validating their ideas. Which is not a simple tradeoff.
Even worse most people will take silence as a form of agreement, which for a debate about C++ vs Rust is no big loss but is not true for a lot of other issues. And strangely enough the examples provided by the author are complex non trivial ideas no one should take for face value or agree by silence.
For example what is the probability of such character being rendered incorectly in some standart tex? lets say a wikipedia article.
Even more so the argumet that people don’t report this because they are "not speakers of English!” is just an assumption. Not to mention that translation applications are more than good enough for such a task.