Hacker News new | past | comments | ask | show | jobs | submit login
Why we still can't stop plagiarism in undergraduate computer science (kevinchen.co)
51 points by kevinchen on March 22, 2018 | hide | past | favorite | 118 comments



I was teaching the lab portion of CS 101 (don't remember the actual course number off-hand) when I discovered that two students had the same remarkable code that shouldn't have worked but did.

We were using C, and instead of using globals or parameters, each function declared the same local variables in the same order. The stack, then, remained sufficiently consistent that each function had access to the values it needed.

When I confronted the two of them about plagiarism (and explaining what they had done wrong) their defense was that they were working on the problem together, and thus had made the same mistake out of ignorance.

And frankly, it made perfect sense. I could easily see myself doing something like that.

I guess my point is that, for at least some small portion of the problem space, plagiarism isn't really plagiarism.


In other words, we have made studying in groups a crime. Normally, to study a subject you would pose yourself problems and solve them, sometimes with friends. But we insist that homework be done individually, and we insist to assess every last little thing. I think this is a problem in the American university.

The other problem is that we have reduced scientific ethics to the subject of plagiarism. But there's other things, like publication ethics, medical ethics &c, and they don't even appear on the kids' radar screens.


This is entirely true, on almost all of my homework I have been specifically instructed to not communicate on more than a ideal level with any code that I write. They could just be missing a semicolon somewhere but I am not supposed to even look at people's code. This leads into group sessions basically everyone being silent and just asking for syntax help. It's dystopian.


I strongly agree with you. When I was an undergraduate long ago, I remember getting lectured at great length by a math professor who was angry that many of the students in his abstract algebra course had worked together on the homework. I was one of those students (who had worked together), and I was very puzzled and annoyed by the professor, since I knew that I had learned far more on that assignment than any other! As a result -- I'm a college professor (in computational math), and over the last fifteen years on EVERY homework assignment I've given, I've explicitly encouraged students to work together as long as they clearly acknowledge who they worked with (and how). I wish everybody else did the same.


Here's my input as an actual computer engineering student with good grades and who was hired after 2 years: I have no idea why professors wouldn't want students to collaborate on homework, forcing students to solve problems on their own will only create basement coders who can't cooperate with other people.

Not a single company has ever asked me to invert a binary tree/implementing any more complicated algorithm nor my grades, they have only cared about my personality and non-school projects.

Hell, we are encouraged to work together with other people at my university.


I am equally confused as well. It's been a while since I studied CS, but when stuff was to be handed in on paper we even had multiple names on it (which was allowed, but like only up to 4) and when it was switched to digital handin, why wouldn't you submit the same thing twice, with both names on it.


Here's the easiest solution: stop grading homework

Why judge student performance on something that they are using _to learn_? It doesn't make any sense.

Every student is basically competing with one another to get the highest GPA possible - if you're going to give them cookie cutter homework with solutions can be easily searched for on the internet _and_ can only bring their GPA down, then they're going to cheat. Plain and simple.

Give them homework and "grade it" to give them feedback, sure, but don't make it count.. that is, if the goal is to have students learn.


Optional homework almost definitely won't get done. Making it count ensures that students would actually do it, which is useful to instructors (what do they need to focus on in class? Is the class understanding the lectures?).


You can still grade the homework (to give both lecturers and students feedback), but just have it not count towards the kids letter grade in the course. If no one is doing it, then you could make the grade for homework pass / fail: did you hand something gradeable in or not?


Hm, then I feel like students might do the minimum passable work. Which, to be fair, is pretty close to optimal for the real world.

Maybe that's the trick - make the requirements for passing good enough that the minimum still requires learning and engagement, and then just grade on a straight pass/fail.


That's how a bunch of courses I attended and TAed did it. You needed to beat a specific rating averaged over your homework assignments to be allowed to take the exams. Occasionally caused really good students to fluke the last 1 or 2 assignments since they were sure to have qualified, but generally kept everyone doing them and allowed for failing occasionally (which is useful because students then are less motivated to try and hide their failure, which gives the TAs better feedback about progress and problems)


or have it count to a small part of the grade.

Basically, perhaps construct a class like this

1) HW 20% of the grade

2) Quiz (say 1 Q a week related to homework that was returned a week ago, give students a chance to figure out what they did wrong) - say 20% of grade

3) midterm and final, a 30/30 or 20/40 split.

one can adjust the percentages to achieve one's desired pedagogical goals.

personally, I don't find copying homework to be a problem if the student figures out what they copied. It's no worse than students frantically copying down everything the teacher puts on the board or slides which seems to be a common way students study.


> 2) Quiz (say 1 Q a week related to homework that was returned a week ago, give students a chance to figure out what they did wrong) - say 20% of grade

I never had quizzes except from professors who issued them to keep attendance up. Quizzes by nature are short and shallow. At the college level they are not particularly useful for establishing understanding.


ok, that's reasonable response. and as other commentator said, not enough time as it is os takaing 10-20m out of a lecture isn't good.


Quizzes aren't really great at the college level. They waste too time--and you already don't really have enough time to begin with.


> almost definitely won't get done.

While I believe "almost definitely" is a bit too strong, one of the most important lessons undergraduates need to learn asap is learning how to learn effectively. Homework is only a tool that can be very helpful when learning, but it's not the only tool available.

> which is useful to instructors

That's nice, but it's the instructor's job to serve serve the students.

> Is the class understanding the lectures?

In most classes, this should be easy to determine simply from interacting with the students in class, the questions they ask during office hours, etc.

--

The ideal instructor should offer regular homework problems/projects that are clear examples of what the students will be expected to know, and check the work of anybody that wants to take advantage of the instructor's knowledge. However, if a student feels their time would be better spent elsewhere[1], that's their business.

The exam will reveal if they made a good decision. Like any skill, many students will make mistakes when they first attempt this kind of project/time management. Fortunately, as undergraduates, the consequences of failure are usually having to retake the class. Later in life, failure may mean losing a job or other far more damaging consequences.

(For the record, when I was an undergraduate at UC Davis, many classes had optional homework. I spent a lot of time working on it for some classes, and others I skipped because it was trivial compared to what I was doing every day at my job.)

[1] e.g. other classes, a job, their own self-study, or maintaining friendships at a social event


> Optional homework almost definitely won't get done.

Then they fail?

I don't mean to sound glib, but I don't see the issue here.

Students that want to pass will pass, students that don't, won't.

I say this as a student that didn't pass.


Really depends on the goal. Is the goal to teach everyone in the class? If so, "just let them fail" is a poor strategy. (It's also one that will get the teacher bad performance reviews.)


They might be learning more from 'optional' homework though. For instance, my dynamics professor didn't issue any graded homework, although the homework in the class was my largest time sink last semester. The homework was scored so you knew what you did wrong, but the entire course was graded on tests (which were difficult enough that afaik nobody managed >70% before the curve). After the first test, it was pretty clear that the homework wasn't 'optional'. While this did help w/ teaching self-study, overall I think it was a poor way to run the course, since then the tests became overwhelmingly stressful for nearly everyone. I think that grading is necessary, and at the college level quizzes are typically pointless, but without homework there is little left to judge comprehension on.


Pass/fail grading for making an attempt.


> Here's the easiest solution: stop grading homework

Because it's not that simple. These are INTRO classes.

For many students, an intro CS class may be the first class they have encountered in their lives in which they finally have to work.

So, part of the job of a teacher is to teach the class material but also to teach good studying habits. Without grading homework, the first real feedback that a student is in trouble will come on a midterm when they fail--the feedback is WAY too late at that point.

That having been said--your professor isn't as stupid as you think he is. Plagiarism that fools the professor is as much work as just doing the assignment.

And professors have lots of ways of dealing with plagiarists far short of disciplinary proceedings. For example, partial credit on exams is quite subjective, and plagiarists tend to lose the "coin flip" if the professor is on the fence.

The problem I have is simply that there are quite a few professors who simply don't care. They make it far too easy to cheat--reusing a previous year's project or exam, for example, is a no-no.


I think that there is a solution that is being overlooked, which worked pretty well in my intro class. The professor explained that he would be using plagarism software and wouldn't be accepting excuses. He also explained that he expected to see comments explaining nearly everything, and provided a style guide. While this is training a bad coding practice (who want's to read code that has that many unnecessary comments?), I don't see any way around it for an intro course, since it's really hard to avoid reusing assignments. There aren't that many ways to code 'hello world', but if you make it clear that code must be thoroughly commented then it will drastically reduce false positives. Doesn't stop people copying wholesale from stackexchange, but w/ that level of commenting required I don't think students are really learning less. And this isn't required for higher-level courses, since by that point the assignments are complex enough that accidental plagiarism is extremely unlikely.


Not sure if you missed it, or I'm misunderstanding you, but he wrote:

> Give them homework and "grade it" to give them feedback

which addresses your main point.


It doesn't address the point. Homework that doesn't count won't get done, so won't provide feedback.


Exactly. I have tried "Voluntary Homework" with even upper level classes--it doesn't work.

I generally put homework at about 5% of grade. You can avoid doing it and still get the grade you want, but it's a bit of an obstacle. And most decent students will do the homework anyhow.

Intro classes don't get that option. I put it at about 25% of the grade.


But then where does your grade come from? More testing doesn’t seem like a good solution.


Grade by completion. Maybe 50% of questions right rounds up to 100%?


Completion of what?


a set of problems, the homework


I'm confused. The fix for grading homework is to grade homework?

"Completion" is pretty ill-defined if there's no attempt to judge content.


Like I said, rounding up from 50% seems reasonable


Ok, but that’s essentially just saying to grade homework gently. Which is totally fine but doesn’t seem to address the concerns with plagiarism nor with the earlier comment that instructors simply shouldn’t assign homework.


lab work.


How about just stop grading on a curve? At least then, cheaters can't hurt the grades of non-cheaters.


Maybe I'm being small-minded, but I strongly dislike using the word 'plagiarism' to refer to cheating on your homework by copying someone else.

To me, plagiarism is taking credit for someone else's ideas at their expense; it's a "sin" against the person being copied.

Copying someone else with their connivance, or paying some essay-mill writer to do your work for you, should be in the same category as taking a calculator into a mental-arithmetic test, not the same category as « My name in Dnepropetrovsk is cursed, when he finds out I publish first ».


> plagiarism is taking credit for someone else's ideas at their expense;

that seems like an unworkable definition. If I submit John Milton's Paradise Lost as my own work on an application to grad school, it certainly wouldn't be an "expense" to the long dead author.

Plagiarism is a fraud where you misrepresent the work of others as your own. In the academic world, what else is there but your own work?


If I discovered a lost work of Milton and pretended it was my own, then that would be acting at the expense of Milton's reputation, in the sense I mean. So I don't think it matters if the "someone else" happens to be dead.

(If I tried it with Paradise Lost there'd be no expense to their reputation, but that just means it's not a realistic example of someone attempting plagiarism.)


> In the academic world, what else is there but your own work?

Work published under your name but performed by anonymous flunkies. It's... the norm in the academic world, actually.


Heh. I wanted to store my old university coursework somewhere, and GitHub seemed as good a place as any. Because I'm cheap and don't want to pay for a private repo, there it sits in all its public glory. Few years back I got an irate email from a professor that students were copying a program I wrote for a SPARC assembly course which has remained unchanged for, like, two decades. So, to some degree the whole plagiarism thing is due to professorial laziness.


Oh, now that is great! Instead of just re-writing the problem for the class, the PI takes the time to write you an angry email and just make themself look foolish. Gotta love lazy professors.


People are turning in your SPARC assembly code for their assignments? Are they actually still teaching SPARC assembly?


I don't know if it's still the case of this year, but for our computer architecture course we learned SPARC as well as x86. Was pretty neat to explore the different ways of doing things.


I have a big problem with the general theme of this article, which is that plagiarism detection software is infallible and every student who disagrees with its findings is wrong and dishonest.

You claim

> We have virtually eliminated false positives at this point

but offer no explanation for how you verify this.

You later rant about the fact that students have the audacity to challenge these (very serious) charges and the university actually expects you to follow up when they do. The horror!

IMO it's your system of senseless programming exercises and automated grading that is broken. Instructors need to put in the time and effort and assign homework where students have to actually think and be creative, rather than reuse the same assignments for the 10th year running and be shocked when submissions turn out to be similar.


> I have a big problem with the general theme of this article, which is that plagiarism detection software is infallible and every student who disagrees with its findings is wrong and dishonest.

These claims were not made.

> You claim

> > We have virtually eliminated false positives at this point

> but offer no explanation for how you verify this.

Yes, he did. He described the process that arrived at the conclusion including "keeping only the cases that contain indisputable evidence — for example, hundreds of lines copied right down to the last whitespace error". It's clear from the context that this was verified by following up with the alleged plagiarizers. Indeed, he notes one false positive.

I'm not sure what your concern is. The article presents the process they've used for catching plagiarism and rather than point out any actual flaws, you attack the strawman of blindly using anti-plagiarism software and treating it as literally infallible.


It doesn't matter if they think their process is perfect. It is still just an accusation at that point, and students have the right to appeal it.

IMO it isn't acceptable for an instructor to say that they don't have time to provide an explanation when asked by the university.


You're still fighting strawmen. At no point did he say that the students don't have the right to appeal or that instructors shouldn't have to participate in that process. He said it's a big time sink and one of the reasons instructors don't fight plagiarism much.


> "Then, we apply another filter, keeping only the cases that contain indisputable evidence — for example, hundreds of lines copied right down to the last whitespace error."

Sounds like they don't want to deal with plagiarism, if you can avoid it by simply making your copying "disputable".

MOSS is clever-- instead of doing direct textual comparison, you compare streams of tokens. This means that even if a student reformats the whitespace or renames all the variables (a common obfuscation technique), the same stream of "TOKEN ASSIGN_OP TOKEN LPAREN TOKEN COMMA TOKEN RPAREN" will exist. TMOSS extends this to snapshots of code as a student develops it, which is apparently 2x more effective!

This author also delicately avoids the cultural side to plagiarism-- many students come from backgrounds where "group work" is common, and passing classes is a communal effort, including homework. It's an unfortunately common mistake to think the grade is what matters, not the fundamental skill development.


Almost every aspect of our discipline encourages open source code sharing and code reuse. This is a discipline wide mind set.

In fact, "build it yourself from scratch" is an anti-pattern in my opinion.

I'm not condoning cheating, but why would one not expect this to be be the default behavior?

As others have suggested, there are much easier "solutions" related to logging keystrokes and commits should you really want to catch and punish this behavior.


> "In fact, "build it yourself from scratch" is an anti-pattern in my opinion."

In the context of getting things done, that is sometimes true.

However, as an employer, I want to know that you:

1) have enough knowledge to build it from scratch if you have to, starting with analyzing the problem and ending with a coded, tested, debugged, and working solution, preferably at least somewhat optimized.

2) have enough knowledge to be able to read the code that you might want to use in [not building it from scratch] and assess it's value, considering A) whether it will meet the actual need, B) whether it will do so at a lower cost than writing it in-house from scratch, C) whether it will meet or exceed performance parameters, D) not introduce more problems than an in-house solution.

3) make a well-informed and reasoned decision between #1 and #2, and not merely be a copy-paste monkey.

Doing copy-paste as a regular practice in school eliminates all three of these capabilities.

In short, school is different from work, and you need to adhere to different practices.

edit: format


Depends what you’re writing. I’d expect a student to write code for data structures and sort algorithms themselves.

Not sure how well plagiarism detection would work though...


Isn't that the point of these courses anyway? Unless you have very, very good reasons, why would you ever write your own bubble sort professionally outside of niche cases? But having to implement different sorts by hand makes it much clearer how they differ, as well teaching basic algorithm knowledge.


Exactly. We make every third grader look up and write down important dates and people from the past, even though they're already written down write there in the book they're looking at - it's not about the dates and people, it's about how to look things up.


After implementing plagiarism detection campus-wide, design a process which is very time consuming on the students part, and not as time consuming on staff.

Maybe something like: "Your homework was flagged as possible plagarism. Report to this lab at this time, and code one additional problem which should be easy to anyone who understood the original homework."

Anyone who really did the homework will be in and out within five minutes. If you can't finish in an hour, you get a zero on that one homework assignment, not expelled.

That flips the incentives. Also, reducing the punishment cuts the drama of people arguing that the software is not 100% accurate.


Here's an idea: why not make the assignments personal enough that you cannot cheat on them?

"But wait, that would require huge amounts of time investment from the professors/TAs"

Yes, it's almost as if paying $65k a year for someone to teach you something should result in that person teaching you that thing instead of just checking in to see if you've learned it on your own.


Here's an idea: you try to do that teaching three or four sections and report back how that goes. I am not sure if you have ever taught post secondary classes before, but what you're describing is untenable for someone who wants to not work 12 hour days. It's a nice idea, and I know that I would have loved to do that when I was teaching, but the fact of the matter is it isn't feasible unless you have a squadron of TA's to help you carry it out, and that's not gonna happen.


Totally! I definitely didn't mean to imply this is the Prof/TA's fault. It is the fault of many institutions and systems above them. I meant that TA's shouldn't be responsible for grading 4 sections of 60 students each. The only natural outcome of that is poor education, one facet of which is easily cheatable assignments.

It just seems ridiculous that the thing we are focusing on is that kids are cheating and not that we've made this massive, unscalable, expensive, ineffective education system. We are acting like its the kids who aren't holding up their end of the bargain when they try and cheat it.


Yeah, I absolutely agree. So much gets offloaded onto the teachers/TA's these days it's hard to get any educating done.


> (...) unless you have a squadron of TA's to help you carry it out, and that's not gonna happen.

Interestingly enough, here's how this worked when I was an undergrad TA:

* The college's central office paid TAs for obligatory undergrad courses (so basically CS101 etc. up until sophomore year) * Advanced courses were funded by the individual chairs and institutes. And would you look at that, suddenly there was a whole less of code submissions auto-checked for plagiarism, and a whole more "talking to the TA to explain your reasoning and problems you encountered". This also served the double purpose that these TAs got to know the students, their likes and dislikes, and led to some easy recruiting for PhD candidates and such.


I like that model!


It's a fair complaint from the student / customer's perspective, though, because university tuitions are constantly increasing for no apparent increase in value.

That universities pay the TA's beans isn't the customer's problem.


Then schools should hire squadron's of TA's. We pay them enough that they can afford to, but they simply choose not to.


the reason the pay is so low is that it’s part of a PhD, usually, so the TAs tolerate it. i don’t think you could hire someone to just be a TA full time at those wages.

you can’t admit more PhD students, necessarily, because to graduate they need to do research under an advisor, and access to grants/projects won’t necessarily grow along with required teaching staff.


To build on my snarkniess: it's almost as if the incentives behind research and education aren't nearly as aligned as we think.


Fine for higher-level courses, but how exactly are intro courses supposed to make 'hello world' and the like personal enough that you can't cheat?


The same way it's done in kindergartens around the world? They say "write me a story about your family" or "draw your family tree." So ok: "write me a program that is somehow relevant to your life."

Imagine you are teaching someone basic web development. The assignment is "make a webpage that displays some facts about your favorite tv show." Easy.

I can buy that it makes the grading more complicated, but coming up with assignments is not difficult.


One thing that I think people don't talk about enough on this topic is the wildly different plagerism guidelines between different classes.

I did both CS and economics in college. And in my CS classes, even discussing the homework with classmates was often "against the rules".

But in my business and economics classes, me and my classmates would regularly work together on the homework my straight up assigning certain problems to certain people and then copying from each other.

This was not only allowed by the professor, it was explicitly ENCOURAGED!

They understood that if you talked to classmates, you will be able to understand things better, instead of struggling and failing to do stuff on your own.

And with such wildly differing guidelines for different classes, things were often confusing to students.

One potential solution to "cheating" is to explicitly allow it, such that everyone is on the same playing field.

What matters, at the end of the day, is that the students learn the material.


The article doesn't seem to even consider the possibility of assessing students in some other way than through standardized homeworks which are easily copied.

For example, individual projects where everyone in the class is working on something different; or at the other extreme, proctored exams.

(Of course, neither of these systems is entirely free from cheating, but the barrier is higher.)


For intro-level CS courses, neither one works all that well. You can do that in more advanced (but still undergraduate) courses, but for beginner-level you IMHO need the feedback loop of regular homework. It doesn't have to be strictly graded, but it basically has to be there and at least some enforcement for it actually being done (e.g. handing in homework is required, but it's not a problem if it is doesn't actually work).

Coming up with hundreds of different small projects to e.g. get people to understand pointers isn't very realistic, and if you only test them in exams you're missing critical feedback both for the teachers and the students before it is too late.


Assess the homework, give them a grade on it but don't have it factor into their final grade at all. Just let the exams / big projects take care of that.

If you're regularly failing your homework, it probably means you won't be ready for the exam that actually counts. Which should be enough feedback


having been both a TA and a student, i think this idea is entirely too optimistic. "optional" problem sets can work well in graduate and advance undergraduate courses, but for the bulk of students in the intro/intermediate courses, homework that doesn't get graded for accuracy may as well not exist. i certainly didn't do it, and when the CS department decided to grade homework on completion only, i saw student attendance of my office hours immediately drop to zero.

ultimately, you cannot just expect college students to make reasoned decisions based on the way you set up the course. you need to arrange the course in a way that incentivizes them to keep up at every step.


That's the conventional wisdom in America, certainly. And yet my undergraduate degree (maths at Cambridge) was 100% assessed by end-of-year exams and it worked fine.^

Did I stay up with the classes? Well, not always — some of them I caught up during breaks. (That's the advantage of end-of-year not end-of-semester exams, too.) But did I know the material by exam time? Yes.

But look, more generally, you say you have to "incentivize" students to keep up but I think we agree that it's in their long-term interest to keep up with or without graded homework, so we're talking about behaviourist incentives not rationalist incentives here. And with that admitted you have to consider that there's broad scope for other ways to incentivize than through GPA consequences. (I mean, hand out candy, or pizza, or Pokémon, or porn, or whatever this year's students are into, you can probably think of a much better ideas than these.)

^Unless you were that one kid who had mono during finals week. But that's a different issue.


I think me editing overlapped your comment: yes, that's an option, and I guess then you do not have to care about plagiarism so much. There probably still should be some effort against it though.


as a grader, it is already hard enough to keep up with the work that gets generated by thirty students attempting the same weekly homework assignment. i don't know how i could possibly grade thirty different projects in a week, let alone in any fair or reasonable way.

i've always wondered why they don't do something like select a random number of students and ask them to explain their work in an interview with the professor or TA.

i figure, even if they cheated, if they can give a satisfactory explanation, they probably learned the damn thing anyway. plus, you don't have to select even half the students before word gets around that they better understand the work they are turning in.


Add a grading section to each assignment with something like this: --------- Program is expected to maintain this interface <link to interface here> over stdin/stdout (or sockets, etc).

Students will pass when their program passes the grading/test suite. Test suite can be check ahead of time by uploading binary/zip/tarball to this location using their student id: <location here>. Detailed instructions <here>.

IIRC, this is what my university did 15 years ago. Not everyone had unique homework, but often the homework assignments were not all the same. I guess how unique they are depends on if you have some parameters in your homework generation or if you have a bunch of misc grad students to generate a few sets of homework that you can build up over time.

How might you grade style? IMO Don't. If you must, then save it for interactive review/lab sessions/office hours or style-specific spot checks (e.g. "At least once per semester your assignment will be additionally graded on program style according <insert guidelines here>. This will be added as additional points to your semester total.")


as a TA, i love when professors set up grading scripts for me, and as a student, i love knowing what grade i will get before turning in the assignment. unfortunately, you need the students to have the basic level of competency where they can actually write code to a testable interface or correctly use print statements to produce good enough output for a script to grade by parsing. in the course i currently TA, neither of these prerequisites are met. many students turn in programs that will not compile and i am expected to give generous partial credit if i can determine that they are "most of the way there".

i also have pedagogical concerns about providing the test script ahead of time. i suspect that it biases low effort students even more in the direction of "guess and check" rather than the ideal (imo) "reason and predict".

all that said, the approach you outlined is probably the best feasible approach for most cs courses.


The issue of incentivizing students to guess-and-check when providing the test scripts upfront is, IMO, fixed by making the students write the tests themselves. This paper explains it pretty well:

https://www.cs.tufts.edu/~nr/cs257/archive/stephen-edwards/a...

Essentially the students would write a test suite and the grading framework would grade based on

1) Code coverage when running the student's test cases against an instructor's reference solution

2) Correctness of output: running student's test cases on student's code and comparing with output from running those test cases on the reference solution

3) Number of test cases passed in student's test suite

Also from the paper:

"All three measures are taken on a 0%–100% scale, and the three components are simply multiplied together. As a result, the score in each dimension becomes a “cap” for the overall score—it is not possible for a student to do poorly in one dimension but do well overall. Also, the effect of the multiplication is that a student cannot accept so-so scores across the board. Instead, near-perfect performance in at least two dimensions should become the expected norm for students."

Students still get the benefit of knowing their grade when they submit, and as an added bonus students get more hands-on experience with test-driven development. Having the students write the tests themselves also increases the cost of mutating code until it just works.


this sounds like a great solution. i particularly like that it encourages careful thought about test cases.


It sounds like students who can't program need to be offered/forced into a 099-level class on whatever language or languages are being used for pedagogical purposes prior to actual CS classes. We don't let students who can't handle Algebra and Geometry take Calculus classes and we shouldn't suffer students who can't generate compiling code in CS classes. I made most of my money in college because such a class didn't exist.


Standardized homeworks as a simple check on learning with an individual or group project at the end has been my favorite way of learning.


Sure! But the method of learning doesn't have to be the method of assessment. In fact, as pointed out upthread, it shouldn't be.

When students are doing homework you want them to form collaborative study groups or freely consult any other source, if it helps them learn. Fear of plagiarism is antithetical to that.


I think a related problem is that many university educators have a pretty abismal understanding of education itself. For some of the, the notion that leaning tools shouldn't necessarily be assesment tools is rather suprising.


My recollection of CS-50/51 from decades ago is that the final project with the most value was an individual one. It seems infeasible to make all projects individualized, if you have a student/TA ratio greater than 3.


What I find fascinating about this problem is students paying all that money in order to deliberately avoid learning anything.


As you know, such students are paying for the piece of paper which certifies that they learned something. Who's to say it doesn't work?


They are paying for the piece of paper not for the content.


I too paid for a piece of paper. But given how expensive that piece of paper was I damn well made sure I learned something too.


In an intro CS class, many of the students don't want to be there. They're taking the class because it's a requirement for the degree they actually want, which may have little or nothing to do with CS. Intro classes aren't just for people who want to specialize in that area.


Then perhaps the world needs to stop caring so much about the assement part of a university.

That would be wonderful if more people were able to focus on the learning, and less on the grades for unrelated subjects that don't matter for your job.


I do agree with you there. If you didn't actually learn anything in school, that should become apparent when you're unable to perform your job function. If you can perform your job function regardless, then I suppose it doesn't matter all that much either way.


They're very frustrating to interview.


They should be flagged in the phone screen.


This essay is most about culture, but it does mention an anti-plagiarism program (which sounds pretty hard to do except in very trivial cases, but who knows?)

There's another tool: the repo. My son was accused of plagiarism in his last year of high school. It could have been a "he said/he said" case -- in fact it started that way -- until I pointed out that if he believed he was in the right he had a record that could be checked.

The CS teacher had to explain to the principal why the repo proved who had copied whom (and left me wondering why the teacher hadn't looked there first????) which wasn't easy because the plagiarist's parents were big donors to the school. So in the end, despite what it says in the school handbook, the only penalty was a 0 on the assignment.

But a good lesson for my kid on both programming and the sociopathologies of organizations.


Requiring students to submit their VCS history along with the finished project would at least up the cost to the students for copy and pasting.

They even hint at that sort of solution in the piece by mentioning cosmetic changes to the files at the last minute.


I wrote assignments for cash in college. When it came time for me to take a class, it was noticed that my work was very similar to past students. I tried using my VCS history as a defense when my prof noticed that my homework was similar to that of past students. After questioning some former students, it became obvious that the reason my VCS history was reasonable and style still so similar was that I was the person who had written other student's assignments in the past.


I'm also a TA for the class mentioned in this article. We teach Git and have a submission system where students submit patches based on skeleton code; students are required to make at least five commits. We still have a significant number of students who copy code, and while it does help with picking up on that kind of behavior those students also don't seem to care about the increased cost and will pad their commits anyway.


Was going to post something similar -- to protect myself as a student, I'd quickly adopt git to keep a history of my work. Not that this couldn't be forged...

Does git count as a block chain for proof of work? :)


Pretty much. When I taught CS, I told people to commit early and commit often.

This has so many advantages even beyond defending against plagiarism charges that it really wasn't hard to drive home.

The big advantages being defense against the inevitable computer crash and the inevitable directory deletion.


Wish I was taught or even knew what VCS was in University.


Engineering classes should really switch from being homework-based to being project-based. Even something as simple as small coding projects that can be done in a week.

Then, the final project would be such that you'd have to explain your code, either in person with a TA, or by writing documentation for it.

We really need to move on from this academic mindset of homework, grades, and plagiarism toward something that is actually reflective of the world outside of academia. The concept of plagiarism doesn't really exist in the software industry -- it's a matter of what you can get done.


TFA seems a bit at odds with itself. One reason academic boards don't care too much about CS plagiarism complaints is that CS generates so very many of them, compared to other fields. The reason isn't that CS students are degenerates (although they may be anyway), instead it's because it is so much easier to check for plagiarism in CS. So, sure closed-source is bad and definitely we can always use more TAs, but the problem is clearly not "we're only punishing 10% of our students while we should be punishing 40%!"

The problem isn't with CS at all, but rather with USA colleges in general. Indeed the only professor I've read who seems to even notice the problem is Harry Lewis. Most subjects should be taught very differently than they are taught. USA university education makes a great deal of unnecessary and counterproductive work for students and professors. The busywork threatens to drive out real academic work.

The reason for this is so that more such work might be created for administrators, who must multiply inexorably to absorb the ridiculous amounts of money that our ridiculous system of student debt generates. In fact it will be no surprise if some schools eventually do hire enough administrators to suspend 40% of every CS course every semester. One hopes that the professors who could restore some of the quality that universities used to possess, will realize by then that they can restore that and should restore that.


Instead of coming up with punitive solutions, I wonder what can be done to re-structure computer science education in ways that move the incentive structure away from one that encourages plagiarism? Bonus points if it improves the quality of the education, too.

For example: What if we move toward a more seminar-style approach of having students discuss and critique each others' code on larger projects?

This might not get rid of all copy/pasting, but it would create a huge incentive for students to at least understand how their code works, in order to avoid embarrassment in front of their classmates. And, should two kids copy/paste the same code, and that becomes apparent in the course of a peer review session, well, that's an event that everyone will remember. No need for the instructor to make themself the bad guy in the process, either.

It also has the side benefit of giving students experience with code review, with reading and understanding others' code, and maybe helps them start to develop a sense for how to write clean, readable code several years before they start getting bludgeoned by senior devs at their first full-time job.

As for smaller problem set type homework, why not give them group work? It doesn't necessarily need to be graded, aside from credit/no credit, if you're worried about giving A's to duffers. I had a few classes that did that back when I was in school, and I really liked it. I felt like I learned faster, both from working together with classmates and because the format allowed them to give us more challenging problem sets.


I was accused once in my undergrad and I thought it highlighted an interesting issue.

It was my senior year, and I attended a systems programming course that was being piloted and was very challenging. Work in the CS department was very group heavy, especially in courses heavy on theory. I benefited a ton from working with groups with other students outside of class. In your data structures/theory/math courses, this wasn't an issue - But in this class in particular, peoples submissions started to look similar.

It was resolved rather quickly because we just had to be honest, but I thought it was interesting - Specifically because, in classes that were so challenging heavy collaboration was what pulled me through, I barely remember the course content anymore. But, the soft skills I acquired from hours of collaborating with my peers after class hours has followed me for life and made a noticeable impact on my career.


> Finally, as educators, we also hope that the accused student can learn difficult lessons about ethical behavior in the classroom rather than the workplace.

Suppose that technique X can actually deter students from cheating 100% of the time.

So we apply technique X to intro class Z that has 300 students.

Now we have an intro class of 300 non-cheating students who sit quietly and listen to an instructor for an hour a week.

Then those non-cheating students sign in for a more reasonably sized class section of 40 to sit quietly and listen to a graduate student for an hour.

Finally, these non-cheating students take tests and do assignments written in such a way that the amount of grading time does not put the graduate students over the weekly allotted work time for their TA-ship in their particular program.

Ballpark-- by what percentage would one say the quality of the learning environment has improved by employing technique X?


This seems like something accreditation orgs like ABET should be more worried about. If students are cheating their way to degrees, that hurts everyone with a CS degree. Professors cant really do much if their uni doesnt care.


College isn't about education, but about signaling your value as a contributor to capitalism.

Otherwise, we'd be promoting collaborative learning and letting those who don't contribute or cheat simply cheat themselves.


College has two distinct but related functions: education and certification. While related, these things are sometimes in tension.

Most things related to grading support the certification function, not the education function.


Certification isn't a necessity in the face of accepting uncertainty. Once we accept the lack of certainty around hiring people, we can start coming up with solutions to overcome certification.

The companies experimenting on such things are more likely to be able to adapt to a climate where certification is becoming meaningless.

Hopefully, more companies will realize certification designed for the industrial age has run its course and academia will no longer be incentivized to continue gatekeeping via certification and will get back to focusing on education.


> Certification isn't a necessity in the face of accepting uncertainty

Sure, but since real people that are actually hiring are quite Keen to limit uncertsinty, that's not particularly germane to the real context in which colleges or students operate. If it were, non-certifying MOOCs and certifying college courses would have more similar costs.

> Once we accept the lack of certainty around hiring people,

Most hiring parties accept a lot of uncertainty around hiring people, and that is reflected in pay which is discounted for that risk.

Both hiring parties and quality employees (who can thus command higher pay) want to reduce that uncertainty, not increase it.

> and academia will no longer be incentivized to continue gatekeeping via certification and will get back to focusing on education.

Higher education started out much like trade guilds for scholars; there is precious little “education without gatekeeping” for it to return to (and it well predates the industrial age.)

Outside of universities, education without certification never went away. There are plenty of places to get that.


I'm wondering if removing money from all the equations makes the risk moot. If I have to choose one person to hire because of limited funds/resources, that introduces a selection problem. Removing the design constraints that lead to needing to choose one person can allow us to push tasks to a decentralized group of volunteers. Without money in the game, the market can be a collaborative one, as opposed to competitive. Then we're all working for the betterment of mankind.

This may be Star Trekonomics 101...dunno. I do know it starts with an organization willing to give it a try and put themselves out there to be the most sustainable with as little money required as possible. Over the second company joins them, it's only a matter of time before they (or someone inspired by then) publicly starts trouncing the rest of the market by reducing the flow of cash into the industry while focusing fully on meeting the needs of their customers. Customers will flock to them, collectively giving the org whatever is needed to include more people in a sustainable manner. Once that starts happening, others in the industry will likely follow and the industry will likely start heavily divesting of cash.

I patiently wait for it to happen in academia.


I co-teach an upper-level undergraduate class where the students create independent programming projects - the computer science students couldn't copy anyone else in the class's code even if they wanted to, as it wouldn't make any sense for their context.

Perhaps the solution is to be more creative with how computer science education is taught? If the students are copying homework problems they don't understand, they're not going to do well on the projects or exams that might be part of the rest of their grade.


part of the reason this may be salient now is that there are SO MANY undergrads taking CS courses.

classes that might have had 30 students in the past now have 300. you can’t hope to grade projects for all those students, even if you put them in groups.


Maybe the party that detects and decides on consequences for plagiarism should be a separate entity in the university. Like the internal affairs division in police departments that ordinary police officers hate so much in movies and tv-series. They would be an "external enemy", so the teaching staff would not have to suffer from friction with their students in these unpleasant matters, and also the consequences would be out of hands of the teachers.


I wonder how much of the 'plagiarism' is just people copying the same StackOverflow snippit.


Because the source for most generic programming assignments are already online?

Why not skew the grading more heavily towards in-class midterms and finals?

Or you could generate indivualized hws for each student, but that may not be feasible in a 500 student intro to cs class.


the other side of the coin is the group assignment where three of the five do all the work and all five live or die on the benefit.

oddly, post degree, we're actively encouraged to re-use code.


It's a problem in MS programs too.


I was a TA for a graduate level class at one of the top universities in the US and I've had some interesting encounters with plagiarism.

I. The time I got caught for "plagiarizing". In an intro systems class, me, a CS major, and my roommate, who wanted to minor in CS, were working together and I was "showing him the ropes". He was an intelligent student and we never worked together on the homeworks aside from general verbal discussions on what the solution could be. He used a Windows laptop and for one of the assignments, his C code wasn't compiling because he was missing some libraries and he told me he couldn't figure it out and we were approaching a deadline and asked me to compile it for him and send him back the binary. I did so, but when sending back the binary, in a rush, I accidentally mistook my HW folder for his (we'd downloaded this as a part of the assignment, and the folder structure was identical) and sent him my binary by mistake. Both of our solutions worked. Obviously, we got "caught" in the most naive way. Our binaries had the same MD5 hash and the CMS flagged us. We were both confused at first, and then we realized what happened and explained it to the professor. The proof was simple - just compile my roommate's binary and run it. However, he annulled our assignment to 0. We still both got As (because you could drop one homework) and while some may claim this was a gentle slap on the wrist, it felt unjust. We clearly made a dumb mistake and we shouldn't be punished at all, especially when we knew how rampant actual plagiarism was.

II. The time I caught students for "plagiarizing". As Kevin points out in his post, there aren't really any incentives to catch students for cheating. As a TA, I get no benefit, and moreover, there's a cost. No one wants to be known as THAT TA who busts kids for using "a little help". Keeping that in mind, I was usually very lenient when it comes to cheating. I've noticed signs, but there was never enough proof to warrant the effort of calling someone out. However, at one level it went too far. Two students who were partners for the "projects" had submitted nearly identical solutions for a complex Graphics homework assignment. They got the answer right, but I looked into their working and they both said "(9/5) / (4/3) == (4/7) / (5*9) = 1/3". I don't remember the exact values, but it was two steps of non-sense numbers and then a correct answer. I ended up reporting the case, mostly because I felt like my intelligence as a TA had been insulted. Are you seriously going to submit random numbers with a correct solution hoping I won't see? In any case, it didn't go anywhere.

III. Discovering a cheating ring. At our university, one of my good friends and project partners told me there was an "enormous Asian cheating racket" - not to call out any specific race, I'm Asian too. I wasn't surprised - to be blatant, it made sense. We're very grade oriented with tiger parents. Then I learnt the extent of it. There were apparently Chinese forums and "outsourcers" you could send your homework problems to and they would solve it and give it back. In addition, there were special shared systems like DC++ where you could discover answers to homeworks for different classes at my university as well as Prelims, Midterms and Finals contributed by students of previous years. I was in shock. Students would leave exam halls to go to the bathroom just to look at these answers mid-exam. But was I gonna tattle? No.

IV. The reality at universities. Not just in CS, but in every other subject, almost everybody cheats. Excuses that go around are: "I've worked on it with someone else" "Oh the TA in office hours told everybody the exact same solution" "What? Cheating? me?" "Maybe he/she took it from me, I didn't do it"

And look, people aren't stupid. We all know how cheating works. You get a homework assignment, and you re-write the sentences in your own language. You get some code from someone else and you define some useless functions with 1-2 lines of code. Or you arbitrarily re-organize lines of code. You rename all the variables. You re-organize your functions. You create some unnecessary classes.

There were students who distribute 10 homework assignments between 10 people (in groups of 2), and have one do the assignment (use office hours, friends, google, whatever) and the other literally re-write the assignment in LaTeX 9 different ways for the others to use. No one would ever really have to do the work.

The well known key to cheating is plausible deniability - if there's enough evidence you didn't do it, you didn't do it.


And it's an even bigger problem with MEng/MS students. These are usually unfunded cash cow programs even at top universities. They accept fairly mediocre students from China and India and the class is usually 80% Chinese/Indian. A generalization, of course, but they have 0 intellectual curiosity. They are here to pay $50k-60k for 1 or 2 years, make sure they have as close to a 4.0 and then go get a tech job where they will make $150k/yr, and little to none of their skills from class would be needed.

And I can speak for Indians, but CS education in India aside from the IIT, the IIIT, BITS and some NITs is dismal. Cheating is rampant there, and they're much more well versed with the art because it's much harder to cheat and get away with it in India - you can't bring phones to your exam or freely go to the bathroom mid exam, for example.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: