Automatically grading multiple choice exams from photos using Python and OpenCV

Houshalter · on Oct 3, 2016

There's an even cooler thing you can do with automatic grading. I saw this idea on another HN comment once. The computer can figure out the answers without programming them in. Or it can figure out mistakes in the real answer key.

The idea is that answers are highly correlated. Better students are more likely to get all answers correct, worse students are more likely to make mistakes. So if you do something like PCA on all the students' answers, the first dimension will represent the quality of the student. And the weights will represent which answers are correct.

Fiahil · on Oct 3, 2016

> Better students are more likely to get all answers correct, worse students are more likely to make mistakes.

Yes! Until you brainfuck them, by putting the correct answer in the column B for the first 10 questions. Once the doubt is there, it's not gonna go anywhere.

germinalphrase · on Oct 3, 2016

If automatic grading works well, could you also automatically assess the effectiveness of the question/answers themselves? For instance, a teacher creates a series of 10 different quizzes over the years that assess for the same knowledge/skill. Rather than simply re-using a static document, could you automatically generate a set of quizzes that assess students knowledge/skills at "basic", "intermediate", and "advanced" levels (distributed automatically based on [potentially] preparatory work scores).

I would love to see an automated "quiz creation" engine (perhaps with some minimal instructor curation) that would both quiz/score students - but also allow teachers to integrate more assessment, more often at lower stakes.

By significantly reducing the labor involved with assessment, teachers can more effectively create learning opportunities for students (teaching - with all it contains - is time constrained). Certainly, self-leveling assessments already exist and are in common use with cognitive tutors/online education; however, the knowledge/skills they assess must either be very general or specific to the content of the online course. If teachers were tagging the content of their courses - which are likely to teach knowledge/content that is being taught in hundreds/thousands of other school across the country - then such a self-generating assessment engine would be very useful if it could self-modify to accurate assess whatever is being taught in the course.

Modifying learning materials and tasks for different student levels is a major pinpoint for teachers, but is very much a stated goal in almost all k12 schools. It is done poorly, if ever - and mostly when legally required by IEP/504 plans. The trend is very much in the direction of increased data/personalization in the classroom, but currently the big investments in data are being made on the administrative side of the k12 industry. There is so much potential for improving the classroom backend to support instructor workflow rather than constraining it with pre-conceived notions of how we work/what our goals are. I hope to see it happen.

fnbr · on Oct 3, 2016

> If automatic grading works well, could you also automatically assess the effectiveness of the question/answers themselves?

That is an absolutely fascinating idea. I would love to see someone run with this as a company, or as an internal initiative at the Department of Education- that seems like a brilliant idea. Has anyone done this before? Is there any reason why this isn't a slam dunk?

germinalphrase · on Oct 3, 2016

In some capacity, I imagine this is a component in the creation of standardized tests. Likewise in the development of cognitive tutors (though I could probably follow up with an academic acquaintance that develops them at UW-Madison). The trick (and value) would be de-coupling it from specific curriculum.

I would be also be intensely interested in exploring this as a domain side professional.

Denzel · on Oct 4, 2016

A company I cofounded back in 2013-2014 was trying to do this. Although our plan of attack showed promised, the company crashed and burned for "legal" reasons.

For this to work, you need to do three things: (1) get the original exam, (2) tag every question with a series of categories, and (3) get the students' completed exams. You'd think professors/teachers would be excited by the idea of reducing their workload with an automated quiz/exam generator based upon their curriculum. It didn't seem like it.

The professors can't be relied upon to tag their exam questions with categories, so the company has to handle #2. To solve problem #1, we reached out to hundreds of professors at multiple universities, explained what we were doing; and asked if they'd be willing to share their corpus of past (and hopefully present) exams. Nope! They didn't want to do it, and when pressed to give an explanation the lines of communication went cold. Not to mention, if we couldn't convince them to share a blank exam, there's no way they would be willing to share their students' completed exams.

So, we thought of a different route. My cofounder had taken a year off from Vanderbilt University, where he participated in a fraternity. Not only did his frat have a massive test bank that we could use, he knew plenty of other frats that had even more. It was a start. But convincing frats at other schools to upload their exams was a no-go. They slammed their big Greek doors in our face.

There was nowhere else to turn, except the student body itself. But no student wants to give away their exams without an incentive, and we didn't have the money to pay them. So, we setup a marketplace where students could upload their exams and sell them for a split of the sale. Finally, we got some traction. Some students uploaded their exams. Other students bought them. In our first week of release we made almost $1k. We were on track to finally make good on our promise; it looked like we'd be able to hire people to annotate the exams and begin work on our exam generation service.

Until that fateful night, while we sat there trailing the logs, and we saw it. A Vandy professor had signed up and bought one of her own exams! (Just to verify they were authentic.) She immediately reported my cofounder to the university's disciplinary department. And they threatened to expel him with a permanent mark if he didn't immediately take down the website. After 3 years of classes, he'd leave with no credits, no diploma, and no other place to go. Like most young kids, he decided to lay his cards down under such massive pressure.

After that, we shut the website down, and went our separate ways. Never able to achieve our ultimate vision.

Maybe there's a different way to go about it. But without a strategy to compile an extensive corpus of completed exams with tagged questions, I don't know how you would go about it. If you have any ideas, I'd love to hear them. Because not a day goes by where I don't want to take another stab at this.

germinalphrase · on Oct 4, 2016

Higher ed. is likely to be more protective of intellectual property when it comes to assessments. They're less frequent, higher stakes, and are principally used for summative rather than formative purposes.

Even in k12, there might be a chicken/egg situation. The obligation to tag content is a barrier (just as tagging is always a barrier).

That said, k12 has so many underlying content standards - which are only becoming more entrenched, for instance with standards based grading - that you might want to integrate tagging as a sub feature to a curriculum creation/curriculum performance toolset so teachers don't have to focus on it specifically (only, rather, as fulfilling the pre-existing expectation that they are hitting their standards).

But then you'd need a toolset teachers actually want to use.

Denzel · on Oct 4, 2016

You're absolutely correct with respect to the differences between assessments in higher ed and those in K-12. K-12 is outside my wheelhouse. Although your point of entry is interesting. I'm wondering what the lead time would be before you hit a large enough mass of tagged questions with student performance data.

Do you have any links to information on the current market for curriculum creation/curriculum performance toolsets? (Something with a list of companies, their products, total market penetration, and respective market share.)

germinalphrase · on Oct 4, 2016

No formal list, but seeing market maps like this is typical:www.cbinsights.com/blog/ed-tech-market-map-company-list

There is a diversity of approaches to improving schools and education. They often feel as much like features than they do companies.

I have to feel that creating the content/performance/assessment backend opens up a lot of progressive improvement. You'd actually be able to see inside the classroom in a broad and detailed manner. You'd actually be able to personalize instruction without diminishing content to paint-by-numbers. And, maybe - eventually - you'd be able to de-school education and let people learn as they can, where they can.

Denzel · on Oct 4, 2016

I've thought about your "de-schooling" point a lot. I think building software to support teachers is orthogonal to de-schooling the system. Because there's too much cruft, I don't think you'll ever get contemporary schools to fully adopt de-schooling. It has to be new.

I'm actually working on a very small (but lucrative) experiment on building such a de-schooler, in a very specific vertical, as proof that it can be done. Then, after proving the process, I planned to take the lessons to their logical extreme. If you'd like to talk more, please email me: denzel dot morris 1 at that google mail service.

germinalphrase · on Oct 4, 2016

I absolutely agree that schools will never accept de-schooling, but I don't agree that it's orthogonal. I see de-coupling education from the school as parallel to the educational mission of schools, even if politically unpalatable. It won't mean that schools will go away, but it might mean that someone somewhere else may not go to a school to educate themselves beyond a young age.

One goal of creating such a system is to collect and make re-usable (and re-mixable) all curriculum materials, performance related interactions, and assessments that are being created and used everyday in our schools. Perhaps, US students won't take advantage of the fact that a few decades of this material might mean that personalized learning opportunities can be created for them without the intervention of a human being - but it could develop as a parallel capability underneath the current school system to support learners anywhere in the world who have non/minimally functional schools (or, for that matter - anyone who would prefer not to spend their days in a school). It might not perfectly rival a high quality flesh-and-blood instructor, but it sure might be better than the status quo for that person elsewhere.

It's the difference between [Sugara Mitra's hole in the wall] or [Negroponte's OLPC] and real education. Computers are wonderful - but most young learners need structure. Likewise, when Facebook/Google/? blanket the world in free internet we will see a flood of brand new educational products, but why create an entirely new army of curriculum developers when we already have an army of 3.1 million trained school teachers creating educational content every day? All of that labor is silo'ed within the individual classrooms, both preventing us from self-improving as well as providing educational opportunity broadly.

Also, the outcome would not be to close your neighborhood school. After all, community schools exist for a variety of reasons that are distinct from education. Schools won't go away just because education can be done without them (as that is already explicitly true).

TD-Linux · on Oct 3, 2016

Classic multiple choice grading machines will print out the number of incorrect answers on a final sheet passed through, next to each question (where the checkmark normally is). A lot simpler than PCA, but still pretty effective in finding mistakes, or gaps in course material.

ronald_raygun · on Oct 3, 2016

There is a paper on this!

http://www.icml.cc/2012/papers/597.pdf

tgb · on Oct 3, 2016

Interesting, but how would PCA work with a discretely-valued, unordered, metric-less data like multiple choice answers? (Actual question, not snark, I only know the basics of PCA.)

Houshalter · on Oct 3, 2016

I would imagine you would assign each multiple choice answer a value of 0 or 1, based on whether they selected that answer or not. So if there was just one question, and the student answered 'A', it would produce a vector like [1, 0, 0, 0].

In fact I believe there are similar but better algorithms than PCA for handling mutually exclusive binary variables. I just don't know what they are called.

gbrown · on Oct 3, 2016

https://en.wikiversity.org/wiki/Dummy_variable_(statistics)

eanzenberg · on Oct 3, 2016

That's pretty cool, like automated collaborative grading on a curve.

toomuchtodo · on Oct 3, 2016

Machine learning using human expert trainers ;) I wonder if Khan Academy has tried this yet.

fnbr · on Oct 3, 2016

Why would the first dimension necessarily represent the quality of the student? I don't see why that should be the case.

Note- I'm not challenging you- I don't see why it shouldn't be the case- I just don't follow your reasoning (but I'm quite interested in the claim).

Is it because the largest determinant in which answer is correct is the quality of the student?

patmcguire · on Oct 3, 2016

You'll have trouble with the hard questions, though, the ones toward the end where half the high quality students get it right and the other half fall for the trap answer.

zakki · on Oct 4, 2016

Doesn't it become a prejudice towards minority answer?

coredog64 · on Oct 3, 2016

The author has been pretty good at documenting a number of simple projects that can start you off in OpenCV.

Disclaimer: I bought his eBook bundle.

canada_dry · on Oct 3, 2016

Whats especially great - and differs from most bloggers that are shamelessly shilling their wares - is that he provides a ton of excellent content on his blog.

His book takes it to another level, but isn't necessary to get your feet wet.

mountaineer22 · on Oct 3, 2016

How are the eBooks? Do you like them?

coredog64 · on Oct 3, 2016

I am very happy with them. It wasn't full of fluff but was very targeted. I have the prior edition, so at some point I'll have to go back and buy the basic package again to upgrade.

honkhonkpants · on Oct 3, 2016

So back in the Bronze Age my teacher would collect all the scantron cards -- which by the way can already be automatically graded; that is the entire point of the scantron -- and put them on the overhead projector with the correct answers masked out. Overhead projector was more than bright enough to shine through the paper card. The teacher could easily grade the entire class in a minute or two, no cameras or computers needed.

coredog64 · on Oct 3, 2016

I had a biology teacher who would put them into a plywood form and drive a nail through a giant stack of papers.

honkhonkpants · on Oct 3, 2016

I don't clearly understand how that is effective, but it sounds like fun, and a good way to impress kids.

coredog64 · on Oct 4, 2016

She would drive nails through the forms and then count where the marked answer matched the hole. I suspect she got her husband to help too.

It was easy for the students to double check that their tests were marked correctly as well.

sah2ed · on Oct 3, 2016

I don't understand your story. How would the teacher use the projector to speed up scoring?

milesokeefe · on Oct 3, 2016

The light would shine through each student's answer card and key, like when you press two sheets of sketch paper together. So any mismatches would be easy to see and mark as you scan down the card. It would only take a few seconds to score each card and there would be no cognitive load at all, as in you wouldn't have to evaluate any answers.

epalmer · on Oct 3, 2016

This is pretty cool. I am getting ready to embark on camera projects with a Raspberry Pi 3 and saw this project. Since I will be using python and OpenCV anyway this is a motivator.

zionsrogue · on Oct 3, 2016

Combining the Raspberry Pi + Python + OpenCV is a lot of fun. If you feel like sharing the details on your project I might be able to point you in the right direction.

epalmer · on Oct 3, 2016

Send me an email and let me know what you had in mind. epalmer _at_ richmond _dot_ edu

epalmer · on Oct 3, 2016

You wrote the ebook right? I think I'll get it.

zionsrogue · on Oct 3, 2016

Correct, I am the author of Practical Python and OpenCV.

swyphcosmo · on Oct 3, 2016

Do you take off points if they don't use a #2 pencil?

mandudebruh · on Oct 3, 2016

Here's a similar service: http://bubblevision.org/

andersonfreitas · on Oct 3, 2016

An open source alternative: http://www.formscanner.org/

honkhonkpants · on Oct 3, 2016

If only I had Adobe Flash, I could learn all about this!

dylz · on Oct 3, 2016

VLC -> open network -> http://embed.wistia.com/deliveries/5bed39f586d77c4436f569d16... if you actually want to

tropo · on Oct 4, 2016

Processing well-known forms (AP, PSAT, SAT, etc.) would be sweet. The one for AP is 4 pages in color, but one might have a greyscale printer or decide to print only one page of it.

jrcii · on Oct 3, 2016

I just need 20 years to modify this to grade my short and long answer tests and I'll be sitting pretty.

teach · on Oct 3, 2016

If Gradescope[0] starts allowing public signups it might be sooner than you think....

[0] https://blogs.nvidia.com/blog/2016/09/02/gradescope-brings-a...