Show HN: Real world (Jupyter notebook embed) way to assess data scientists

soVeryTired · on Aug 29, 2019

What you're asking for looks a lot like a kaggle challenge. You're asking if someone can use xgboost (or linear regression). It's the last ten per cent of a data science problem, if that.

The hard parts of data science include the following:

  - choosing the right input data (rather relying on regularisation)
  - figuring out what the consequences are if you're wrong in a specific way, and avoiding the bad cases
  - wrangling your data into a nice CSV format
  - handling missing data
  - spotting biases in your data collection methodology

I'd expect a graduate to know about regression. For anyone else, this wouldn't help me assess their skills.

anilgulecha · on Aug 29, 2019

Correct, soVeryTired.

This ShowHN was to showcase the platform. Technically any csv/datasets/notebooks can be loaded in, and candidates for interviews can be asked to do any of the things you listed. The challenge you took was to showcase one specific example (regression)

shikharja · on Aug 29, 2019

@soVeryTired, the challenge showcased in the test does involve a lot of wrangling, handling missing data points, spotting bias and identifying the right features for the regression model. The challenge is designed to allow for a candidate's creativity.

Do you think the data set we have used doesn't do it to the extent you'd expect from a data scientists?

tfehring · on Aug 29, 2019

Any assessment that directly provides data sets - even with "gotcha"s like missing values - is testing, based on the conventional wisdom, at most 20% of a real-world data science workflow. And IMO it's the least critical 20%.

The only good end-to-end "technical" data science assessment I can think of is to pose a broad question or business problem that's addressable by applying data science techniques to publicly available data. But a nontrivial version of that assessment would take half a day on the very low end, and long assessments anti-select against good candidates.

IMO, when it comes to evaluating data scientists, the only thing that online coding assessments are good for is to ensure that they can perform basic coding and data manipulation tasks. (I'd include tasks like web scraping, image manipulation, API calls, and ORM stuff in this category). Everything else needs to be evaluated in person.

shikharja · on Aug 29, 2019

Do you think candidates looking for Data Science jobs would be open to performing a half a day exercise?

We optimized these challenges to allow candidates to show as much of their skills they can show in a timed window, without killing their creativity. I'd be curious to know what do think is a good way to interview data scientists.

tfehring · on Aug 30, 2019

I think some candidates would be open to performing a half-day exercise. But the best candidates wouldn't, which is what drives the anti-selection I mentioned in my previous comment. More broadly, I don't think it's realistic to create an assessment that's representative of real-world data science workflows without being onerous enough to exclude good candidates.

If representative isn't an option, highly correlated is the next best thing. In practice, for my team specifically, this means screening for math aptitude and general business acumen during a phone screen, data manipulation (moderately complex SQL + tidyverse/data.table/pandas) during a "take-home", and delving more into problem solving approach, model selection and validation, etc. during an onsite. Broad business questions (e.g., "How does a life insurance company make money?") and communication skills generally weed out the candidates who picked up the bare minimum math and programming background through Kaggle + MOOCs.

As an aside, I absolutely think that the sort of assessment in the OP kills creativity. I care a lot about whether a candidate would think to include covariates like Internet usage and segmented urban population when predicting mortality rates; I don't care at all whether they're able to write the trivial amount of code that's needed to include those covariates in a model, given a data set that already contains them.

minimaxir · on Aug 29, 2019

Typically take-home tests in any SWE field don't account for everything as it's a layer of screening: that's what an on-site is for (even better: adjust the on-site to address the results of the take-home).

tfehring · on Aug 30, 2019

I completely agree, and that's exactly my point: the median data science screening process tries to be all-encompassing to an extent that would seem ridiculous (for a "take-home" assessment) in any other technical field.

dannykwells · on Aug 29, 2019

Working data scientist here. As many have said, this is, effectively a Kaggle challenge. I mean honestly at this point, I don't care, at all, how well someone can predict anything to be a data scientist - there is very little correlation between that and how good of a data scientist they are.

Tools to hire data scientists and going to continually fail until they realize that the interesting, hard part of being a data scientist is closer akin to a business lead (which can't really be tested in 60 minutes).

Concrete feedback:

- You ask for writing and descriptions on why a model was chosen, why features matters - are you grading this automatically? That would be a feat.

- The task is waaay to easy (even if you do believe there is a market for identifying people who can predict well).

- Python is overly limited. Why not SQL or R?

minimaxir · on Aug 29, 2019

Disclosure: Got a preview of this product, my opinions only.

> You ask for writing and descriptions on why a model was chosen, why features matters - are you grading this automatically? That would be a feat.

Grading is apparently not automatic, which is good as I am not a fan of the Kaggle approach in this demo.

> The task is waaay to easy (even if you do believe there is a market for identifying people who can predict well)

You'd be very surprised about how candidates can respond to these types of questions!

> Python is overly limited. Why not SQL or R?

The full product allows Python, R, and Julia, with popular packages preinstalled for Python/R.

okl · on Aug 29, 2019

Isn't the most important part weeding out the most unqualified applicants? For that purpose such a test might be fine.

shikharja · on Aug 29, 2019

Thank you for the feedback!

- We are not automatically grading it. We have learned that in the past that trying to automatically grade candidates on such challenges biases their approach, which breaks the point of a good data science challenge.

- that's good to know. we are not really focusing on the final outcome but how creatively a candidate can go about the problem. the dataset allows for some good amount of creativity.

- Ah. can you elaborate what you mean by overly limited? We do support R.

iamwil · on Aug 29, 2019

What is a business lead?

data4lyfe · on Aug 29, 2019

I am in the camp to think that this notebook judges data scientists in a way that soon will be obsolete.

If I'm given this clean dataset with all of the features properly set in columns and data types labeled, I could spin up Azure or Google Cloud's ML capabilities and have them run gridsearch and optimize my model.

To test data scientists, it seems like it's generally falling more into the buckets of people that can pull analytics and query databases to create the datasets and features OR people that can build the infrastructure to serve models, engineer pipelines, etc...

FYI though we're working on this now at https://www.interviewquery.com to try to start creating suitable tests to assess data scientists without having them do 10+ take homes every month.

minimaxir · on Aug 29, 2019

> If I'm given this clean dataset with all of the features properly set in columns and data types labeled, I could spin up Azure or Google Cloud's ML capabilities and have them run gridsearch and optimize my model.

That's the fault of the test design allowing such techniques without scrutiny, not with the Notebook format.

enahs-sf · on Aug 29, 2019

Gathering data and doing analysis seems like a much better skill set to look for than can this person press run on this model. I hope you guys can figure it out; interviewing in the software industry is broken.

ryanferg · on Aug 29, 2019

I'm a data scientist (for an MLB team that will win the WS this year!) and I love this. Of course this isn't a whole end to end evaluation platform. But we will get 300-500 applications for a position sometimes, and often folks have no business applying and this would be a great way to filter out some of the noise. Great job!

shikharja · on Aug 29, 2019

That's great to hear Ryan! You can sign up for the free trial for a full experience here - https://www.hackerrank.com/products/free-trial.

shikharja · on Aug 29, 2019

If would like to see more Data Science questions (not available in the free trial), I'd be happy to give you a demo. Let me know how to reach you.

pequalsnp · on Aug 29, 2019

This was pretty cool. For fun, I tried to get the best possible score I could, using XGBoost, without any feature engineering and achieved a MAE of 0.042422154541399665.

morelandjs · on Aug 29, 2019

No one does good science in 60 minutes.

minimaxir · on Aug 29, 2019

True. But as long as all candidates have the same time limit / same expectation of work depth, and the test providers have a reasonable expectation of how much can be accomplished in that timeframe, then it's fair.

That said, this demo should have a several hour time limit.

listenallyall · on Aug 29, 2019

A standardized, precise 40-yard dash might be fair. But it is also pretty useless if you are evaluating runners for a 1 mile race, or a marathon.

kthejoker2 · on Aug 31, 2019

I like this, let's consider how do you evaluate someone for a marathon.

Average marathon time is about 4.5 hours. Let's say you expect you're hired to last around 9 years (average is 8 according to GlassDoor but you're woebegone)

So scaling linearly, for every hour you get to assess a job candidate, you'd get 0.2 seconds to evaluate a potential marathoner.

Assuming a typical candidate gives you maybe 12, 24 hours tops, you have all of 3 to 5 seconds to evaluate a marathoner.

The futility of such an exercise is obvious.

The only solutions are to:

* insist on a longer evaluation period

* hire them on a probationary basis

* only hire people who have ran official marathons before, with proof and such

I leave it as an exercise for the reader to determine what's right for their own evaluation process.

shikharja · on Aug 29, 2019

That is correct. We had to reduce the time duration to be able to handle the traffic. We recommend 90mins for this challenge. What do you think would the right amount of time for this problem?

b_tterc_p · on Aug 30, 2019

I think this is good as a way to filter people out, but not as a way to rank people to find the best.

I would want to see a short script to clean and predict a dataset, plus a small description of why choices were made.

Wouldn’t care much about the performance of the model.

tryitnow · on Aug 30, 2019

I think this is great as a self-assessment tool, especially for beginners.

It would work great with other learning tools, like MOOCs, datacamp, dataquest.io, as part of an overall data science learning process.

I'm more skeptical of its ability to help companies select candidates, but I could be very wrong about this and if I am then it's a huge win, so thanks for developing it.

I am super interested in seeing how you all develop this in the future,there's a lot of potential here. Is there a data science specific mailing list I can sign up for? I honestly, have zero interest in hiring for other roles so I am not going to sign up for a general mailing list.

shikharja · on Aug 30, 2019

If you are interested, we are looking for data scientists and developers in general from the community to help us built these solutions and provide us with honest feedback. We are also looking at building support for other Data Science roles like a Data Engineer. I would be more than happy to show you what we have and hear your thoughts on the same. Let me know if you'd like to be a part of it and how can I reach out.

anilgulecha · on Aug 30, 2019

There isn't a specific mailing list. However you can sign up for a one off webinar that will go into a lot of detail on the hiring manager flow. It's at the bottom of the launch blog post.

https://blog.hackerrank.com/hackerrank-projects-supports-dat...

sireat · on Aug 30, 2019

I suppose I am in the minority but I thought it was a pretty good FizzBuzz challenge for DS.

In fact I'd say it is a bit aggressive for a 60 minute challenge.

Quite a bit of data wrangling is expected to complete modeling on all columns. Some regex knowledge would help here too (for example for wrangling internet_users column)

What was the idea behind asking for 20 most important features when we have 16 columns? Is it expected to do some feature engineering?

Disclaimer: I teach Python and basic Data Science to adults and I'd say most people would struggle to complete this in 60 minutes including myself.

shikharja · on Aug 30, 2019

We had to reduce the time limit on the test to handle the traffic. The intended test duration is 90-120mins. I have updated the test duration to 90mins now.

There is indeed some feature engineering involved. The challenge in the test can indeed be solved in the most obvious way possible, as well as in the most creative fashion. We believe how a Data Scientist goes about solving the problem was more important than a fixed outcome.

kthejoker2 · on Aug 31, 2019

Data scientists are fundamentally problem solvers.

The best way to assess technical problem solving is a structured hackathon. That is, to be given

* a problem with multiple subproblems and solution milestones

* with both objective and subjective criteria

* freedom of tools

* a "junkyard" of resources

* a fixed amount of time for each deliverable of the problem

And then you observe the process and the results.

For data science, the subproblems should be:

* requirements gathering / understanding the problem

* data acquisition, prep, and analysis

* refinement of requirements / communication

* feature engineering

* modeling

* presentation / storytelling / viz

ska · on Aug 29, 2019

Here are my high level thoughts after a quick look at the question and some clarification in thread below.

- a single question is difficult to evaluate. "Answering a business question" is at the very end, usually, of a bunch of exploratory steps

- 60 min is reasonable but not much time to evaluate real work. You either need to expand the time (also a problem, for interviewing) or allow scoring of "what I'd do next"

- tooling familiarity is going to be a huge factor with short time. Are you testing general knowledge or environment knowledge?

- too focused on models, too "kaggle-like". That covers about 20% of the skills and job.

Here are the sorts of things I look for. Do they understand:

1. How to verify & validate data, clean inputs, handle coding errors and ELT type issues

2. How to evaluate data set issues like bias, missing data and outliers, and account for that (and when you can't)

3. (situational) How their infrastructure works an what they need it to do (e.g. for distributed training, if appropriate). How to use it effectively.

4. How to control data and code throughout lifecycle, so you don't waste time and experiments

5. How to choose between approaches and models

6. How to evaluate performance rigorously

7. How to monitor performance over time

but here is the kicker

8. How do you know you are trying to solve the right problem?

For junior people, the emphasis will be on the earlier points. For senior people the last point is key.

Your question partially addresses some of the early points only.

Off the top of my head suggestions.

- Have separate stages. Cleanup & verification can have objective and subjective issues (missing & corrupt data? Outliers?)

- Don't focus too much on modeling, it's the least interesting part.

- Allow different toolsets possibly (e.g. R)

- Initial cleanup/eval stage on a CSV, but following stage pull from SQL?

- Possibly allow multiple inference choices from same or a few data sets. Give a short list of things the "business" is interested in, they pick and describe why

- good idea to focus a bit on producing one/two graphics/tables to communicate to a lay audience.

- more focus on verification

- add a validation discussion requirement. How are you going to know what you did is worth doing?

- add a "next things I would try/do"

The latter is going to be text heavy but no way to avoid this unless there is a follow on voice/personal interview.

There isn't any way you are going to auto score this stuff reliably, so that's probably ok. Consequence is your evaluators are going to actually have to be good at this.

shikharja · on Aug 30, 2019

Thank you ska. This is pretty insightful, and actually makes sense. We will try to incorporate your suggestions as we create more challenges. We are also looking for more data scientists and developers in general to help us built these solutions, review them and share honest feedback. Would you like to work us on the same? I would love to hear your thoughts on the new features we are building for more Data Science roles. If yes, let me know how to reach out to you.

ska · on Aug 30, 2019

Happy to discuss that, I have a lot of related experience that might help you. Where can I reach you by email?

PLenz · on Aug 30, 2019

Data scientists (in my opinion as one) should be spending most of our time listening and talking to our colleagues, our clients, our peers, teachers, and - last of all- to the data. Last because we dive into the data searching for things, answers to questions, people asked for and that other people inform our journey and our methods.

Counter-intuitive in regards to the phrase but good ds is people work first and data work like 9th.

rvivek · on Aug 29, 2019

Hello folks, would love your feedback on our new product to assess data scientists.

ska · on Aug 29, 2019

I have a number of concerns about the efficacy of this, but they are made more difficult to rank by not understanding how you are planning to evaluate and use the results.

Can you elaborate?

anilgulecha · on Aug 29, 2019

Evaluation is subjective at the moment, by a review of the jupiter session by the hiring managers.

For certain data science usecases, evaluation is possible by using a CSV output bu a user, and comparing that to an expected CSV.

(I worked on the product).

ska · on Aug 29, 2019

Ok, although I would be wary of using a numerical comparison for anything except catching obvious errors.

I should have asked this before, but "data science" is a pretty broad term - who are you hoping to target with this? I'm guessing for pretty junior positions but want to clarify.

Oh, and one other question, do you /can you enforce the 60 min time? [edit: never mind, I answered this experimentally, you do cut it off at 60 min]

shikharja · on Aug 29, 2019

Agreed. Data Science is a very broad term. The challenge is designed for Data Scientists. We are trying to target all experience levels as of now through a screening/take-home test that should take about 60-90mins at a stretch. Do you think the timed challenge should be different for senior vs junior data scientists?

What skills would you consider important for senior vs junior Data Scientist?

ska · on Aug 29, 2019

I've added some top level comments.

For what it's worth, I think junior and senior DS roles should have fairly different evaluations & interviews.

peterbell_nyc · on Aug 29, 2019

I think at best this is fizzbuzz for DS, which is not inherently wrong. It's nice to know a software developer can write a loop and a data scientist can use a JN, so for weeding out people who have no practical experience with a given tool set, it could make sense.

The question then is how do you algorithmically (or even just consistently) distinguish a great data scientist from one who can accurately model answers to a question that was badly thought out?

Plus as pointed out before, the length of a take home could reduce applications from the most qualified candidates.

Wonder if this should be even shorter and more quiz/fun like so it intrigues rather than annoying more senior applicants, and still wondering the best way to identify the data scientists who ask better questions.

shikharja · on Aug 29, 2019

ska, our aim with the challenge was to allow candidates to not be biased by a fixed outcome and try to solve the problem as they would solve any real data science problem.

This meant we couldn't automatically score/rank a candidate's solution. We do provide them with an evaluation metric in the problem description (Mean Absolute Error). Here is scoring rubric we provide to the interviewers when they review the submission - https://d.pr/i/hNYY0u

Would love to hear more opinions on our scoring rubric

ska · on Aug 29, 2019

That's useful information, thanks. I'll add some thoughts on the rubric.

[edit]. Initial thoughts:

- "data wrangling" scoring difficult given this task - more weight to "rationale", that's more important the "performance", here.

- not enough focus on communication capabilities

- really need something on validation

- "proficiency" measure you use is pretty much impossible to accurately evaluate from your example question

- way too much weight to modeling section overall

shikharja · on Aug 29, 2019

By 3rd bullet, you mean Data Validation?

ska · on Aug 30, 2019

both data validation and model validation are important.

Of course more abstractly you want to validate the entire process - but I was referring to those two, as it is hard to see how to address the latter in a format like this.

tzm · on Aug 30, 2019

This test is a relative assessment that tests the employer more than the employee.