I've been meaning to finish my LendingClub machine learning underwriter. There's a lot of data to train on since all historical loan data is available.
The exact model to use is tricky though. You could train a classifier to detect whether a loan will default or not, but this doesn't weigh the chance of default with the interest rate. I then thought of doing a regression on the expected return, which would properly balance between interest rate and default rate. At some point, though, you need to make the binary decision of whether to invest in the loan or not, again classification.
Another complication is that since LendingClub has been growing exponentially, the majority of the loans they've issued haven't matured yet. Utilizing this partially complete data is even more tricky. You could ignore non-mature loans, but that would reduce your training data significantly and make your data at least 3 years old.
The exact model to use is tricky though. You could train a classifier to detect whether a loan will default or not, but this doesn't weigh the chance of default with the interest rate. I then thought of doing a regression on the expected return, which would properly balance between interest rate and default rate. At some point, though, you need to make the binary decision of whether to invest in the loan or not, again classification.
Another complication is that since LendingClub has been growing exponentially, the majority of the loans they've issued haven't matured yet. Utilizing this partially complete data is even more tricky. You could ignore non-mature loans, but that would reduce your training data significantly and make your data at least 3 years old.