Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What you could do is assemble the data in tabular form so that your data is in the shape:

    Issue     System log
    -------- ------------
    issue_1   corresponding system log
    issue_2   corresponding system log
    issue_3   corresponding system log
    issue_4   corresponding system log
    issue_5   corresponding system log
Once you've done that, you can train some sort of classifier on it, e.g. something like [1]. There's a bunch of stuff you want to do to make sure you're not overfitting (I'd scale your data & use 5-fold cross validation), but that would get you started.

[1]: http://scikit-learn.org/stable/tutorial/text_analytics/worki...



First - great answer, and thanks for time and response! And now, for some issues, the RCA depends on the order of the syslogs. For some complex issues, the RCA changes based on what path the code took making the order of the syslog change and hence the RCA. Guess I will have to spend some time to incorporate syslog order to the table format you are suggesting.


If it's possible to split the log out into a more granular format, beyond what fnbr has suggested, then it can potentially be used with more complex models; keep the issue as the "label", and the "system log" (or a hash representation?) as well - but if the log entry can be broken up into other data points, it can be useful in other ML methods.

Then again, if the log entry has a somewhat set length (or can be truncated), you could feed that in as the input to a CNN (one input node/neuron per character), and the output layer could consist of the issue labels. I'm not sure what if anything that could net you; perhaps an unknown log could be input on the trained network, and it could classify it to an existing issue?


If you can upload a sample log, I'd be happy to take a look and try to provide some more specific guidance (email's in profile).

I did some work using stack traces to predict duplicate bug reports, so I'm somewhat familiar with a similar problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: