What you could do is assemble the data in tabular form so that your data is in t...

pks2006 · on Feb 15, 2017

First - great answer, and thanks for time and response! And now, for some issues, the RCA depends on the order of the syslogs. For some complex issues, the RCA changes based on what path the code took making the order of the syslog change and hence the RCA. Guess I will have to spend some time to incorporate syslog order to the table format you are suggesting.

cr0sh · on Feb 15, 2017

If it's possible to split the log out into a more granular format, beyond what fnbr has suggested, then it can potentially be used with more complex models; keep the issue as the "label", and the "system log" (or a hash representation?) as well - but if the log entry can be broken up into other data points, it can be useful in other ML methods.

Then again, if the log entry has a somewhat set length (or can be truncated), you could feed that in as the input to a CNN (one input node/neuron per character), and the output layer could consist of the issue labels. I'm not sure what if anything that could net you; perhaps an unknown log could be input on the trained network, and it could classify it to an existing issue?

fnbr · on Feb 16, 2017

If you can upload a sample log, I'd be happy to take a look and try to provide some more specific guidance (email's in profile).

I did some work using stack traces to predict duplicate bug reports, so I'm somewhat familiar with a similar problem.