> I mean sure, nlp, parse some papers and make some sort of search/q&a type of thing? Fine, whatever.
I think you severely underestimate the effort and expertise required for getting decent content-aware search, let alone a fully functional question-answering pipeline.
These are their own subfields of research in text mining.
The fact that you conflate these as if they were some trivial task on a new dataset shows your lack of understanding of the field.
Biomedical text mining [1] is wide a subfield with plenty of open datasets and competitions such as the bi-annual ACL BioNLP workshop [1].
Furthermore existing knowledge-base creation and information extraction pipelines such as protein-protein interaction extraction, NER, event extraction, drug-drug-interaction minin, etc. could be applied to this novel dataset and provide useful insights for researchers and staff.
> These are their own subfields of research in text mining. The fact that you conflate these as if they were some trivial task on a new dataset shows your lack of understanding of the field.
On the contrary, I've built many, but in this context I see it as a waste of time. As I said in another reply, it's a resume/portfolio task with no real world application.
I think you severely underestimate the effort and expertise required for getting decent content-aware search, let alone a fully functional question-answering pipeline.
These are their own subfields of research in text mining. The fact that you conflate these as if they were some trivial task on a new dataset shows your lack of understanding of the field.
Biomedical text mining [1] is wide a subfield with plenty of open datasets and competitions such as the bi-annual ACL BioNLP workshop [1]. Furthermore existing knowledge-base creation and information extraction pipelines such as protein-protein interaction extraction, NER, event extraction, drug-drug-interaction minin, etc. could be applied to this novel dataset and provide useful insights for researchers and staff.