have you considered offering it as a cloud platform? we're doing something along...

dalke · on July 21, 2020

Thanks!

Yes, I've considered cloud platform. There are several big difficulties with that.

First, data. It's easy to grab public data from PubChem, ChEMBL, and a few other projects, and make a service. But why would anyone pay for it given that PubChem, ChEMBL, ChemSpider, and others already provide free search services of that data?

There's search-as-improved-sales, like how Sigma-Aldrich lets people do a substructure search to find chemicals available for sale.

There's value-add data. eMolecules includes data from multiple vendors, to help those who want to purchase compounds more cheaply.

Or there's ZINC, which already provides search for their data.

So you can see there's plenty of competition for no-cost search. I don't have the ability to add significantly new abilities that people are willing to pay for.

Note also there's a non-trivial maintenance cost to keep the data sets up-to-date.

Second, the queries themselves may be proprietary. I talked with one of the eMolecules people. Pharmaceutical companies will block network access to a public services to reduce the temptation of internal users to do a query using a potential $1 billion molecular structure (or potential $0 structure). eMolecules instead has NDAs with many pharmas which legal bind them. Managing these negotiations takes experience I don't have, and neither do I have the right contacts at those pharmas.

Sequences don't have quite the same connection between sequence and profit as molecules do.

BTW, part of the conclusion of my work is that people don't need a cluster for search - they can handle nearly all data sets on their laptop, so there shouldn't be a need to scale up any more. And small molecule data has a much smaller growth curve than sequence data, so Moore's Law is keeping up.

My first customer, who continues to be a customer, said outright that they would not buy if it were under GPL.

Since my paying customers are pharmaceutical companies who, as a near-rule, don't redistribute software, it doesn't really matter if they don't redistribute under MIT or don't redistribute under GPL.

I came into the project in part to see if FOSS could be self-supporting on it own. AGPL is often used as a stick to try to get people to use a commercial license - the implicit view of the two-license model is that FOSS is not sustainable. Which is now my conclusion, for this project and field.

fock · on July 21, 2020

not really into industry, but a) the pharma-companies using it are probably reluctant to give you their data and b) uni researchers are not overly fond of high-fee services and labor is cheap there.