Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Serra (YC S23) – Open-core, Python-based dbt alternative
139 points by Alanhlwang on Aug 14, 2023 | hide | past | favorite | 84 comments
Hey HN! Alan and Albert here, cofounders of Serra. Serra is end-to-end dbt—we make building reliable, scalable ELT/ETL easy by replacing brittle SQL scripts with object-oriented Python. It’s open core: https://github.com/Serra-Technologies/serra, and our docs are here: https://docs.serra.io/documentation/.

I stumbled into this idea as a data engineer for Disney+’s subscriptions team. We were “firefighters for data,” ready to debug huge pipelines that always crashed and burned. The worst part of my job at Disney+ was the graveyard on-call rotations, where pagers from 12am to 5am were guaranteed, and you'd have to dig through thousands of lines of someone else’s SQL. SQL is long-winded—1000 lines of SQL can often be summarized with 10 key transforms. We take this SQL and summarize those transforms with reusable, testable, scalable Spark objects.

Serra is written in PySpark and modularizes every component of ETL through Spark objects. Similar to dbt, we apply software engineering best practices to data, but we aim to do it not just with transformations, but with data connectors as well. We accomplish this with a configuration YAML file—the idea is if we have a pipeline with said 1000 line SQL script that is using third-party connectors, we can summarize all of this into a 12-block config file that gives easy high-level overhead and debugging capabilities—10 blocks for the transforms and 2 for the in-house connectors. Then, we can add tests and custom alerts to each of these objects/blocks so that we know where exactly the pipeline breaks and why.

We are open-source to make it easy to customize Serra to whatever flavor you like with custom transformers/connectors. The connectors we support OOB are Snowflake, AWS, BigQuery, and Databricks and are adding more based on feedback. The transforms we support include mapping, pivoting, joining, truncating, imputing, and more. We’re doing our best to make Serra as easy to use as possible. If you have docker installed, you can run this docker command to instantly get setup with a Serra environment to create modular pipelines.

We wrap up our functionality with a command line tool that lets you: - create your ETL pipelines, test them locally with a subset of your data, and deploy them to the cloud (currently we only support Databricks, but will soon support others and plan to host our own clusters too). It also has an experimental “translate” feature which is still a bit finicky, but the idea is to take your existing SQL script and get suggestions on how you can chunk up and modularize your job with our config. It’s still just a super early suggestion feature that is definitely not fleshed out, but we think it’s a cool approach.

Here’s a quick demo going through retooling a long-winded SQL script to an easily maintainable, scalable ETL job: https://www.loom.com/share/acc633c0ec03455e9e8837f5c3db3165?.... (docker command: docker run --mount type=bind,source="$(pwd)",target=/app -it serraio/serra /bin/bash)

We don’t see or store any of your data—we’re a transit layer that helps you write ETL jobs that you can send to your warehouse of choice with your actual data. Right now we are helping customers retool their messy data pipelines and plan to monetize by hosting Serra on the cloud, charging if you run the job on our own clusters, and per API call on our translate feature (once it’s mature).

We’re super excited to launch this to Hacker News. We’d love to hear what you think. Thanks in advance!




Congrats on the launch!

Interesting project in a space that I am pretty certain is going to change a lot in the coming years. Here is a bit of random feedback and questions.

* Some of your messaging related to python vs yaml is a bit confusing, which results in me not being immediately clear on the value prop. After digging through docs and code I now understand that the yaml is a declarative pipeline calling the underlying python code that can include user defined transformations. Nifty! As someone who has led data platform teams, I understand that this would be a big win for any data platform team to better support data eng/scientists. But you don't tell me any of that. I would look at trying to give more context to what this is and adding more of these use cases and values in your marketing (even if they are pretty nascent at this stage)

* From the loom, the play you are doing is clear and makes a lot of sense to build a cloud service to easily run these jobs... but that makes me wonder if your licensing choice is maybe a bit too restrictive? IMHO, the most important thing to do when building dev tools is to be very deliberate in your end-to-end user -> customer journey and designing your open source and commercial strategies to nicely dovetail. For a product like this, I would think the faster and bigger I can build a community, the better, and that may mean "giving away" a lot of the initial core innovation, but with a clear plan on the innovation I can drive through integrated services, which would imply as open as a license as possible. As is, I think you might find it much harder to get people to take it serious, as, unlike other source available companies (Elastic, Cockroach, etc) you aren't yet proven to be worth the effort to get this approved vs a full open source alternative

* On a similar note, what is in the repo right now seems to be a relatively thin wrapper around spark. That isn't a criticism. Many technologies and communities have started based on a "remix" of a lower level tool that offers simplified UX/DX or big workflow improvements. What sets those apart though, imho, is to drastically lower the barrier to entry to using the underlying technology and to be seen as leaders and experts in the space you operate. I am guessing you probably have lots of features planned, but I would also give a soft suggestion to look as much into thinking of learnability as a feature (via features, interactive docs, etc) as I would almost anything else, as that is really where a lot of the value of a higher level interface like this comes in

* My past experience with really large and complex ETL jobs that essentially required dropping into spark to represent them has me wonder how much actual complexity can be represented by the transformers? I would be curious to know what your most complex pipeline is? It doesn't seem there is an API limitation why these pipelines couldn't get quite a bit larger and represent many sql statements, other than big long spark pipelines getting kind of ugly, and in some cases, could even remove the need for quite a few airflow jobs. I am curious to know if and how you see Serra addressing those sorts of problems like those types of ETL jobs.

Once again, congrats on launching! Happy to give more context/thoughts in a thread or reach out to me via in profile


This is super insightful thanks a ton for this gold mine.

On the python vs yaml part—definitely could've made that way more clear in the demo. Right now are framework lets you call these python objects in your yaml file, but we are working on just a python-centric implementation as well for those that do not want to interact with yamls.

On the loom and licensing choice—that's a great point. One of the main issues we ran into is getting adoption as we originally just tried licensing out the framework (mega fail ofc)—found out the hard way that no dev wants to buy something to try it out. We're definitely flexible on our license and will take all this feedback into account.

On the barrier of entry—also super insightful. We're working on a local UI offering that will be a 'config' block builder that will be free for all installs. We're implementing a DAG view similar to Airflow on the transform level. We also want to make it super easy to see your code and preview how it changes with this local UI (and have a list of all the params you need for your spark objects without having to go through the docs). We also want to flesh out more features especially on the translate side, as well as host on the cloud.

With the complexity issue that's something I ran into Disney as well! As the product grows we definitely want to flesh out our transformers based on the scripts we see. For now, the developer can make one-off transformers—we actually have a catch all "SQL transformer" for cases where you want to just pass in your sql (similar to a dbt model) and run it that way. That way it's a fail safe for if you have one specific transform that you feel is super hard to break down, you can fall back on dbt's way of just modularizing the SQL into a transform, and reference it however many times you want as an input block later on.

Thanks so much for the congrats, will definitely reach out and would love to have further discussions in the thread as will.


The pattern of reading from data sources to a Pandas DataFrame first defeats the whole point of using Spark[1]. Maybe it's ok for small tables, but you'll probably run out of memory on large tables.

[1] https://github.com/Serra-Technologies/serra/blob/a7a80c77af5...


Moving between Spark and Pandas can cause type casting as well. For example the range of allowable dates in Pandas is much smaller than in Spark. We completely abandoned Pandas in favor of PySpark for this reason.

It seems unnecessary to use multiple dataframe implementations when Spark is already in play.


Are you referring to pandas.Timestamp.max being 2262-04-11 23:47:16.854775807 ?

https://pandas.pydata.org/docs/reference/api/pandas.Timestam...

(pandas design choice was to support nanosecond times, for financial data.)


Yes. Unfortunately I’m dealing with an app that likes to use multiple magic dates way past the Pandas range.


"much smaller range" seems disingenuous without saying that you mean "not beyond 2262". And you said those aren't real dates, only magic dates or sentinels. So that's a totally artificial requirement. And you could fix the magic dates up at conversion with a simple replacement script.

* MS-DOS supports dates from 1/1/1980 to 12/31/2099

* 32b Linux (or Windows 7) supported timestamps up to 2038

* 64b timestamps fixed all thia already, and presumably OSes will be using 128b datetimes well before 2099 if not sooner.


The RDBMs in this case accepts 9999-12-31 as a valid date. Pandas does not. This is where the issue came in, and switching to PySpark meant we needed no date manipulation to handle the data supplied by the upstream.

Magic dates suck, but they exist in the wild. There are also valid cases where data is not tied to the lifetimes of humans currently writing code.

The range of values for date values in PostgreSQL is 4713 BC to 5874897 AD:

https://www.postgresql.org/docs/current/datatype-datetime.ht...


Ah I see your point. Yeah I noticed SQL goes up to 9999.


This is a completely valid point, we'll be changing the readers to directly read into Spark. Thank for the comment!


That's a smell. I thought they basically packaged the ETL portion of DBT as open source, not the data connector implementation. I'd like it be connector agnostic so that you can choose the most suitable for your needs.

Good intentions, but perhaps wrong execution. We'll see!


If the selling point is "replacing brittle SQL scripts with object-oriented Python" you should have at least one example of this code in the README!


I don't think SQL can be more brittle than untyped object-oriented Python. The latter is as brittle as it gets, as we all know from ML code. SQL is common for ETL exactly because it's not brittle, (mostly) declarative and easy to test and modularize.

What Python give you is more flexibility, at the expensive of exponentially more brittleness.


> SQL is common for ETL exactly because it’s not brittle,

SQL is common for ETL because typically at least one, sometimes both, ends of an ETL operation is an RDBMS for which SQL is the standard language. It has nothing to do with lack of brittleness.


I guess it's surprising then that both Hadoop/Hive and Spark, which were the originators of SQL for ETL, typically work on data lakes instead of RDBMSs. In fact, RDBMs support didn't come for a long time. The choice of SQL has nothing to do with RDBMs. It's because SQL is a declarative language that's easy to parse and convert into a physical query plan that can be parallelized and optimized extremely well. Why is that? Because it's not a general-purpose imperative loosely typed brittle language like Python.


> Hadoop/Hive and Spark, which were the originators of SQL for ETL

They weren’t.

I guarantee you, before either of those existed, when Data Warehousing was often done with a different version/configuration of the same brand of RDBMS as the transactional store (the latter likely using something closer to a normalized schema, the former using a star or snowflake schema), using SQL for ETL was absolutely normal.

Which is why newer data warehousing / data lake systems support SQL even though they aren’t RDBMSs: a couple decades of RDBMS dominance made it the JavaScript of data storage.

> Because it’s not a general-purpose imperative loosely typed brittle language like Python.

Its not general-purpose or imperative, its just as much “loosely typed” as Python (both Python and SQL are strongly typed.)

Its not clear what concrete meaning “brittle” is supposed to have in this claim, so I can’t evaluate its accuracy.


Definitely, I can jump into what we meant by brittle—we mainly meant that SQL scripts are hard to debug/undescriptive, you can't parametrize and customize error messages that you receive from transforms, and you can only execute one complete statement at a time that are often chained together with CTEs (which is a nightmare if its a statement of 400 lines of SQL). Python makes it easier to debug since we turn the approach from a declarative to a procedural one, and that's even the case with breakpoints when you write your actual transformers in Python.


Definitely a great point, adding one today


We’re considering adopting DBT or a similar tool for the orchestration of our data pipelines on Snowflake. But we also explored Snowflake Dynamic Tables, and they make it easy to build a complex DAG without having to describe it.

I’m curious if data warehouse features like materialized views or dynamic tables will end up making DBT or the like obsolete?

https://docs.snowflake.com/en/user-guide/dynamic-tables-abou...


It is likely that these will make it into dbt since there are many dbt users who also use snowflake. The main thing to consider when deciding to some something other than dbt is:

* can you apply software dev best practices, CI/CD etc * is it proprietary or can you use it with other dbs * is there a large community behind it e.g. dbt packages and dbt python libraries * will you also get docs, dq, lineage or will you need additional tools\ing * will you need to orchestrate other aspects of your data flow e.g. EL, then T, then activation, etc

Databricks also has delta live tables and for the reasons above I usually suggest people consider all of these and not just go all-in with one vendor


This is a great point—in terms of dynamic tables/materialized views, the software engineering best practices ie modularization, testing, version control are not as intuitive/straight-forward to apply in comparison to dbt and Serra. We can also add a direct SnowflakeDynamicTableWriter into our framework to work with the best of both worlds!


Were your Disney+ fires using dbt? The comparison in your demo doesn’t resemble normal dbt usage: it forces the SQL to inline the state abbreviation instead of using a dbt seed file, while the initial serra version uses one; it initially shows the serra code-folded, to make the SQL seem more verbose; and the SQL makes no use of CTEs or dbt models, either of which would make the transform steps clear.


These are all great points, and no, Disney didn't use dbt. We wanted the demo to show how you can modularize SQL into reusable objects that you can fully customize error logs for, while also adding the value of handling all steps of ETL in-house. You could definitely write this script modularly using dbt, but we feel like that value add of having connectors that easily integrate with your transforms (e2e), as well as taking the software engineering best practices that dbt applies a step further by turning each transform and connect into objects that you can test, modularize, and customize error logs for is our main differentiator.


The data engineering space feels like it's earning the same reputation front end had/has with the endless stream of new and shiny frameworks.


I couldn’t agree more. Except that the cost for a data tool is incredibly higher than any FE tool.

I manage both a FE team and a data team. The former spends around 1k/month on infra and hosting. While the data team easily spends 20k/month.

The gold rush is data. Build shovels.


For those also wondering what is ETL and dbt:

> ETL (Extract, Transform, Load) is a process that involves extracting data from various sources, transforming it to fit operational needs, and loading it into a database for analysis. dbt (data build tool) is an open-source software tool that enables data analysts and engineers to transform and model data in the data warehouse, streamlining the transformation part of the ETL process.


I like that you improve on the underlying database error messages, as they are really unhelpful, and I think this is a great place to add value.

I've been keeping track of a few dbt alternatives. Dbt have opened up the market to this use case, while only partially solving the business model and maturity side. Here are the more interesting ones:

sdf.com (ex Meta team)

sqlmesh.com (relatively new)

paradime.io (more an IDE)

cloud.google.com/dataform (GCP only)


These are great links, we'll take a look


Maybe I am missing something but would there ever be a scenario where taking a single albeit large sql statement and rewriting it as several pyspark scripts would result in faster runtime for your data pipeline? In most cases, this will be much much slower.


Greatly depends on your environment. I am thankfully in an area where there are very modest timeliness requirements. Improving the speed of a job means little to me. However, improving debugability or checkpointing when things go wrong is always valuable.


If you’re interested in doing the reverse of this (replacing pyspark with sql) - Sqlglot can do this: https://sqlglot.com/sqlglot/dataframe.html


I'm not sure about this. The big SQL script is annoying but when broken down into 4 parts it's very easy to understand. This is precisely the strength of dbt. This would decompose to 4 dbt models, which could be deployed as views, tables, or ctes etc. Instead it seems that you've developed your own DSL for data transforms to take its place, in the form of special classes declared in yaml.

It might be better? But sql is so well understood and covers so much functionality I'd expect that it would be a long time before you ever hit parity with it.

It would be nice if dbt could interface with buckets etc but if they're wrapped in an external table or whatever then that problem goes away.

One thing I noticed is that it (the example) misses a killer feature of dbt, you're specifying your database targets right in the class config. The killer feature of dbt is that you just specify the transforms and then point it at different environments using a target flag and a profiles file, deploy to different envs with ease. I would definitely separate location/env/credential config from transform logic or make it variable.

Given that sql is a totally valid language for declaring transforms in spark, I would probably rather see spark as a materialization backend to dbt somehow rather than an entirely new thing.


The selling point was replacing brittle SQL with Python. Sounds great, but I'm not seeing where Python comes into play from that demo? Was it not shown in the video?


Yep we only showed our configuration file which instantiates the said Python objects. Definitely would've made the demo clearer, and if you want to look at the actual objects that our config is referencing, definitely check out our repo!


Thanks! I'll check it out, cheers


Exciting project, definitely taking a look at this.


Thank you we appreciate the support!


Is ETL/ELT same as writing SQL scripts and periodically executing them? I assumed there's more to it.


Sometimes there is more to it, like pulling data from external services or running ML, but other than that, yeah it's SQL, DAGs, and cronjobs


> Serra is a low-code, object-oriented ETL framework that allows developers to write PySpark jobs easily—think end-to-end dbt with the benefits of object-oriented Spark.

Could you please explain this as if I am three years old? (also, I don't know dbt)


Sure, I’ll clarify some of the terms used in that one-liner in case it’s helpful for anyone else as well.

ETL is the process of extracting transforming and loading data from a source to a destination in a data pipeline. Spark, an engine for large scale data processing, allows us to write code that can work with large amounts of data. dbt is a tool you can use to break up your SQL scripts into smaller “models” - other SQL scripts that can be reused and tested.

We described us as an end to end because we also have extractors and loaders, whereas dbt focuses on the T ( transformation step of ETL ). Each of our steps involved in extraction, transformation and loading correspond to a specific Python object defined in our Python framework. I have also updated the README in our repo to hopefully better explain how the config file links to user defined readers, writers, and transformers.


Thanks


If it is really a dbt clone it is an ELT tool not ETL:

https://en.wikipedia.org/wiki/Extract,_load,_transform

https://en.wikipedia.org/wiki/Data_build_tool

It's about (big) data munging.


Thanks for these links! We consider ourselves an ELT and ETL tool—if you run a Serra job in your own warehouse (ie Databricks), you can easily specify extracting from AWS, loading the parquets into your warehouse, then transforming them with our config block approach (ELT).

The same is true for ETL. If you have a spark cluster separate from your warehouse, you can define your config file to run in the order E T L: you can extract from your data source, run the transformations on a separate cluster, then load it to your warehouse.


Finally a competitor to dbt. the world needs this!


There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.


I agree. The open source flavor of dbt Core is very well-designed and almost as complete as you would want it, but any competition is good competition


> easy by replacing brittle SQL scripts with object-oriented Python

There is a lot to unpack here. Can you explain this in more detail?


Sure, our approach is to define Python classes to handle reusable steps for reading, transforming or loading data. For example, we have a MapTransformer, CastColumnsTransformer, GeoDistanceTransformer.

Each class specifies some configuration needed for the "step" and can then be used in the config file to construct a full ETL job. You can write unit tests for custom transformers you create as we have shown in the tests/ directory.

I have also updated the README in our repo to hopefully provide a better explanation of how our config file connects to specific Python objects.


This sounds very similar to Apache Airflow. How would you compare them?


We see this working more alongside Airflow—we see Airflow mainly centered as an orchestrator/scheduler to chain together your ETL steps after you've written your data transformations and connections. With Serra, we're a flexible dev tool to write these transforms and connectors. I think you could accomplish something similar but the implementation would be unwieldy (ie breaking up 4 SQL script tasks into 40 modular transform blocks), whereas we see ourselves falling into the camp of being able to integrate with Airflow and have great error logs in those 4 SQL tasks that you have simplified into Serra configs.


It's not a DBT alternative if it's not SQL-oriented.


A quick note that the "open source" license they use requires activation and license keys that block feature activations in the "open source" software be preserved.

This license was popular w folks like apollo who arguably hijacked nearly 700 contributors efforts w a license like this. Because they are using it from start at least that won't be as bad


> This license was popular w folks like apollo who arguably hijacked nearly 700 contributors efforts w a license like this. Because they are using it from start at least that won't be as bad

Which Apollo are you talking about? Only one I know is Apollo GraphQL, and their main server package seems to be MIT, so I must be looking at the wrong thing. What's the story?


Apollo GraphQL is not MIT. Their Gateway, federation libraries, and all versions of router are under a Elastic License v2.

https://www.apollographql.com/docs/resources/elastic-license...


As I mentioned, their main GraphQL server package is[1], so that's where the confusion came from. Thanks.

[1] https://github.com/apollographql/apollo-server/blob/9817bc47...


It's not an open source license. Maybe that's why the quotes but let's not indulge them in their lies.


Congrats on launching.


> It’s open core

So it's proprietary. Stop trying to spam HN with fake open source.


hacker.com.deoffjjjohhh


hacker.com.deoff


Another YC launch advertising as open source while using a non-osd-adhering license (ELv2 in this case). I respect your right to choose a license that protects your efforts, but calling this open source will be misleading to many.


I put the word "open source" in the title, and am happy to change it if someone has a better term. (I'm not up on license subtleties.)


Thanks dang, "source available" is pretty common for licenses like the ELv2 used here.


As more and more startups are going open source, source available, open core, etc., I need to figure out how to do Launch HNs without triggering off-topic controversies around the term "open source". My problem is, there's no consensus among HN readers about what the term means.

If anyone has a suggestion about how to solve this problem in an accurate and neutral way, I'd like to hear it.


I understand your frustration.

IME, HN tends to use the term open source in two senses. It can either refer to:

- the license or;

- the business model.

And we know that licenses exist on a spectrum of permissive to restrictive.

So when the community is presented with a for-profit entity in a Launch/Show HN, they tend to dwell on the 2nd sense.

If it’s a side project that’s on display, then the 1st sense kicks in.

Based on this, I’d like to offer the following colloquial interpretations for the terms you mentioned.

1. Open source: permissive (or more correctly, well-known) licenses like MIT, Apache, BSD, GPL, LGPL etc that do not prohibit commercial derivatives (or prevent cloud hyperscalers like AWS from using it).

2. Open core: our code is split into 2 parts: the open source bit (often under a permissive open source license in #1) to attract fellow devs and the closed source bit. The closed source bit is how we plan to make money.

3. Source available: we plan to make money however we see best so as insurance, our code can only be available under an obscure license that was designed to be restrictive.

So, I think what’s really happening is that labelling something “open source” will cause the community to quickly to point out that said license is restrictive.


(Here is an example from another post on the frontpage where the community is engaging in the 1st sense on a side project: Show HN: Little Rat – Chrome extension monitors network calls of all extensions

https://news.ycombinator.com/item?id=37122927 )


Thanks! that's helpful. I've changed the wording to "open core" above.


OSI maintains a list of open source licenses which is as close to an industry consensus as you'll find. If a license is on that list I don't think many would say it's not open source.

https://opensource.org/licenses/

That's for software only, if it's an AI model all bets are off.


First if you don't know what open source is, don't add that phrase to titles at random. Fact that it's launched with YC help doesn't actually help your case. Adding feel good phrase "open source" to benefit hand that feeds you is pathetic. Because right now damage is already done, many people saw and will associate Serra with open source, which it isn't.

Second if you are adding it, add definition you are using(not supported? then someone should implement it). Make sure that definition you are using factually describes license used. Don't use some fringe bs. as your definition, so you can crowbar it in every time you find it convenient.


No one "knows" what the meaning of a disputed term is; it's disputed.


> My problem is, there's no consensus among HN readers about what the term means.

Open Source is defined here, by the people that invented the term: https://opensource.org/osd/

The vast majority of HN readers would support this over any other definition.

Anything that provides access to source but doesn't allow forks, commercial use, competition, removal of advertising, etc is 100% not open source.


This comment would be more helpful if you could summarize the pitfalls of people relying upon ELv2. My impression of these variations is that they are generally used to protect themselves from a giant corp from using it to create a cloud service of some kind?


There is no problem with people relying on ELv2 license. Just don't call your project open-source because ELv2 is not an open-source license.


It is open source in the sense that the source is open, you can go and look at it. It's even free open source in the sense that you can take it and use in your own, commercial project without the need to compensate its authors.

The only limit is, that the project you're building with it can't be a hosted service version of the software itself - which is, what I assume, Serra's business model will be.

I don't think that "Open Source" just means Apache 2 and MIT licensed stuff - and infact feel, that the license Serra chose is one of the most generous OSS licenses that still retain just enough rights for the authors to make a living.


https://opensource.org/osd/

> Introduction

> Open source doesn’t just mean access to the source code. The distribution terms of open-source software must comply with the following criteria

There’s a definition. This isn’t open source.


And how does the OSI derive it's legitimacy as the steward for all things open source? As far as I am concerned, it is just one body with its own private viewpoint, not a universal lawmaker for all open source devs.

In general, while I appreciate the work of the OSI, I believe that they are too idealistic in their viewpoint, derived from the world of Linux and early OS.

In my view, if we want to maintain a healthy and growing open source ecosystem, we must allow the makers of great OSS to be sustainable and monetize their creation. I don't believe that that's an inherent conflict with the spirit of OSS.


> And how does the OSI derive it's legitimacy as the steward for all things open source?

We give it to them.

OSI isn't an entity that's existed since the beginning and the original definition doesn't come from them.

I agree that it's not a space that doesn't change.

Having said that, change must come from the community. It can't be just a couple corporate entities defining their own license, calling it open source, and going against the established definition.

> In my view, if we want to maintain a healthy and growing open source ecosystem, we must allow the makers of great OSS to be sustainable and monetize their creation. I don't believe that that's an inherent conflict with the spirit of OSS.

I don't mind monetization don't get me wrong.

Should it change? I'm not the authority on it. I'm just saying it's not the current definition.


Usage determines definition, not the OSI.

At the end of the day, Pure Open Source™ modulo one very narrowly defined prohibited use is good enough for everyone except product managers at large public cloud companies and people who want to argue about ideological purity. It provides all of the same benefits.


I get what you’re saying. The term for that is “source available” though.

Open Source has a specific meaning to the people who frequent this site. And I get how YC companies a scrutinized more for vague and misleading promises.

What if someone said their app was free to use. And then somewhere far in the sign up flow, it turns out you are required to pay. And then the app developer claims “well, you’re free to use it, but you do have to pay”. It’s not that that sentence can’t mean what the developer says it does. But they should take into account what people will think it means.


>where pagers from 12am to 5am were guaranteed, and you'd have to dig through thousands of lines of someone else’s SQL

Jfl … can’t find the words …


[flagged]


That was my mistake, not the founders', and I've changed the term to source available (edit: now open core) to try to avoid misunderstanding. I assure you there was no attempt at fraud!


Thanks! Do not make this mistake again!


I will try! but it is not so easy, because there's no consensus on what these terms mean.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: