Hacker News new | past | comments | ask | show | jobs | submit login

do you have some thoughts on how sdv-dev type projects can be used to start populating, say, a database (eg: mysql running in a container) i've looked into this space a bunch (eg: Gretel, Tonic, etc) and there doesn't seem to be a good solution that works end-to-end Privacy Dynamics is quite cool but ideally I'd like something super lightweight that can get pointed to a source db of some sort and then write to a sink (maybe applying a transformation layer in the middle)



Curious what a good end-to-end solution looks like for you? Is it more about ease-of-use (import/export with minimal effort) or is there a privacy layer that's missing?

I see it in 4 steps: 1. Connect to a source db to import your data 2. Train a Gen AI using the source data 3. Use it create synthetic data 3. Export synthetic data into a new db

The SDV team is working on business solutions to cover the full use case. You can use the public SDV to validate steps 2 and 3.


its not necessarily about the privacy layer per se. the workflow i was ideating over is as follows:

1. spin up a production-equivalent database (eg: mysql container instead of prod RDS)

2. point a process/binary (maybe a simple container) to:

-- source db (RDS)

-- sink db (mysql container)

-- transformation function (that may use gen AI, etc) to seed sink db with synthetic/anonymized data [there may be some parallel process to enable testing of this transformation function]

3. profit (use this for dev etc)

Key over here would be speed in step (2) if the entire pipeline were to run end-to-end on-demand. do you have some examples of using SDV to achieve this? highly possible that there's already something in the docs that I have missed


This is what I am trying to solve via building Data Catering (https://data.catering/). It gives you the ability to generate data into any database (along with maintaining any relationships between data) via metadata that can be retrieved via a source database or other types of metadata sources (i.e. Open metadata).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: