I don't think so, I see them as complementary. MinIO is great when you have down...

mbreese · 2024-11-18T17:23:40 1731950620

I think it’s more analogous to Minio’s discontinued proxy mode. This is where you’d talk to minio locally (using whatever interface/protocol) and it would act as a local cage for S3 objects. If you wrote to it, it would propagate the changes up to S3 proper (or whomever using the S3 protocol).

I believe they stopped supporting that mode because they didn’t want to keep chasing every S3 protocol change. However, if you’re just using S3, and not trying to masquerade as S3, this problem becomes easier.

Jugurtha · 2024-11-18T21:13:28 1731964408

I think it's complementary as well, even more so after MinIO deprecating its Gateway and Filesystem modes a couple of years ago. MinIO is "S3 compatible" object storage, so technically, MinIO users should be able to use your product to have a file-system like experience on their buckets and objects, although you're using IAM and there might be a need either for your client to handle pure S3 credentials, either for a third-party plugin to your client to do that. It could be a good opportunity to piggyback on MinIO's userbase.

We had built an MLOps platform[0] a few years ago and enabled users to use their S3 buckets in a "file system like" manner. This made it possible for them not to have to know or write S3 specific code in their Jupyter notebooks as most people in the industry did with boto3, which also forced them to write code (say using TensorFlow) in a certain way for training to consume the files, err, objects. It was a mess, and we removed that for notebooks that could run the same way on a laptop or on the platform, even with the shell kernel so people could explore objects like files. MLFlow could work on a filesystem or on S3, but it had no authentication, so we built around that to know which user/experiment produced which artifact.

MinIO had a Gateway that was deprecated. We didn't use it much and they didn't have an admin client at the time, so I rolled one up to orchestrate the thing.

One way I did it that hook into users' compute and storage as opposed to offering storage/compute was for two reasons:

- Organizations already had their data somewhere with established policies. Getting them to move that data is very hard (CISO, CTO, IT, legal, engineers). Friction would have been huge.

- Organizations already had budgeted compute and storage, they may have had contracts/discounts/credits with cloud providers and it didn't make sense to ask them to make a decision on budgeting for another solution.

- A design principle of having the product being able to die without leaving the users scrambling to exfil/migrate data.

One way to do it was to handle FUSE, and your mileage may vary (s3fs-fuse, goofys, etc). Amazon has released Mountpoint last year[1], and one question you'll get asked is why use Regatta when I could use Mountpoint?

Less friction for engineers and execs.

In any way, congratulations on the launch, man!

[0]: https://web.archive.org/web/20230325150132/https://iko.ai/

[1]: https://aws.amazon.com/blogs/aws/mountpoint-for-amazon-s3-ge...

huntaub · 2024-11-18T21:50:29 1731966629

We are finding a lot of success in the ML Ops space for exactly this reason. I also completely agree that enterprise customers want to keep their data where they can govern and audit it (often in S3). We're excited about the possibility to allow folks to access and use that data while it stays in S3 for primary storage.

I agree around the questions with Mountpoint, and we're solving a very different set of problems than Mountpoint. Mountpoint, for example, isn't designed to be used with all file applications and lacks support for things like appends to existing files, random writes, renames, and symbolic links. On the other hand, Regatta supports POSIX semantics and can work with nearly all file based applications.