Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Dropbase 2.0 – Turn offline files into live databases (dropbase.io)
261 points by jimmyechan on Aug 17, 2020 | hide | past | favorite | 114 comments



Hey HN,

We're happy to introduce Dropbase 2.0! It's a tool that helps you bring offline files, such as CSV, Excel, and JSON files, into Postgres database. You can also process your data before uploading it using a spreadsheet-like interface or by writing a custom Python script. Once your data is in the database, you can query it using any third party tool (credentials will be provided). You can also access your data via REST API (powered by PostgREST) or create custom endpoints to serve a more specific use case.

A bit about the tech:

Currently, we support .csv, .json, .xls, .xlsx files. For data processing, we use Pandas, so if you are comfortable using Python, you can write your own custom functions to process the data. We also give you a free shared Postgres database to test the tool with (your data is isolated and hidden from others). Each one of these databases come with an instance of PostgREST preinstalled, so you can query your database using REST API ([http://postgrest.org/en/v7.0.0/](http://postgrest.org/en/v7....). You can also generate an access token with an expiry date to share your data with others.

There are many more features that we baked into the product. Come check it out, it's open for HN community.


Question about the TOS:

"However, by posting Content using Service you grant us the right and license to use, modify, publicly perform, publicly display, reproduce, and distribute such Content on and through Service. You agree that this license includes the right for us to make your Content available to other users of Service, who may also use your Content subject to these Terms"

Does this mean I should have no expectation of privacy or control over anything I upload?


Your data is private and you own all of your data. We do not and will not share your data with anybody else unless you share it yourself through the sharing of projects, pipelines, endpoints, or exports. We do however store and process your data. We also let you generate endpoints so we need some wording to cover these cases. We'll double check our terms to make this point clearer, but we added this because you can generate live endpoints and you can share those.


Unfortunately IANAL and the formulation of the ToS/PP in your, and that of most other online service providers, always give me that naggy feeling that the legalese leaves so many loopholes, texts open to different interpretation, that effectively - even though it may seem so - I have no privacy guarantees whatsoever. That might be entirely unwarranted of me, but the feeling is there. Unease.


> even though it may seem so - I have no privacy guarantees whatsoever.

I mean, fundamentally, really consistent security is hard; and the best you can reasonably expect from someone you're not paying is "best effort". For them to make real promises about security opens them up to being sued if they fail; it's not really reasonable to ask someone to do that unless you're paying them a reasonable chunk of cash to offset that risk.


Sorry, but this almost feels like a GPT-3 response to me.

I don't see what security, paid vs. free or best-effort has got to do with my argument, which is that the loopholes in legalese are so hard to spot for anyone but a lawyer, that effectively my data might still be used in any way and possibly against my wishes or expectations (but which becomes legal when I consent to the PP and ToC).


We have also stumbled upon this point in the T&C. A real show-stopper for us as we are planning to work with sensitive client data.


We're working on an on-prem / self-hosted version Dropbase that will better address use cases where sensitive data needs to stay within the company.

Would you mind describing the kind of data you work with and which constraints you're subject to?


Looks very cool so far, congratulations!

I have two questions:

- How do you handle incremental loads from files (or even google sheets)? Am I able to only load the diff, load full snapshots and get bi-temporality, etc?

- Are you supporting PostgREST as sponsors in any way? It's one of the most solid tools I've ever used, and I love to see that companies build great products on top of it!


- we use pandas to process data and load it to Postgres using .to_sql (https://pandas.pydata.org/pandas-docs/stable/reference/api/p...). for incremental loads, we set "if_exists" to "append". we're working to add more flexibility to load function, so you can specify how to load your data, handle conflicts, and so on. we're open for suggestions

- PostgREST is great. We are not sponsors at the moment, but looking to do this when we can!


I've used Pandas recently, not sure if this will ever help you, but dictionaries are much faster if you continually add rows.

pandas.concat and similar functions for appending to a Pandas dataframe can be quite slow. Just mentioning this in case you ever encounter this issue; maybe you don't need to. In my case I changed to dictionaries for important parts and 2 mins changed to 5 seconds execution time.

However, in my case, I have to change logical row structure and not just read in rows as is.


thank you for advice! we'll consider it


PostgREST links are broken. Here is the correct link: http://postgrest.org/en/v7.0.0/


Can this be installed on-premise?


If not I really don't like how it says "offline files".

I can install a database on-prem and upload files to it and then I can too query it as if it was in a database.

From what I've seen on the this seems like a fancy cloud database client.


Good point. We can probably work on clarifying our value prop. You could do something like what you described with a local database. We could add that by building some desktop code. At the moment we are only focused on the cloud part. That way we can get data, let you process it, and also easily share as APIs or endpoints.


Not at the moment, but we're actively developing a teams/enterprise version of this that allows for self-hosted/on-prem.


On a somewhat unrelated note, the design of this landing page is fantastic. It is exactly how I like to have new tools presented to me. All the fundamental competencies of the tool displayed on one page, with sufficient, but not verbose, technical detail. Kudos.


Yeah, although in my case it would have been nice to have a 'light' version, as I have some trouble reading the dark grey text on a black background in broad daylight.


Working on this.


Judging from the source code, it appears to be a webflow site — https://www.webflow.com


Yes!


I too very much like this landing page. Is it a custom one? And if so, are there any services that would provide a template like this one? I would like to have one like this, but unfortunately, Im far from a CSS expert.


Checkout TailwindUI[0] they have a bunch of surprisingly high quality blocks for building these types of pages. I use them all the time!

0: https://tailwindui.com


TailwindUI is fine so long as you don't want to customise the blocks they give you. As soon as you want to do that you're in a world of pain.

Example: I wanted to use their "Hero Sections - with angled image on right" without the navigation where shown (was going to add a standard top navbar). As soon as you delete it, the angled bar doesn't reach to the top. I ended up keeping an empty navbar in that place to keep the block looking right.


Thanks! We also added shots and short videos of the product. We don't like it when we can't see what we get ;)


Basically a rip off of https://www.linear.app


Imitation is the highest form of flattery


Is there a light mode option for the site? Or is iOS just not selecting it for some reason?

My astigmatism makes reading dark UIs migraine inducing, so as cool as this sounds I unfortunately can’t read more about it without triggering a migraine. x_x

(Maybe still default to the dark UI, but if the user has light mode enabled it uses a light UI?)


Oh, so that's what makes dark themes so hard to read for me. Unfortunately there's no easy way out for me, since my eyes are photosensitive due to a separate complication. Between a rock and a hard place :p


I'm with you in the sense that I just learned through these posts that astigmatism and website dark modes don't go well together.

I wonder if lighter colors and soft grey palettes would work for your case. have you experimented with colors that are easier on your eyes, given your complications?


I haven't actually. I always just used dark mode and assumed the additional difficulty was a drawback everyone experienced and learned to live with it. Now that I know that's not the case I'll see if I can find a color scheme that works well, like you suggested :)


Isn’t ‘photosensitive’ what all eyes are? ;)

Anyway, +1 for astigmatism and extreme sensitivity to glare. The rise of dark themes is something of a curse for my ability to interact with interfaces these days.


I don't like reading pages with dark background either. I just do ctrl+A on such pages so that the text background becomes blue, making it a little better to read.


This is a good workaround! Thanks for suggesting it.


on Firefox, you can remove all page styling too. I don't know how to do it on the hamburger menu, but do Alt-V, Page Style, No Style


There is also the reader mode but it doesn't work on all pages.


I also have astigmatism which makes all dark mode websites and apps difficult to read. As "cool" as this trend and previously "hacker" color scheme has gotten, don't remove light modes!


It's not just "cool" "hacker" color, some people actually have different eyes than you and now you know how they feel on light mode.


Sorry about that. Not at the moment, but we'll consider adding a light mode for both the website and the app.


Thank you for the response. :)

I’m used to getting “most of our users don’t have an issue, so we don’t care” responses (Robinhood did this for a while before they finally added a toggle, and Spotify straight up doesn’t care), so having a company actually note they’ll look into doing it if possible is really appreciated by me.


This is very cool. I think there's a lot of room to grow this space: local "folders" that do some "magic" in the cloud.

Obviously, sync (Dropbox) is just the beginning, and Dropbase takes it a step further. There's been times where I had a (big-ish) CSV and wanted to run a few tests/queries on it. Auto-importing it into some database and being able to run SQL/Python on the dataset (without bootstrapping that locally) would've been a godsend.

Good luck with this!


Thank you! Please give it a try and let us know if you have any feedback.

One of the features we added is to do make the "magic" replicable (or deployable in production). So we keep track of processing steps and let you export Python code that that applies those same processing steps. This could be used to run the exact same steps in a larger version of that dataset later.


"mounting" CSVs is something sqlite can do, sqlite.org/csv.html


If you're open to PHP, I found the perfect tool for myself. The Laravel framework has an ORM called Eloquent. Usually it maps models to database tables (hence ORM) but you can also bring it to get its rows from a CSV which it then maps into an in-memory sqlite database.

You can then work with it through the ORMs methods, with regular SQL or with external tools like Tinkerwell that just displays them in a tabular fashion.


For future reference: sqlite can do this import field record files (like csv) and do quires on them.


Perl’s DBD::CSV can do this. I would be surprised if Python didn’t have something similar.


This reminds me of @BrandonM's famous reply to Drew Houston[1] :) Of course there are ways of doing it. But sticking something in a folder and stuff just automagically "working" is a much more pleasant workflow -- and more importantly, how you create value. Jimmy, I'd say you're in good company!

[1] https://news.ycombinator.com/item?id=8863


Thank you. That comment is legendary!


Please don't forget about BrandonM's follow up comment though!

I've seen the thread linked as a 'tech people don't appreciate simplicity' but he actually acknowledges Dropbox could be very useful and wishes success.

The other criticisms were also very valid at the time, and were acted upon.


csvkit, you can sql query and much more from the command line.


Gitlab has a similar project that will load google sheets as a dataframe.[1]

At my current company, we load google sheets into s3, then mount those files as external tables. There has not been a commit in years, meaning it has worked out well for us.

What seems to be missing in these solutions, and what Dropbase provides, is a UI to guide users through the process.

[1] https://gitlab.com/gitlab-data/analytics/tree/master/extract...


Thanks. That's a useful project! And yes, we aim to make data processing easy (through UI, low code) and easy to reuse/export (by converting UI steps 1:1 to code)


My favorite incarnation of this idea is probably Datasette.

https://github.com/simonw/datasette


I just had a friend mention this too. I just checked it out and it's really cool! What's your favorite thing about it?


I tend to think of software in terms of composable units, so Unix-like utilities are very attractive in my workflow, and Datasette just fits right into that model. Datasette is easy to deploy and does one thing well. I can use it on my little single-board computer I use for hobby projects and allow other machines on my network to have an API to view a database a daemon is populating there. But it works just as well to share larger, static data sets on the internet. It's just a tool that fits right into its niche in the stack and does its job really well (much like sqlite).


Thanks, that's a good point.

As engineers, we also tend to think in terms of modularity and control - call this "tool flexibility."

With Dropbase, we're balancing flexibility with the goal of creating an experience that allows users who can't directly work with these composable units.

How we balance experience vs flexibility is that we give users full control of the database and the processing steps (we even allow you to export Python code you can run anywhere else).

We found this is the right balance for the uses cases we're targeting, although, we're still doing a lot of research to figure out the right balance and that balance might also evolve over time.


Good job. Channeling patio11: increase your prices, e.g. $49/$250/CALL US. Your customers are businesses, these numbers will be perfectly reasonable.


Thank you. Good point, we actually just had a few agencies/consultancies ask about self-hosted / enterprise options.

We are offering our current pricing to early adopters on HN. We'll likely increase prices once we do a general public launch.


If you do this, please offer nonprofit and student/academic plans! Many people in social sciences and the nonprofit world don't have the engineering resources to build data pipelines, nor the budgets for a $250/mo plan. But they're spending every day slicing Excel files of surveys and risk assessments and potential donors and the now-departed intern's messy list of average flight speeds of unladen swallows.

In all seriousness, this product could bring leverage to those in society who could have the most impact. Design is brilliant, the pipeline idea is brilliant, I can see this really gaining traction.


Thank you so much!

Yes, we will offer non-profit and academic plans.


I really like this. I could see myself using this in the future for some personal projects or for prototyping.

What I would really love though is something a little more similar to Dropbox, with tight integration to the user's filesystem, and keeping the spreadsheet as the source of the data.


Spreadsheet view for all your data is on our roadmap. Integrating into user's file system is something we'll definately explore, it sounds quite interesting. These are great suggestions, thank you!


This is just just excellent. I do a lot of ETL work and need to build custom workflows for it, this is exactly what I have been looking for. Good job, team!


Since you brought up ETL, you may also be interested in Meltano (https://meltano.com), an open source ETL tool we've been working on at GitLab for a few years now!

I shared some thoughts on how it compares to DropBase in another comment: https://news.ycombinator.com/edit?id=24194916

If you end up giving it a try, I'd love to hear what you think :)


Thank you! Will look into it.


Thanks! Give it a go and let us know if there's anything we can do to make this better. We let you export Python code that maps 1-to-1 to any processing step you take on the UI.


Thank you! Will do :)


Congrats! Looks like it could be very useful!

Just a tiny thing I noticed. The free plan is usually mentioned on the left of the page. ( https://www.dropbase.io/pricing )

Or was the page layout put intentionally that way?


Updated the page. Free plan on the left! Thanks!


I would love to be able to import more than one json file or multiple URLs. This would allow me to migrate legacy databases through apis.


it's on our roadmap and will try to add support for multiple files and zip in the upcoming release


Some of the ideas are good but it’s more interesting if it processes files like RDS Spectrum does vs loading first to Postgres. I know you are targeting smallish datasets but eventually data size will go up and loading everything in PG could become a scaling problem.


Thanks! This is a neat idea! It would allow users to upload unstructured data. We'll explore this.


This reminds me of https://www.visidata.org/ which is a terminal based application with similar purpose - loading tabular data from various sources, and exploring and processing it in a visual way.


Woah. I need this, but then with Vim bindings ... that would be my ultimate data browser


Looks useful, and the landing page is excellent.

Minor grammar fix: "See how your data looks like as you process it." -> "See what your data looks like as you process it."

(could be "how it looks" or "what it looks like", not "how it looks like")


Thank you for this! I appreciate it.

Update it to: "Spreadsheet view. See how your data looks as you process it". Went for this suggestion to keep everything to 2 lines.


I've been looking at a spreadsheet to API like GUI interface for quite a while now. Mainly for small projects around the house NOT company sized work.

Does anyone know of a self-hosted interface like Dropbase or Airtable?


This looks useful, I'd prefer self-hosted for some use cases though.


Thanks. Yes, in some cases where you're working with regulated data you'd need a self-hosted version. We're working on an enterprise version that allows this.

Would you be able to describe your use case and the kind of data you're using?


Sorry, what I have in mind is mostly just personal productivity stuff, relatively small data, but private.


Our BYODB plan lets you connect to 1 of your own databases. It's not quite self-hosted but you are in full control of your DB.


If you're looking for open source, self-hosted ELT, I suggest you check out Meltano (https://meltano.com), which we've been working on at GitLab since 2018!

We've decided to focus primarily on the CLI for the moment, which you can see showcased through the example code on the homepage: https://meltano.com, but we're working on a UI as well, as evidenced by today's release blog post: https://meltano.com/blog/2020/08/17/now-available-meltano-v1...

---

Meltano uses open source Singer taps and targets (https://singer.io) as its extractors and loaders, so to put together something similar to DropBase (which looks amazing, by the way), you could use:

- tap-spreadsheets-anywhere (https://github.com/ets/tap-spreadsheets-anywhere, https://gitlab.com/meltano/meltano/-/merge_requests/1813), which supports CSV and XLS over S3, HTTP(S), (S)FTP (etc),

along with:

- target-postgres (https://github.com/meltano/target-postgres, https://meltano.com/plugins/loaders/postgres.html),

- target-jsonl (https://github.com/andyh1203/target-jsonl, https://meltano.com/plugins/loaders/jsonl.html), or

- any of the others you find on https://meltano.com/plugins/extractors/, https://meltano.com/plugins/loaders/, or https://www.singer.io/.

---

For transformation, Meltano currently supports only dbt (https://www.getdbt.com/), which means that unlike DropBase, it's built for ELT rather than ETL, since transformation takes place inside the loading database, rather than in between the E and L steps.

I'm very interested in exploring the ETL direction more, though, because as DropBase clearly shows, there are still a lot of companies and people who may not be experts on SQL, but would benefit tremendously from sturdy ETL with an accessible interface and flexible integration points.

As I just wrote on our Slack workspace (there's a link on https://meltano.com):

> I’d love to see Meltano UI develop into that direction for simple transformations over Singer tap stream/entity schema and record JSON, so that we can do ETL as well as dbt-backed ELT.

> We’d probably start with a way of specifying transformations in `meltano.yml`, similar in spirit to the `select`, `schema`, and `metadata` extra’s (https://meltano.com/docs/command-line-interface.html#extract...), and/or by pointing at a Python file that can process each Singer message. Building a DropBase-style UI over that would be on the horizon too, once we’ve brought the Entity Selection interface back (https://gitlab.com/meltano/meltano/-/issues/2002) and add interfaces for metadata rules and schema overrides.

I'll create some more issues around this potential direction tomorrow :-)

---

If you or anyone else interested in open source, self-hosted ETL/ELT end up giving it a try, I'd love to hear what you think, so that we can figure out how to build it into this direction together!


Thanks for all the info - sounds interesting, I'll check it out.


Thanks for all the support, comments and feedback HN! It's been a really long day. I will respond to more comments in the morning!


This is very cool! I was building something similar but with a different crowd in mind, basically inbetween business and ops people as a "data sanity gatekeeper".

I was debating going with a similar stack (postgres(t)) but am currently playing around with sqlitebiter. Cool to see a similar product like this!


Thanks! sqlitebiter looks interesting!


How does it work for deep JSON? Does it import it as JSON and keep the depth inside each row? Or is there an option to flatten the data, or spread across related tables?


Hey, I'm one of the engineers behind Dropbase. Currently all imported data needs to be structured, which means that the data needs to be formatted in either values, records, index, or column orientations (see https://pandas.pydata.org/pandas-docs/stable/reference/api/p...). Right now we can only auto-detect between those formats, but in the future we're looking to accept unstructured data as well.


Reading the headline I clicked this thinking it might give you an API for your file system and I had thoughts of managing / viewing my files via an API.


We could work on the wording. Let me know if you have any suggestions.

We don't let you manage your file system through API but we offer access to your database tables through REST APIs. It's not the same but you could hack it to work that way.


Perhaps changing offline files to offline data (files) as just "offline files" is pretty ambiguous of what is supported (and how I arrived at file system)


Why not build a filesystem API without the need for importing first, so you can do queries on the file system?


That's a great suggestion. We were considering something like this for business or enterprise versions. It would also allow you to connect Dropbase pipelines to multiple files or entire folders in local or cloud storage.


tried to import csv from file, failed. there was CORS error shown in the dev console.


Are you still getting this issue? Could you check your file encoding and make sure it's UTF-8?


yes, just tried again, same issue. stuck at "Importing..." forever. I looked at the dev console.

Access to XMLHttpRequest at 'https://api.dropbase.io/v1/pipeline/HhNFQmZnnFkjrqK6Lptr3w/l... from origin 'https://app.dropbase.io' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.

The csv file is UTF-8 encoding.


Yes, there is a bug with error messages, we'll push an update with a fix later today.

If you want, you can also share the csv with us and we'll troubleshoot it on our end. just email us at hello@dropbase.io


Thanks for the feedback. We'll look into this!


Xlsx sheets with more than one sheets, only first sheet gets imported?


Yes, we'll add multi-sheet import soon.


> Your password must be between 6 and 32 characters (inclusive) long.

Really?


Good point. We just added code to allow for up to 256 characters. We're testing this now and will push to prod end of day.


Is there a way to automate the data updating using JSON file?


If you mean automating data ingestion on a schedule, then yes, it's something we're building.

If not, could you clarify what you mean?


Would be amazing if you supported parquet :)


I agree, parquet cannot be ignored and we're planning to support it.

What would you be interested more in: loading data from parquet files or converting and storing your data in parquet?


Can it do joins/group by/etc.?


yes, you can run regular sql queries on your database where you can join/group tables.

if you want to join/group static files (like csv/excel), then you'd need import your files into dropbase DB first and then run sql query on that DB.

let me know if this answered your questions.


no, it didn't. you should add a feature where you have 2,3,4,5 tables represented as CSV and you can do joins on them


Any plans for sqlite?


Yes, we have more database types in our roadmap. At the moment we are starting out with Postgres. Are you trying to get data to a new sqlite or an existing one?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: