Really interesting! Could you clarify what difference having two different query_parameterized_hash for similar queries is? Is there a performance hit?
Yea - the idea is that Snowflake will generate these after a query runs in order to help you look at multiple runs of the same query. So imagine you run a query that's "select a from b where c = 1" and you want to find all examples of that query running. That's where "query_hash" comes in. But Snowflake also says well what if we let you be generic about the parameters - so "where c=1" and "where c=2" and "where c=300000" all have the same query_parameterized_hash.
That's the intent but turns out it's only doing a very simple hashing and not actually looking at the canonical version of the query. For example it won't treat aliases/renames as the same even though it should. This makes it harder to look at all queries that are in essence doing the same thing.
Oh that's really interesting! I imagine there could be a reason for it, for instance the data is distributed differently in the micropartitions so different where values could result in different data lookup patterns as you may skip more/less blocks. But overall this makes a lot of sense!
Totally agree. In my last job I was able to create my own ETL jobs as a PM to get data for my own analyses and figured out a fairly minor configuration change could save us $10M per year. It was from one of many random ETL jobs I created myself out of curiosity that, if I had been forced to rely on other people, I may not ever have created.
If you’d just had a business controller, you’d have x*$10M saved and have more time for your PM-role.
Yes, calling BS on leadership running their own SQL. Bring strategy and tactics, find good people, create clear roles and expectations and sure don’t get lost in running naive scripts you’ve written because you can do all roles better than the people actually occupying those roles.
I know nothing about working in small firms. So that is probably very true. The smaller the firm, the more you do yourself. But ... if a company can save $ 10 mln. ... it can afford a set of financials.
What motivated this: I kept hitting token limits when working with Claude on tasks with many parts. For instance, Claude would recommend three different areas of code to work on, but by the time I started on the second area, I'd get token limit warnings. This happened because Claude needs to process the entire conversation history with each response, even though the discussion of the first area is often unrelated to the second.
This extension lets you fork the conversation at any point, preserving all context and files up to that point while starting fresh with the token count. You can simply copy and paste it into a new chat or (download it as a file to attach) and continue your chat. Your conversation continues from the forked point as if it were the same conversation.
I've really enjoyed using Claude projects to help generate code drafts. However, one thing that's been painful is uploading new files versions after changes. I'd have to separately go through each folder in my project to upload the files again so Claude could have the most recent versions. It only took a few minutes each time....but it was a bit annoying.
As a result I built a file organizer for Claude projects. It selectively copies files based on extensions and .gitignore rules, organizing them into a target folder for easy uploading. It commits all the files in that folder to a git repository so they're not lost, deletes them, and then copies the updated files to that folder. This allows you to upload your latest file states to a Claude project without having to manually click through all your folders to select and upload specific files.
Adding a Snowflake connection configuration option that allowed for standard Snowflake connection conventions might be a good option. That way you could connect to Snowflake with your existing configurations (.snowsql/, .snowflake/). Or explicitly specify by adding matching args/params to your project's config.
I just pushed the external_browser branch to github that should offer 1/ externalbrowser authentication and 2/ the ability to use TOML files with profiles with instructions on how in the readme. if you run the following you should be able to test it out.
git clone https://github.com/ryanwith/melchi.git
git pull external_browser
# remaining steps are the same
Would you mind testing it out and letting me know if it works for you? Would really appreciate it!
just replied to chrisjc that I created a branch with externalbrowser auth and pushed it to github. would you mind taking a look and letting me know if it works for you? would love to get your feedback as I don't have SSO set up in my account to test this myself
That's really interesting! Could you tell me a bit more of what you're thinking? I'm not the most familiar with SQL Mesh and the typical workflows there.
Not the original parent, so unsure of their use-case. But I've seen the approach where some/basic development can be done on duckdb, before making its way to dev/qa/prd.
Something like your project might enable grabbing data (subsets of data) from a dev enviroment (seed) for offline, cheap (no SF warehouse cost) development/unit-testing, etc.
I wanted to start with duckdb since it's really an incredibly powerful tool that people should try out. The performance you can get on analytical queries running on your local compute is just really impressive. And with snowflake streams you can actually stream live data into it without changing anything about your existing data. On why not other databases, I wanted to focus on OLAP to start as there are already other great tools like DLT that help you load data from OLTP sources like postgres and mysql to OLAP sources already, but OLAP to OLAP is pretty rare.
Have you run into a use case for streaming data between data warehouses yourself yet? If so which warehouses?
Played with this and it did a great job of creating the initial framework for my most-recent react app based on a few paragraphs of text. Would recommend trying it out
Additionally, if you want to make any changes like including only certain columns or changing column types/names you can do that via the inputs below the generated SQL. Any changes you make are immediately reflected in the SQL generated.
One possible advantage I see is it creates a 1:1 correspondence between a website and a file.
If what I care about is the website (and that's usually going to be the case), then there's a single familiar box containing all the messy details. I don't have to see all the files I want to ignore.
That might not be a benefit for you and not having used it, it is only a theoretical benefit in an unlikely future for me.
But just from the title of the post, I had a very clear piccture of the mechanism and it was not obvious why I would want to start with a different mechanism (barring ordinary issues with open source projects).
That the page HTML is indexable by search engines without having to render in the server. Such unzipping to a directory served by nginx. You may also use it for archiving purposes, or for having backups.