More

ryanwaldorf · 2024-12-18T17:07:02 1734541622

Really interesting! Could you clarify what difference having two different query_parameterized_hash for similar queries is? Is there a performance hit?

dangoldin · 2024-12-18T17:11:45 1734541905

Yea - the idea is that Snowflake will generate these after a query runs in order to help you look at multiple runs of the same query. So imagine you run a query that's "select a from b where c = 1" and you want to find all examples of that query running. That's where "query_hash" comes in. But Snowflake also says well what if we let you be generic about the parameters - so "where c=1" and "where c=2" and "where c=300000" all have the same query_parameterized_hash.

That's the intent but turns out it's only doing a very simple hashing and not actually looking at the canonical version of the query. For example it won't treat aliases/renames as the same even though it should. This makes it harder to look at all queries that are in essence doing the same thing.

ryanwaldorf · 2024-12-18T17:40:49 1734543649

Oh that's really interesting! I imagine there could be a reason for it, for instance the data is distributed differently in the micropartitions so different where values could result in different data lookup patterns as you may skip more/less blocks. But overall this makes a lot of sense!

ryanwaldorf · 2024-12-18T17:02:41 1734541361

Totally agree. In my last job I was able to create my own ETL jobs as a PM to get data for my own analyses and figured out a fairly minor configuration change could save us $10M per year. It was from one of many random ETL jobs I created myself out of curiosity that, if I had been forced to rely on other people, I may not ever have created.

wjnc · 2024-12-18T17:37:25 1734543445

If you’d just had a business controller, you’d have x*$10M saved and have more time for your PM-role.

Yes, calling BS on leadership running their own SQL. Bring strategy and tactics, find good people, create clear roles and expectations and sure don’t get lost in running naive scripts you’ve written because you can do all roles better than the people actually occupying those roles.

mble_ · 2024-12-18T18:25:01 1734546301

Agreed, if you have the budget for it. There are often times where living off the land is necessary.

wjnc · 2024-12-19T11:49:20 1734608960

I know nothing about working in small firms. So that is probably very true. The smaller the firm, the more you do yourself. But ... if a company can save $ 10 mln. ... it can afford a set of financials.

ryanwaldorf · 2024-12-12T16:17:42 1734020262

What motivated this: I kept hitting token limits when working with Claude on tasks with many parts. For instance, Claude would recommend three different areas of code to work on, but by the time I started on the second area, I'd get token limit warnings. This happened because Claude needs to process the entire conversation history with each response, even though the discussion of the first area is often unrelated to the second.

This extension lets you fork the conversation at any point, preserving all context and files up to that point while starting fresh with the token count. You can simply copy and paste it into a new chat or (download it as a file to attach) and continue your chat. Your conversation continues from the forked point as if it were the same conversation.

ryanwaldorf · on Nov 14, 2024

I've really enjoyed using Claude projects to help generate code drafts. However, one thing that's been painful is uploading new files versions after changes. I'd have to separately go through each folder in my project to upload the files again so Claude could have the most recent versions. It only took a few minutes each time....but it was a bit annoying.

As a result I built a file organizer for Claude projects. It selectively copies files based on extensions and .gitignore rules, organizing them into a target folder for easy uploading. It commits all the files in that folder to a git repository so they're not lost, deletes them, and then copies the updated files to that folder. This allows you to upload your latest file states to a Claude project without having to manually click through all your folders to select and upload specific files.

Hope y'all find this useful!

ryanwaldorf · on Nov 5, 2024

Would that be to support SSO? If so, I haven't planned it yet but I could. Would that make this more useful to you?

chrisjc · on Nov 6, 2024

It should be just as easy as adding:

    authenticator="externalbrowser"

Adding a Snowflake connection configuration option that allowed for standard Snowflake connection conventions might be a good option. That way you could connect to Snowflake with your existing configurations (.snowsql/, .snowflake/). Or explicitly specify by adding matching args/params to your project's config.

    # myconf.toml
    [test-connection]
    account=mysfaccount
    authenticator="externalbrowser"
    ...

    # config/config.yaml
    source:
        type: snowflake
        connection:
            file_path: test-connection
            name: ./myconf.toml
        change_tracking_database: melchi_cdc_db
        change_tracking_schema: streams

    sf.connect(connection_name=?, connections_file_path=Path(?).resolve())

ryanwaldorf · on Nov 7, 2024

I just pushed the external_browser branch to github that should offer 1/ externalbrowser authentication and 2/ the ability to use TOML files with profiles with instructions on how in the readme. if you run the following you should be able to test it out.

     git clone https://github.com/ryanwith/melchi.git
     git pull external_browser
     # remaining steps are the same

Would you mind testing it out and letting me know if it works for you? Would really appreciate it!

You can also let me know if it works in discord here: https://discord.gg/bTg9kJ92

ryanwaldorf · on Nov 6, 2024

Thank you! Will take a look at this over the next few days

pratio · on Nov 6, 2024

absolutely, without sso, I can't even try it. Our policies don't allow using a password.

ryanwaldorf · on Nov 7, 2024

just replied to chrisjc that I created a branch with externalbrowser auth and pushed it to github. would you mind taking a look and letting me know if it works for you? would love to get your feedback as I don't have SSO set up in my account to test this myself

pratio · on Nov 9, 2024

thank you, will try and let you know

ryanwaldorf · on Nov 5, 2024

That's really interesting! Could you tell me a bit more of what you're thinking? I'm not the most familiar with SQL Mesh and the typical workflows there.

chrisjc · on Nov 6, 2024

Perhaps similar to https://github.com/duckdb/dbt-duckdb , but SQLMesh instead of DBT obviously.

ryanwaldorf · on Nov 6, 2024

Ah gotcha! Do you have a use case where you'd look to remodel/transform the data between warehouses?

chrisjc · on Nov 6, 2024

Not the original parent, so unsure of their use-case. But I've seen the approach where some/basic development can be done on duckdb, before making its way to dev/qa/prd.

Something like your project might enable grabbing data (subsets of data) from a dev enviroment (seed) for offline, cheap (no SF warehouse cost) development/unit-testing, etc.

ryanwaldorf · on Nov 6, 2024

This makes sense, thank you!

ryanwaldorf · on Nov 5, 2024

I wanted to start with duckdb since it's really an incredibly powerful tool that people should try out. The performance you can get on analytical queries running on your local compute is just really impressive. And with snowflake streams you can actually stream live data into it without changing anything about your existing data. On why not other databases, I wanted to focus on OLAP to start as there are already other great tools like DLT that help you load data from OLTP sources like postgres and mysql to OLAP sources already, but OLAP to OLAP is pretty rare.

Have you run into a use case for streaming data between data warehouses yourself yet? If so which warehouses?

ryanwaldorf · on June 11, 2024

Played with this and it did a great job of creating the initial framework for my most-recent react app based on a few paragraphs of text. Would recommend trying it out

ryanwaldorf · on June 10, 2024

You can download one of the NZ government's Annual Enterprise Surveys here: https://www.stats.govt.nz/large-datasets/csv-files-for-downl.... Then simply go to the Excel to SQL tab (https://sqlgenerator.io/#/sql-converter/excel-to-sql) and press upload file. It will generate SQL CREATE TABLE and INSERT INTO TABLE statements for you. You can then download it as a file or simply copy and paste it into your query editor.

Additionally, if you want to make any changes like including only certain columns or changing column types/names you can do that via the inputs below the generated SQL. Any changes you make are immediately reflected in the SQL generated.

ryanwaldorf · on June 10, 2024

What's the benefit of this approach?

brudgers · on June 10, 2024

One possible advantage I see is it creates a 1:1 correspondence between a website and a file.

If what I care about is the website (and that's usually going to be the case), then there's a single familiar box containing all the messy details. I don't have to see all the files I want to ignore.

That might not be a benefit for you and not having used it, it is only a theoretical benefit in an unlikely future for me.

But just from the title of the post, I had a very clear piccture of the mechanism and it was not obvious why I would want to start with a different mechanism (barring ordinary issues with open source projects).

But that's me and your mileage may vary.

unlog · on June 10, 2024

That the page HTML is indexable by search engines without having to render in the server. Such unzipping to a directory served by nginx. You may also use it for archiving purposes, or for having backups.