I've been using the JS version of langchain for a few months now, and despite there being a lot of valid criticism, (especially around the abstractions it provides) I'm still glad to be using it.
We get the benefits of a well used library, which means making certain changes is easy. For example, swapping our vector database was a one line change, as was swapping our cache provider. When OpenAI released GPT-4, we were able to change one parameter, and everything still just worked.
Sure, it's moving fast, and could use a lot better documentation. At this point, I've probably read the entire source code several times over. But when we start testing performance of different models, or decide that we need to add persistent replayability to chains of LLM calls, it should be pretty easy. These things matter to production applications.
It's not. The API is different, since GPT-4 is a chat based model, and davinci isn't. It's not a huge difference, but these little sort of things add up.
Arguing that Tailwind is a leaky abstraction for CSS is like arguing that ORMs are leaky abstractions for SQL; hiding the underlying implementation isn't the point of these tools.
The biggest benefit is that you get a fairly well thought out API to work with. In the case of Tailwind, this a pretty flexible and good set of defaults that works for 95% of use cases. You can focus more on building classes for cases specific to your site, and not spend time rebuilding undifferentiated layout utilities.
I launched mergecaravan.com to deal with a problem I've encountered at a few jobs in the past. Mostly, I built it initially to scratch my own itch, and to go back to doing more django dev.
Basically, you can add a label to a PR in github and it will then queue it up to be merged once all the required checks pass, and it keeps queued PRs "up-to-date".
It's made a little money, but not much.
Github recently rolled out a feature at Github Universe that has overlap, so I'm guessing it won't get much more traction.
A few lessons I learned:
- Especially when building on a platform, make sure you have the right niche. In this case, it probably has a wide enough audience that Github decided it was worth it to build as part of the platform.
- Like any engineer, I spent too long building and let scope creep delay me from launching.
All in all, it's pretty cheap (read: basically free) to run, but I probably committed somewhere over 120 hours on it.
Beyond that, there are a couple of things that I don't believe bors has.
A big one that I built for myself was the idea of supporting "working hours", i.e. only merge code during this timeframe.
For example, one company I worked for had some pretty flaky tests. Unfortunately, what would happen is we would have several PRs get reviewed and then use a tool like bors to enqueue them. Inevitably, some queued PR would have a failed check for this flaky test.
Fast forward to the weekend and a completely different developer would merge a hotfix into master. Unfortunately, a side effect would be that a tool like bors would try to merge the head commit into the PR with the flaky test, and now it passes! So it gets deployed at a random unexpected time, which isn't what we wanted.
I've used StitchData at a startup with AWS Redshift. Pair it with something like dbt for transforming your data, and you have a great match. A little pricey, but totally worth it, IMO.
Chewse | Fullstack Developer | San Francisco | ONSITE | https://www.chewse.com | $115-162k
Chewse is weird little family who works with offices to run their meal programs.
We're looking for individuals who want to work as part of a small team, and have a lot of responsibility for what they produce. Humble confidence strictly required. Previous experience with Python and JS nice, but not required.
Process: Initial phone screen, technical phone screen and take home question, video chat, full day onsite
Having a clear salary expectation up front just saves both parties time and effort. One of the big reasons salary ranges aren't always public, is because it benefits the company, by maintaining information asymmetry. By but doing that, you're at least in a small way, telling me that you don't want to play that game and are more likely to be transparent with me about other parts of the process.
I was the first engineer hire at a small startup, and have helped our CTO/Co-founder grow the company over the past 3 years.
All I can say, is that almost everything that was said here was the exact experience we had. Even down to the choice of Angular 1.X and rewriting all of the IIFEs in our codebase to use ES6 imports with babel.
I also need to acknowledge that PM is something that you do fine with 2 people, but your processes will fall apart, probably as soon as you even hit 4 or 5 people.
While I resonate with the sentiment, I just wish Python would add better syntax for functional programming. Having written a lot of JavaScript lately, I wish Python's built-in functional tools supported something cleaner, kinda like Underscore/Lodash.
Send to me something like Coconut which implements that in a way which accepts existing Python but adds cleaner functional syntax is one way of making the case for that in future Python.
Is there a more Pythonic way to do it? Lambdas are cool but usually not the first place you go in Python. I would think something like (my best guess, not a Python pro)...
sum_of_squares = sum([x*x for x in arr])
Which I think is easier to read than either example post above.
Of course you will point out that this is less powerful than full map and reduce.. but meh... pros and cons to both styles
Worth noting that map() can be parallelized whereas a list comprehension can't necessarily (since it is an explicit loop). The multiprocessing module allows trivial map parallelization, but can't work on list comprehensions.
So I have coded everything from dumb web servers (tm), to high performance trading engines (tm). I have toyed with doing the list in parallel thing... and used it in a toy GUI tool or two I wrote... but never really found it that useful in the real world. If you actually want high performance, doing a parallel map is not going to be fast enough. If you are a dumb web server, it's a waste of overhead 99% of the time.
But hey, if you want to use map when you actually need to do a parallel map, cool. But seems very very uncommon. ~ 1 in 10,000 maps I write.
That example works only because the function sum is already defined in Python. If you wanted to do something less common than summing up elements you would have to either use reduce or implement a for loop.
In Python 3, reduce was intentionally moved into the functools library because it was argued that its two biggest use cases by far were sum and product, which were both added as builtins. In my experience, this has very much been the case. Reduce is still there if you need it, and isn't any more verbose. The only thing that is a little bit more gross about this example is the lambda syntax; I would argue that even that is a moot point, however, since Python supports first-class functions, so you can always just write your complicated reduce function out and then plug it in.
I just counted the number of reduce I used in my current python project (6k lines). reduce comes up 32 times. And by comparison, map is used 159 times and filter 125 times - for some reason I tend to use list comprehensions less than I should.
That seems like an argument against lambda functions in general - why use lambdas when you can define a static function for every case? Well, the answer in my opinion is because it makes code more readable if you can define a simple lambda function instead of having to name every single function in the code base.
What's the advantage of list comprehension over lambdas (assuming the lambda syntax is decently lightweight)?
I feel like I come down hard on the side of lambdas, but I've never really spent enough time in a language with list comprehension, so there's a good chance I'm missing something.
how can you come down hard on the side of one when you've never experienced the other?
I'm from a non-list-comprehension background too, but recently started working a lot in a large python codebase, and have found the dict/list comprehensions to be beautiful. I'm a huge fan. It's a shame lambda syntax is not the best and it's generally crippled, but comprehensions are a great 80/20 compromise for handling most cases very cleanly.
I find it a lot easier to read, part of which is that I'm used to the Scala way of sequence dot map function. When I see the python one I can't remember if the function comes first or the array.
I'm not positive, but I think it saves the need to create a new execution frame for each lambda call, since the whole loop executes in single frame used by the comprehension.
In theory I suppose the VM could have a map() implementation which opportunistically extracts the code from a lambda and inlines them when possible; but doubt CPython does that. OTOH, I'd be surprised if PyPy doesn't do something like that.
I'm not meaning when the comprehension is invoked, but during each iteration of the loop within the comprehension.
When doing something like `map(lambda x: 2+x, range(100))`, there will be 101 frames created: the outer frame, and 100 for each invocation of the lambda.
Whereas `[2+x for x in range(100)]` will only create 2: one for the outer frame, and one for the comprehension.
I think lambda syntax can be a bit cumbersome, but that aside what I really miss is a clean syntax for chaining functional operations. So often I find myself thinking about data in terms of 'pipelines'. i.e. in JS:
The bigger problem remains: lambda functions are hideous in Python. map() will forever be ugly if you try to use it in the same way it is used in most functional languages.
This sort of API is hard to implement in Python though, because there's no formal notion of interfaces, so you cannot extend all iterables generically. So you need to use free functions (which don't read well when chained) or a wrapper object (ick).
I've been thinking that it might be nice to use chaining (though I didn't know it had a name) in ordinary mathematical notation too, writing "x f g" instead of "g(f(x))".
The latter can't really happen given Python is a statements- and indentations-based language. You'd need some really weird meta-magical syntax which really isn't going to happen in Python. Although you can cheat by fucking around with decorators e.g.
def postfix(fn, *args):
return lambda arg: fn(arg, *args)
@postfix(map, range(5))
def result(i):
return i * i
print result
# [0, 1, 4, 9, 16]
(`postfix` is necessary because `map` takes its argument positionally so it's not possible to pass in the sequence with functools.partial)
> The latter can't really happen given Python is a statements- and indentations-based language.
Yeah, though I suppose you could hack around that and get nearly-full functionality in lambdas if you built a library that either wrapped non-expression statements in functions or provided equivalent functions. There are obviously some statements that there aren't good solutions for in that direction.
OTOH, using named functions is in many cases more readable -- in the context of what is otherwise a normal Python codebase -- than the kind of lambdas that you can't easily write in Python. But I like the Coconut approach of but providing a more concise syntax for the kind of lambdas Python already supports.
A lambda can only contain a single expression, by "full anonymous function" I'm guessing hexane360 means multiple statements. You can't put a for loop or a context manager in a lambda for instance.
Well you might be able to if you add a bunch of named function combinators wrapping these, but definitely not with only lambdas, unless you define your combinators using `ast`, which I think would let you define statements via expressions.
We get the benefits of a well used library, which means making certain changes is easy. For example, swapping our vector database was a one line change, as was swapping our cache provider. When OpenAI released GPT-4, we were able to change one parameter, and everything still just worked.
Sure, it's moving fast, and could use a lot better documentation. At this point, I've probably read the entire source code several times over. But when we start testing performance of different models, or decide that we need to add persistent replayability to chains of LLM calls, it should be pretty easy. These things matter to production applications.