Code Llama, a state-of-the-art large language model for coding

daemonologist · on Aug 24, 2023

Works nearly out of the box with llama.cpp, which makes it easy to try locally: https://github.com/ggerganov/llama.cpp/issues/2766

Here's some output from q4_0 quantization of CodeLlama-7b-Python (first four lines are the prompt):

    # prints the first ten prime numbers 
    def print_primes(): 
        i = 2 
        num_printed = 0 # end of prompt
        while num_printed < 10:
            if is_prime(i):
                print(i)
                num_printed += 1
            i += 1

    def is_prime(n):
        i = 2
        while i * i <= n:
            if n % i == 0:
                return False
            i += 1
        return True

    def main():
        print_primes()

    if __name__ == '__main__':
        main()

It will be interesting to see how the larger models perform, especially after community tuning and with better context/prompting.

d0mine · on Aug 24, 2023

Simple, concise, more efficient:

  def primes_upto(limit: int):
    """Generate prime numbers < *limit*.""" 
    # Sieve of Eratosthene
    is_prime = [True] * limit 
    for n in range(2, limit):
        if is_prime[n]:
            yield n  # found prime number
            for c in range(n*n, limit, n):  # start with square, less values are marked already
                is_prime[c] = False # mark composites

  if __name__ == "__main__":                
    from itertools import islice
    
    print(*islice(primes_upto(100), 10)) # -> 2 3 5 7 11 13 17 19 23 29

someplaceguy · on Aug 24, 2023

Yeah, but yours was generated by the "post unoptimized code to HN and wait for someone to optimize it" model, which, although free and doesn't require a GPU, is a much slower model.

turnsout · on Aug 24, 2023

Someone should turn this into a product! You highlight the code you want to optimize, and it posts it to hn as a semi-contextually-appropriate comment to invite code golfing, and the highest rated reply gets posted back to your repo as a PR.

badloginagain · on Aug 25, 2023

We then train the LLM on all code optimized like this.

easygenes · on Aug 25, 2023

What are some existing data source that are somewhat analogous to this? e.g., Project Euler.

seanthemon · on Aug 26, 2023

There's also code golf stackexchange[0] and iirc reddit has some daily programmer challenges too

[0] http://codegolf.stackexchange.com/

saurik · on Aug 24, 2023

But, unless you are trying to find a prime number low enough that you might as well look it up in a pre-generated table, it might still be end-to-end more efficient?

someplaceguy · on Aug 24, 2023

Ah, good point :) Touché!

kilolima · on Aug 25, 2023

For printing the first 10 prime numbers, there's a one line solution to this problem:

print("1, 2, 3, 5, 7, 11... and so on!

kordlessagain · on Aug 25, 2023

That code shows me a parse error.

tuukkah · on Aug 25, 2023

That can't be, because primes are numbers greater than 1.

timjver · on Aug 25, 2023

They said nothing about not printing any non-primes.

imglorp · on Aug 25, 2023

This is a good point; English alone is an awful specification language. Everyone knows what you mean by "print the first ten prime numbers" but only by idiom and cultural context. This is why LLM is a good fit here, because -- when it works -- it includes all three aspects.

tuukkah · on Aug 25, 2023

I suppose it's indeed possible to read the assignment as "print whatever you like as long as it includes 10 or more primes" but I'd fail you for that based on not being able to communicate with humans.

quickthrower2 · on Aug 24, 2023

Funny watching HN be nerd sniped by a machine :-)

orm · on Aug 25, 2023

How did you get access to the model?

I have been waiting for weeks, and am still waiting, to get access to Llama2 (released a month+ ago), and access to this model goes through the same form, so I'm not very hopeful. Are you getting it from other methods?

madduci · on Aug 25, 2023

They have updated their readme in the GitHub repository

rednab · on Aug 25, 2023

Interesting here and in some of the other comments in this thread is that 1 is not a prime number ¹)!

Now granted, that's more or less by definition and I don't doubt there's communities and fields where it is considered one, but still shows some of the subtleties at play when using language models.

¹) https://www.google.com/search?q=is+1+a+prime+number

Renaud · on Aug 26, 2023

Whether 1 should be a prime number or not wasn't clear-cut for centuries.

Current consensus has settled on excluding 1 from the definition, but there are examples of publications well into the 20th century that still included 1 as prime.

Fascinating read about the subject: https://arxiv.org/pdf/1209.2007.pdf

tuukkah · on Aug 25, 2023

Also interesting that nobody called out the AI's bullshit regarding is_prime:

  -1 True
  0 True
  1 True
  2 True
  3 True
  4 False

FrozenSynapse · on Aug 25, 2023

how much VRAM do you need to run quantised 7b model?

RossBencina · on Aug 25, 2023

Rough calculation: typical quantization is 4 bit, so 7b weights fit in in 3.6GB, then my rule of thumb would be 2GB for the activations and attention cache (not usually quantized). So 6 or 8 GB VRAM would probably do it. llama.cpp will let you offload your choice of layers to GPU, so you could probably get quite a way with 4GB.

blibble · on Aug 24, 2023

I'd fail an interview candidate that suggested adding 1 each time for subsequent prime testing

csmpltn · on Aug 24, 2023

> "I'd fail an interview candidate that suggested adding 1 each time for subsequent prime testing"

Congratulations! You must be that arrogant guy everybody hates interviewing with, the one with the superiority complex.

How about instead of just failing people over literally nothing (wasting everybody's time and money) - just ask the candidate whether they could somehow reduce the search space by utilizing the properties of a prime number?

hyperbovine · on Aug 25, 2023

How many even numbers are prime?

wavemode · on Aug 25, 2023

hyperbovine · on Aug 25, 2023

Very good. So why is it necessary to check all of them.

Biganon · on Aug 25, 2023

It is not, and nobody said it was.

awwaiid · on Aug 25, 2023

Two when you include -2, which I certainly think we should in this circumstance.

chaboud · on Aug 25, 2023

Are we including it just to poke the bear? Prime numbers are typically defined as numbers with no positive divisors other than one and the number in question.

TeMPOraL · on Aug 25, 2023

A definition which marks 1 as being prime :).

tuukkah · on Aug 25, 2023

This thread started with AI code whose correctness of is_prime no-one has questioned so far :-D

  -1 True
  0 True
  1 True
  2 True
  3 True
  4 False

chaboud · on Aug 26, 2023

Well, I’m questioning it… not that it really matters.

-1, 0, and 1 are no good.

Two divisors for positive integer X: 1, X

I recently demonstrated GPT-4 via by having it explain phases of matter metaphorically with animals. Though impressive, it points to loose abstraction being more in range than firm abstraction. Let’s get that model a python interpreter and see how far we can take this party. Until then, I’m going to stick to explaining concepts.

————- included here for fun

Alright! Let's dive into the wild world of matter using some animal friends as examples:

1. *Solids* - Think of solids like a herd of elephants standing closely together. The elephants are packed in tight, barely moving, just maybe swaying a bit. They're sturdy and strong, just like solid things in our world. This is because the particles in a solid don't move around much; they just vibrate in place.

2. *Liquids* - Imagine a school of fish swimming in a pond. They're free to move around, weaving in and out, but they still stay pretty close to each other. They aren't packed as tightly as our elephant friends, but they aren't completely free either. This is like liquids: the particles are close, but they can move around and flow, just like water in a glass.

3. *Gases* - Now, picture a flock of birds soaring in the sky, free to fly in all directions. These birds aren't sticking close to one another; they're spread out, enjoying the vast space of the sky. In gases, the particles are like these birds, very spread out and moving all over the place.

4. *Plasma* - Think of plasma like dragons (I know they're mythical, but bear with me). These dragons breathe fire, and that fire is so hot and energetic that it can change the way things behave. Plasma is like that – it's gas that's become so hot that the particles are super energized and can even glow, like neon lights.

5. *Bose-Einstein Condensate (BEC)* - This one's a bit trickier, but imagine penguins in Antarctica. They huddle closely together to keep warm. BEC is like the coldest group of particles ever, where they start acting in strange, uniform ways, almost like one giant super-particle. It's like all the penguins moving together as one.

So, next time you think of matter, just remember our animal friends: the sturdy elephants, the flowing fish, the free-flying birds, the fiery dragons, and the huddled penguins!

TeMPOraL · on Aug 26, 2023

> Two divisors for positive integer X: 1, X

Again, 1 fits, because it has two divisors: 1, and 1. You never said X != 1, nor does the definition upthread.

This isn't a silly gotcha, these things matter in math. For instance, when solving a quadratic equation, allowing for the solutions to be equal lets you avoid special-casing your understanding - instead of memorizing when the equation has zero, one or two solutions, you just learn it has two or zero (real) solutions, and the two solutions are allowed to be equal.

It's perfectly reasonable understanding. Inequality isn't a natural implicit assumption. E.g. if I say you have two variables:

  int a;
  int b;

I doubt you'd be insisting that `a != b` at all times.

chaboud · on Aug 26, 2023

No. In the set of integers, 1 is not two divisors if repeated. It is one divisor.

An element to set relationship is that element A is or is not in set B. So, if the set of divisors only contains 1, there is only one divisor. If 1 and 1 made two divisors, 1 and 1 and 1 and… would make infinite divisors, rendering the concept of counting divisors (i.e., the cardinal number of the set of divisors) meaningless.

M is a divisor of N if it is a number that divides N without a remainder. While divisors can be negative, they are conventionally limited to non-negative integers in primality and factoring.

If you’d wanted to dig in on negative vs. positive divisors, that quickly provides an avenue for clearer formality, but piling on 1 and saying it’s not a silly gotcha is pretty fruitless. And please don’t bother to say “you didn’t say it has only two divisors”, as that would, again, be a silly argument.

So wind back and really formalize the definition if you want: A prime number is a natural number with only two divisors in the set of natural numbers, 1 and itself.

While set theory is axiomatic, it’s not practical for me (or anyone else) to explain conventional foundations to avoid someone feeling like they need to wiggle out of a prior bad argument.

Just say “ah, okay” or stop replying and move on. Feel free to read up in Wikipedia or any other texts (or ping me privately if you’d like to discuss further), but this thread isn’t looking like it’s going to meaningfully contribute to the broader discussion. Accordingly, I’ll leave it here unless something meaningful comes up.

sam_goody · on Aug 27, 2023

No need for the disclaimer on the dragons, especially as we can't prove they have not existed [even though we both know they haven't].

squeaky-clean · on Aug 25, 2023

Negatives are not prime.

lynx23 · on Aug 25, 2023

Downvoting this is not enough. I'd pay $5 (reddit style) to have this properly killfiled if HN allowed for that. Besides, not "everybody hates them, only those inmature enough to still need intellectual babysitting.

throwmeback · on Aug 25, 2023

Whoa there Satan

tasubotadas · on Aug 24, 2023

Finally we meet the lifeless drone that everybody complains about in the interviews.

My suggestion for your next interview: decide to hire them just based on their leetcode score, but invite to the interview just to flex that you're still better at puzzle solving :-D

Perfect

internet101010 · on Aug 25, 2023

"Use <insert language> and optimize for runtime."

"solution begins with <insert default solution template from leetcode>".

Copy solution from gpt, paste in leetcode, run, submit.

"faster"

Repeat.

Next question.

jckahn · on Aug 25, 2023

So I take it you typically produce fully optimized, thoughtful, and correct code on the first iteration while being actively judged by a stranger, yes?

maleldil · on Aug 24, 2023

I assume you meant that you should add 2? If yes, that's such a mind boggling basic thing to do that I agree with you, and it makes no sense that you're being crucified.

warent · on Aug 25, 2023

  it makes no sense that you're being crucified.

Probably because there's significant overlap in the Venn diagram of people with years experience who professionally develop products that generate $millions in wealth/value, and people who would fail that interview.

Or we have worked with junior developers who have really grown and flourished under our care, who would never have gotten that chance with such insane Draconian judgements.

It's such an obvious "GOTCHA!!" setting someone up for failure.

The way it's framed is very cringy because it signals that they don't care in their interviews about determining how objectively effective a software developer is.

maleldil · on Aug 25, 2023

I don't get it. This is an extremely basic fact that most people can figure out after thinking about primes for a minute. Maybe if you ask for "what's an easy optimisation here?" This would make the candidate think more closely about invariants that their code should hold, which in itself is a very valuable skill.

NhanH · on Aug 25, 2023

Because I know enough not to write prime testing code that resemble anything like that loop to have to care about reducing its search space. If you actually want to test my knowledge about prime number, you can ask and I will tell you about using some probabilistic choice instead, and that I know fast deterministic one might exist, but I am not up to date on the state of the art.

If I have to write the loop above, I am assuming it is the Fizzbuzz equivalent of your company to show that I know how to write a while loop. I am not thinking about reducing the search space because I am writing the code semi-unconscious and frankly just want to get to the next question.

TeMPOraL · on Aug 25, 2023

Same. Also, if I'm suspecting it's a "can they into loops" fizzbuzz test, I'd be wary of reducing the search space for the simple reason that it makes the code slightly more complex, introducing a chance to make e.g. off-by-one error, which would lose me points if the reviewer has the kind of "pedntic over irrelevant things" attitude this subthread is criticizing.

croes · on Aug 25, 2023

You would fail someone about such a trifle?

blibble · on Aug 24, 2023

throwuxiytayq · on Aug 24, 2023

i’d walk out from an interview that asked me to write a prime number generator

belenos46 · on Aug 24, 2023

I've done that (maybe it was fizzbuzz, now that I'm thinking about it) and boy howdy does that get the people you're interviewing with agitated. Saying "I'm interviewing for a architect level container orchestration position. If I'm reinventing the wheel writing algorithms, something is terribly wrong" shuts them up, but doesn't make them any happier.

zaphirplane · on Aug 25, 2023

Is the job role just a for example ?

what does and container orchestration architect do ? Something like this cluster should use envoy and Prometheus. The new clusters rate isn’t usually high enough for the stack to change.

Real question I love these non conventional (swe, sre, pm, manager ) roles in tech

hititncommitit · on Aug 26, 2023

> I'd fail an interview candidate that suggested adding 1 each time for subsequent prime testing

The problem a team that always seeks the optimal solution is that never they get shit done. And that’s rarely optimal in a business context. Your view does not strike me to be nearly as arrogant as it is short-sighted.

I think on a team of one I want the guy who gets it done without thinking. On a team of two I want a guy that’s somewhere in the middle. And on a team of three, that’s when I want my optimiser. Because in the time that guy number 3 has written let’s say 3 files of optimal code, guy number 10 files of not optimal code. And you rarely need guy number 3 to fix all ten, you just need him to fix the one or two files that actually matter.

Clients rarely ask “is this optimal?”. But they always ask “is this done?”.

All three developers have different approaches. All three are assets.

I think on some level then you’re making the same mistake that we could say the “just add one guy” made if your comment is honest- not factoring in (his) speed.

I think code readability, rather than code optimisation is far more important thing to get hung up on in an interview (and is, I must remind some of you, not to be confused with conciseness). And you can see this in the end result. But if you’re following along and the interviewee already knows you know what’s going on because of that, you can see it in smaller ways- it could be as simple as going back and changing a variable name such as open (maybe a function?) to isOpen (almost always a Boolean value).

I think most of us are in this position pretty often where we’re writing and figuring out something at the same time, maybe we just see if it works first and don’t give a name much thought or maybe we change the actual value or type of a variable in the process and the variable name becomes ambiguous. I’d look for small indicators that shoe me that this person is considering readability. But I still don’t necessarily expect it to be as readable as it would be in a more natural setting I mean I think 90% of these sorts of things start off with the person saying “I swear I can type” within the first five minutes of being watched- if they get flustered while being watched that it effects their typing, it certainly also effects their coding as well.

squeaky-clean · on Aug 25, 2023

You'd reject a candidate that is willing and legally able to work for free while also cloning themselves so they can pair program with every one of your employees at once?

5Qn8mNbc2FNCiVV · on Aug 25, 2023

I would not ship code just because of those reasons.

dontupvoteme · on Aug 24, 2023

Simply prompting the output with "Optimize " prepended adds your suggestion, and some others.

droopyEyelids · on Aug 24, 2023

The simple-to-understand, greedy algorithm is always the correct first choice till you have to deal with a constraint.

blibble · on Aug 24, 2023

it's not that though, there's several other typical optimisations in there

just not the super obvious one that demonstrates extremely basic understanding of what a prime number is

jpeterson · on Aug 24, 2023

Having "extremely basic understanding" of prime numbers immediately at one's command is important for approximately 0% of software engineering jobs. If you instant-fail a candidate for this, it says a lot more about you and your organization than the candidate.

SideQuark · on Aug 24, 2023

Approx 0% of devs need to know what the earth is, but from lots of interviews I've given I've found consistent correlation between lack of basic knowledge and lack of ability to solve many things. It was so strong we found it much more cost effective to cut people early that didn't know at least a few of some standard knowledge items.

chaxor · on Aug 25, 2023

This is some really good advice here. It's always a good idea to throw out all candidates that can't immediately recall what the first theoretical result of the rest mass of a Higgs boson was in the first paper describing was. Basic knowledge like this just correlates so well with ability to make proper decisions in API architecture.

SideQuark · on Aug 27, 2023

I'd also save time and money cutting people that read as poorly as you're demonstrating.

Try actually measuring basic knowledge with competency at programming before thinking your opinion is better than measured data. Peer reviewed research finds similar results [1].

And yes, we tested all this carefully before enacting it. Interviews cost time and money, so giving 100% on every candidate despite quick signals is a waste of time and money that would be better spent on other candidates. If you want the best outcome then you allocate scarce resources based on expected returns, not on unfounded beliefs.

[1] https://helloworld.raspberrypi.org/articles/hw12-language-sk...

blibble · on Aug 24, 2023

> If you instant-fail a candidate for this, it says a lot more about you and your organization than the candidate.

yes, we expect professional software developers to have basic maths skills

"what is a prime number" is taught to 7 year olds, it's not vector calculus

what else would you consider to be an unreasonable thing for an employer to require?

reading and writing skills of a typical 7 year old?

daok · on Aug 24, 2023

You probably do not have a child of 7 years old because they do not know at that age what is a prime number.

Second, basic math still that you never or rarely use or with very large time between usage might get rusty. You may understand the concept but not find the optimal solution. The way you are responding here shows quite a lot about how you are short sighted by instant-failing someone with a single question instead of trying to asses the whole person as much as you can. On you side, you are wasting opportunity to have a great person that could be a key player in your team by bringing other set of skill on the table.

blibble · on Aug 24, 2023

> You probably do not have a child of 7 years old because they do not know at that age what is a prime number.

it's part of the curriculum for children of this age where I grew up (I did check)

> The way you are responding here shows quite a lot about how you are short sighted by instant-failing someone with a single question instead of trying to asses the whole person as much as you can. On you side, you are wasting opportunity to have a great person that could be a key player in your team by bringing other set of skill on the table.

it may also be the case that I have more in depth knowledge about the roles that I've interviewed candidates for

most recently: hiring people to work for quants

not instantly knowing that even numbers (other than 2) are not prime is a very strong signal

hyperbovine · on Aug 25, 2023

> You probably do not have a child of 7 years old because they do not know at that age what is a prime number.

A few do. And in 20 years you're reallyreally going to want to hire them.

jpeterson · on Aug 25, 2023

You're not testing for "basic math skills" here. What you're testing for is more like "immediately retrieves an irrelevant math fact after many years of having no need to think about it."

Look, if you think this sort of thing allows you to identify great candidates, good for you. But in my experience, not only is this kind of practice stupid on its face, but it leads to engineering orgs packed with people who are good at memorizing trivia but terrible at solving real problems.

ungruntled · on Aug 24, 2023

I think the key problem here is that is is a bad programming question. If you know anything about prime numbers then coming up with an answer is trivial. If you expect a more optimized solution, then you are really only gauging the interviewee’s understanding of prime numbers. So effectively the interview is more about mathematics than it is about programming or problem solving.

Mawr · on Aug 25, 2023

You happen to remember a particular piece of knowledge, so you project that expectation onto others. Theory of mind.

> yes, we expect professional software developers to have basic maths skills

Skill != knowledge. "What is a prime number" can be looked up and understood by any competent programmer in <5 minutes.

> "what is a prime number" is taught to 7 year olds, it's not vector calculus

Then it's reasonable to expect that an interviewee would be able to learn it as well, given the same resources. It does not however follow that an interviewee would inherently have that knowledge, just because 7 year olds are taught it.

Bottom line is, you're making too many assumptions about complete strangers.

rini17 · on Aug 25, 2023

If they know "prime number" is some technical term to look up. They might confuse it with amazon prime or anything else depending on context. You waste time explaining, they get indignant they are supposed to coding not do maths, complete mess.

noduerme · on Aug 24, 2023

I'm mad at myself now that it has eaten 15 minutes of my time trying to come up with the right optimization. What's the trick? 2, +1, and then +2 from there on seems obvious but once you get to 9 is it worth building a list of nonprimes to skip?

thewataccount · on Aug 24, 2023

I think they're suggesting simply doing +2

+1 is not a good idea since ~half of all numbers are effectively non-prime simply by being even numbers.

You can double the speed by using +2 without using any fancy tricks, just changing a single character.

nojs · on Aug 25, 2023

Well it doesn’t double the speed, since anything with a factor of 2 undergoes only one loop iteration inside is_prime. It basically just saves a function call

bakuninsbart · on Aug 25, 2023

Exactly, multiples of 2 can't be a prime, so skipping this saves a function call (that itself only does one comparison), so it is a very simple and relatively obvious optimization. But actually every multiple of a number per definition cannot be a prime, you can't just change the step size, but there are some nifty tricks to approach the problem from a different angle.

Our_Benefactors · on Aug 24, 2023

https://stackoverflow.com/a/54544012/1336678

Common approach is to use square roots, this reduces the runtime. Recommend checking out project euler if you like solving hard math-code-o(n)-puzzles.

noduerme · on Aug 24, 2023

I didn't want to cheat by looking on S.O. but thanks ;)

Yes it makes sense (in the GPT code) that you'd only go up to i * i ... although looking at pythonic while: statements is just gross to me in this context, it would feel a lot more readable to say, e.g. in PHP:

for ($i=2;$i<sqrt($n);) { $i+=($i==2 ? 1 : 2); //although the first one should just be outside the loop }

dahfizz · on Aug 24, 2023

There's lots of more advanced optimizations, which would be an interesting avenue for discussion in an interview, but the drain dead algorithm would just use +2 instead of +1

beanjuiceII · on Aug 25, 2023

Also don't these only ever need to be computed once

redox99 · on Aug 24, 2023

The highlight IMO

> The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.

Edit: Reading the paper, key retrieval accuracy really deteriorates after 16k tokens, so it remains to be seen how useful the 100k context is.

nabakin · on Aug 24, 2023

Looks like they aren't releasing a pretty interesting model too. In the paper they mention a "Unnatural Code Llama" which wipes the floor with every other model/finetune on every benchmark except for slightly losing to Code Llama Python on MBPP pass@100 and slightly losing to GPT-4 on HumanEval pass@1 which is insane.

Meta says later on that they aren't releasing it and give no explanation. I wonder why given how incredible it seems to be.

EvgeniyZh · on Aug 24, 2023

Note that current GPT-4 pass@1 for HumanEval is closer to 90% than to 67% reported in GPT-4 technical report, as reported, e.g., in [1]

[1] https://arxiv.org/abs/2305.01210

cosmojg · on Aug 28, 2023

Unfortunately, we have no idea whether GPT-4 has been further trained or finetuned on contaminated data since then.

nabakin · on Aug 24, 2023

Good point, I guess Meta should be using that number in their chart

jonchurch_ · on Aug 24, 2023

The paper states it was instruction fine tuned with synthetic data (LLM generated instructions) ala another paper (“Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor”).

The github repo associated with that paper is linked below. It links to the paper on arxiv, but also has some data in the repo.

https://github.com/orhonovich/unnatural-instructions

ilaksh · on Aug 24, 2023

Maybe they used GPT-4 to train it. OpenAI terms of use don't allow that to be released commercially.

nabakin · on Aug 24, 2023

I've seen this argued a lot but is it fact? OpenAI was able to train on data from other platforms and surely, those platforms weren't letting their data go if they could help it. Unless some new laws have been passed, I don't think OpenAI can legally prevent others from using their data to train models. OpenAI can't have their cake and eat it too. After all, any content generated by AI can't be copyrighted.

lhl · on Aug 24, 2023

It is indeed a fact that OpenAI's Terms of Use do state that you can't use their service to develop competing models: Section 2.c.iii - https://openai.com/policies/terms-of-use

Now of course, the terms are not the law (so don't govern the use of the generated data by any third party), they are an agreement between two parties. If you did click "agree" then that's a binding agreement and there could be legal/contractual repercussions (some of which are outlined in the terms).

haldujai · on Aug 24, 2023

That seems like a likely explanation, probably won't get into legal trouble for using an OpenAI model for a research paper but redistributing said model may be upsetting enough for OpenAI trigger a legal challenge.

Unnatural language used davinci-002 although that was a while ago, they only say "similarly" in this paper and don't specify what they used. I can't see a reason why they wouldn't be releasing it if the unnatural prompts were generated by LLaMA2-family.

In any case, replicating this training seems trivial and very cheap compute-wise for anyone who wanted to do it.

nkohari · on Aug 24, 2023

This is the most likely explanation for both why they wouldn't release it and wouldn't explain why.

kapp_in_life · on Aug 24, 2023

Likely trained on internal code.

mediaman · on Aug 24, 2023

That model is trained on synthetically AI-generated code, not internal code.

It suggests that synthetic training could be the future in increasing capability of smaller models (and perhaps bigger ones too). AI will train AI.

haldujai · on Aug 24, 2023

I thought this specific model was referring to self-instruction using both synthetic prompts (generated from few-shot in-context prompting of presumably some OpenAI model, the original paper used text-davinci-002) as well as synthetic code (presumably Code Llama 7 like for self-instruct) subsequently validated with execution?

The differences being it's not just training on unvalidated synthetic data and this specific method (per the unnatural questions paper) results in increased instruction diversity which confers some added advantage and I'm assuming explains the performance gain over the also synthetic self-instruct code?

I may be misunderstanding but this seems more nuanced than just training on synthetically AI-generated code and is more validating of synthetic instructions (i.e. low resource setting) rather than synthetic code (i.e. high resource setting).

sroussey · on Aug 24, 2023

That is the basis for https://synthesis.ai/

SubiculumCode · on Aug 24, 2023

I'm an amateur, but it seems to me that methods to synthesize will have to be distinct from methods of the generative model.

brucethemoose2 · on Aug 24, 2023

Did Meta add scalable rope to the official implementation?

snippyhollow · on Aug 24, 2023

We changed RoPE's theta from 10k to 1m and fine-tuned with 16k tokens long sequences.

malwrar · on Aug 24, 2023

Curious, what led you to adjusting the parameters this way? Also, have you guys experimented with ALiBi[1] which claims better extrapolative results than rotary positional encoding?

[1]: https://arxiv.org/abs/2108.12409 (charts on page two if you’re skimming)

ttul · on Aug 24, 2023

Undoubtedly, they have tried ALiBi…

riku_iki · on Aug 24, 2023

> The Code Llama models provide stable generations with up to 100,000 tokens of context.

what is the trick to achieve 100k context? They can't just use 100k wide transformer layer, it is cost prohibitive, right?..

littlestymaar · on Aug 24, 2023

I'm pretty sure they don't do that, but for code the relevant relationship between two tokens is easy to determine with the semantics of the language alone (for instance you can say that tokens related to a local variable have no relationship with tokens outside), so it would lead to a sparse matrix in the transformer, reducing the cost of big contexts by a lot. But it would require language specific preprocessing, and whether you can make it fast is also dubious. I don't think it's been tried so far.

up6w6 · on Aug 24, 2023

Even the 7B model of code llama seems to be competitive with Codex, the model behind copilot

https://ai.meta.com/blog/code-llama-large-language-model-cod...

SparkyMcUnicorn · on Aug 24, 2023

I'm not sure copilot is using codex anymore[0]. They've also been talking about a shift towards GPT-4 with "Copilot X" a few times now[1][2].

[0] https://github.blog/2023-07-28-smarter-more-efficient-coding...

[1] https://github.com/features/preview/copilot-x

[2] https://github.blog/2023-07-20-github-copilot-chat-beta-now-...

zarzavat · on Aug 25, 2023

Copilot X is just their name for their project to bring AI to more areas of VSCode. I don’t believe they can use GPT-4 for completions because it’s a chat-optimized model. It seems that they are using something else, that blog post seems to imply it’s a custom-trained model.

cosmojg · on Aug 28, 2023

I use GPT-4 for code completion all the time! There are many Neovim extensions[1][2][3] (and I'm sure there are many VSCode extensions) which call the GPT-4 API directly for code completion. I'm pretty sure the only reason that Microsoft might avoid using GPT-4 for Copilot is cost.

[1] https://github.com/cosmojg/nvim-magic

[2] https://github.com/dpayne/CodeGPT.nvim

[3] https://github.com/aduros/ai.vim

up6w6 · on Aug 25, 2023

True. The results from codex are actually from code-cushman-001 (Chen et al. 2021), which is an older model that Copilot was based on.

ramesh31 · on Aug 24, 2023

>Even the 7B model of code llama seems to be competitive with Codex, the model behind copilot

It's extremely good. I keep a terminal tab open with 7b running for all of my "how do I do this random thing" questions while coding. It's pretty much replaced Google/SO for me.

coder543 · on Aug 24, 2023

You've already downloaded and thoroughly tested the 7B parameter model of "code llama"? I'm skeptical.

bbor · on Aug 24, 2023

It was made available internally, I believe. So this is one of the many Meta engineers on this site —- after all, Facebook is now less hated than Google here ;)

Eddygandr · on Aug 24, 2023

Maybe confused Code Llama with Llama 2?

realce · on Aug 24, 2023

Just sign up at meta and you'll get an email link in like 5 minutes

coder543 · on Aug 24, 2023

Yes, that's not a response to my comment.

No one who has been using any model for just the past 30 minutes would say that it has "pretty much replaced Google/SO" for them, unless they were being facetious.

tyre · on Aug 24, 2023

They said 7b llama which I read as the base LLaMa model, not this one specifically. All of these LLMs are trained on Stack Overflow so it makes sense that they’d be good out of the box.

brandall10 · on Aug 24, 2023

The top level comment is specifically citing performance of code llama against codex.

dataangel · on Aug 24, 2023

GPT4 has replaced SO for me and I've been using it for months.

lddemi · on Aug 24, 2023

Likely meta employee?

MertsA · on Aug 24, 2023

I've been using this or something similar internally for months and love it. The thing that gets downright spooky is the comments believe it or not. I'll have some method with a short variable name in a larger program and not only does it often suggest a pretty good snippet of code the comments will be correct and explain what the intent behind the code is. It's just a LLM but you really start to get the feeling the whole is greater than the sum of the parts.

coder543 · on Aug 24, 2023

I just don’t understand how anyone is making practical use of local code completion models. Is there a VS Code extension that I’ve been unable to find? HuggingFace released one that is meant to use their service for inference, not your local GPU.

The instruct version of code llama could certainly be run locally without trouble, and that’s interesting too, but I keep wanting to test out a local CoPilot alternative that uses these nice, new completion models.

fredoliveira · on Aug 24, 2023

There are a bunch of VSCode extensions that make use of local models. Tabby seems to be the most friendly right now, but I admittedly haven't tried it myself: https://tabbyml.github.io/tabby/

kateklink · on Aug 25, 2023

there's also Refact (https://github.com/smallcloudai/refact/) with support of several open-source code LLMs and extension for VS Code and Jetbrains

ohyes · on Aug 24, 2023

What hardware do you have that lets you run 7b and do other stuff at the same time?

brucethemoose2 · on Aug 24, 2023

Pretty much any PC with 16GB+ of fast RAM can do this, any PC with a dGPU can do it well.

hmottestad · on Aug 24, 2023

Maybe a MacBook Pro. The Apple silicon chops can offload a special AI inference engine, and all ram is accessible by all parts of the chip.

gzer0 · on Aug 24, 2023

An M1 Max with 64GB of RAM allows me to run multiple models simultaneously, on top of stable diffusion generating images non-stop + normal chrome, vscode, etc. Definitely feeling the heat, but it's working. Well worth the investment.

selfhoster11 · on Aug 25, 2023

A 7B model at 8-bit quantization takes up 7 GB of RAM. Less if you use a 6-bit quantization, which is nearly as good. Otherwise it's just a question of having enough system RAM and CPU cores, plus maybe a small discrete GPU.

woadwarrior01 · on Aug 25, 2023

You’ll need a bit more than 7GB (~1 GB or so), even at 8 bit quantization, because of the KV-cache. LLM inference is notoriously inefficient without it, because it’s autoregressive.

kkielhofner · on Aug 25, 2023

Some projects such as lmdeploy[0] can quantize the KV cache[1] as well to save some VRAM.

Speaking of lmdeploy, it doesn't seem to be widely known but it also supports quantization with AWQ[2] which appears to be superior to the more widely used GPTQ.

The serving backend is Nvidia Triton Inference Server. Not only is Triton extremely fast and efficient, they have a custom TurboMind backend for Triton. With this lmdeploy delivers the best performance I've seen[3].

On my development workstation with an RTX 4090, llama2-chat-13b, AWQ int4, and KV cache int8:

8 concurrent sessions (batch 1): 580 tokens/s

1 concurrent session (batch 1): 105 tokens/s

This is out of the box, I haven't spent any time further optimizing it.

[0] - https://github.com/InternLM/lmdeploy

[1] - https://github.com/InternLM/lmdeploy/blob/main/docs/en/kv_in...

[2] - https://github.com/InternLM/lmdeploy/tree/main#quantization

[3] - https://github.com/InternLM/lmdeploy/tree/main#performance

selfhoster11 · on Aug 25, 2023

6-bit quantizations are supposed to be nearly equivalent to 8-bit, and that does chop 1.5 GB off the model size. I think a 6-bit model should therefore fit, or if that doesn't, 5-bit medium or 5-bit small surely will.

There is always an option to go down the list of available quantizations notch by notch until you find the largest model that works. llama.cpp offers a lot of flexibility in that regard.

FrozenSynapse · on Aug 25, 2023

how's the generation speed on CPU?

selfhoster11 · on Aug 25, 2023

On Ryzen 5600X, 7B and 13B run quite fast. Off the top of my head, pure CPU performance is about 25% slower than with an NVIDIA GPU of some kind. I don't remember the numbers off the top of my head, but the generation speed only starts to get annoying for 33B+ models.

_joel · on Aug 24, 2023

If you're willing to sacrifice token/s you can even run these on your phone.

solarkraft · on Aug 24, 2023

Huh? Do you perhaps mean standard Llama?

reacharavindh · on Aug 24, 2023

Code llama Python is very interesting. Specifically tuned for Python.

I wonder if we could make such specific LLMs (one that is proficient in all things Rust, another- all things Linux, all things genomics, all things physics modeling etc) and have them talk to each other to collaboratively solve problems.

That would be a crazy future thing! Putting machines truly to work..

esperent · on Aug 24, 2023

I think this is called "mixture of experts" and also there's a lot of speculation that it's how GPT-4 works, although probably with just a few large models rather than many small ones.

jmiskovic · on Aug 24, 2023

It's been confirmed by multiple (unofficial) sources that GPT-4 is 8 models, each 220B parameters. Another rumor is GPT-4 being 16x111B models.

There's a quite fresh and active project replicating something similar with herd of llamas: https://github.com/jondurbin/airoboros

codercowmoo · on Aug 25, 2023

Can you provide some of these sources?

jmiskovic · on Aug 25, 2023

All of these should have close ties with OpenAI, but no one from inside confirmed it as far as I know.

https://twitter.com/swyx/status/1671272883379908608 https://twitter.com/soumithchintala/status/16712671501017210... https://twitter.com/MParakhin/status/1670666605427298304

brucethemoose2 · on Aug 24, 2023

If you can find a large body of good, permissively licensed example code, you can finetune an LLM on it!

There was a similar attempt for Godot script trained a few months ago, and its reportedly pretty good:

https://github.com/minosvasilias/godot-dodo

I think more attempts havent been made because base llama is not that great at coding in general, relative to its other strengths, and stuff like Starcoder has flown under the radar.

bbor · on Aug 24, 2023

Mark my words: you’ve caught a glimpse of the near future :). Google “Society of Mind” if you’re not yet familiar

seydor · on Aug 24, 2023

Start with a CodeLlama for C, and start treating these systems as natural language compilers. C is low level enough and still readable for those rare moments

Palmik · on Aug 24, 2023

The best model, Unnatural Code Llama, is not released. Likely because it's trained on GPT4 based data, and might violate OpenAI TOS, because as per the "Unnatural" paper [1], the "unnatural" data is generated with the help of some LLM -- and you would want to use as good of an LLM as possible.

[1] https://arxiv.org/pdf/2212.09689.pdf

redox99 · on Aug 24, 2023

The good thing is that if it's only finetuned on 15k instructions, we should see a community made model like that very soon.

syntaxing · on Aug 24, 2023

TheBloke doesn’t joke around [1]. I’m guessing we’ll have the quantized ones by the end of the day. I’m super excited to use the 34B Python 4 bit quantized one that should just fit on a 3090.

[1] https://huggingface.co/TheBloke/CodeLlama-13B-Python-fp16

mchiang · on Aug 24, 2023

Ollama supports it already:

`ollama run codellama:7b-instruct`

https://ollama.ai/blog/run-code-llama-locally

More models uploaded as we speak:

https://ollama.ai/library/codellama

jerrysievert · on Aug 24, 2023

while it supports it, so far I've only managed to get infinite streams of near nonsense from the ollama models (codellama:7b-q4_0 and codellama:latest)

my questions were asking how to construct an indexam for postgres in c, how to write an r-tree in javascript, and how to write a binary tree in javascript.

jmorgan · on Aug 24, 2023

> managed to get infinite streams of near nonsense

This should be fixed now! To update you'll have to run:

  ollama pull codellama:7b-instruct

mchiang · on Aug 24, 2023

still modifying the code completion (foundation / python models) to see what's causing the behavior.

Have had some good success with the instruct model:

codellama:7b-instruct

jerrysievert · on Aug 24, 2023

thanks! this give me some results, but I've had to use a specific construct to get anything meaningful:

using <language> write me a <thing>

it's managed to spit out code, rather than "write a traversal function".

ranguna · on Aug 25, 2023

Nice work with ollama!

Any plans to add the 13B quant models?

syntaxing · on Aug 24, 2023

Same, just tried it and it would give me infinite amount of blank lines

jmorgan · on Aug 24, 2023

Sorry, this should be fixed now! To update you'll have to run:

  ollama pull codellama:7b-instruct

pmayrgundter · on Aug 25, 2023

This is amazing!

I was up and running from clone/build-from-scratch/download in ~5m.

It's running on my M1.. it knows WebGL JS APIs better than I do, makes a passable attempt at VT100 ascii art, and well, should read more about Wolfram Automata, but does seem to know Game of Life!

Thank you so much!

carbocation · on Aug 24, 2023

Similarly, I had it emit hundreds of blank lines before cancelling it.

justinsaccount · on Aug 24, 2023

Maybe it's outputting https://en.wikipedia.org/wiki/Whitespace_(programming_langua... :-)

kordlessagain · on Aug 24, 2023

What fortune, I so happen to need hundreds of blank lines.

syntaxing · on Aug 24, 2023

Whoa, it’s absolutely astounding how fast the community is reacting to these model release!

Pesthuf · on Aug 24, 2023

Isn't ollama terminal only? For code, that wouldn't be good.

natrys · on Aug 24, 2023

They have a server/client model. The binary comes with a basic terminal front-end but you can just create your own self-hosted GUI or editor integration against the API[1]:

[1] https://github.com/jmorganca/ollama/blob/main/docs/api.md

jmorgan · on Aug 24, 2023

Indeed! After pulling a model with "ollama pull codellama" you can access it via the REST API:

  curl -X POST http://localhost:11434/api/generate -d '{                        
    "model": "codellama",
    "prompt":"write a python script to add two numbers"
  }'

TheLegace · on Aug 25, 2023

There is also ollama-ui. https://github.com/ollama-ui/ollama-ui

comechao · on Aug 24, 2023

I'm testing it on my M2 Air (16GB). Quite fast!

stuckinhell · on Aug 24, 2023

What kind of cpu/gpu power do you need for quantization or these new gguf formats ?

syntaxing · on Aug 24, 2023

I haven’t quantized these myself since TheBloke has been the main provider for all the quantized models. But when I did a 8 bit quantization to see how it compares to the transformers library load_in_8bit 4 months ago(?), it didn’t use my GPU but loaded each shard into the RAM during the conversion. I had an old 4C/8T CPU and the conversion took like 30 mins for a 13B.

selfhoster11 · on Aug 25, 2023

I can quantize models up to 70B just fine with around 40-50 GB of system RAM, using the GGMLv3 format.

GGUF seems not optimised yet, since quantizing with a newer version of llama.cpp supporting the format fails on the same hardware. I expect that to be fixed shortly.

For inference, I understand that the hardware requirements will be identical as before.

SubiculumCode · on Aug 24, 2023

i run llama2 13B models with 4-6 k-quantized oin a 3060 with 12Gb VRam

UncleOxidant · on Aug 24, 2023

If I don't want to run this locally is it runnable somewhere on huggingface?

emporas · on Aug 24, 2023

Replicate has already hosted Llama2 13B, the chat version. My guess is, in a short span of days or weeks they will host the code version too. They charge a dollar for 2000 generations if i am not mistaken.

https://replicate.com/a16z-infra/llama-2-13b-chat

suyash · on Aug 24, 2023

can it be quantised further so it can run locally on a normal laptop of a developer?

syntaxing · on Aug 24, 2023

“Normal laptop” is kind of hard to gauge but if you have a M series MacBook with 16GB+ RAM, you will be able to run 7B comfortably and 13B but stretching your RAM (cause of the unified RAM) at 4 bit quantization. These go all the way down to 2 bit but I personally I find the model noticeably deteriorate anything below 4 bit. You can see how much (V)RAM you need here [1].

[1] https://github.com/ggerganov/llama.cpp#quantization

totallywrong · on Aug 25, 2023

Just started playing with this, there's a tool called ollama that runs Llama2 13B on my 16GB M1 Pro really smoothly with zero config.

jmorgan · on Aug 24, 2023

To run Code Llama locally, the 7B parameter quantized version can be downloaded and run with the open-source tool Ollama: https://github.com/jmorganca/ollama

   ollama run codellama "write a python function to add two numbers"

More models coming soon (completion, python and more parameter counts)

benvolio · on Aug 24, 2023

>The Code Llama models provide stable generations with up to 100,000 tokens of context.

Not a bad context window, but makes me wonder how embedded code models would pick that context when dealing with a codebase larger than 100K tokens.

And this makes me further wonder if, when coding with such a tool (or at least a knowledge that they’re becoming more widely used and leaned on), are there some new considerations that we should be applying (or at least starting to think about) when programming? Perhaps having more or fewer comments, perhaps more terse and less readable code that would consume fewer tokens, perhaps different file structures, or even more deliberate naming conventions (like Hungarian notation but for code models) to facilitate searching or token pattern matching of some kind. Ultimately, in what ways could (or should) we adapt to make the most of these tools?

wokwokwok · on Aug 24, 2023

That seems daft.

You can, I suppose, contract your code so that it’s context free and uses less tokens, but that makes it more confusing for humans and language models.

Taken to the extreme, you can see obviously with one letter functions and variables like i, j, k the model will be able to infer literally nothing and, thus, produce arbitrary nonsense.

Clearly the solution is to do what we already do to manage complexity which is to decompose large tasks into smaller black box modules with an api where the (large number of tokens) implementation is hidden and not known or relevant to using it.

If you give an LLM a function signature and good description, maybe some usage examples, it doesn’t need the implementation to use it.

Terseness decreases the ability of LLMs to process code; it doesn’t solve context length, and even at best it doesn’t scale.

100k tokens is plenty.

You don’t need to do anything like that.

sicariusnoctis · on Aug 25, 2023

64k tokens ought to be enough for anybody.

roguas · on Aug 27, 2023

I see what you did there mr Gates

emporas · on Aug 24, 2023

The process of decomposing the task into smaller steps and generate each step independently seems to be the correct way in my experience too. It works very well with GPT (chatGPT or GPT4).

>100k tokens is plenty.

The context window can be really helpful, in case there is a release of a new library and the user wants to generate code targeting the API of the library. When the training date stops at August 2023, any library released after that date is not known to the engine.

My general opinion in regards to context window, is that 1 trillion tokens context window still may not be enough for all use cases.

ttul · on Aug 24, 2023

Your developer tool already maps out the entire code base in useful ways, such as knowing all the symbols available in the current context and the structure of classes. This information can be distilled for presentation to the LLM. For instance, if you’re wanting to generate a method implementation inside a C++ class, the LLM can be given a condensed version of the header files that the compiler would have access to on compiling that specific class. Removing white space and comments and boiling macros down saves a lot of tokens.

You can also probably skip including standard library headers since those will be well known to the LLM through its fine tuning.

Either way, consider that a typical preprocessed C++ file would push against the 100K limit even with some optimizations. You will definitely want to have some middleware doing additional refinement before presenting that file to the LLM.

roughly · on Aug 24, 2023

I’ve found the utility of the coding LLMs gets a lot higher when you’ve got code comments and descriptive variable and function names - the LLM makes better inferences and suggestions. We’ve seen similar on data - properly tagged data and descriptive field names helps the LLM to produce much more useful responses. I’m secretly hoping the spread of these tools will finally lead my fellow developers to comment their code and stop using three character variable names.

GreedClarifies · on Aug 24, 2023

Commenting the code in this manner sounds like a job for an LLM, maybe with human assistance in the short run.

bbor · on Aug 24, 2023

This is my ultimate (short term) AI fear - letting it get into a feedback loop with itself, leading to perverse and incorrect results.

To state my position more clearly: I don’t think an AI could comment code from scratch very well - how would it know all the decisions made, business logic considerations, historical conventions, micro-industry standards, etc?

A good benchmark I was told once was “if a human expert couldn’t do it, an AI probably can’t either”. And commenting code I didn’t write would certainly test the bounds of my abilities

gonzan · on Aug 24, 2023

I built a VS code extension a while back that I still use that wraps GPT-4 and writes code directly in my editor.

The method I used to choose which files to feed GPT-4 was embeddings-based. I got an embedding for each file and then an embedding from the instruction + some simple processing to pick the files more likely to be relevant. It isn't perfect but good enough most of the time in medium-sized codebases (not very large ones).

The one thing I started doing because of how I implemented this is make files shorter and move stuff into different files. Having a 1k+ LOC file is prohibitive because it eats up all the context window (although with 100k context window maybe less so). I think it's a good idea to keep files short anyways.

There's other smarter things that can be done (like embed and pass individual functions/classes instead of entire files) so I have no doubt someone will build something smarter soon. You'll likely not have to change your coding patterns at all to make use of AI.

brucethemoose2 · on Aug 24, 2023

This sounds like a job for middleware. Condensing split code into a single huge file, shortening comments, removing whitespace and such can be done by a preprocessor for the llm.

reactordev · on Aug 24, 2023

So now we need an llmpack like we did webpack? Could it be smart enough to truncate comments, white space, etc?

brucethemoose2 · on Aug 24, 2023

You dont even need an llm for trimming whitespace, just a smart parser with language rules like ide code checkers already use. Existing llms are fine at summarizing comments, especially with language specific grammar constraints.

reactordev · on Aug 24, 2023

My point. We don’t need the middleware.

visarga · on Aug 24, 2023

A good practice is to have a prompt file where you keep the information you want the model to have at its disposal. Then you put it in the start of your conversations with GPT-4. It's also good documentation for people.

You start a project by defining the task. Then as you iterate, you can add new information to the prompt. But it can be also partially automated - the model can have a view of the file structure, classes, routes, assets and latest errors.

I was really hoping that the one year update of Codex would be that - a LLM that can see deep into the project, not just code, but runtime execution, debugging, inspecting and monitoring. Something that can iterate like autoGPT. Unfortunately it didn't improve much and has weird conflicts with the native code completion in VSCode, you get freezes or doubled brackets.

archibaldJ · on Aug 25, 2023

I’m working on a project related to that: https://github.com/0a-io/Arch-GPT

I think hypergraph is an overlooked concept in programming language theory

adamgordonbell · on Aug 24, 2023

Solutions exist that feed LLMS ctags, and seem to work well. The function signatures and symbols names for a code base are much smaller than the actual code.

sean_flanigan · on Aug 25, 2023

I know about https://github.com/paul-gauthier/aider. Have you got a link to any others?

smcleod · on Aug 25, 2023

I'm using this right now, but it's noted that "ctags only work with GPT4" so I'm yet to get them working with llama locally.

adamgordonbell · on Aug 25, 2023

Aider was exactly what I was thinking of!

rawrawrawrr · on Aug 25, 2023

> Not a bad context

A little understated, this is state of the art. GPT-4 only offers 32k.

lordnacho · on Aug 24, 2023

Copilot has been working great for me thus far, but it's limited by its interface. It seems like it only knows how to make predictions for the next bit of text.

Is anyone working on a code AI that can suggest refactorings?

"You should pull these lines into a function, it's repetitive"

"You should change this structure so it is easier to use"

Etc

adocomplete · on Aug 24, 2023

Give Cody a try! (Cody.dev)

With Cody you can create embeddings for your entire repo, so Cody will have much greater context about your code base and the problems you're trying to solve.

Disclaimer: I just joined Sourcegraph a few weeks ago.

stuzenz · on Aug 24, 2023

Cody is great, it had become my go-to (and I pay for Github Co-pilot).

With that said, they have recently changed the architecture, with the local install required, and I have not managed (yet) to get it working with NixOS. Once I have some more time, I will try again - it looks like there will be some hoops to go through. https://nixos.org/manual/nixpkgs/stable/#ssec-pkgs-appimageT...

Kudos to the Source Graph team, Source Graph's original product was nicely thought out and ahead of it's time. Nice to see how the original product gave a nice basis for building out Cody.

beyang · on Aug 25, 2023

Hello, Sourcegrapher here! We are making some simplifications to eliminate the need for the separate local app install. Should be out in the next month or so. Until then, apologies for the added friction here, we're working hard to smooth it out.

phillipcarter · on Aug 24, 2023

SourceGraph Cody is going in that direction, as is Copilot Chat. But it's still early days. I don't think there's anything robust here yet.

sestinj · on Aug 24, 2023

You can use Continue for all of this, as easy as highlighting code and making the request. We also support using Code Llama: https://continue.dev/docs/walkthroughs/codellama

thewataccount · on Aug 24, 2023

Any plans to support IntelliJ?

vunderba · on Aug 24, 2023

Yeah this would be a crucial feature - interoperability with Jetbrains IDEs.