Replit's in-browser coding AI

giansegato · on Oct 31, 2022

replit employee here. the team who built this is very small (less than a dozen, including non-eng roles for the go to market), and went from idea to general availability in 8 weeks

eachro · on Oct 31, 2022

That's very impressive. Hats off to them! I dont think this is too out of the ordinary either though. I'd guess they started off with a LLM from hugging face, set up some pipeline to ingest code from replit repos to finetune the LLM. The ML aspect of this is not terribly hard given that they probably dont need to train a LLM from scratch. Figuring out how store and serve from replit repos (or publicly available code bases) is not too difficult. From there it's a matter of productionalizing: how to serve the model in real time, figuring out they want the product to look/feel like and I suppose this part of it might take a while. I'd estimate you'd need 1-2 ML engineers, 2 data engineers, 2-3 swes, 1 PM for the team for a minimal viable product.

giansegato · on Oct 31, 2022

yep, true! however, the devil is in the details. from what i've been told, the big challenge was latency: they worked a lot to bring the latency down to acceptable levels - essentially to be usable in a cloud IDE

iirc the team managed to bring it to a lever an order of magnitude lower than off-the-shelf models

nerdponx · on Oct 31, 2022

8 weeks is impressive for something like that, and it goes to show just how powerful our off-the-shelf tools have become.

I think it's also a bit scary, because 8 weeks is very little time for testing, tuning, and validation of something as opaque as a machine learning model. If it worked right the first time, that's great. But there is still a lot of inherent uncertainty in ML projects. Decision makers need to take that uncertainty into account when planning.

That, or, the 8 weeks only covers the final training runs and the implementation/deployment, and doesn't include time spent developing and tuning proof-of-concept prototype models.

mradek · on Oct 31, 2022

In 2022 you test live in production lol

conductr · on Nov 1, 2022

It’s not life or death though. It’s just giving some useful boilerplate that you’ll have to touch to make useful. They can learn as they go.

sithlord · on Oct 31, 2022

any interns on that team who are going to get bullied by replits lawyers in the near future?

googlryas · on Oct 31, 2022

Interesting, sithlord is an anagram for shitlord. While the behavior of the CEO wasn't cool, the issue seems to have been resolved between all involved parties and everyone has moved on - we don't need to bring it up every time repl.it is mentioned.

selykg · on Oct 31, 2022

This is the type of thing where goodwill is burned and it takes time to earn it back. I don't think we just brush it under a rug either. In my opinion, you don't just get to "resolve it" and then everyone forgets about it. For me, future decisions and importantly, actions, will help me personally move past this and "move on" as you say.

googlryas · on Oct 31, 2022

Ok, sounds good about it taking time - assuming perfect behavior, how long will it be before you stop referencing the affair whenever an unrelated repl.it story comes up?

selykg · on Oct 31, 2022

Can I ask why you're so defensive about it?

I feel like, if there's ANYTHING we have learned in the past decade or two it's that people who defend a company tend to be doing so for the wrong reasons. See Sony or Microsoft, or Apple or Android, etc. Defending a company is just weird.

I look at replit as a tool, run by people. The tool might be cool, but the CEO made a bad decision and now I judge the product on that CEOs actions. There's no definitive time frame or action that just magically makes it better.

But in general, I'll stop thinking about the stupid actions of the CEO when my brain stops reminding me "Oh, no matter how cool this is, the actions of the CEO were incredibly poor." When will that be? No idea, but maybe sometime down the road he does enough good things that I will suddenly stop and think "cool, looking back, he's done enough good that I can probably forget about the poor decision he made and start looking at this again, because he's proven he isn't that one stupid action."

Goodwill is earned, it's not simply given. It's often hard won, but incredibly easy to lose.

googlryas · on Nov 1, 2022

I'm just tired of seeing the same low effort comments on HN when a given topic comes up.

Google => when will they cancel this?

Meta => metaverse what a joke

Etc...

selykg · on Nov 1, 2022

I mean, I think it's relevant still. My time to move on from this particular instance might be different than someone else's, and there may be people out there that did not hear about that particular story. Personally, I feel everyone should have the opportunity to make their own decision on what I believe are poor actions made by a company.

We do it every day, whether we realize it or not. As a people we should support the companies that do good, and we should be aware of the companies doing bad. There's room for grey in there, it's not a one size fits all. But if you aren't aware of the bad then you aren't informed in your decision making.

If it bothers you, sorry. But I see it as a pro. I had a really bad experience with Remarkable, every time it comes up, I point out my experience so that others who might be making their own decision can use my experience in their decision making. When a company performs poorly. I guess by the metric you've provided above, this would be a low effort comment by your definition.

ChrisKnott · on Oct 31, 2022

It doesn't seem healthy to care more about this than the people actually involved

selykg · on Oct 31, 2022

The CEO's actions are a reflection of the company. I'm not sure I "care more about it" than I am simply aware of their past actions when making decisions on whether to use their product or not.

notwhereyouare · on Oct 31, 2022

I'll admit, every time I hear of repl.it mentioned, I think of the time the CEO threatened the intern. The CEO did a huge disservice to himself and the company that day in my mind

still_grokking · on Oct 31, 2022

Oh, internet drama. I love internet drama! So I looked it up as I never heard this story before.

https://intuitiveexplanations.com/tech/replit/

Looks like this CEO isn't of good character after all. He looks almost like a jerk when looking at the end of the story. Even in his last email he tried to get his (obviously wrong) point. He never apologized for the things that mattered most, only tried to extinguish the social media fire all in all.

Also he doesn't look very smart, imho:

https://amasad.me/meta

Big LOL here! The abstract things are the simplest, yeah! That's why progress in something like math or theoretical physics is made by the dumbest people, in contrast to something like sociology where you need genius level of intelligence to come up with some new ideas. Sure, sure.

But that's of course not everything this dude got completely backwards.

Would explain why replit is the most useless of all the online IDEs: It has no direction, no true value proposition. It's not a good cloud coding environment. It never was a good code snippet playground (actually one of the worsts). Now they even require accounts, so the quick code snippet aspect is also gone. Also they badly positioned in the education space…

Of course I wish them luck!

But I guess they have no chance against something like Gitpod, Github, or OpenShift codespaces, which are light-years ahead.

OK, maybe the exit-strategy is "just" to be visible enough that at some point they get bought by one of the above. (Which doesn't look like the most ethical thing to do ;-)).

googlryas · on Oct 31, 2022

You're allowed to think that. My point is really about littering unrelated posts regarding repl.it with snipes about it.

mr_cyborg · on Oct 31, 2022

To be fair, since then I’ve heard of them ghosting people after final round interviews and meeting with the CEO. It’s a pattern at this point.

cercatrova · on Oct 31, 2022

Context: https://news.ycombinator.com/item?id=27424195

caprock · on Oct 31, 2022

That's really neat to hear. Can you comment on how replit has managed to foster a culture of fast delivery? Are there any interesting trade offs?

tephra · on Oct 31, 2022

Curious, how large are teams in Replot usually?

To me (programmer in Sweden) the largest single team I've been on was 14 people and that was _very_ large (indeed the largest in the tech department). We actually broke ourselves up into two more informal groups since we thought that was a more manageable team size.

dbish · on Oct 31, 2022

Neat feature but yeah very small doesn’t seem like < 12 to me either (worked at big tech for a while). A two pizza team (standard amazon size) is 8-10, 12 starts to be on the larger size for a single team, but not abnormal. Very small to me would be if a team of 2-4 shipped it. Replit must be much larger then I expected for a startup.

swyx · on Oct 31, 2022

very cool :)

can i get a clarification - when it says "in-browser" i hear "on-device" as in it doesnt call back to replit to get the predictions. i assume that's inaccurate?

for cost/compute purposes i'm wondering how small models have to get in order to run "truly in browser"

maep · on Oct 31, 2022

If this tool was trained on open source code, what license does the generated code have? At least with Codepilot people were able to generate verbatim GPL code with typos and everything. More importantly, I wonder if companies behind these type of tools offer legal or financial protections in case GPL code sneaks in and leads to expensive law suits.

risyachka · on Oct 31, 2022

I mean people are also trained on GPL code and I bet you can find a ton of functions copies from GPL projects in million other projects.

But as long as these are tiny parts of codebase (which will most probably be the case), I doubt anything can be done with that. No one will go to court because of a few generic functions.

deworms · on Oct 31, 2022

No they weren't able to generate the same existing code, both because that code is not included anywhere in the model, and because Copilot (not "Codepilot") has safeguards against this kind of situation, should it arise in the highly unlikely situation that a snippet is repeated thousands of times across thousands of repositories.

I've gotta let you know that people copy code snippets from all sorts of codebases with little regard for licenses anyway, because they're toothless in 99% of cases, AI or not. It's a nice illusion that anyone respects licenses, but it's just not true.

NicoleJO · on Oct 31, 2022

That's incorrect. CoPilot steals verbatim. Examples: https://justoutsourcing.blogspot.com/2022/03/gpts-plagiarism...

maep · on Oct 31, 2022

I've spent hours looking over code before delivering to FAANG. Our company had put a clause into the contract that our code was free of any GPL'd code. It happened before and it was discoved. The whole thing was a very expensive excersice. I'm aware that many small startups, 90% of which go bust anyways, just ignore licenses but that doesn't work when you play with the big boys.

m00x · on Oct 31, 2022

If you look at licensed code, then write new code, do you also bring in those licenses?

It's been proved in court that AI does not infringe on copyright or licenses since it generates things from an understanding of the whole, instead of directly stealing, just like the human brain does.

nstart · on Nov 1, 2022

That is going to need a source. All I see in these AI data gathering exercises is that if the industry isn’t a well established litigious one, the companies will happily suck in all the data, license be damned. Code and art both fall under this. But when it comes to music which is heavily litigated, suddenly the only content a company like stable AI will use is open and voluntary because in that case they worry about “overfitting and legal issues”. (Refer Harmon ai)

Hypocrisy dressing up as progress in the machine learning field has been one of the most embarrassing scenes in software engineering recently. The genie may be out of the bottle but the fact is that a bunch of software engineers with a “move fast ethics later” attitude are the ones who let it out and they shouldn’t get to shrug it off for free.

mtlynch · on Oct 31, 2022

>It's been proved in court that AI does not infringe on copyright or licenses since it generates things from an understanding of the whole, instead of directly stealing, just like the human brain does.

Do you have a source for that?

This SF Conservancy article[0] says that's not true:

>Consider GitHub’s claim that “training ML systems on public data is fair use”. We have not found any case of note — at least in the USA — that truly contemplates that question.

The first major court case I know about is the class-action case Matthew Butterick is trying to build.[1]

[0] https://sfconservancy.org/blog/2022/feb/03/github-copilot-co...

[1] https://githubcopilotinvestigation.com/

Barrin92 · on Oct 31, 2022

astonishingly enough every sentence in this post is untrue. There's been no court case on any of the models in question here. They don't work like human brains, nor understand anything they output. Even if they did of course that output would still be subject to licenses, given that human code is subject to them, which is why those licenses exist in the first place.

If you ever plan to steal someone's code and justify it with "my brain is able to learn, therefore copyright doesn't exist" I warn you right now this will not fly.

krono · on Oct 31, 2022

US Court rulings do not automatically apply worldwide, and not everything it would apply to exists within its jurisdiction.

mr_toad · on Oct 31, 2022

> If you look at licensed code, then write new code, do you also bring in those licenses?

If the “new” code is close enough to be considered a derived work then you will need a license.

MuffinFlavored · on Oct 31, 2022

> If the “new” code is close enough to be considered a derived work then you will need a license.

And how is that determined... in court at trial? By an unbiased 3rd party competent enough to understand both codebases?

m00x · on Nov 3, 2022

Same for all programmers out there. Copilot will need to be careful, and like with everyone else, they'll learn.

krono · on Oct 31, 2022

> trained on publicly available code

Fully respecting the licenses this code was published under, one would hope?

googlryas · on Oct 31, 2022

What do source code licenses say about using source code as training data? IANAL, I would imagine it's only relevant if the model spits out already existing licensed code, and that using the code as training data is largely irrelevant.

For a simpler example than code-generating ML, if I write a program to recognize a directory of source code vs non-source code, and my logic is if (unbalanced parenthesis count in all files > X) { return "not source code"; } else { return "source code"}.

And then I compute X by scanning over the linux kernel source and counting the amount of unbalanced parens, have I just committed a GPL violation if I don't GPL my source code recognizer?

krono · on Oct 31, 2022

We're talking large scale commercial repurposing of source code with worldwide redistribution here. Not some project you whipped up in 5 minutes to learn from, or automate some minor annoying task.

Unlicensed source code - the default - is still protected by copyright law. If it's hosted and served from a different jurisdiction where no exception exists for training data models.

Then there are also licenses that explicitly prohibit commercial usage to consider.

What it comes down to, as it always does, is that a small group of (practically) untouchable people are making money by abusing and thereby irreparably damaging the trust and good will of the collective.

It's a complex topic eh

serf · on Oct 31, 2022

I'm not sure if the training data would constitute an aggregation -- given the usually non-reversible nature of it -- but I found this.

"Where's the line between two separate programs, and one program with two parts? This is a legal question, which ultimately judges will decide. "[0]

[0]: https://www.gnu.org/licenses/gpl-faq.en.html#MereAggregation

IshKebab · on Oct 31, 2022

I don't think anyone seriously thinks that is required. The real issue is that these models can reproduce code they've been trained with and then you do need to be aware of the license. That would be fine except as far as I know none of the existing solutions warn you that the code they've produced is the same or very similar to copyrighted code they learnt it from.

That's the main difference from a human learning from copyrighted code (which is totally legal). If they have a good memory they might be able to reproduce copyrighted snippets, but they would usually (probably not always!) know they are doing that.

swyx · on Oct 31, 2022

crickets

aryamaan · on Oct 31, 2022

I was on beta list used this for golang.

It blew my mind half of the times. It was like it knew what I was going to do.

The other times it was dumber than the standard auto complete. It doesn't have any awareness of already defined variables and doesn't use them to complete halfwritten variables. Hope this gets better soon.

_whiteCaps_ · on Oct 31, 2022

Have you tried Copilot? That seemed to work very well for me.

aryamaan · on Oct 31, 2022

Will give a try. I like Replit's ready to use key-value db and also, a click deployment (running is deployment).

Overall, I like my Jetbrains IDE but Replit is coming with appealing features for the side projects specially. Easy to use auth, db, analytics, deployment...

I wish either Replit levels up its IDE game or Jetbrains(or community) build plugins to match the state of art/joyful experience of programming.

luxurytent · on Oct 31, 2022

Does this differ from GitHub's CoPilot in any user-noticeable way? (outside of the platform it's available on)

Liquid_Fire · on Oct 31, 2022

One thing I noticed is it seems to have an "Explain code" feature, giving you a textual explanation of a block of code you select, which I'm not aware of GitHub having.

pynappo · on Oct 31, 2022

GitHub Copilot Labs (a VSCode extension meant to be installed with GitHub Copilot) seems to have an explain code feature.

ref: https://github.blog/2022-09-14-8-things-you-didnt-know-you-c...

WithinReason · on Oct 31, 2022

You can just use the prompt "The above code does the following explained in English" in a comment to prompt CoPilot to explain its code. You could probably also engineer a prompt to translate code between languages.

lgas · on Nov 1, 2022

Have you tried "This ruby code does the same thing as the following python code" in a comment?

ksaj · on Nov 1, 2022

I wish I thought of that when I had the co-pilot trial. If that works, I might have had a different opinion of my experience with it.

WithinReason · on Nov 1, 2022

I actually did something similar and it worked IIRC. It was python to C or the reverse.

poulpy123 · on Oct 31, 2022

TBH the example wasn't convincing at all

iandanforth · on Oct 31, 2022

Things I'd like to know about this tool:

- What areas/languages/tasks is it good at and what is it bad at?

- How often is it generating code with bugs?

- How often is the code that gets generated used as is vs immediately edited?

- What are the new frustrations that this causes that existing IDE code completion doesn't?

I work with trained models daily and I know that their failure cases are unintuitive, unexpected, and exasperating. I'd like to know as much about the failure cases of this model as possible before diving in.

bardia95 · on Oct 31, 2022

Replit team member here:

- Ghostwriter is especially good at Python and JS, but supports up to 16 languages (to varying degrees of effectiveness) you can read more here: https://docs.replit.com/ghostwriter/faq#which-programming-la...

- As for tasks, it's great for reducing the amount of boilerplate code you need to write (Complete Code), writing React components (Complete + Generate Code), explaining code in plain English (Explain Code), translating code between languages (Transform Code), writing exhaustive tests (Complete Code)

- No stats on how often its generating code with bugs or how often the code gets generated used as is vs immediately edited, we're interested in getting both to help improve Ghostwriter

- Like any LLM, it can get stuck in a long-tail of repetitive loops; we're working pretty hard to improve and mitigate these issues, but, especially for new users, the repetition and hallucination type problems can be distracting.

sdevonoes · on Oct 31, 2022

Who's the real target audience of these kind of tools?

- Developers who work at a company (e.g., as employee) and need to spit out features every sprint? Velocity is important, so I imagine these kind of developers need to squeeze every minute they are in front of the screen in order to produce working code?

- Developers who think of written code as one way to solve (tech) problems, so they don't really care much about the process of creating code, but mainly about the output (i.e., does the running program solves the issue at hand?)

- Senior developers who don't like to write boilerplate code?

I don't see myself as the target audience of Copilot or Ghostwriter. I do work as an employee, but I'm not a "feature machine". Usually the hardest part about my job is solving problems while communicating with other people. I don't need to write code "fast", and by the time I hit the keyboard to start coding, I don't really need that much help (granted, I'm not working on code that goes into space rockets... just normal e-commerce stuff)

I like to work on side projects and learn new technologies. When I was starting with programming, as part of the learning I liked to write boilerplate code (actually, that's how I learnt programming. I remember writing C boilerplate code by reading "The C Programming Language". Skipping the "boring" parts wouldn't have helped me in my learning).

If any, Copilot and similar tools take away all the joy of actually writing code (because, when I work on side projects, 50% of the satisfaction comes from actually writing code for the sake of writing code. The other 50% comes from the ability to solve a problem). So, yeah, maybe for the people like me who does find the act of writing code for the sake of writing code (you know like painting or taking photographs), Copilot seems like an unneeded tool?

zamalek · on Oct 31, 2022

> If any, Copilot and similar tools take away all the joy of actually writing code

I like to challenge my own beliefs, and I reluctantly tried it out. At least I would have a basis for my criticism, so I thought. I'm about 10 hours into using it, maybe.

If anything it has increased the joy of writing code for me. It eliminates the mundane busy work and lets me focus on solving problems. For me, the "real" coding happens in my head, putting it into an editor is just process. I still also have to check it's work whenever I use it, so I'm still deeply embedded in the coding process.

I believe it's akin to an easel vs Krita/Photoshop. Some people enjoy interacting with the physical medium, others enjoy the creative process.

I would strongly recommend trying it out in anger (i.e. a reasonably real codebase), at least for 30 minutes. Form a better opinion after that (which may well be the same as yours right now).

For reference: I've been coding since I was 8 (almost 30 years).

version_five · on Oct 31, 2022

Interesting to read (see my other comment in the thread). It sounds like I should try it

davidmurdoch · on Nov 1, 2022

I've used it for a long time and my favorite thing it can do is write documentation and code comments.

fossuser · on Nov 1, 2022

I'd guess it'd be really great for seeing examples and learning.

dr_kiszonka · on Oct 31, 2022

For me, not a senior dev, Copilot is useful for: discovering API, generating parts of docstrings, and generating bits of code that don't require too much thinking but go beyond simple copy and paste. It is really quite useful and helps to keep my RSI in check.

My primary UX issue with copilot is that it is trying too hard to be helpful, often suggesting code that I don't need. You also can't trust it with more complex cases but that's actually pretty reassuring : - )

minraws · on Oct 31, 2022

I would like to give some context, still not onboard with these tools but we have a lot of chore like work of adding some very similar things but with some changes, complicated or otherwise.

So we have been considering using Codex or something for generating the code in a more streamlined version, the key reason of it being a benefit is we are a small team with each person owning more than one large repositories. It's gotten very annoying and our pace is far slower than what we would like, here something like this makes quite a lot of sense.

Though the problem with such specific tools is they can't generate any customized code for our codebase, we can finetune other codegen models and that's what we plan to do down the line, but this specific tool just not really useful if it can't specialized for our codebase.

sdevonoes · on Oct 31, 2022

So, does your team then spend considerable amount of time writing boilerplate/chore code? Isn't that a sign of: "Hey, we actually need to improve our code base guys!". I don't know, if your solution to "I don't want to write chore code" is "let's use Copilot to do the boring stuff"... well, I have bad news for you: "chore code" needs to be maintained and/or fixed, and I don't think Copilot maintains code (for now... :D)

minraws · on Oct 31, 2022

yeah we work on linters, tooling and the like highly specific single page, highly similar code.

you can't get around it. We have pretty low boilerplate in all the codebases I happen to manage but the sad part is there is no getting around porting of specific rules, setting up better metric analysis and reporting systems and such.

If you have been involved in programming professionally for a while, you would know you just can't get around the chore like works sometimes. Ofc it's not a long term goal to keep going this way but we needed a solution to simplify our challenges as we move on.

version_five · on Oct 31, 2022

I admit I haven't used it (I have used IDE autocomplete features and don't like them)

For me, same as writing actually, the thinking of what I want to do is everything, and the doing it is nearly trivial. I don't picture having copilot write nontrivial code for me and then reviewing it would be different than writing it, even if I didn't know the exact syntax and had to look it up. So I agree, it feels like a solution to a problem I don't have, like is solves something that I don't spend time on.

Cynically, like GPT-3 probably help write content farm stuff, copilot probably helps write some junk code for something, but there are probably domain specific low code tools that do that better as well

isoprophlex · on Oct 31, 2022

I love writing code, but I don't love searching the docs for the sixtieth time to find the correct combination of brackets, .groupBy's and .agg calls that gets the baroque horrorshow that is python's Pandas lib to wrangle some data for me.

See it as a better autocomplete for people who don't want to or can't learn by diligently doing the boring parts.

easrng · on Oct 31, 2022

Is this in-browser as in running the model in the browser or is the model running on the server? (I assume it's on the server for size and people-not-ripping-off-your-product reasons, but actually running in the browser would be cool and it doesn't look like it's specified.)

ricopags · on Oct 31, 2022

I am not certain, but my surmise is if in-browser GPT-3 inference was possible it would've made the news. Seems likely to be an API call.

Heleana · on Oct 31, 2022

Is there any way that users will be able to tell the difference between this and GitHub's CoPilot?

bardia95 · on Oct 31, 2022

Ghostwriter also includes 2 features that Copilot does not have: Transform Code (translate between langs/refactor code), Generate Code (prompt Ghostwriter to write full programs in one shot)

Plus, Ghostwriter is integrated with the Replit platform, meaning you get all the benefits Replit offers as a portable development environment you can take with you anywhere you go and host instantly.

TOMDM · on Oct 31, 2022

One could argue Copilot can translate/refactor code.

I can't count the number of times I've copy pasted a chunk, commented it out, then put a comment describing what I'm after. Copilot will get it exactly right ~40% of the time, ~40% of the time it gives me a good starting place, and the other 20% I just scrap.

machinekob · on Oct 31, 2022

"Any interns on that team who are going to get bullied by replits lawyers and CEO in the near future" for developing this?

three_seagrass · on Oct 31, 2022

Love using Replit but is there any way to try this before buying without having to wait weeks on a waitlist?

bardia95 · on Oct 31, 2022

If you sign up for the waitlist today, you should be able to get access within a few days.

NicoleJO · on Oct 31, 2022

A couple of questions here asked about the intellectual property rights. Has an answer been provided?

ignite · on Nov 1, 2022

No free trial? Not interesting enough to pay for, if I have no idea how useful it is.

If you are doing something new, you need to let people see what it will do, before they will give you money.

eliseumds · on Oct 31, 2022

Is a VSCode extension on the roadmap? Refactoring existing code using AI looks extremely useful. Using Github Copilot I have to trigger synthetise multiple times.

make3 · on Oct 31, 2022

how is it better than copilot?

VWWHFSfQ · on Oct 31, 2022

Does anybody know the status of the legal action against that kid that supposedly stole all their good ideas and made a hobby project out of it? I would like to know how that resolved before I consider anything from this company again.

Jtsummers · on Oct 31, 2022

https://intuitiveexplanations.com/tech/replit/#how-did-repli...

At the bottom of the original blog post. He put his project back online and it is still up: https://riju.codes/

bdn_ · on Oct 31, 2022

Even if this was settled, I consider this quote when thinking about Amjad Masad (Replit CEO).

> When someone shows you who they are, believe them the first time.

- Maya Angelou