It’s just a basic IntelliJ plugin which provides an infinite canvas to add code bookmarks to. I work on a large code base and often have to take on tasks involving lots of unfamiliar areas of code and components which influence each other only through long chains of indirection. Having a visual space to lay things out, draw connections, and quickly jump back into the code has been really helpful
The canvas and UI is built using Java AWT since that’s what IntelliJ plugins are built on, but it occurred to me that I could just throw in a web view and use any of the existing JS libraries for working on an infinite canvas. React Flow has seemed like the best option with tldraw being what I’d fallback to.
But then.. if the canvas is built with web technology then there’s no reason to keep it just within an IntelliJ plugin vs just a standalone web app with the ability to contain generic content that might open files in IntelliJ or any other editor. I’m pretty sure the “knowledge database on a canvas” thing has been done a number of times already so I want to also see if there are existing open source projects that it’d be easy enough to just add a special node type to
See also this recent post about Mercury-Coder from Inception Labs. There's a "diffusion effect" toggle for their chat interface but I have no idea if that's an accurate representation of the model's diffusion process or just some randomly generated characters showing what the diffusion process looks like
Just about every line of code for this was written by Claude 3.7 via Claude Code. I never gave the other AI development tools like Cursor and Aider a fair shake so I may have just been behind the times on what's possible with agentic editors, but I found Claude Code to be extremely impressive. The API costs did end up being something around $15 though and this is a really small project, so I imagine cost goes up quite a bit to do any non-trivial work on a project of a decent size
This was great for me though because this is something I've had in mind for a while but never considered quite useful enough to be worth the time it'd take to learn about IntelliJ plugin development + the Java AWT knowledge required to create a canvas app like this
lol yeah I tried to get it to whisper too. And talk faster or slower or do accents. It seemed to be able to kind of do each of those things but only very slightly. Enough to see that there was some successful interpretation of the request but lack of flexibility to fully execute on it. OpenAI's model still has this beat on that front imo (talking quietly / slower / faster)
Yeah after a few interactions, the repetition of the mannerisms that initially added to the sense of life-likeness started to break the illusion a bit. The "you got me" response shows up a bit too often. The creativity remains impressive though
This was already posted here: https://news.ycombinator.com/item?id=43221377 but I’m really surprised at the lack of attention this model is getting. The responsiveness and apparent personality are pretty mind blowing. It’s similar to what OpenAI had initially demoed for advanced voice mode, at least for the voice conversation portion.
The demo interactions are recorded, which is mentioned in their disclaimer under the demo UI. What isn't mentioned though is that they include past conversations in the context for the model on future interactions. It was pretty surprising to be greeted with something like "welcome back" and the model being able to reference what was said in previous interactions. The full disclaimer on the page for the demo is:
"
1. Microphone permission is required. 2. Calls are recorded for quality review but not used for ML training and are deleted within 30 days. 3. By using this demo, you are agreeing to our
"
It was genuinely startling how human it felt. Apparently they are planning on open-sourcing some of their work as well as selling glasses (presumably with the voice assistant). I’m very excited to have a voice assistant like this and am almost a bit worried I will start feeling emotionally attached to a voice assistant with this level of human-like sound.
I still feel like they don't have the right amount of human to them, maybe it's because I'm Australian and it sounds like I'm hearing an American robot?
Edit: well I asked the "male" model to speak more like an Australian and yep, getting way more uncanny. If it had an Australian accent I think it would mess with me more
Maybe the ability to personalize the voice so it is more... robotic or based on a fictional thing like Knight Rider would help to change the attachment to something more... healthy?
I'm almost positive that some AI systems have a backend that analyzes the sentiment of your messages and if you threaten to cancel billing it will notice your defcon-1 sentiment and spin up some more powerful instances behind the scenes to tide you over.
This is actually much more stressful than working without any AI as I have to decompress from constantly verbally obliterating a robotic intern.
I'll try with the system prompt. Also love your username.
It generally maintains the tone you set. Remember that it outputs most likely tokens based on the system prompt of its owners + your system prompt + the whole conversation. If OpenAI and default system prompt tell it that it's a helpful cheerful secretary/assistant, you get best results if you talk to it "professionally".
I heard you could make Claude say "kurwa" a lot while helping you program in Go if you convince it that you want a conversation with your ziomek Seba from your backyard with whom you like to share kebab and browar, so there goes.
I'm surprised by the lack of attention that Gemini 2.0 with native audio output got. They have a demo at https://youtu.be/qE673AY-WEI, which I think is really good too. The main problem with Google's model is that this audio output is not supported by the API, but you can try it at https://aistudio.google.com.
In general, text to speech is pretty good nowadays I think. For example, this is a little math video that I made a few days ago: https://www.youtube.com/watch?v=G1mvLrCfjFM with the (old) Google text to speech API. Honestly, I think the narration is better than I personally could have done. It's calm, well pronounced, and sounds relatively enthusiastic.
I see, I guess it was never a standalone product then, from reading a Reddit post, it’s a feature built into assistant. Thanks, solves a mystery for me.
I know people who worked on it. It was real. They used real people for some calls, in some cases, but a vast majority of calls made through the system with 100% automatic.
It really is an astonishing technological feat! Also note that the largest model they trained is only 8.3B parameters (8B backbone + .3B decoder). It's exciting to think that they're going to be releasing this model under an Apache 2.0 license.
Just realizing how uncanny valley it is to talk to AI and it never remembers anything you said in the past. Imagine if a human did that. It’s like you are talking to Tom Hanks’ Mr. Short Term Memory from SNL over and over.
I does remember but you have to ask for. Try to say "make a bookmark at this point" and later ask for that bookmark. You can even give the bookmark a name or ask it to do so for you.
For individual transactions, it's not really reliable, unfortunately. But for monthly reporting, they do have it, so that could be the next step. There's an app here that does something similar, but it doesn’t seem to be actively developed anymore. It’s a free app, so I guess there’s no reason for them to keep investing in it. Fair enough. Looks like they’re shifting toward a B2B solution instead, so that might be my next direction too.
That said, my main goal for now is just to make it work for personal B2C use first. I do think there’s some potential here because major cities are pretty much cashless now, and there aren’t any good existing solutions for B2C.
There are some other decent options, but they mainly focus on B2B (that’s where the money is), so they’re quite expensive and overkill for what I need.
A cross-platform clipboard manager / search-and-filter tool / launcher built with Flutter that has a simple Python plugin interface.
Plugins can be used to add new "result actions" and new sources of entries to filter and select. Eg. recent Jira tickets, email inbox, shell history, Notion pages, etc.
The result actions are a way to easily perform common transformations on selected entries (eg. wrap in triple backticks, find and parse json, trim whitespace, ...) or kickoff some script with a selected entry as an argument.
Project started as a result of having to do a lot of work using Ubuntu and sorely missing Alfred and all the workflows I'd built with it. I wanted something for which I could build workflows once and have those workflows available on whatever system I'm on. Plus to be able to build some plugins that would be usable by coworkers regardless of what operating system they're using and with minimal runtime resource usage. There are some existing cross-platform solutions which could serve this purpose, like Cerebro, Ueli, Script Kit, some others.., but I wanted something lighter weight than is possible with an Electron app. Granted the current state of Epte is that it's built with Flutter + Go + Python so the final distributable and runtime memory usage are higher than is ideal.
Basic Windows support is almost there but there doesn't seem to be a great solution to switching to existing windows of an application instead of just re-launching it. The tool isn't intended to be as good or better than any given OS's built-in launcher so I'll probably just leave that as-is and upload the current state of the Windows build.
The best approach I’ve found so far is to just have a single master “event log” where I dump everything that I want to save by default. I have specific places to put things but if I can’t be bothered to decide where or am not sure it’ll just go to the event log. I’m using Notion for this where each entry is its own page in a “database” list. Adding a new page is trivial though through the site or app. I have an iOS shortcut setup too to open the entry creation
It’s just a basic IntelliJ plugin which provides an infinite canvas to add code bookmarks to. I work on a large code base and often have to take on tasks involving lots of unfamiliar areas of code and components which influence each other only through long chains of indirection. Having a visual space to lay things out, draw connections, and quickly jump back into the code has been really helpful
The canvas and UI is built using Java AWT since that’s what IntelliJ plugins are built on, but it occurred to me that I could just throw in a web view and use any of the existing JS libraries for working on an infinite canvas. React Flow has seemed like the best option with tldraw being what I’d fallback to.
But then.. if the canvas is built with web technology then there’s no reason to keep it just within an IntelliJ plugin vs just a standalone web app with the ability to contain generic content that might open files in IntelliJ or any other editor. I’m pretty sure the “knowledge database on a canvas” thing has been done a number of times already so I want to also see if there are existing open source projects that it’d be easy enough to just add a special node type to