More

monroewalker · 2025-05-25T22:52:23 1748213543

Now that Claude 4 is out, I’m making some updates to the project I’ve built primarily just with Claude Code: https://github.com/mwalkerr/BookmarkCanvas

It’s just a basic IntelliJ plugin which provides an infinite canvas to add code bookmarks to. I work on a large code base and often have to take on tasks involving lots of unfamiliar areas of code and components which influence each other only through long chains of indirection. Having a visual space to lay things out, draw connections, and quickly jump back into the code has been really helpful

The canvas and UI is built using Java AWT since that’s what IntelliJ plugins are built on, but it occurred to me that I could just throw in a web view and use any of the existing JS libraries for working on an infinite canvas. React Flow has seemed like the best option with tldraw being what I’d fallback to.

But then.. if the canvas is built with web technology then there’s no reason to keep it just within an IntelliJ plugin vs just a standalone web app with the ability to contain generic content that might open files in IntelliJ or any other editor. I’m pretty sure the “knowledge database on a canvas” thing has been done a number of times already so I want to also see if there are existing open source projects that it’d be easy enough to just add a special node type to

monroewalker · 2025-03-07T08:05:20 1741334720

See also this recent post about Mercury-Coder from Inception Labs. There's a "diffusion effect" toggle for their chat interface but I have no idea if that's an accurate representation of the model's diffusion process or just some randomly generated characters showing what the diffusion process looks like

https://news.ycombinator.com/item?id=43187518

https://www.inceptionlabs.ai/news

monroewalker · 2025-03-03T05:45:19 1740980719

Just about every line of code for this was written by Claude 3.7 via Claude Code. I never gave the other AI development tools like Cursor and Aider a fair shake so I may have just been behind the times on what's possible with agentic editors, but I found Claude Code to be extremely impressive. The API costs did end up being something around $15 though and this is a really small project, so I imagine cost goes up quite a bit to do any non-trivial work on a project of a decent size

This was great for me though because this is something I've had in mind for a while but never considered quite useful enough to be worth the time it'd take to learn about IntelliJ plugin development + the Java AWT knowledge required to create a canvas app like this

monroewalker · 2025-03-02T06:43:36 1740897816

lol yeah I tried to get it to whisper too. And talk faster or slower or do accents. It seemed to be able to kind of do each of those things but only very slightly. Enough to see that there was some successful interpretation of the request but lack of flexibility to fully execute on it. OpenAI's model still has this beat on that front imo (talking quietly / slower / faster)

mrkstu · 2025-03-02T08:36:24 1740904584

The male's Australian accent consisted of throwing a 'fair dinkum' in while keeping it's vague New York accent.

monroewalker · 2025-03-02T06:40:39 1740897639

Yeah after a few interactions, the repetition of the mannerisms that initially added to the sense of life-likeness started to break the illusion a bit. The "you got me" response shows up a bit too often. The creativity remains impressive though

monroewalker · 2025-03-02T06:16:49 1740896209

This was already posted here: https://news.ycombinator.com/item?id=43221377 but I’m really surprised at the lack of attention this model is getting. The responsiveness and apparent personality are pretty mind blowing. It’s similar to what OpenAI had initially demoed for advanced voice mode, at least for the voice conversation portion.

The demo interactions are recorded, which is mentioned in their disclaimer under the demo UI. What isn't mentioned though is that they include past conversations in the context for the model on future interactions. It was pretty surprising to be greeted with something like "welcome back" and the model being able to reference what was said in previous interactions. The full disclaimer on the page for the demo is:

" 1. Microphone permission is required. 2. Calls are recorded for quality review but not used for ML training and are deleted within 30 days. 3. By using this demo, you are agreeing to our "

edit: Actually this has been posted quite a few times already and had good visibility a couple days ago: - https://news.ycombinator.com/item?id=43200400 Others: https://hn.algolia.com/?q=sesame.com

hn_user82179 · 2025-03-02T06:32:50 1740897170

It was genuinely startling how human it felt. Apparently they are planning on open-sourcing some of their work as well as selling glasses (presumably with the voice assistant). I’m very excited to have a voice assistant like this and am almost a bit worried I will start feeling emotionally attached to a voice assistant with this level of human-like sound.

jofzar · 2025-03-02T07:03:51 1740899031

I still feel like they don't have the right amount of human to them, maybe it's because I'm Australian and it sounds like I'm hearing an American robot?

Edit: well I asked the "male" model to speak more like an Australian and yep, getting way more uncanny. If it had an Australian accent I think it would mess with me more

igleria · 2025-03-02T08:25:14 1740903914

Maybe the ability to personalize the voice so it is more... robotic or based on a fictional thing like Knight Rider would help to change the attachment to something more... healthy?

MarcelOlsz · 2025-03-02T08:28:31 1740904111

Yeah this is straight up creepy, and I also can't stand chatgpt saying "Lmao" and "Yeah". Keep it formal & robotic.

WesolyKubeczek · 2025-03-02T11:21:49 1740914509

What ever did you tell ChatGPT so it responded with "lmao"?

I told it that it should behave explicitly like a computer in the system prompt, sort of worked.

MarcelOlsz · 2025-03-02T19:30:38 1740943838

After multiple prompts and utterly garbage output: https://i.imgur.com/5aOARCV.png

I'm almost positive that some AI systems have a backend that analyzes the sentiment of your messages and if you threaten to cancel billing it will notice your defcon-1 sentiment and spin up some more powerful instances behind the scenes to tide you over.

This is actually much more stressful than working without any AI as I have to decompress from constantly verbally obliterating a robotic intern.

I'll try with the system prompt. Also love your username.

WesolyKubeczek · 2025-03-03T16:38:04 1741019884

> After multiple prompts

It generally maintains the tone you set. Remember that it outputs most likely tokens based on the system prompt of its owners + your system prompt + the whole conversation. If OpenAI and default system prompt tell it that it's a helpful cheerful secretary/assistant, you get best results if you talk to it "professionally".

I heard you could make Claude say "kurwa" a lot while helping you program in Go if you convince it that you want a conversation with your ziomek Seba from your backyard with whom you like to share kebab and browar, so there goes.

huijzer · 2025-03-02T10:32:48 1740911568

> This was already posted here: https://news.ycombinator.com/item?id=43221377 but I’m really surprised at the lack of attention this model is getting.

I'm surprised by the lack of attention that Gemini 2.0 with native audio output got. They have a demo at https://youtu.be/qE673AY-WEI, which I think is really good too. The main problem with Google's model is that this audio output is not supported by the API, but you can try it at https://aistudio.google.com.

In general, text to speech is pretty good nowadays I think. For example, this is a little math video that I made a few days ago: https://www.youtube.com/watch?v=G1mvLrCfjFM with the (old) Google text to speech API. Honestly, I think the narration is better than I personally could have done. It's calm, well pronounced, and sounds relatively enthusiastic.

moralestapia · 2025-03-02T14:35:27 1740926127

>They have a demo at https://youtu.be/qE673AY-WEI

That's not a demo, that's a video. Anyone can make something like that in an afternoon with a couple friends and a microphone.

Also, Google is known for putting out fake "demos", remember the Google Duplex scam?

underdeserver · 2025-03-02T15:28:14 1740929294

Scam? Duplex worked.

jazzyjackson · 2025-03-02T22:55:09 1740956109

I thought it was announced and never heard from again. It may have worked but it never shipped did it?

underdeserver · 2025-03-02T23:15:37 1740957337

I made some restaurant reservations, it worked.

jazzyjackson · 2025-03-03T01:03:58 1740963838

I see, I guess it was never a standalone product then, from reading a Reddit post, it’s a feature built into assistant. Thanks, solves a mystery for me.

moralestapia · 2025-03-03T01:25:57 1740965157

It was never real. They even admitted they used real people for the service. It was a scam.

Also, that would be quite hard to pull today, 2025, after transformers etc. There's absolutely no chance they were sitting on that back in 2018.

underdeserver · 2025-03-03T09:33:32 1740994412

I know people who worked on it. It was real. They used real people for some calls, in some cases, but a vast majority of calls made through the system with 100% automatic.

moralestapia · 2025-03-05T01:01:52 1741136512

Source: trust me, bro.

Meanwhile, the CEO of the f company admitted it was false, but sure, you know better ;).

underdeserver · 2025-03-05T10:59:21 1741172361

He didn't, otherwise you would have linked to a quote. But OK. Believe what you want.

moralestapia · 2025-03-02T19:08:25 1740942505

I doesn't work today, let alone 6 years ago.

But good work defending your master.

smusamashah · 2025-03-02T11:48:42 1740916122

How do I get to this in aistudio.google.com?

huijzer · 2025-03-02T12:01:46 1740916906

I think the one under "Stream Realtime" should be similar to the demo. It's only Gemini 2.0 flash though and not the full one.

anon373839 · 2025-03-02T07:20:12 1740900012

It really is an astonishing technological feat! Also note that the largest model they trained is only 8.3B parameters (8B backbone + .3B decoder). It's exciting to think that they're going to be releasing this model under an Apache 2.0 license.

Mistletoe · 2025-03-02T07:32:44 1740900764

Just realizing how uncanny valley it is to talk to AI and it never remembers anything you said in the past. Imagine if a human did that. It’s like you are talking to Tom Hanks’ Mr. Short Term Memory from SNL over and over.

https://youtube.com/watch?v=C6ufImch00g

micw · 2025-03-06T06:21:33 1741242093

I does remember but you have to ask for. Try to say "make a bookmark at this point" and later ask for that bookmark. You can even give the bookmark a name or ask it to do so for you.

ekianjo · 2025-03-02T08:46:56 1740905216

That can easily be fixed if you attach it to a RAG system

znpy · 2025-03-02T11:01:27 1740913287

> 2. Calls are recorded for quality review but not used for ML training and are deleted within 30 days.

Sounds (pun intended) reasonable.

monroewalker · 2025-02-24T08:27:52 1740385672

Do the banks offer email notifications for transactions? That could be another approach if you automate pulling info from the emails

azuanrb · 2025-02-24T09:35:05 1740389705

For individual transactions, it's not really reliable, unfortunately. But for monthly reporting, they do have it, so that could be the next step. There's an app here that does something similar, but it doesn’t seem to be actively developed anymore. It’s a free app, so I guess there’s no reason for them to keep investing in it. Fair enough. Looks like they’re shifting toward a B2B solution instead, so that might be my next direction too.

That said, my main goal for now is just to make it work for personal B2C use first. I do think there’s some potential here because major cities are pretty much cashless now, and there aren’t any good existing solutions for B2C.

There are some other decent options, but they mainly focus on B2B (that’s where the money is), so they’re quite expensive and overkill for what I need.

masteruvpuppetz · 2025-02-24T10:21:20 1740392480

I was wondering about building a payment SMS notification -> Tracking app.

It would be more real-time and give me heartaches everytime I go out of budget :D:D:D

monroewalker · 2025-02-24T06:47:03 1740379623

A cross-platform clipboard manager / search-and-filter tool / launcher built with Flutter that has a simple Python plugin interface.

Plugins can be used to add new "result actions" and new sources of entries to filter and select. Eg. recent Jira tickets, email inbox, shell history, Notion pages, etc. The result actions are a way to easily perform common transformations on selected entries (eg. wrap in triple backticks, find and parse json, trim whitespace, ...) or kickoff some script with a selected entry as an argument.

Project started as a result of having to do a lot of work using Ubuntu and sorely missing Alfred and all the workflows I'd built with it. I wanted something for which I could build workflows once and have those workflows available on whatever system I'm on. Plus to be able to build some plugins that would be usable by coworkers regardless of what operating system they're using and with minimal runtime resource usage. There are some existing cross-platform solutions which could serve this purpose, like Cerebro, Ueli, Script Kit, some others.., but I wanted something lighter weight than is possible with an Electron app. Granted the current state of Epte is that it's built with Flutter + Go + Python so the final distributable and runtime memory usage are higher than is ideal.

Basic Windows support is almost there but there doesn't seem to be a great solution to switching to existing windows of an application instead of just re-launching it. The tool isn't intended to be as good or better than any given OS's built-in launcher so I'll probably just leave that as-is and upload the current state of the Windows build.

https://github.com/mwalkerr/Epte

monroewalker · 2025-02-21T19:21:43 1740165703

The best approach I’ve found so far is to just have a single master “event log” where I dump everything that I want to save by default. I have specific places to put things but if I can’t be bothered to decide where or am not sure it’ll just go to the event log. I’m using Notion for this where each entry is its own page in a “database” list. Adding a new page is trivial though through the site or app. I have an iOS shortcut setup too to open the entry creation

monroewalker · 2025-02-14T21:15:45 1739567745

Oh that would be awesome for that to work. Thanks for sharing

stavros · 2025-02-14T21:53:46 1739570026

If I'm not misremembering, Mistral released a model based on MAMBA, but I haven't heard much about it since.