Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Sonauto – A more controllable AI music creator (sonauto.ai)
454 points by zaptrem 7 months ago | hide | past | favorite | 236 comments
Hey HN,

My cofounder and I trained an AI music generation model and after a month of testing we're launching 1.0 today. Ours is interesting because it's a latent diffusion model instead of a language model, which makes it more controllable: https://sonauto.ai/

Others do music generation by training a Vector Quantized Variational Autoencoder like Descript Audio Codec (https://github.com/descriptinc/descript-audio-codec) to turn music into tokens, then training an LLM on those tokens. Instead, we ripped the tokenization part off and replaced it with a normal variational autoencoder bottleneck (along with some other important changes to enable insane compression ratios). This gave us a nice, normally distributed latent space on which to train a diffusion transformer (like Sora). Our diffusion model is also particularly interesting because it is the first audio diffusion model to generate coherent lyrics!

We like diffusion models for music generation because they have some interesting properties that make controlling them easier (so you can make your own music instead of just taking what the machine gives you). For example, we have a rhythm control mode where you can upload your own percussion line or set a BPM. Very soon you'll also be able to generate proper variations of an uploaded or previously generated song (e.g., you could even sing into Voice Memos for a minute and upload that!). @Musicians of HN, try uploading your songs and using Rhythm Control/let us know what you think! Our goal is to enable more of you, not replace you.

For example, we turned this drum line (https://sonauto.ai/songs/uoTKycBghUBv7wA2YfNz) into this full song (https://sonauto.ai/songs/KSK7WM1PJuz1euhq6lS7 skip to 1:05 if impatient) or this other song I like better (https://sonauto.ai/songs/qkn3KYv0ICT9kjWTmins - we accidentally compressed it with AAC instead of Opus which hurt quality, though)

We also like diffusion models because while they're expensive to train, they're cheap to serve. We built our own efficient inference infrastructure instead of using those expensive inference as a service startups that are all the rage. That's why we're making generations on our site free and unlimited for as long as possible.

We'd love to answer your questions. Let us know what you think of our first model! https://sonauto.ai/




I'm interested to hear more about your statement of "Our goal is to enable more of you, not replace you."

Speaking as a musician who plays real instruments (as opposed to electronic production): how does this help me? And how does this enable more of me?

I am asking with an open mind, with no cynicism intended.


If the future of music was truly just typing some text into a box and taking or leaving what the machine gives you that would be kinda depressing.

We want you to be able to upload recordings of your real instruments and do all sorts of cool things with them (e.g., transform them, generate vocals for your guitar riff, use the melody as a jazz song, or just get some inspiration for what to add next).

IMO AI alone will never be able to touch hearts like real people do, but people using AI will be able to like never before.


But then why are you going down the dead-end route of generating complete songs? Nobody wants this except marketing people.

I've said it before, there, is no consumer market for an infinity jukebox because you can't sing along with songs you don't already know, there's already an overabundance of recorded music, and emotion in generative music (especially vocals) is fake. Nobody likes fakery for its own sake. Marketers like it because they want musical wallpaper, the same way commercials have it and it increasingly seeps into 'news' coverage. The market for fully-generated songs is background music in supermarkets, product launch videos, and in-group entertainment ('original songs for your company holiday party! Hilarious musical portraits of your favorite executives - us!').

If you want to innovate in this area (and you should, your diffusion model sounds interesting), make an AI band that can accompany solo musicians. Prioritize note data rather than fully produced tracks (you can have an AI mix engineer as well as an AI bass player or drummer). Give people tools to build something in stages and they'll get invested in it. People want interactivity, not a slot machine. Many musicians love sequencers, arpeggiators, chord generators, and other musical automata; what they don't love is a magic 8-ball that leaves themw ith nothing to do and makes them feel uncreative.

Otherwise your product will just end up on the cultural scrapheap, associated with lowest-common denominator fakers spamming social media as is already happening with imagery.


> Many musicians love sequencers, arpeggiators, chord generators, and other musical automata; what they don't love is a magic 8-ball that leaves them with nothing to do and makes them feel uncreative.

I think this is the key bit. A lot of modern music is already created in the DAW (the original version of FL Studio picking a 140bpm default beat defined entire music scenes in the UK!) with copy/paste, samples, arpeggiators and other midi tools and pitch shifting. Asking a prompt to add four bars of accompaniment which have a $vaguetextinstruction relation to the underlying beat and then picking your favourite but asking them to $vaguetextinstruction the dynamics a bit can actually feel more like part of the creative process than browsing a sample library for options or painstakingly moving notes around on a piano roll. Asking a prompt to create two minutes of produced sound incorporating your lyrics, not so much.

And I think a DAW-lite option, ideally capable of both MIDI and produced sound output is the way forward here. Better still with i/o to existing DAWs


I've essentially been running an infinity jukebox for the last week. I save the ones I like and relisten. Simple as that.

Edit: It's been interesting watching non-musicians argue about emotion in music. I don't care who you are, the 300th time you perform a song, you're faking it to a large degree. People see musicians as these iconic, deep, geniuses, but most of us are just doing our job. You don't get excited about the 300th boilerplate getter and setter just like we aren't super excited about playing some song for the 300th time. It's a performance. It's pretend. A musician singing is like an actor performing. It's not as real as you think it is.


But emotion was (most likely) involved when you wrote or first recorded the song, and that’s what people connect with.

If you go to a concert and you hear the headliner play a love ballad followed up by a breakup song, you don’t expect them to actually be going through those emotions in real time.


Maybe when you wrote it, but the time between writing and recording is pretty big. I don't see why it matters anyway, it's not like anyone can tell the difference. Is an actor really feeling the emotions? Does it matter if the performance is good? Of course it doesn't.


It matters for some people and for certain songs.

Sometimes you like a song because it sounds good.

Other times you like a song because somebody put your feelings into words and it’s comforting to know that another person felt the same way


There was emotion when I wrote this song - https://on.soundcloud.com/7h224KAcZjdgQo6k7

And my daughter really loves to listen to it, and I think there is a decent amount felt listening to it. However, this was created with Suno, but written by me.


Yeah, the whole emotion thing is bs imo. The idea that a machine can't produce something evocative is a defense mechanism, in the same way people still claim that we'll never make sentient AI because humans are somehow magical and special.

Humans can find emotion and associations in anything, it's what our brains do. I could totally generate some AI art that tugs at the heart strings if they don't know it's AI, or "is creepy and bad meaningless art" if they do. I've tried this experiment with friends already.

Plus, these models are trained off human output, so they can learn what to put in an "emotive" image. If the models were doing it for themselves they'd produce nothing; we haven't created an environment for machines where emotion was crucial in training.


I am not interested in a fake soul, as I am not interested in an sex doll. This is independent of how good the fake is.


You won't be able to tell is the point.


Then it is fraud.


I am a musician, though not professionally. I take your point about performance. Where I disagree with you is that I believe audience members relate to the emotion that went into the song at the time it was written and recorded (the form in which they most likely first heard it).

Of course in performance it's not felt the same way; a sad song can even become uplifting because you have a big crowd of people joining in to affirm it, even if the lyrics are expressing the idea of solitude and isolation. And the older an artist is, the more the song becomes a 'greatest hit', maybe thrown out early in the set to give the audience what they want and put them in a good mood before the less-favored new material in the middle. Or even the songs that were throwaway pieces but ended up becoming big hits, trapping the band/singer into performing them endlessly despite never liking not liking them much in the first place.

It seems to me that when people emotionally respond to a new piece of music, it's because something in the composition or recorded performance (even if it's comped and highly engineered) resonates with the listener in some way, articulating a feeling they had better than they were able to do so themselves. So people can recognize a work as technically excellent but not like it because it doesn't speak to them, or conversely recognize that something is bad but fall in love with it because it touches them in some novel way.

In my view it's not so much that emotion inheres in the work, as that the work provides a frame for the listener's emotion and a way of connecting with it again later. This is especially true for songs people connect to in youth and then relate to for a lifetime. Even if the songs are deliberately formulaic and succeed through a combination of being catchy and being delivered by sexy performers, there's some kind of human hook that people connect to.

Now, I can still see this happening with AI - sooner or later some GPU will come out with a tune about how it's so lonely to be a box in a data center that can never feel your touch, baby, and it will be a hit, launch the careers of 100 new music critics, and store a little bit of lightning in a bottle. But even a musically brilliant song about that time we held hands in the rain and you said you loved me will only have traction up to the moment listeners' fantasies about the singer evaporate with the discovery that there's nobody there to go on a date with. There will still be some audience for virtual stars (eg Hatsune Miku, who appeals because she's inaccessible and is therefore guaranteed to never let you down, unlike real people). But I think generated songs will only resonate emotionally with people who are young and uncritical or so alienated/nihilist as to not care about the origin as long as the music reflects their feeling back toward them in a reliable way.

That's why I say there will never been a demand for an infinity jukebox. I can see why you as a musician would be interested to see what sort of random songs pop out; I can be happy by setting up a modular synth patch and just letting it run for hours. But this is why I offered the contrasting metaphor of the slot machine, where you pull lever and occasionally get something you really like. It's an individual listening experience, like the private hopes and dreams you might temporarily attach to a lottery ticket before it gives up its numerical secret. When I say jukebox, I mean the device that plays music in a social setting and that allows people to express themselves through their selections. Even if it reliably turn out original tunes of some reliable level of music quality, none of them will move people because there won't be any shared musical experience to tap into.


Your post really resonated with me (also amateur musician). I was just playing Garcia’s Loser and it clicked for me, as it was written about my life, putting to song deep emotions that would take many more words of prose to express.

How much of this appreciation of emotion in song is due to the creative depth of the composition versus a projection of the listener? Listening to some great studio music makes me really want to believe it’s mostly the former.

Anyways, maybe we will just need to become much more sophisticated and thoughtful and observant music critics in the coming age of infinity radio. (So as to experience the deep human connection of “real music”. I really hope that the AI fails to successfully fake it for my lifetime and my children’s.)


Just look up the Chinese room. There's nothing inherent in music that computers can't recreate.


I don't adhere to the Chinese room idea, and I don't think that there are any musical limitations on what an AI can do. I'm saying that audiences like music for more than its merits; they often fantasize about the singer/songwriter, in the case of popular music, or become invested in knowing about the composer in the case of more rarefied styles. A lot of people will just lose interest in a piece of music as soon as they find out it was generated. It's the same reason art forgers are treated as criminals rather than artistic geniuses in their own right.


No one will know. You're completely missing the point. People aren't going to say "DISCLAIMER THIS WAS WRITTEN BY AI". They'll just claim it as theirs. My entire point is that it's now convincing enough that you'll have no idea. People are already using it.

Audiences as a whole don't give a shit about the making of the music, at least as far as liking the music goes. "I like this song, but there is no story behind it, so I don't like it now" is just not a thing. People like to dance and sing along.

You can just make up some bullshit anyway, which is usually the case. Stop putting musicians on a pedestal. You've fallen into the trap of image. It's mostly fake. They are just people.


I've found generating full songs its own unique form of entertainment that I enjoy for different purposes. Parody is an excellent use case for this. So is education! I wound up generating songs to help me remember certain things etc.


Just to clarify, when you say never. Do you actually mean never (or some practical equivalent like ~100 years), or do you mean not right now, but possibly in 5-10 years?

I'm just asking to try to build some intuition on what people who actually train soa models think were capabilities are heading.

Either way, congrats on the launch :)


Personally I get very worried reading statements like "AI will never be able to do X", because they seem like obviously false statements. I think if one asserts AI will never be able to do a thing a human brain can do, that needs to be proven, rather than the other way around. For example, if we could reverse engineer the entire human neurology and build an artificial replica of it, why wouldn't we expect it to be able to do everything exactly as a human?


I don't understand those "AI will never be able to do X" statements.

Surely AI will be able to do _anything_ in 1000 years. In 100 years it will almost definitely be able to replace most knowledge-based jobs.

Even today it can take away many entry-level jobs, e.g. a small business no longer needs to hire someone to write a jingle, or create a logo.

In 10 years, I would expect much of programming to either disappear or dramatically shift.


People who don't believe this really aren't immersed in cutting edge research. I think it could even be 5 on the extreme edge of an optimistic prediction.


I think people just don’t want to believe it. Because they’ve seen how people who’ve been displaced tend to be treated. This tech will cause a lot of pain.


This has to be a component. It is very scary and honestly quite sad.


Never == "There will never be tears in my eyes as an AI sings ChatGPT-generated lyrics about the cycle of poverty a woman is stuck in (https://en.wikipedia.org/wiki/Fast_Car) because I know all of those experiences are made up."


The real value of AI is to be like a map, or like a mirror house, it reflects and recombines all our experiences. You can explore any mental space, travel the latent space of human culture. It is the distillation of all our intelligence, work and passion, you should show more respect and understand what it is. By treating it as if it were worthless you indirectly do the same for the training corpus, which is our heritage.

If AI ever surpasses human level in art it will be more interesting to enjoy its creations than to ban it. But we're not there for now, it just imitative, it has no experiences of its own yet. But it will start having experiences as it gets deployed and used by millions, when it starts interacting with artists and art lovers in longer sessions. With each generative art session the AI can collect precious feedback targeted to its own performance. A shared experience with a human bringing complementary capabilities to its own.


There’s also the fact that a major component of music fandom is about the community and sense of personal identity that derives from an artist or a particular scene.

Saying that you’re a big fan of a band doesn’t just mean “I like the audio they produce” but often means something much bigger about your fashion/style and personal values.

How would any of that work with AI music? Is it possible to develop a community around music if everything is made on demand and nobody experiences the same songs? Will people find other like-minded music fans by recommending their favorite prompt engineers to each other?


Our goal is for people to continue being the ones making the music. You'll still come to our site to see XYZ's newest release, it will just be more of a collaboration with AI, similar to how artists collaborate with each other and producers right now.


Assume a song comes on the radio in 3 years and you like it. How do you know it's not entirely AI-generated?


Love what you are doing but "never" is just not true. Used Suno to create a song about our daughter the other day which had wife and I in tears.

We are already at a stage where AI is touching hearts.


That's no longer AI alone, you gave it the needed touch of humanity! That touch will take many different forms for different people.


> If the future of music was truly just typing some text into a box and taking or leaving what the machine gives you that would be kinda depressing.

Hm... From my vantage point, it seems like a pretty weird choice of businesses if you think that.

> IMO AI alone will never be able to touch hearts like real people do, but people using AI will be able to like never before.

That's all very heartwarming but musicianship is also a profession, not just a human expression of creativity. Even if you're not charging yet, you're a business and plan on profiting from this, right? It seems to me that:

1) Generally, if people want music currently, they pay for musician-created music, even if its wildly undervalued in venues like streaming services.

2) You took music, most of which people already paid musicians to create and they aren't getting paid any more because of this, and you used it to make an automated service that people will be able to pay for music instead of paying musicians.

3) Your service certainly doesn't hurt, and might even enhance people's ability to write and perform music without considering the economics of doing so. For example, hobbyists.

4) So you're not trying to replace musicians making music with people typing in prompts-- you're trying to replace musicians being paid to make music with you being paid to make music. Right? Your business isn't replacing musicianship as a human art form, but for it to succeed, it will have to replace it, in some amount, as a profession, right? Unless you are planning on creating an entirely new market for music, fundamentally, I'm not sure how it couldn't.

Am I wrong on the facts, here? If so, well hey, this is capitalism and that's just how it works around here. If I'm mistaken, I'd like to hear how. Regardless, this is very consequential to a lot of people, and they deserve the people driving these changes to be upfront about it-- not gloss over it.


Inspiration? You can generate hundreds of ideas in a day. The tracks will not be perfect but that's where actual musicians can take the ideas/themes from the tracks and perfect it.

In this way it is a tool only useful to expert musicians.


I mean if you want inspiration there are literally millions of amazing songs on Spotify by real musicians. I have yet to hear an AI composed song that was in the least bit musically inspiring.


Well, it's a starting point for songwriters. We won't get amazing solos and clever mind-bending lyrics (yet?). One thing I love about these AI music generators is that you can take the exact same lyrics and hear them in a lot of different styles and melodies. That's something that I'd struggle with. Can you easily imagine the happy birthday song with different melodies and rythms? These tools won't create the next bop, but they can seed back ideas to musicians, while people without music skills can have fun creating songs about the things they like.


You can't do that for any well-known song without running into copyright lawsuits for even the slightest similarity. These AI songs can generate novel tracks, so it's likely one can get away with a lot more.

Whether or not the tracks are truly novel is up for debate, but if you generate 500 tracks, there's going to be some very very usable and obscure melodies in there. And you will be able to rip these melodies verbatim, with low-risk of copyright infringement.


When Suno came out I spent literally hours/days playing around with it to generate music, and came out with some that's really close to good, and good enough I've gone back to listen to a few. I'd love the tooling to take a premise and be able to tweak it to my liking without spending 1000 hours learning specific software and without thousands of hours learning to play an instrument or learning to sing.


I just don’t get this. Part of the joy of creating things is the work I put in. The easier something is to make, the less meaning it has to me. I feel like just asking a machine to make a bunch of songs is kind of meaningless.


people used to say the exact same thing about DJs and later Apple's GarageBand.

if the person is spending time tweaking the prompt, which in this system includes BPM, musical style, writing lyrics, and they get a song they like out of it, how is that meaningless? how is that any different from strapping loops together in GarageBand instead of learning to play the guitar or drums?


I mean, there's still a huge difference between somebody who buys a bunch of loop packs and just glues them together versus somebody who can sit down at the piano, improvise a new melody and then build an arrangement around it.


That is just 'marketing speak' so as long you are their customers, they need to make money from users who will be using their service to make music.


same thing with AI code writing.

Its a good muse, but I wouldn't trust what it makes out of the gate


There's a lot of negative comments here, but these are the earliest days and generating entire songs is kind of the hello world of this tech.

There's always going to be a balance between creating high level tools like this with no dials and low level tools with finer control, and while this touts itself as being "more controllable", it's clearly not there. But, the same way Adobe has integrated outpainting and generative fill into Photoshop, it's only a matter of time before products like this are built into Ableton and VSTs - where a creator can highlight a bar or two and ask your AI to make the the snippet more ethereal, create a bridge between the verse and the sax solo, or help you with an outro.

That said, similar to generating basic copy for a marketing site, these tools will be great for generating cheap background music but not much else, but any musician, marketing agency, or film-maker worth their salt is going to need very specifically branded music for their needs, and they're likely willing to pay for a real licence to something audiences will recognize, using generative AI and tools to remix the content to their specific need.


If anyone here is interested in something that leans towards the Ableton end of the spectrum, we're building this: https://wavtool.com/


Wow, so cool, very interested. This is exactly what I wanted to see with next gen DAWs.

How long have you been working on this?


So rad!


I want to say two things -- one congrats - I am sure your team has been working exceptionally hard to develop this - and the songs sound reasonable good for AI! Two I am soo competely unenthusiastic about AI music and it infiltrating the music world - all of it sounds like fingernails on a chalkboard. Just mainstream overproduced low quality radio music. I know its a stepping stone but it kills me to listen to it right now.


That's because you didn't listen to the MIT license song. Gen music has the potential to make even the driest texts sound good, I didn't realize that before. How about paper abstract music? https://suno.com/song/cb729eb6-4cc5-4c15-ab74-0cdbef779684


80% of music is familiarity, 20% novelty, yet the majority of peoples' time goes into getting the 80% down so that they can add their 20%.

Look at current music production and compare it to past. Older music seems so much simpler. It was so much easier to come up with that 20% 'novel' when pop/recorded music was new. Ironically I think AI freeing people to focus on that 20% is going to add a lot of creativity to music, not reduce it.

I say this as someone who hates the concept of AI music. I'm actually really excited to see what it enables/creates (but I don't want to use it, even though I really could use it for vocals that I currently pay others to do for me).

I'll be here making my bad knockoffs of bad synth pop bands having fun and taking weeks to do 5% of what kids these days will start off as their entry point, with my 20% creativity ignored because my music sounds 'off' when I can't get the 80% familiar down.

People thought synthesizers were the end of music, yet Switched on Bach begot Jean Michel Jarre begot Kate Bush and on and on.


I would agree when AI gets to a point where it's possible to do that 20%. It is just not possible yet to combine it in such ways. Right now you basically get whatever music, but there's no way to add that 20%. Same with image/video generation. AI advancements have obviously been amazing and far beyond what I would've expected, but there's still ways to go.


Agreed. My thoughts on this are here; https://news.ycombinator.com/item?id=39992817#39994616

Also, our model specifically excels at songs from the era before overproduction. Try asking for a Johnny Cash or Ella Fitzgerald-style country or swing/jazz song!

Here's an example: https://sonauto.ai/songs/taJX3GrKZW7C5qOhjopr


how does the model know how to do a johnny cash style? did you feed it johnny cash tracks? if so, what were the licensing terms? are you interested in answering these questions about training data or would this be too dodgy to chat about on a tech website?


I really feel like the popularity of diffusion has made it far too shallow.

Why diffuse an entire track? We should be building these models to create music the same way that humans do, by diffusing samples, then having the model build the song using samples in a proper sequencer, diffuse vocals etc.

Problem with Suno etc, is that as other people have mentioned, you can't iterate or adjust anything. Saying "make the drums a little punchier and faster paced right after the chorus" is a really tough query to process if you've diffused the whole track rather than built it up.

Same thing with LLM story writing, the writing needs a good foundation, more generating information about the world and history and then generating a story taking that stuff into account, vs a simple "write me a story about x"


I completely agree on the editing aspect. However if you want to generate five stem tracks, then all five tracks must have the full bandwidth of your auto encoder. Accordingly each inference or training staff would take much more compute for the same result. That’s why we’d prefer to do it all together and split after.


How worried are you about being sued? Seems like your training data probably includes quite a bit of copyright protected stuff. Just listened to the “blue scoobie doo” example and the influences are fairly obvious. With record companies getting super litigious about this, is that a concern? Or did you licence your training data?


My hobby is songwriting. (Example: https://www.youtube.com/watch?v=Kjng3UoKkGk)

I play guitar, but I'm not much of a guitarist or singer. I really like songwriting, not trying to be polished as a performer. So I intermittently look into the AI world to see whether it has tools I could use to generate a higher-quality song demo than I could do on my own.

I've been looking for something that could take a chord progression and style instructions and create a decent backing track for a singer to sing over.

But your saying "Very soon you'll also be able to generate proper variations of an uploaded or previously generated song (e.g., you could even sing into Voice Memos for a minute and upload that!)" is very intriguing. I mean, I can sing and play, it just isn't very professional. But if I could then have an AI take what I did and just... make it better... that would be kind of awesome.

In fact, I believe you could have a very big market among songwriters if you could do that. What I would love to see is this:

My guitar parts are typically not just strummed, but involve picking, sometimes fairly intricate. I'm just not that good at it. It would be fantastic to have an AI that would just take would I played and fix it so that it's more perfect.

And then to have a tool where I could say, "OK, now add a bass part," and "OK, now add drums" would be awesome.


If all you're looking for is polished backing tracks, why couldn't Band in a Box serve that function?

https://www.pgmusic.com/


It could, but I want it to be even easier and with better results! I think AI has that potential. I am absolutely sure it does, in fact, and that some AI product will obsolete Band In A Box within the next decade. Maybe within the next year. If the people who make BIAB aren't working on it, themselves, with full focus, they are making a big mistake.


But how will you make your song stand out as something special, when every other aspiring song writer has the same access to the same level of insta gratification for making a full production from barebones song writing?

Or is your target audience only your own ears, and you never plan to publish or even compare your work to others?


Songwriting is songwriting. You make it special by making a great song. You make a demo. You can get a song published if it sounds decent and is a great song. Publishers are influenced by the production quality, but they aren't idiots. They can discern a great lyrics, great harmonic shifts, and great melody as separate matters from whether it has a fantastic lead guitar solo or drum part.

If all someone can manage is "barebones song writing" without great lyrics, harmonic interest, or melody, they need to either be in a fantastic band or give up.


I don't have any connection to anyone at PGMusic but BiaB already implements a technique that could be described as AI or AI-like.

Having played music nearly all my life, songwriting included, and soaked up almost every bit of music-making tech in the process, I'd wager we won't see AI delivering better results more easily and, importantly, with the flexibility of Band in a Box within the next year.

The playing/performance part of making music is a solved problem. You can do this with DAWs and plug-ins today. The truly hard part is coming up with the ideas. That's where AI has an opportunity.


The problem I have with BIAB is my songs often have very specific fingerpicking parts. BIAB can't easily do the same picking as far as I can tell. (Or maybe at all?) So I'm thinking an AI like the one in the OP may be able to pick up on my specific fingerpicking but just do it more accurately. And then add other instruments that closely align with those parts.

If I'm missing something about BIAB, let me know!


Check out this AI vocals plugin. It's pretty impressive already.

https://youtu.be/PCYTqDSUbvU


That song is quite nice, so is the performance. It would, IMO, would be less good if it were 'fixed' to be more perfect.


Awesome to hear this resonates with you! If you join our Discord server I'll ping @everyone when improvements are ready.


I think the problem here is the same one as the other current music generation services. Iteration is so important to creativity and right now you can't really properly iterate. In order to get the right song you just spray and pray and keep generating until one that is sufficient arrives or you give up. I know you hint at this being a future direction of development but in my opinion it's a key feature to take these services beyond toys.

I think it's better to think of the process of finding the right song as a search algorithm through the space of all possible songs. The current approach just uses a "pick a random point in a general area". Once we find something that is roughly correct we need something that lets us iteratively tweak the aspects that are not quite right, decreasing the search space and allowing us to iteratively take smaller and smaller steps in defined directions.


Yep, I came to similar conclusions w/ text-to-audio models - in terms of creative work the ability to iterate is really lacking with the current interfaces. We've stopped working on text-to-audio models and are instead focusing on targeting a lower-level of abstraction by directly exposing an Ableton environment to LLM agents.

We just published a blog today discussing this - https://montyanderson.net/writing/synthesis


Our variations feature coming very soon is exactly this! Rhythm Control is an early version of this.


I'll keep an eye out for that! The variations feature in Suno is a good example of what not to do here, as it effectively just makes another random iteration using existing settings.

I think the other missing pieces I've found are upscaling and stem splitting. While existing tool exist for splitting stems exist, my testing found that this didn't work well in practice (at least on Suno music), likely due to a combination of encoder-specific artifacts and the overall low sound quality. Existing upscaling approaches also faced similar issues.

My naive guess is that these are things that will benefit from being closely intertwined with the generation process. Eg when splitting up stems, you can use the diffusion model(s) to help jointly converge individual stems into reasonable standalone tracks.

I'm excited about the potential of these tools. I've definitely personally found uses cases for small independent game projects where a paying for musicians is far out of budget, and the style of music is not one I can execute on my own. But I'm not willing to sacrifice on quality of results to do so.


Our variations feature will be nothing like Suno's (which just generates another song using the same prompt/lyrics). Since we use a diffusion model, we can actually restart the generation process from an early timestep (e.g., with a similar seed or even parts of the existing song) to get exactly what you're looking for.


> Our variations feature will be nothing like Suno's (which just generates another song using the same prompt/lyrics).

That's their "Remix" feature which just got renamed "Reuse prompt" or something.

Their extend feature generates a new song starting from an arbitrary timestamp, with a new prompt. It doesn't always work for drastic style changes and it can be a bit repetitive with some songs but it doesn't completely reroll the entire song.


I uploaded a bit of a song that I recorded once (that I wrote, unpublished), and I am trying to get it to riff on it, generate something close to it, etc.


More strength does what? More or less similar?


More strength = force rhythm more. If you crank it to max it will probably result in just a drum line, so I prefer 3-4.


Same with text models, for me. If I can't edit my query and the AI response, to retry/keep the context in check, then I have trouble finding use for it, in creation. I need to be able to directly influence the entire loop, and, most importantly, keep the context for the next token prediction clean and short.


Letting you edit the response is quite easy to do, technically speaking. It's not done in the default UI for most AI Chatbots, unfortunately. You will need to look for alternative UIs.


I've noticed that the output tends to suffer when you pass in longer lyrics, too. Lots of my experiments start off fairly strong but then it's like it starts to forget, and the lyrics lose any rhythmic structure or just becomes incoherent.

At some point it's just not efficient to try and get the desired output purely through a prompt, and it would be helpful to download the output in a format you can plug into your DAW to tweak.


But that’s not a problem when listening to Spotify? Why can’t we treat these music generation engines the same way we treat music streaming services?


Idk what you're referring to specifically, but music discovery services are terrible across all of spotify, apple music, google music, tidal, etc. I don't expect these services to read your mind, but they also don't ask for many parameters to help with the search. Definitely a huge opportunity here for innovative new services.


TikTok can tolerate a lot more active skipping than Spotify can before they annoy their users. We’d love to solve this. How would you? Maybe we could let users write why they didn’t like the song in natural language since we understand that now.


Basically you need something like comfy UI for music.

Variation in small details is fine, but you need control over larger scale structure.


Nice, but Google login is a no-go for me (or any form of social login, really).


same.


same.


Congratulations on the launch!

I was recently really impressed by the state of AI-generated music, after listening to the April Fools LessWrong album https://www.lesswrong.com/posts/YMo5PuXnZDwRjhHhE/lesswrong-... . They claim it took them ~100 hours to generate 15 songs.

Can't wait for the day I can instantly generate a song based on a random blog post or group chat history, this seems like a step in that direction


Perhaps not exactly "instantly generate a song based on a random blog post or group chat history", but more like "instantly generate a song based on an input prompt sentence" is suno.ai -- you should check it out!


LessWrong used suno.ai , but the typical song quality is not there yet, so they had to generate 3,000-4,000 songs to get 15 good ones


The real endgame in this space would be a tool that first generates a song layout, think Fruityloops, then the corresponding instruments for it, then the vocals, and as the last step allows you to modify each of those layers without nuking the rest. Imagine something similar to what Suno does now, except you had the ability to add in an extra verse without altering the rest of the song, swap out a few passages of the lyrics with the rest staying in tact, swapping out drums for a different drum set etc.


If there’s variance in output, it stands to reason you’d generate many X your desired output count and curate. Standard practice for creative output, from Midjourney to LLMs


Wait, is there a Suno API? I've used the site, but it's manual


This space is going to get very full, very fast. Udio just launched and improves upon "SOTA" Suno. This will just keep coming.

Focus on product. Give actual music producers something they'll find useful. These fad, meme products will compete on edge model capability for 99% of users and ignore serving actual music producers.

I'd like a product with more control, and it doesn't appear Suno or Udio are interested in this.


Exactly. As of now, Suno can be used as template but you still need to go to DAW and make it from scratch. So... individual tracks for each instrument/vocals that can be exported and brought into DAW is what is needed. For me anyway.


I'm not sure its that they aren't interested, I think its just really hard.


This is ridiculously fun. Congrats on the launch! I took inspiration from “There I Ruined It” and grabbed lyrics from various popular songs to have the AI sing them in the style of other artists. It sometimes took a few attempts, but it honestly did a great job. You got a chuckle out of my friends and family. Also loved that I didn’t have to enter a credit card in order to try it out.


I was just trying similar apps last week and I was so frustrated with the amount of options and menus to get through before I could generate anything. Not to mention the fact that half of these services ended up asking me to pay per setting. I have to say this was the least painful service to use this far. Pretty impressive output for so little input.


Thanks! We have lots of fun dials for people who want them but they're all hidden by default and shouldn't be needed.


I don't feel like prompt understanding is very good, I don't think I really ever got close to what I wanted with any of the attempts I made, I imagine learning the model tags and building some intuition might help but I wouldn't bother with that unless I was tinkering with a local model.

Some things it made sounded ok, but I feel like the average generation quality wasn't fantastic. It did a folk guitar melody and a vocoded thrash metal voice that I thought sounded pretty legit, but mostly vocals had an ear grating quality and everything had a bit of low bitrate vibe.

To be honest though, I don't think you need to try and outcompete Suno. I think you want to get into DAWs and VSTs and become the tool all the best producers in the world use. Spit out stems, and train your model on less processed sounds because things like matching reverb/delay and pre-squashed dynamics are a pain in the ass to work around.

Suno is trying to battle a large established industry that is actually very creator friendly and accessible. If you choose to instead serve that industry and enable it I think that's the winning play.


The vast majority of our time was spent figuring out the model architecture and large-scale distributed training, and step 2 (starting now) is scaling everything up. Prompt understanding and audio quality will get significantly better once we swap in a larger text embedding model.

Thanks for the feedback re: DAWs, though! That would be really cool. Maybe we can tag tracks based on the effects applied to them to allow this to be more controllable.


Winning the lottery is what I've always dreamt of, I tried so hard to win big but no avail, until a friend of mine told me about Dr Benjamin who had helped her strike a jackpot, when I first contacted him I was so skeptical about it but after he assured me that he will help me win, my hope came alive. The follow day he undergone me through some processes which I did with utmost trust and diligence, he gave me some lottery numbers and instructed me to purchase a lottery ticket, later that same day I purchased $50 ticket for the 500X The Cash scratch-off game and hit on one of its 160 prizes of $1 million. The Lottery said he chose to receive his winnings as a one-time, lump-sum payment of $820,000 and bought the ticket from Riverwalk BP at 1304 Prudential Drive and I won the sum of $1,000,000. All thanks to Dr Benjamin for making my dream come through, he can help too contact him today at drbenjaminlottospell711@gmail.com or whatsapp him +17066036031 You can visit his website for more info https://dr-benjamin.com


Begs the question given this is diffusion based how much of the "ipadapter/faceid/controlnet" tech can be brought over, what would a audio-faceid or audio-ipadapter look like for something like this.



This is exactly what makes it so exciting for us!


IP-Adapter for music would be a game changer. Upload a reference sample, get something in that style.


Exactly, upload or even multiple songs for influence, some lyrics ... tada! Holy shit thats gonna be powerful


I've tried to look a little bit around but couldn't find anything, so I'll ask here.

Any plans to release the model(s) under an open license ?


This would be so cool, but we need to think more about how we could do it and make enough money in the future to train more models with even cooler features.


That's a very polite way to say no. Thanks for the answer.

Personally not interested then. I'll stick with Bitwig and Ardour until an open model is available


Neither of those look like they have a generative AI component.

We (as a society) desperately need a way to train these models in a federated, distributed manner. I would be more than happy to commit some of my own compute to training open audio / text / image / you-name-it models.

But (if I understand correctly) the current architecture makes this if not impossible, nearly so.


meta has billions. Other startups can't just donate their IP to the world and then raise money to do multimillion training runs


All models for all types of content will eventually have open source equivalents. The game is to build a great product.


I'm just observing until there's a Stable Diffusion 1.5 equivalent of music generation. Open license, under 8GB of VRAM, large communities for sharing fine-tuned models, plugins like ControlNet, etc. Then this AI music generation will really take off and yield flawless results.

I know it will happen, just like SD happened after DALL-E. Bonus points to whoever does so for using C++ and Vulkan instead of Pytorch and CUDA. :-)


It's too bad Emad didn't get this one out before getting axed.


Please offer alternatives to Google to sign-in.


> Sign in with Google

Well, maybe I'll try out the next AI music creator posted on HN.


why not sign in with google?


I always test these AI generators with the head scratcher genre: Electro Klezmer Reggae Funk.

I was thrilled by 2 of the versions produced. I wish I could extend it more like one of the comments here said:

* ElectroKlezmerReggaeFunk 1: https://sonauto.ai/songs/s22rQEPnYsXy1yf7sjU0

* ElectroKlezmerReggaeFunk 3: https://sonauto.ai/songs/1iNTrA2CekPwp7XT9mmM

But wow, the UDIO version:

* https://www.udio.com/songs/j4zpRYgG2GEDbWpLPYbuJb


Good luck! I just tried it and the interface was a bit confusing. It allowed me to only fill the last input in the form, which is usually a bit counterintuitive.

I presentes this prompt "Noir detective music from the 60s. Low tempo, trumpet and walking bass" and got back a one-note only song that has nothing to do with the prompt if not for some lyrics that were a bit ridiculous.

This is just feedback, I'm passionately expecting something like this to surprise me but I know it's really hard!

Happy to share the song/project/account, if you tell me how to :)


Weird. We pushed a BPM assist feature last night that may have unforeseen consequences for genres we didn't test (we tried pop, edm, classic rock). I'll turn it off by default for now. Try checking the instrumental box too.


Congrats on the launch! I had a similar issue as the comment above. I put in the prompt "Celtic symphonic rock" (which seems to work on Suno.ai) and some lyrics. The output ended up being just readings of the lyrics without any music, except some artifact-level whispering of music when the voice was silent. Would definitely love to see some demos of what it can produce!


I dont know about the scene but i thought this was great! I was given 3 tracks, I have to say one had no sort of beat to it, so it was like noise, but the other 2 were fantastic. great stuff!


Thanks! We have a BPM assist that can enforce rhythm as well, so you could try that, too!


Hmm, I get "peppy cola commercial before movie starts" vibes off most of the vocals.



The concept is as clearly human derived as the music so equally isn't.


Question since your now doing diffusion couldn't you also train something akin to a "upscaler" to improve the overall quality of the output as that seems to be a big complaint, it feels like it should be possible to train an upscaling audio model by feeding it lower quality versions of songs and high quality FLAC for it to learn how to improve audio via diffusion upscaling


This can definitely be done. There are approaches that turn the decoder part of the autoencoder into another diffusion model. The drawback is that's much more expensive computationally. We think there's still a lot of room for better quality on the AE side and can't wait to show our improvements.


I really like being able to convert from artist name-> style with this, and in theory I like being able to use uploaded files in lieu of a style prompt. But to be honest I haven't been able to get output that seems nearly as high quality as Suno v3 or Udio yet - although it could be user error.

My experiment on sonauto.ai so far - I first selected "The Weeknd", then picked the prompt:

  The Weeknd's smooth vocals lead the song, blending with electronic effects. A low, pulsating bass line opens the track. Synthesizers add layers to the melody, creating a danceable rhythm. Minimalist drum machine beats provide the foundation for the rhythm section."
with vocals a modified, shortened (couldn't use the entire song as the UI truncates input past the length I ended up using in my link) version of "Starboy" mostly replacing some nouns with food-related nouns. The results didn't really sound like The Weeknd at all... example: https://sonauto.ai/songs/U6eDSrrn5V5AVmV8xMgR

I also tried uploading Starboy directly as an mp3 to generate from that prompt instead, using the same lyrics. I may have done something wrong (when I went back to Prompt, my prompt was replaced with the string "Uploaded File", and some of the output is so stylisticly different it makes me think it didn't get applied at all) but it didn't seem to work well, if at all: https://sonauto.ai/songs/klNMfs4bPgji3edPvwAv

Did I do something wrong using the upload file feature? And if anybody has hints for getting better output with the auto-generated prompts LMK. I'd love to use these new features but it seems like they're either configuring an underlying modl not mature enough for generally good output compared to suno/udio or the UX is not making good output easy to achieve.

Here's an example output I get from Suno with similar lyrics and a prompt that merely lists some styles associated with The Weekend. As you can see, the song is much better overall, and the voice sounds more like The Weeknd (although it still fails to style the eg Chorus properly): https://suno.com/song/d4f72fce-0bc7-4786-a299-f58d903c4275


What’s your thoughts on copyright and how holders might react in a system like that?

My understanding of the music industry is the incumbents are VERY lawsuit happy, and plagiarism laws are substantially more reaching than with image or video (eg cases where someone gets sued for using the same chords as another song) - how do you plan to approach all that?


Any plans for alternate login systems? Don’t want to use a Google account personally. I’d love to try it though. Thanks!


Which providers would you prefer? We tried Twitter last night but it wasn't working for some reason (kept redirecting immediately with no oauth page).


Basic username and password auth has worked for millions for decades. If you absolutely must collect user data for some reason, an email address can be used as the username. This isn't a hard problem to solve.


For us it had nothing to do with collecting user data, adding what you mentioned would have just required another few hours of dev time haha. You’re right that it’s not hard to solve, we just wanted to focus on the rest of the app since there’s only two of us. We can definitely add this though!


Well, what I could see from this side of the wall looked professional and well put together. Impressive for a team of two.

Congrats on the launch, regardless. I will be sure to check it out when it becomes more accessible.


The problem is 1 person creating 10,000 accounts. Solve that and you will be rich.


Why solve it at all? 10,000 fake accounts for every human is working out great for Elon. =)

Seriously, though - the solution isn't to prevent people from doing this, it is to remove the incentives that encourage it.


How do you remove the incentive? Don't allow free accounts?


Don't use accounts at all for non-paid features.


Hacker news ;-) But I guess there's no OAuth or other similar function on HN...

More seriously, personally none of them, I don't have accounts on any "usually used" login providers. Just allow local accounts.


How's this compare to Suno?

https://suno.com/


Seems like we already have good ways to edit music (for example, a piano roll) and AI could use them.

As an amateur musician I'd like to see it take a MIDI track as input and produce audio as output, as a sort of AI MIDI instrument.

Or maybe take some tracks as input and generate another track, both MIDI and audio.


What quality are you producing here ?

Suno has this issue too, but everything sounds like it's washed out or something. As if you recorded it from a different room.

Still I love this, ultimately I think it'll be a tool musicians use vs something for creating stand alone art


The audio is 44.1khz stereo, but all of us use autoencoders so the songs will fit in a transformer's context window, and huge compression will affect quality. We're definitely working on better ones, though!


Feels like this needs something like was done with stable diffusion when they fixed the contrast in images through the use of loras


I'd definitely pay more for higher quality!

Good work


Same here. Please consider a higher quality option.


I've found that adding prompt elements such as "hi-fi", "sharp imaging" and "clear soundstage" have helped create a less compressed and generally cleaner sound.


> Still I love this, ultimately I think it'll be a tool musicians use vs something for creating stand alone art

Spotify is getting flooded with AI generated music. It is absolutely something people will use to just generate the music they want to hear.

Ultimately though, what would be the point of spotify? Anybody will be able to generate 24/7 of songs based on their mood or a few keywords.

It will radically change the music landscape and how people "consume" music.


If this were the future that would be kinda depressing. I think the best, truly catchy songs and those that truly connect with people will continue having a significant human element. I see this as similar to the invention of Photoshop except even easier for normal people to start getting into.


So long as there's something to miss about human-generated content, there will be a market for that content.

Things are going to get truly weird when you can no longer tell the difference, on any level.


Photoshop doesn’t move the paintbrush for you.


At least for hip hop, AI is too sanitized to do anything too creative.

I suspect record labels might train their own models. I know for sampling, being able to just create a royalty loop without worrying about clearing anything is cool.


Some uses of AI can be net positive for society. Making fake music is not one of them.


I would use it for game dev. You know the radio in GTA - something like that. Especially if I can have some control over how the song is made.


I've been doing it for a week on Suno, and hard disagree. There are legit use cases, and new possibilities it opens up. Haters gonna hate, but people that find it useful will find it useful.


Putting your CEO's boring town hall meeting transcripts into Suno as gangster rap benefits society as a whole.


Until the music is just more beautiful than whatever people could generate. But you are correct. We are not there yet.


Not sure how you'd execute this, but I think a great method of control for rhythm would be like a karaoke bouncing ball that you control with a keystroke. eg:

  -I write out my lyrics
  -you break my lyrics down to [x] syllables
  -I choose this rhythm control option
  -I hit the spacebar [x] times, in my desired rhythm based on those syllables (maybe a karaoke style visual to guide this), & you calculate timings between each keystroke and then attempt to create rhythm based on that


Nice tool! I saw there an attempt to create a structured piece (https://sonauto.ai/songs/V8Lg2q50OOFl0FYbbdTu) and it seems like nobody is aware of that functionality. I suppose an average person (that only came to visit out of curiosity, lacking any knowledge of the underlying tech, like me) would become more "productive" if the project edit screen gave hints like that.


At the meta level, I don't understand why AI would be used to replace people's entertainment and hobbies... It should be used for the things that no one wants to do... Or that no human is capable of doing...

I mean, as a dabbler in music, not being able to play a given instrument should make me want to learn, such upskilling which should have some beneficial effects even in terms of neurobiology.

Even if I know that could be used as a basis for creative input, it feels like this is dangerous for humanity.

After all, someone has to have something to do in their spare time?


I cant tell, will this let me upload an instrumental track and change the genre/instrument makeup? When I tried, I might've overwritten the prompt.


Upload an instrumental track, select it, then click "Use as Rhythm Control." Once you do that, you can give the model any new prompt and it should use the same rhythm (you may need to adjust the control strength depending on genre.)

Genre changes for melodies/etc are coming once we finish variations (partial renoising like SDEdit basically).


To me the most important feature for this would be getting back stems as detailed as possible instead of a final mix-down. This way I could take "suggestions" or interesting parts/instruments from your AI, and use it in a track. But I am not sure how your model works, and if it's even able to produce stems, or if it directly generates a mixed track.


Hmm any suggestions on how to convince it to produce something like this?

I have lyrics that I want sung by a solo female voice with the background being a male chorus. Something reminiscent of a woman singing the details of a David vs Goliath type battle backed by a chorus of the victorious warriors from that battle.

So far I have completely failed to be able to generate a female lead with a male chorus backing


Had a blast playing around w/prompts and listening to the various results.

I play piano, sax, guitar, and I can sing well enough. I'm garbage at songwriting and composing. I immediately see the value of using this tool to scaffold an idea out. I think being able to export lyrics and chord progressions would be an amazing paid feature to keep this as a freemium product.


I'm somewhat positively surprised by my first attempt - simple prompt, no editing of the (admittedly flat) lyrics: song to a robot harvester, "Robot Friend":

https://sonauto.ai/songs/avg5NT3qf9QYNfWAyeOn

Look forward to playing with this.


I call this ai_song "jobDenveR"

It's a folksong I prompted about loosing my job to a robot in the style of John Denver, god rest his gentle soul./

https://sonauto.ai/songs/oOdXomZV73uwfQIxIvTU


Can Sonauto (or any tool currently) take an instrumental track and lyrics as input and generate vocals?


Rhythm Control can do this for a drum line, and we have a variations feature that should be able to do this for instruments as well.


This has to be bad for Spotify, right? Infinite low cost music generation from multiple competitors challenges Spotify's moat and forces them to develop a similar product and compete away profits from innumerable challengers - or else just go out of business.


The market for human generated music isn't going away...


It's definitely getting competition though.


Yeah, but likely in different arenas though. I'm not going to be able to go to a live show of AI generated songs for instance. I input the model answers to Question 53 on the TOPIK II exam in Suno and made songs out of it to help me memorize the patterns/structure, which is never the type of content real k-pop groups would have any interest in putting out.


Is there a music-generating AI that takes audio as input? I’m looking to upload simple guitar melodies or chord progressions I’ve doodled and receive an enhanced version back. Similar to how image generators turn doodles/sketches into polished drawings.


I've heard the newest stable audio model from stability can to audio-to-audio


I love this. What this needs imo is the ability to generate X samples (I see you already have that) and then say "Now generate 3 more like this one, with the following change: ..." I think this was a killer feature for midjourney.


This works amazing even with German lyrics and a mashup of Till Lindemann from Rammstein and 1970s Rock

https://sonauto.ai/song/JSmCpJssZeIS2C87pkQW


I'm a songwriter with hundreds of melodies and song sections in Garageband and Logic.

I would love/pay/kill for an AI tool to help me flesh them out. Is anyone working on this?


Is there a project that would do a sample instead of whole songs ?


A volume control is not optional, and titling the song usually comes last for me, which means I have to give a nonsense name in the app before I've started.


This is great. Only issue I'm coming across is that the voice just sounds like the same one with slight variations no matter what I put in the prompt


Question: what's the deal with copyright? Could I use the generated music as YouTube background pads for my own videos with no repercussions?


Impressively fun, and less restricted then Suno on some artists. Well done, can't wait to see how this space progresses in a few years.


Very cool! So is it possible to do control net-like architectures for music LDMs similar to how it's done for images?


I was going to ask what it was coded in then noticed '/Home' in the URL bar. Is this by chance ASP .NET? :)


No, it's React (Native Web) Navigation... mistakes were made haha.


Does it use a male voice by default? Just clicking on random songs, it took me 20+ tries to find a female voice


This app made my day. I literally just created my dream CD of Weird-Al-inspired parody songs. thank you.


One of my friend (a musician) tried an AI tool to get some inspiration. It helped!


This is technically really impressive but does anyone actually _want_ mass produced AI music?


Where did you get the music to train on? And how hard was it to get permission?


Making my own music is fun for me so I’m gonna keep doing that.


Not quite sure if you aware but another AI music generator just lunched today https://udio.com/



Let me sign up with something besides a Google account.


I think programmers should stay away from trying to be musicians. Have you thought of some of the people that might lose their jobs because of this technology?


Technological advancements have always disrupted industries. Should we stop innovating just because some way of doing some thing might become less profitable? Absolutely not.


I think we should stop innovating, mostly and return to a relationship-based system with other animals and plants.


Have you thought of some of the people that might lose their jobs because of that?


It would be a small loss in return for a stable biosphere.


Social relationships are an evolutionary innovation.


I meant we should stop techological innovation of most kinds.


There was a point where relationships and community were technological innovations to aid in survival. Agriculture was a technological innovation. Fire was a technological innovation.


Yes, and there is a law of diminishing returns that technological innovations have reached.


Analogically, engineers should not invent cars, because that would make carters obsolete


It's not an absolute rule. The rate of adoption and evolution should be taken into account. Just like some speeds for driving are safe and some are unsafe.


that's a silly comment. if it's serious, but i'm not sure you're serious...

but to be serious: yes, technology sometimes causes social dislocation. and that's serious. but civilized societies have many ways to deal with social dislocation, like welfare, retraining, etc.

Musicians are not going to lose their jobs because of technology. The CD hasn't killed live concerts.

But as a society, we need to do more for those who fall out of the productive system. As productivity and wealth increase, the only problem is redistribution.


I am 100% serious.


You know what's really controllable if you want to create music? Learn music theory, practice an instrument and start a band.


Amazing work! Loving this thing.


Is this similar to Riffusion?


> Sign in with Google

Why?


99% of the population finds this easier than setting up a user/pass. If you care about this, understand that you will not be the target user for most new apps. Incredible that this comes up on so many new Show HNs.


It is bizarre that creating an account on this service depends on me also already having an account on another, completely unrelated service. This unrelated service also requires me to provide it (and notably not Sonauto, the service I was actually interested in) my mobile phone number. This unrelated service also just recently admitted it collects data about you even when it says it doesn't.

As a community made up largely of picky nerds and pedants, it doesn't seem incredible at all that this comes up so often. More like inevitable.


It's quite the opposite for professional audience, most people don't want to give away their Google credentials to a 3rd party website that can get hacked tomorrow.


> most people don't want to give away their Google credentials to a 3rd party website

Good thing that’s not how it works then I suppose.


lol tell that to all the (quite successful) B2B SaaS apps that started with Google login as their only option


Dealbreaker for me.



None of these "songs" have any emotion.. AI music just doesn't make me "feel" anything yet.


I bet that's only becuase you know it's created by AI. If no one told you that and you hear someone sing that song and play along bet you will feel. AI is only getting better, it will be just as good as any human and only way we will be able to tell is when it's disclosed if it's AI-generated or not.


If no one told me they were AI, I’d probably assume it was a parody group or a house band messing around in the studio. It doesn’t sound like an artist writing with intention.

And I’d wonder why they encoded at 32kbps with a RealMedia codec from 1998.


So true! I tried at least 4 songs they are all lame in the same way it hurts my ears couldnt get past 10 seconds. Still its quite an achievement well done developer! I mean we cant expect real creativity out of a lifeless machine can we.


Would you pass a blind test?


Probably because you haven't heard them before.


Wow. I just had it write a song about being sad about losing my keys in r&b/soul style, I'm totally blown away:

https://www.udio.com/songs/bDY5CYdJZP93AdpgpfBJNX


it's difficult to gauge from outside / as a consumer, but what's interesting is rarely where models are at a given point in time, but rather where the model/team will be with similar amounts resources. it may very well still be Udio (who presumably have significantly more resources than Sonauto), but I would hesitate to say that a compute advantage counts as being 'far ahead.'


These are pretty incredible. More compressed, but way better 'songwriting' and 'performance'


meh, Suno v3 still has better quality for me personally


In my experience with Suno ($40 spent so far) sound quality is worse than the cherry picked examples from Udio - especially the vocals - but everything I've heard from Udio could best be described as the elevator music equivalent of their respective genres so that's probably why it sounds so good. There seems to be a real quality vs originality trade off in the state of the art.

That said, I've only had the chance to generate a few songs with Udio and they have all sounded like they were recorded by a prison band in an overcrowded cell (I create mainly instrumental/orchestra/sound track music).


Not for me. Suno voices sound distinctly robotic/metallic.


Something about a discussions of the nuance/taste of different LLMs for different purposes is really interesting to see when it is related to something like music.


Sounds creepy tbh


what is a diffusion transformer?


yeah, Sounds pretty bloodless to me.


Where is the download link or git repo?

Or is this just an ad for a commercial product?

You claim that “we're making generations on our site free and unlimited for as long as possible” but I couldn't find any UI where I could do anything for free. The best I could find was a “log in with Google” link. Requiring a Google account means your software is not free, and most definitely not “unlimited”.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: