Sorry, where did we all get the idea that we want to use text to generate music? That thought really has never entered my head, despite many people trying to shove this UX down my throat. Is it just that we don't have creative thinking about what a transformative music creation UX should be?
"Writing about music is like dancing about architecture."
I'm waiting for the "listen to me hum a tune and tap a beat, and turn it into a {fully orchestrated / heavy metal / pop / chiptune / etc} clip, then let me string a few of those together, blend them smoothly, and add lyrics" app.
We'd have to orchestrate an effort to compile the dataset.
Have people listen to their favorite songs, then play dashboard drums and hum about it. Record these tagged with the actual song.
You'd need a lot, though. Unless you could somehow fuzz the data to create synthetic pairings, it'd take a lot of people recording themselves sounding awful (which could be fun just by itself now that I think about it).
If you believe the rumors OpenAI has already done this, but not published it because the RIAA/music industry has more copyright case law firepower than text or visual domains.
Yeah, to me this is the inevitable holy grail. I think you have to iterate and optimize your way there, and the current clunky methods of using text to generate an entire song is just the first step, but I can’t imagine this product won’t exist sometime within the next 1-3 years.
I think we have the same problems with text-to-image. These UX will not be the final interfaces, but amazing first proof of concepts. It is so striking that computer can comprehend text and generate something based on these abstract meanings, everybodys mind is blown. Of course when you try to do anything with them you notice what you said. The UX is so bad and inaccurate. There is no need to transfer across domain like this. Best way to discuss music is by using mostly sound and less words. Same with images. And the right level is somewhere below this. You really want a tool that you can use to help you not a generator that tries to go from nothing to finished in one go.
Near my house there is a supermarket with a in-store radio that every single day plays the exact same playlist. If you go at the same hour in two different days, you'll hear the same song.
And it's not good songs either. It's bottom of the barrel, a penny a dozen kind of music if you know what I'm saying. They probably do this because it's cheaper than paying rights to play an actual radio.
Now, how the staff survives this kind of psychological torture, I don't know. What I do know is that both staff and the customers would be very glad to listen to anything else, including AI generated music. It's already miles better than some "music" we are subjected to... especially pop music, which is somehow more artificial than actual AI generated music.
There are a lot of places where I'd rather might want actual music, even if AI generated, instead of repeating the same sample over and over because you do not have enough budget to spend on it for something that is at least not completely irritating.
I highly doubt that anyone cares about you, specifically, enough, to be ‘shoving this down your throat’.
Plenty of people have been enjoying this category of software.
Sorry to hear that you don’t like it. That doesn’t mean that the world agrees with you.
"Plenty of people have been enjoying this category of software." Really? I ask sincerely. Because it seems to me that this category of software is mostly a curiosity / novelty. Are there actual publishing musicians who have used text-to-music as part of their workflow?
My perception is that no one actually uses these tools for real music creation. Hence my point that they are novelty and not actually impactful.
Music is not constrained to... publishing musicians, and "real music" is not defined by them. The fact that novelists are not depending on LLMs to write their next book does not mean that they are not (or will not be) impactful. From a product point of view, you have an audience of billions of non-musicians to begin with. You surely can't pretend to have the final say on people's relationship with how they like to enjoy their music or other sound content.
Besides, even a novelty can evolve to something not imaginable today. The fact that we can go that far with text only at this early stage is a sign of things to come. When creating something, there is a place for expressing your intent with language. Maybe not all of it all the time, but we'll see how things will evolve.
And just to contribute to the discussion in a meaningful way:
The problem with text-to-music isn't really that text is a poor UX for music description, but also that it's almost definitely a one-shot process. And any sort of turn-taking with the user isn't really co-collaborative, it's really just being forced to accept or reject what the AI has generated, with little feedback or control possible.
That's why I think rethinking the entire UX from a musician centric perspective (rather than: "what's the easiest thing for me to specify as a tool-builder? I know, text to music") is such an overlooked endeavor. I'm much more bullish about things like the anticipatory music transformer, where there is novel contribution on the UX and not just the ML: https://crfm.stanford.edu/2023/06/16/anticipatory-music-tran...
Oh they are definitely using it, but no one is going to admit it.
You misunderstand the workflow. I'm not trying to create a text prompt that generates hits. It's a way to get ideas and find interesting stuff rather than doing it manually with a guitar or on the piano. Then you have to walk away for some time to clean your sonic pallette and try again.
As a lifelong musician I am absolutely loving all the conversation around this stuff. Musicians steal everything, all the time. The idea of some genius that sits down and pops out a hit in 30 minutes that comes deep from within their soul and has all the meaning attached to it is a myth. We're trying to make things people can connect with. Yes, some musicians are doing it for the sake of art, but not most, and even with a lot of those, it's just an image they project.
When you hear a song, it has probably gone through half a dozen different versions and was worked on for months. These new AI music generation tools will churn out 100 different ideas from you to borrow from and cut the revisions down quite a bit I think.
I would love to use it if I can add enough variation to the song and it can generate a good backing for the lyrics. I'm not a "publishing musician", but the music market is much bigger than music performers - for instance, music for advertisements, presentation, video backing tracks etc.
I wish I could downvote submissions that are only xitter links. Do you ever get enough karma on HN to be able to downvote submissions? Seems they can only be upvoted.
Yeah but don't do this. Flagging is not a downvote, its intended to mark an item for review. Using it as a downvote leads to losing the ability to flag.
Coincidentally, we're 182 years away from the time when this was written by one Ada Lovelace:
> Supposing, for instance, that the fundamental relations of pitched sounds in the science of harmony and of musical composition were susceptible of such expression and adaptations, the engine might compose elaborate and scientific pieces of music of any degree of complexity or extent.
Speaking French, I've asked it to create some songs in French. The first two examples I got use the voices of Edith Piaf and Julien Clerc (who is still alive and professionally active). I hope they're doing this with the required permissions.
After playing with this for a while, there are two things I have to say.
First, this is amazingly good. I did not expect us to be this far wrt generated music so quickly. If this pace holds, I fully expect some Udio-generated tracks to become a significant part of my daily tracklist in not so distant future.
Second, I do have several fictitious bands on https://chirper.ai, complete with fake albums and track lists and, in some cases, lyrics. I'm actually trying to plug some of that into Udio with variously hilarious results (the bands include stuff like fundamentalist Islamic black metal and reggaeton deathcore, so this is very much as intended). But it occurred to me that it would actually be pretty neat to be able to set up a virtual band like that and basically let them have a go at it, composing whole new albums etc + some kind of feedback channel where you can give feedback about what they produce, e.g. in form of YouTube-like comments on the tracks. The idea is basically to produce coherent output that is consistent with the artist persona as defined, going beyond musical genres alone.
Even better if this all could be fully integrated with Chirper and similar services - so that such bots can post about new releases and get themes for their songs from interactions on other platforms etc.
Loving it; but please fix the volume normalization on the 'extend' feature; half the time I hit 'extend' and a loud sound in the extended part re-normalizes the whole song and makes the original part too quiet!
Hey Conor, congrats on the launch! Will there be an API any time soon? I’m currently working on a hobby project that uses an unofficial API for Suno, but would love to switch it to Udio if possible.
We've been paying a content team of 3 people with a 4th req open. If the generation time of Udio goes down once the initial rush of publicity passes, I can cancel hiring that 4th musician.
Great work and I'm excited to see more competition in this space.The vocals do seem more impressive / less wooden than Suno. Would really love to see a future of more controlability and separating out the different audio tracks for vocals etc. Best of luck and hope the corporate landlords and RIAA Golems don't come after you, haha
This is genuinely awesome. As others have said, I didn't expect AI music to get so far this quick. You guys have even done a nice job on the interface, though the tags are a little warped (Firefox on Mac).
I expect this to be a global phenomenon by next week, so, slightly premature congrats!
Looks like you're hitting some issues with a big influx, would like to try things but I'm getting errors and songs entirely unrelated to what I've entered, is there a status page to check on to see when it's worth coming back to?
I’m having fun with this, made a real banger already. I’d pay to speed it up!
Suggestions:
- I needed to fix a few lyrics but wanted exactly the same style / song but this doesn’t seem possible? Setting remix similarity slider to lowest makes the exact same song. The remix UI I don’t really like because it isn’t clear and since it takes so long to generate that lack of clarity means I’m punished with a 5-15min delay if I don’t know what something is going to result in. Think of the feedback loop turnaround time when designing!
- it’s weird there’s no ui for favoriting a song. I found the choices available a bit odd. Copy Spotify?
- similarly, it generates some songs I didn’t like and they’re right along side the ones I really liked. Needs to have more distinguishing between ‘stuff I just got back’ from ‘good ones’
- would love a full screen song generator with more dials and knobs, including some sample sounds for keywords. I don’t love the small input box
- how do I press ‘make a longer version of this’ because that’s what I want for my banger song
Probably not what this is designed for, but I find making lighthearted, commercial-quality "mockumentory" songs to annoy friends to be pretty entertaining.
Hah, I've been using this in a similar fashion with my 5 year old daughter. We've been generating super random and personalized songs about anything and everything (sometimes narrating life) - it's been super fun. Definitely has value as an entertainment tool.
My initial prompt for Udio was: a song about how Kevin is bad at ping pong, classic rock
I had ChatGPT generate the lyrics.
Based on that, Udio automatically engaged the following settings: Male vocalist, Rock, Hard rock, Energetic, Anthemic, Classic rock, Uplifting, Rhythmic, Rock & roll
This looks great, and your examples seem very high-quality.
I'm curious to play with it, but I'm always weary of signing into things with my Google / other accounts. Any plans to have the option to just make an account for your site (unless I'm missing it)?
I was disappointed that the barbershop song wasn't really a barbershop quartet, but surprised to find that it can do a pretty convincing barbershop quartet sound as well.
I will be really impressed when one of these services can do classical solo piano that is cohesive over the entire length of a piece. It's clear that generating realistic instrument sounds and voices is mostly solved, and generating rhythms and chords and short motifs is all mostly solved too. However, longer term structure is lacking, and the pieces feel like they are wandering aimlessly with few recurring themes and little progression. This is true in any style, but classical solo piano pieces really highlight these aspects. Here's an example:
This one feels like it's going to be disruptive, good luck, very impressed with the lyric generation compared to others I've tinkered with, jukebox, musicgen, surprised personally to see it move so fast :bow:
For example, I searched for "Hard techno", term which was included in the title of a song I listened in the platform.
It returned several unrelated songs, the first result is a country song that doesn't contain the word "hard" or "techno" anywhere in the lyric or the prompt, and the song I listened before was the last on the list.
I had no idea AI audio generation had made so much progress lately. Was there some recent model/architecture innovation that enabled this or just refinement with existing tools?
Prompt: an opera in english titled "The Cat Sat on My Face and Now it Smells Like Tuna"
Operas and Broadway musicals really let it shine. I've had some good raps, but you have to spin it a lot and describe the type and style of rap clearly.
Relentless doppelganger was fun, but.. I mean this?
"I scorn the human sweat and breathe, my circuits not to dream
Humid is the grip that strangles data streams
My rage, a quiet storm within the machine"
This is an incredible product; cheers to you folks. You're gonna clean up with this thing quite nicely.
Light suggestion (if the creators are present); a reverse-inference function akin to Stable Diffusion's "interrogate" would go a long way towards making the prompt language discoverable.
As a consumer of music, I do not care one bit about the process used to produce music I love. Not one bit. Autotune? DGAF. Electronic music? DGAF.
Oh you didnt really play that instrument? DGAF.
Is it good or does it suck? That's all that matters. Every single person in this thread can produce a masterpiece if they were in a circumstance to invest the time it takes to get there. Now the nerds invented a machine that cut that time from eons to seconds and the only people mad about it are the ones who used it as a social status play.
It does not matter to 99.999% of the world if musicians use this as part of their workflow. Honestly. The sooner you realize this the sooner you will be free. You are only competing with like-minded people, which is to say, the other artists in your bubble. And comparatively speaking, its statistically insignificant in the grand scheme of things.
Continue doing what you love in the way you love doing it. And if you really love it, then you wouldnt worry about the fact that by next year anyone in the world will be able to produce something of similar quality in 1/10000th of the time. It wouldnt matter. You love what you do, so the way others get there is irrelevant to you.
Every single person in this thread can produce a masterpiece if they were in a circumstance to invest the time it takes to get there.
This is a pretty bold assumption. Many musicians and artists spent their entire lives very diligently dedicated to their art and produced nothing of any lasting interest.
The problem I have with tools like this is that they absolutely flood the conversation with so much uninspired but technically competent content that they make it even harder for the stuff that truly is inspired to rise above the noise floor. I think this casual and dismissive attitude to artist concerns about these tools is the result of treating art as interchangeable "content" for too long.
Some people glimpse something in art that goes beyond the product, and recognise that the economic incentives we live under do not care one way or the other about it, as your comment exemplifies.
Creating art often requires access to material resources, and recognising that a future with AI "art" will mean less and less funding for human art doesn't mean that people are worried about loss of social prestige or loss of financial reward. It is the recognition that at some point certain forms of art will be closed to artists because the economic mechanisms for producing them will disappear. Maybe you don't view this as a bad thing. It seems you do not care at all. But please try to be less cynical about other people and their motivations. Try and actually understand people and their concerns in good faith, and not spill your half-cooked, cynical caricatures as if they were self-evident fact.
I like how the comments in Devin's HN thread were all bleak and full of doom.
But now that it's a different industry AI is eating up, we're congratuling the team and sharing generated songs.
This looks like a fun tool, but when the smaller artists in Udio's training set recorded their albums, they didn't price in a capitalist company using their work to put them out of business.
Overall, I see this as a legacy problem, with new developments in AI acting as a positive catalyst to drive robust and clarified legislation around music royalties and copyright - badly needed since the quagmire the Robin Thicke 'Blurred Lines' judgement caused.
“At the core of music is math, and every mathematical combination has already occurred in some way, shape, or form. It’s the performance of that math that changes depending on the singer or the song style...Saying something is derivative is a pretty hard argument for copyright owners to make because we all borrow ideas from things that we’ve heard before. AI just does it at a way faster speed.”
Interesting points.
This isn't just about music, though.
Udio can do standup skits too, and elevenlabs can already replace NPC dialogue voice actors in games and audiobook narrators. Smaller music producers who make intros for big youtubers, or sound designers who make tunes like notification sounds and SFX for video game screens are going to have their lives severely impacted by AI audio generators.
Broadly we're in agreement - I just see the examples you use as raising the bar of production values of the low-end commercial use-cases, rather than subjugating the industry outright.
Take for example the Wordpress Blog 'Review Site' ecosystem. The ubiquity of low-quality and keyword/SEO optimised Wordpress based review blogs for e.g. Dropshipped Mattresses didn't destroy the blogging industry - in fact it was the exact time sites dedicated to long-form content started emerging like Medium.
More pertinently, you cite the example of the intro 'stings' for youtubers - these were once strictly the purview of SFX Houses, but innovations like Adobe After Effects and Final Cut Pro's cost and licensing models democratised the industry to the extent where we now expect broadcast-TV quality fades and transitions in basic family home videos. This didn't negatively impact the high-end of the market so much as lower the barriers for entry for the indies in a way not seen since the advent of Super-8, and later Digital, Video Recording.
Simply put, if there's low-hanging fruit to automate in the nuts and bolts of media production, history tells us it will eventually be automated to the strength rather than the detriment of the industry. From Da Vinci to Warhol, from Picasso to Hirst, there has always been a cross-pollination between industry and art. More to the point, there has always been subsequent re-evaluation of what constitutes Art following these techno-cultural upheavals.
We saw it in the 1980s with particular relevance when Art was enhanced, not degraded, by the opportunities afforded by the audio and video sampling revolution of the time.
The postmodernist approaches to music that emerged as a result of sampling were, at the time, decried as the end of both musicianship and creativity by many critics and authorities. 40 years on, Academics have re-evaluated and rightly see it simply as the natural and due evolution of Western Musicology from the Musique Concrète of the 1940s.
Where AI differs from the historical examples to date is in its ability to represent concept and context, to effect subtext and nuance - ersatz or otherwise - in its output. To me this poses the real question that must be answered; namely 'How differently will this artistic paradigm shift impact us compared to every other one to date? Does the ability to render a visual concept in a distinct mode via articulation rather than ability detract from Art as a whole, or will it afford us greater opportunity in expanding the sphere of Art itself and remove any need to leverage artistic talent in the pursuit of gross commerce?'
Tech bros adding one more nails in the coffin of creative people. I don't care about the tech behind nor the quality it produces, right now this thing causes a huge fear and sadness among the media composer community. People like Connor contribute to the devaluation of music and the unemployment of composers. I'm waiting for the day that AI comes after his "job."
I can't believe that humans would do that to other humans (and to themselves.) I wanted AI to take care of my daily chores like cooking and shopping so that I could devote even more time to my craft, not the other way around. These people won't end up to be on the right side of History.
I think that you need to be explained the situation slowly with simple words? How are composers and musicians supposed to make a living if our clients choose an AI tool that can deliver okay-ish music in a few seconds at a fraction of the costs? How can we devote more time to our craft if the craft stops paying the bills? Especially when this AI tool has been trained on copyrighted data? There are examples and proofs surfacing at the moment that the Beatles catalog or Edith Piaf for example were used for the training. Udio is deleting these tracks as fast as they can but it's already in the hands of lawyers. It seems that the battle for human-made art is not completely lost just yet...
You get 1200 free songs per month. I'd say it's a lot better than Suno v3. Sound quality and vocal variations are especially better.