Hacker News new | past | comments | ask | show | jobs | submit login
Deepjazz: AI-generated 'jazz' (github.com/jisungk)
189 points by mattdennewitz on April 18, 2016 | hide | past | favorite | 90 comments



This is really neat! But I think it's a stretch to call it AI-generated jazz music.

As I understand it, the author has trained an LSTM on a single MIDI file -- "And Then I Knew" by Pat Metheny. The network is then asked to generate MIDI notes in sequence.

What this network has been asked to do is to produce an output stream that is statistically similar to the single MIDI input file it has been trained on. It would be more accurate to call this an "And Then I Knew" generator. Its "cost function" -- the function the network is trying to minimize during training -- is exactly how well it reproduced the target song.

Neural networks are "universal function approximators". It's not surprising that given a single input, a network can produce outputs that are statistically similar to it.

A network that could compose novel MIDI jazz would look like this:

* Train a network on a corpus of thousands to hundreds of thousands of MIDI jazz files.

* Add significant regularization and model capacity limits to prevent the network from "memorizing" its inputs.

* Generate music somehow -- the char-RNN approach described here is fine. There are other methods.

You want the network to build representations that capture the patterns of jazz music necessary to pastiche them but not high-level enough representations that the network is exactly humming the tune "And Then I Knew". This is so much of a problem that any paper presenting a novel result in generative modeling pretty much must include a section presenting evidence their model is not memorizing its inputs.

I can hum a few classic jazz tunes from memory but that mental process is not jazz music composition -- it's reproducing something from memory. If we're going to call a model "AI-generated jazz" you need some way to tell the network to not hum a tune it knows and instead compose a new tune with the principles/patterns it knows. Since we can't speak to our models and tell them to think one way and not the other, part of the trick in this field is to come up with models that can only do one thing and not the other.


Collective improvisation is the core of jazz's identity, more so than any of its other defining traits (swing, syncopation, blues-derived harmony, etc).

Generating random patterns that sound jazz-ish is interesting, but until multiple generators can react to what the other is doing in real time (or to a human participant), it isn't exactly jazz.

I'd equate it to a basketball playing robot. Teaching it to shoot free throws is interesting, but doesn't really take a step towards approximating what basketball is. Can it call for picks, lead passes to cutting teammates, box out for rebounds, force bad shots, etc?


Well, given enough time and resources, then yes, the b-ball-bot could, and probably better than a human could. I know this is a cop-out answer, but look at the DeepMind Go games. The computer beat a top 100 (I don't know the rankings, actually) Go player, something that was thought of as nearly impossible in this decade.

The most interesting thing was if you read the commentary on the matches. The announcers were mystified by the computer's moves. 'Alien' comes up a lot in describing the play-style. Us humans can't play Go and evaluate each stone in the game. We have to 'chunk' the game. Exp: These 3 stones are a 'wall' or a 'platoon', this stone is 'hot' and can take your stones, this stone is 'down' and will be used in 3 turns, etc. The computer doesn't have to do that chunking, each stone is evaluated individually. As such, the play-style was totally foreign to people. It did things no player had tried or, importantly, could have thought of given the limits of our brains having to 'chunk' the information.

I would predict that a b-ball-bot would play the same way, in totally strange ways that a human can't think of. Exp: Calculating a reasonably high probability that the ball will bounce off your nose and go into the left hand of it's team-mate, throwing the ball as hard as it can at it's own head to make a shot, not trying to get past just 1 opponent but the entire team's right thighs 57 seconds from now, etc.

Similarly with jazz, the computer is a dumb machine that will just do strange things because humans have to 'chunk'. In music, we play in chords and notes and with rhythm and timing. The computer can evaluate the whole song, and every other song at the same time and can borrow from all those. You and I can pull in the feelings of loss of a child, or the joy of strawberry ice-cream bars in a Memphis summer, things a computer will never. But we cannot pull in the obscure Tuvan throat singing techno-remixes on Youtube , the Afro-Thai heavy metal Vimeo channels, or the terrible pre-teen angst poems set to crappy guitar, etc, all at once. It can only see what you feed it, but you can feed it the life-outputted-into-music of billions of humans with live updates. The computer will know more.

But music is emotional and about feelings. The feel of music is most important to us. And I think that a human songwriter is therefore essential, one that cares and puts effort into the work. It connects us, and that is what is important, not the sounds.


> But music is emotional and about feelings. The feel of music is most important to us. And I think that a human songwriter is therefore essential, one that cares and puts effort into the work. It connects us, and that is what is important, not the sounds.

Children can play music very emotionally (or rather, in a way that adults associate with emotional) without having any experience of or real comprehension of the emotions. Imitation and training is sufficient to be convincing. A program doesn't need to experience emotion, only know that certain characteristics of the sound are associated with certain emotions.


There's a little work on that part of jazz too, though not much. My undergrad supervisor built a solo-trading jazz system about 15 years ago for her PhD thesis. Solo trading isn't the full extent of jazz improvisation, but is at least getting at one of the collaborative parts of jazz. It's now slightly dated, but I think still good work, and could probably sound a lot better if the same basic approach was taken but updated with today's hardware and algorithms.

Thesis: http://reports-archive.adm.cs.cmu.edu/anon/anon/2001/abstrac...

A conference paper: http://www.aaai.org/Papers/AAAI/2000/AAAI00-100.pdf


> Generating random patterns that sound jazz-ish is interesting, but until multiple generators can react to what the other is doing in real time (or to a human participant), it isn't exactly jazz.

For an arbitrarily complex network, it could internally develop independent generators that react to each other.

However, the likelihood of common optimization strategies used for training RNNs (back-propagation through time, foveation/attention, etc.) developing a network like this is probably quite small.

It would be possible for a network designer to come up with a structure (as Hochreiter did with LSTMs) that lends itself to this sort of structure but then you're baking in assumptions about how humans accomplish a task (which comes with trade-offs).


Even then, true jazz music composition would not involve only jazz training data. Even if it's thousands or hundreds of thousands of songs. Wouldn't you just be diversifying the source for your statistical reproduction?

A human composing new creative Jazz is using a much wider set of sources for creativity, not just existing jazz songs.


I think then the question becomes, are humans doing anything different than that?

If so, why do you believe the network is only reproducing statistics rather than having learned the same circuitry humans have when improvising/composing jazz? It's hard to show that it's doing one thing or the other. In this case with n=1, it seems pretty clear it's doing the former.

If not, then it doesn't seem to matter since that's what humans are doing.


Is it though?

Most musicians learn from others and thus develop a style semi inspired by what they listen to.

So add 50.000 more songs and you have something.

Perception is reality.


This is the commenter's point. Because the current way of training the model is by comparing its own output to the Pat Metheny tune, it doesn't work once you add more than a single song.

Musicians do learn from each other, but then they learn how to play what they like, or what sounds good. To this model, "what it likes" is a 100% representation of 'And Then I Knew' . You could swap the target song for another, but not for multiple targets at the same time without reworking the logic.


Isn't that just a matter of expanding the feedback loop to include other things than just the music?

I.e. aren't the mechanics there and is primarily limited by the the size of the feedback loop?

I am genuinely interested in the answer.


@rryan mind sharing methods other than char-RNN? Thanks!


This sounds to me like the "uncanny valley" of music. It's close to being pleasant, but it's very discordant and hard to listen to…


>It's close to being pleasant, but it's very discordant and hard to listen to…

You've probably never listened to free jazz (or 100 other genres besides)...


That's jazz for you.


Not really.


I think he is confusing jazz with Bartok.


God I love Bartok. When I played piano in high school I basically only played Bartok and Brubek.

I think people find unfamiliar music difficult to listen to. I don't think it's really about genre or artist.

I suppose some genres are trying to be difficult on some level (rock and roll, punk, metal, and rap each took up that mantle) but all of those were meant to be easy to listen to for a target audience.

Bartok never struck me as super combative. Brainy, perhaps.


Yes, I agree with you about people not liking the unfamiliar. My comment was tongue-in-cheek. Though, you must admit, badly played Bartok is gruesome. My daughter plays violin, and discovered his 44 violin duets and also the Hungarian Dances suite. Luckily she is good enough that it is fun to listen to. But my standard joke is: "All teenagers seek out music that will drive their parents crazy. Mine found Bartok."


Truth. The combo of amateur violin and Bartok must be another level of torture.


There's jazz which is more "non pleasant" than anything Bartok did.


Coming from an avid Jazz listener, this is awful. Not even close.

I don't mean this as a slight at all, but definitely raise the bar on your experiments.


I thought it was neat for a few seconds, but then it got stale really quickly.

But then I listened to the original (the track used to train the network) and realized the problem: the network only knows how to write one song. What you hear on SoundCloud is the equivalent of giving someone a 5 paragraph essay, and then telling them to write a 10,000 word paper using only sentences contained in that essay.

Supposing that this program can accept more than 1 song in its training data, I expect it could produce really interesting stuff.


Yeah it's kind of pixelizing an existing song. What I really want to do is teach a program to jam and know what sounds good - which is a lot harder.

But there's a part of music where human soul needs to be, and that is interesting too, and some of the expression stuff is harder to do in MIDI land, you can modulate a filter cutoff or velocity or something - but compared to a live player there is a LOT of work to do.


I wonder what T.S. Eliot would think of this? Where it would fall between tradition and the individual talent?[1] He discusses how a poet ( or musician ) takes notions and feelings that a reader ( or listener ) all ready know and experience and combines it in new ways to lead the reader ( or listener ) to experience new feelings they haven't experienced yet. And, in T.S. Eliot case this was point of transcendental poetry, to go beyond. Or, Rainer Maria Rilke on what is art. To him [2] art is the reflection of the experience. Go out into the world, live, and that is the base of the art. There is a connectedness to the shared suffering experienced listening to jazz. I don't experience it in this music.

[1] http://www.bartleby.com/200/sw4.html

[2] http://www.carrothers.com/rilke1.htm


One of the central features of jazz (or any music) is rhythm. In the case of swing-based jazz, including bebop you have the upbeats of 2 and 4 emphasized. It's the opposite of rock. The Metheny track here has a typical rock beat, so it's a very odd target.

Also, unless I missed something the clips just play the network's attempt at duplicating the "head" of the track; not the soloing.

As a jazz musician I find this cool but I also feel safe that it won't be stealing gigs from me anytime soon.


To clarify your first paragraph, rock and jazz both emphasize 2 and 4. Swing is about the relative duration and weight of the first and second eight notes in a single beat.

In fast tempo bebop they tend to have relatively equal durations, and in other jazz styles they trend closer to 2/3 + 1/3 of the beat respectively.


That's not correct.

In a typical jazz swing drum beat the high-hat is closing on 2 and 4 (the upbeats).

In a typical rock drum beat the bass drum is on 1 and the snare drum is on 3 (the downbeats). There's almost never emphasis on the upbeats.

The two styles are almost completely opposite in feel and that Metheny track is using the rock style.


Both jazz and rock (and virtually all popular American styles... country, hip hop, etc) put the "boom" on 1 and the "clap" on 2. Regardless of whether that "clap" is a high-hat, snare, rim, etc, it's the beat you would clap your hands on.

The two approaches you describe ("emphasis on 2 and 4" and "emphasis on 3") are actually the same thing. You're just counting twice as fast when you identify 3 as the back beat vs 2 & 4. To say that another way, any song that can be notated with the snare on 3 could also be notated with the snare on 2 & 4 by halving the tempo. I think most musicians would notate "And Then I Knew" such that the snare falls on 2 and 4, but that could probably go either way.

I'd contrast this with classical music (classical as in Mozart), where the emphasis is on beats 1 and 3.


Huh? In typical rock, the bass drum is on 1 and 3, and the snare drum is on 2 and 4. Think "boom...bap...boom...bap". 1 and 3 are the downbeats and 2 and 4 are the upbeats.

Maybe you are thinking about counting eighth notes on the high hat -- in that case the bass drum would be on the first high hat hit, and the snare would be on the third. However the counting should always be on the quarter note, i.e., two high hat hits per count -- "One and two and three and four and."


No, I'm talking quarter notes in 4/4 time here. Bass drum on one, snare drum on 3 and timekeeping hand playing on all four quarter notes.

In your "boom...bap...boom...bap", the ellipses are quarter-note rests. Listen to any simple rock tune, say, AC/DC's Back in Black. With BD == Bass Drum, SD == Snare Drum, HH == High Hat, what you get is:

    Note | 1    2    3    4
    -----|-----------------
    HH   | x    x    x    x
    BD   | x
    SD   |           x


That's only half a measure. What you transcribed as 1, 2, 3 and 4 are eighth notes.


>One of the central features of jazz (or any music) is rhythm.

Not really. There are lots of jazz styles not dependent on rhythm, and whole lot of genres having little to do with rhythm either (e.g. ambient music).


As a card-carrying jazz nerd, I am impressed. If there were more dynamics, some of these soundcloud examples would sound significantly better.

ETA: The default midi sound font doesn't do it any favors, either. I have some software instruments I could throw at this that would make it sound a whole lot better.


Anyone interested in algorithmic jazz should check out Al Biles:

http://igm.rit.edu/~jabics/


More specifically, GenJam:

"GenJam (short for Genetic Jammer) is an interactive genetic algorithm that learns to improvise jazz."

http://igm.rit.edu/~jabics/GenJam.html


The best part is that the resultant "jazz" sounds more like vaporwave[1].

[1] https://www.youtube.com/watch?v=PdpP0mXOlWM


That's funny, I was just researching this last week.

I stumbled across some music generators. A downloadable one http://duion.com/link/cgmusic-computer-generated-music

And http://www.abundant-music.com/

Both are "procedurally generated music" so I'm not sure where that falls in the AI spectrum.

I found that the quality was interesting and there was some potential there but at least in these cases, there were some issues with the quality of the midi instruments and song structure was very "same-y"

Anyways, Looking forward to poking around in the DeepJazz code.


Always good to see more computer music projects.

I started on recently - and need to do more work on it - to do some things in a bit more of an object-oriented way trying to model more music theory concepts (like scales) as objects, not so much analyzing existing files but making the primatives you might need to build a sequencer (and eventually some generative stuff).

If people are interested check out:

https://github.com/mpdehaan/camp (in the README, there is mailing list info).

The next thing for me is to make an ASCII sequencer so it's a program that can also be used by people who can't code, and then I'll get back more into the generative parts.


George Lewis wrote a realtime improv AI in forth back in the 90s it used midi so the sounds were like general midi at the time but the interplay between human trombone and the machine listening to his playing on the fly was amazing given the limitations of the machines at the time. To be AI jazz it has to be able to jam with humans or other machines. https://en.wikipedia.org/wiki/George_Lewis_(trombonist)



I'd be more impressed if they had trained it on Pat Metheny and then given it "Mary Had a Little Lamb" and said "jazz this up"


I'd be more impressed if they had trained it on Kenny G and Pat Metheny liked the results.


I emailed the author about a week ago about using this work to (re)harmonise melodies, but as others have noted, the network with one training piece doesn't generalise.

I'd be surprised if a current gen LSTM will be able to generalise music or language rules well enough to be able to piece together music or sentences, long enough for a coherent story or a jazz piece that matches that of a competent human author.


Serious question: Who is the copyright holder on generated works? The program author? The person who wrote it? Do you have to give any sort of authorship credit to those who created the works in the mined data set? Copyright law in the 21st century is just getting more and more complicated...


I read an interesting argument related to this topic in a Jehovah's Witness pamphlet. There was an article about how human inventions mimic God's creation, and the silliness of our squabbles over copyrights and patents.


Quite interesting. If I ever get in a situation where one believer would like to engage in a dialog with me, that sounds like a good subject to discuss.


There's an enjoyable summary of some other efforts in neural network music synthesis here:

https://highnoongmt.wordpress.com/2015/08/11/deep-learning-f...

The same author's Endless Traditional Music Session supplies all the Irish session music you could ever need, by mechanical means:

http://www.eecs.qmul.ac.uk/~sturm/research/RNNIrishTrad/inde...


Awesome work, and this is quite interesting, something worth exploring with more depth that an hackaton can't provide.

Having said that, and as a Jazz fan, the generated music is horrible. Keep feeding it more jazz tunes :P


One thing that comes to mind is that, to me, it sounds like all of the notes' velocities are equal. It would sound a lot more natural if volume differences were incorporated


I built a very similar project for classical music using Theano and MusicXML for a Sound Capstone Project at UW.

Blogpost + music: https://medium.com/@granttimmerman/algo-rhythm-music-composi...

GitHub: https://github.com/grant/algo-rhythm


I respect the criticism of people who love and listen to jazz quite a bit.

As someone who maybe is not as sophisticated in his taste for jazz, this sounds good enough for me. Especially this can be passed as elevator music.

On the other hand, it would be more valuable if there were more than a single file used for seeding. This way this is a theme that is listenable but will always have the same style of it's seed.

I intend to play with it and see if I can get more interesting melodies.


It's rendered with some really shitty sounding instruments. Run it through Ableton Live at least. Or even better, a specialized piano engine.


When human composers attempt to compose original music, they have immediate access to their own subjective judgement of the quality of the music.

Until such time as we discover an algorithm that replicates human taste in music, any AI-based approach to composing music will fail because it will not have any feedback about the quality of the music.


It sounds like with a few epochs it captured some rhythmicity. The notes still sound random, but overall its promising. This is only a hackathon project, I 'm pretty sure we ll see more elaborate networks in the future that make acceptable jazz. Its gonna be a bit more difficult for other kinds of music, i guess.


Can someone explain to me the difference between this and the computer generated music David Cope of the early 1990s? https://youtu.be/yFImmDsNGdE?t=44s

It seems like the word 'AI' is getting thrown around.


An improvement that should be quite straightforward and take you no more than a couple of hours is to use sampled sounds for recording the play.

It would massively improve the quality of the output and make it sound more "humane" IMO.

You can use the samples from www.freesound.org for instance.


Was expecting to hear some Blue Note, got frantic muzak. Humans are safe... for now it seems.



That's better than the jazz works (which is fine considering the jazz works were a hackathon). But I wouldn't call these masterpieces.

From my perspective, AI generated music at the present often falls really short on two areas. The first is instrumentation and dynamics. AI music often sounds "robotic". Probably better soundsets for some AI examples would help, but beyond that, I find a lot of AI music "overly quantized" sounding. Humans often don't play the music exactly as written(see: https://en.wikipedia.org/wiki/Expressive_timing); this "non-perfect timing" is a large part of many music works' expressive element.

The second problem to me is that AI music often falls short on overall coherent musical themes. A lot of AI pieces tend to sound "structureless" with no real direction, no thematic elements, nothing that could be called a motif or hook, etc. There are definitely some established "rules and patterns" for music, so it's not like some of this could be fed into the AI. The best composers however bend and play with convention a bit, though.


Thanks for posting this, It's very cool.

I don't like the first one very much, it resembles me improvising some times; randomly repeating patterns without any direction or structure and without going anywhere.

The second one is better, it has some good moments, but still has the same problem, it lacks general structure, and just seems to go from pattern to pattern.

I like the third one, it may help that the form is very formulaic. Some rhythms that it makes are weird but in a good interesting way. The structure is better and seems to be going somewhere but unfortunately it doesn't finish.

EDIT: Conclusion, if the program could incorporate structure in some way it would make for passable music, but I would say the humans are still safe ;)


Yes, interesting concept but far from enjoyable jazz. It's the equivalent of using github as a training set and calling the generated output software. It would at most resemble code.



Even if it is a very limited model and the tracks get boring quickly like everyone is saying, this is still extremely cool. I really need to buy a new GPU that I can run Theano on.


Knowing next to nothing about musical terms I couldn't figure out the workflow of the AI. Does it generate note after note trying to follow the learned "structure"?


This reminds me of AWK Music http://kmkeen.com/awk-music/


I like this because I don't like jazz.


Hook it up to a speech synthesizer, to make Deep Scat!

I played around with looping different speech synthesizers back into different speech recognizers, kind of like audio or video feedback, but with chaotic noise injected like quirks of the synthesizer, the voice, speech speed and pitch, and the audio environment around the microphone (you could talk over it to interfere with the words it was speaking and lay down new words in the loop), working against the lawful pattern matching and error correction behavior of the speech recognizer, and the HMM language model it was trained with.

It was a lot like beat poetry, in that it tended to rhyme and have the same number of syllables and use plausible sounding sequences of words that didn't actually make any sense, like Sarah Palin.

You can start it out with a sensible sentence, and it will play the telephone game, distorting it again and again. If you slow down the speech rate, words will split into more words or syllables, and if you speed it up, words will collapse into fewer words or syllables, or you can tune the speech rate to maintain the same number of syllables. Its analogous to zooming the video camera in and out with video feedback.

It would wander aimlessly randomly around poetic landscapes, sometimes falling into strange attractors in the speech recognizer's hidden markov model and repeating itself with little or no variation.

At any time you can join in with your own voice and add words during the pause at the end of the loop, or talk over its voice, much the way you can hold things in front of the camera during video feedback to mix them in.

Different speech recognizers are better at recognizing different vocabularies, and therefore like to babble about different topics, depending on which data they were trained on, which we could guess by attepmting to psychoanalyze their incoherent babbling.

IBM's ViaVoice was apparently trained on a lot of newspaper articles about the Watergate hearings, as it was quite paranoid, but business like, as if it were dictating a memo, and would start chanting and fixating on phrases like "congressional investigation," and "burglary and wiretapping," and "convicted of conspiracy".

Microsoft's speech recognizer had obviously been trained on newspaper articles about the Clinton Lewinsky scandal, since it was quite obsessed with repeatedly chanting about blow jobs (just like the news of the time), and whenever you mentioned Clinton this or Clinton that, it would rapidly converge on Clinton Lewinsky, Clinton presidency, Clinton impeachment, etc.

What I'd love to have would be a speech recognizer that returns a pitch envelope and timing that you could apply back on the synthesized words, then it could sing to you!


If you're interested in making deep-jazz more discoverable, consider applying to our Search team! :)

https://soundcloud.com/jobs/2016-02-19-search-engineer-berli...


Job posts aren't allowed in regular threads on HN.


Sorry, honestly didn't remember that. Thought it was just usually frowned upon.


I think this is relevant enough to be interesting even if it an "ad".

Have my upvote. (It was downvoted when I wrote this.)


[flagged]


By far the worst first post I've seen on HN so far. Very Reddit-esque. If you wish to have a long future on HN without getting downvoted to oblivion then try to write more substantive, well thought out comments. Also your "joke" doesn't really work as there's more than a few black devs investigating interesting deep learning project ideas.


You mean that black culture can simply be replaced with an AI?


Sorry. Not impressed.


People need to stop knee-jerkedly downvoting stuff. The above comment might not sound very civil or friendly in response to posting about an AI project -- but it's a perfectly reasonable gut-level reaction to have to an (alleged) piece of music. Particularly this "music".

And it happens to be mine, also, in regard to the SoundCloud samples. Sure, the project behind it might be mathematically interesting and all... but really now, this ain't music, let alone jazz. In fact, if I came across those samples whilst flipping between radio stations, I would probably hover for at most a second or two, before giving the dial another turn... or turning the damn thing off.

Absolutely unlistenable, in other words.


No one downvoted the OP because they thought the music was good - they downvoted because the comment was garbage and didn't express any reasoning. It was a useless "-1" reply.

Your comment on the other hand (aside the complaining about downvoting, which is discouraged in the HN guidelines) was fair and interesting - and I personally upvoted it.


It may be true, but to me the parent comment came across as smug and condescending, and didn't really have anything to contribute to the conversation.

Your comment, on the other hand, provided some more insight into why this might not be notable or impressive.


People are down-voting it because it is exact what you said, a gut-level reaction. While it may be a perfectly reasonable reaction, I would think some elaboration on why they didn't like it would be far more informative and probably more inline with the posting style this site is trying to foster.


I respectfully disagree. Yes it's going to sound terrible being played by the default midi soundfont, but what it's playing is actually quite interesting.


My reaction was based on several reactions

1. I heard more convincing music composed by AI a decade ago. 2. It suffers from the same pointlessness that seems to plague these attempts: Disjointed, monotonous, doesn't go anywhere. 3. The exercise did not place itself in the context of other approaches to AI music compodituon. I apologize for my rudeness and brevity.


Can you elaborate? It seemed like reasonable music from a first listen


It seems like it's just kind of randomly playing chords inversions & lines without any real voice leading or melodic sense.


Have you listened to any jazz from 1980 onward? Haha... it pretty much all sounds like this!

(and yet is still more human-sounding than the atonalism that dominates modern orchestra works)


Yea, a lot of people tried to sound like Ornette back then and couldn't quite pull it off.


It learned from Pat Metheny.


No, it didn't. It failed to produce competent pastiche of any of the small or large scale structures in the original track. Some of the medium scale structures are almost passable, but not with any reliability or consistency.

So it does the usual expert system/AI thing of cycling between "Almost music" and "And... lost the plot" over and over.


It didn't learn from Pat Metheny. It "learned" (curve-fit rather) from a MIDI file or two, apparently.

If it learned from a real musician, one of the first things it would have been taught is how to listen to other musician's music. And then, how to listen to itself. And then, how to play a much simpler piece, than then play it right, so it sounds like music... not like someone typing (which is what the SoundCloud samples sound like, to my ears).

But of course machines don't "listen" in any meaningful sense. And they certainly can't tell what it is they like about Pat Metheny's music; or why they "like" his music, but not the music of Billy Joel or Anthony Braxton.

So maybe that's where these researchers should start -- by creating systems that (at least attempt to) understand and evaluate music. And to tell good from bad.

Then, maybe, they can toy around with systems that generate music.


Hard to put my finger on it. Robotic, soulless?

Like pretty much exactly what you'd expect coming from an an entity or a thing that thinks music is just about notes and mathematical patterns... and not about emotions, or an experience that you feel in your body.

On a less handy-wavy level, I suspect there are issues of tone and timing embedded in human-generated musical expression that these algorithms don't begin to capture. In the same way that - no matter how much they keep tweaking their markov chains and phoneme scales -- we can always tell that a machine-generated voice sounds "off" somehow, literally within tens of milliseconds.

As the saying goes: it don't mean thing if it ain't got that je n'sais quoi.


lol wow




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: