Former film pro here. I don't see anything 'impossible' here, most of this is stuff I could do on my desk 10 years ago with a modest setup - but only with a lot of patience and manual labor.
If these demos fairly represent the user experience then it's slick as hell and further blurs the line between editor, compositor, DI specialist etc. Much will depend on whether the ML marketplace can be competitive with many mature commercial offerings from software studios that have no intention of letting their lunch be eaten. Video production involves a lot of bleeding edge technology but the client base is also suspicious of new providers or do-it-all solutions at first, and very loyal to products and tech support offerings that have got them through difficult projects in the past, so there will be a big hill to climb between people acknowledging it's cool as hell and their willingness to sell it to a producer who is making a 5, 6, or 7 figure bet on what some editor/VFX geek is telling them.
The web browser/cloud storage is an issue. It could be a plus in many circumstances, but it's also a big barrier to any production that's working in a remote location without reliable internet, especially given the massive data volumes involved. That limits a lot of the use cases to post production, and many producers and directors are going to be wary of starting on one platform and finishing on something else; nobody likes workflow changes unless they can be shown something seamless, like having your Avid/Final Cut/Premiere/* project leave your machine and show up frame-perfect in Runway, along with a definite answer about export time like being able to take in 12 hours of video and have it online within 24h. There will also have to be a lot of questions about security, downtime, and being able to get your project back out of Runway if money runs out, the editor gets fired, creative differences rear their head etc.
Looks like an instant win for short-form projects like personal shorts, music videos, commercials, corporate, demo reels, spec pieces. Potentially very good for reality TV indie and low-mid budget films once the above questions are answered. Toughest nut to crack is large budget or episodic TV where there's very stiff competition and contractual or professional commitments already in place, but doable within 5 years.
I've just given this a go, with very little work I was able to produce a near seamless mask which was able to adapt to dynamic and changing pose - interesting stuff.
The main drawback I observed was computation speed, most edits required >20 seconds of loading/buffering (with a 1Gbps internet connection). I presume the computation occurs on their own servers, so with beefier hardware the performance could be increased, however my intuition is that the app would perform faster if running natively (rather than a shared resource quota on a remote server).
In this regard, the lack of performance could be a major productivity killer, my hypothesis is that it is the video processing/manipulation which is taking a long time (and not the model classification). Many companies have tackled this problem space with completely video transcoding chips (such as YouTube), however this generally still incurs long periods of waiting.
I would imagine it could be slightly easier to manage models deployed on their servers, but Adobe manages to package and ship its tools so they run natively.
My best guess is that the web can be a path of least resistance and that’s why this launched on the web, but as soon as they have available resources, they could package and ship a native app (hopefully one that can take advantage of modern chip ML tech)
I'm so glad to see this on the front page -- this website, in conjunction with a video from Two Minute Papers, is what convinced me to teach myself machine learning. I come from a filmmaking background, and have used green screen extensively. It's big, bulky, has to be set up correctly, lit correctly, and is expensive and just generally a pain to work with.
I'm currently working on a fully-automatic version of this based off some excellent research from the University of Washington. It's still in its infancy, but if you'd like to follow my progress, I post occasional updates over on https://nomoregreenscreen.com
This is just the beginning of what's possible in the intersection between DL and creative filmmaking, and I'm really excited to get to be in this field at a time when compute is cheaper than ever and all the information I could want is available for free on the internet!
I started taking Coursera courses back in late fall/early winter 2020. I discovered RunwayML through my subscription to Corridor Crew on YouTube, where they demoed the tech back in January of this year.
Forgive my ignorance, but that looks like someone created a GUI for the TensorFlow Hub, which is a public collection of pretrained AI models.
AI Masking: check
AI Depth estimation: check
AI Flow estimation: check
Flicker artifacts just like the public AI models: check
EDIT: Also, this isn't actually new anymore? A quick check found two very similar startups: unscreen.com vfx.comixify.ai And AI rotoscoping has been part of DaVinci Resolve 17 since Februrary:
https://www.blackmagicdesign.com/products/davinciresolve/wha...
Creating a GUI fort TensorFlow (or MediaPipe) is not a bad thing in itself, yes? For one thing, we need more After-Effects-like tools - particularly if they can be browser based and cheap/free to use.
The ML people don't make it easy to work with the output their models produce. I was playing with TensorFlow/MediaPipe last month, to see if I could get them to play nicely with my canvas library. The results were quite promising[1][2][3]. Still, I think making it easier for devs to use these ML models in various ways needs to be prioritized.
CodePen links (all request access to the device camera):
Does this require you to upload your video assets to their servers over the internet? That seems extremely impractical for 4k or 8k footage. Even with gigabit upload speeds a clip could take several times it's playback duration to transfer.
OTOH, if it's targeting (at least in part) the tiktok crowd or busy people throwing together a video slideshow for a wedding, you might be dealing with a situation where most of the source video is already in datacenters already, whether FB, GPhotos, etc.
The Tiktok crowd also doesn't need 4k or 8k video. Automatically downscaling the source material before uploading will be an acceptable tradeoff for the vast majority of users.
For gphotos at least, could I not just put them in an album that's shareable by link and then paste that link into Runway for it to ingest from? There'd still be an "upload" stage, of course, but not involving someone's home internet.
I believe the comment you replied to is talking about peering agreements, or some other way of coming to terms with the potentially huge of traffic this could generate.
This does seem like a 'bring the mountain to Muhammad situation' Upload and download many GB of video every time you want to use it, rather than download the program once.
People who do a lot of video editing usually already have decent PCs on which to do it.
My guess is that they don't want to run this on the users computer (even though video editors have fairly beefy machines) because it would be easy to extract out the ML models and use them, destroying their business model.
Exactly. The cloud is DRM. It’s really the only DRM that works; don’t give the user the code at all.
There’s also the data play of making the user keep data there and pay rent forever to keep it accessible. You can download your videos but you lose edit history etc.
I think hosting the software might also be a way to defuse the software patent minefield - by running the algorithms in
a country which doesn't have software patents. This might (arguably) keep the embodiment of the invention in said jurisdiction. Or at least muddy the waters somewhat.
Once the clips are on their servers, editing is probably fine, as long as they're using some kind of edge compute for low latency. In fact that's great for people who don't want to buy a more powerful computer for video editing. But if that means you have to saturate your uplink for 24 hours before you even start editing then that's borderline unusable.
commenting to say this was a beautiful, straight to the point landing page. We see lots of landing pages with sections of copy that just don't look great
I have no real idea if this is good, because the landing page has only some images, even though it's supposed to be about video. Show me some video, then! There's a slider gallery showing a series of still images of an editing interface -- why not show it in action?
Huh. Apparently Brave was blocking whatever they use, but there was no clue that anything was missing... it just degraded to still images and called it a day.
Those videos or GIFs or whatever they are work fine in Chrome and Safari, so it turned out that I was missing something. :)
It looks like normal <video> tags that attempt to load webm, then mp4, then falls back on a still image. All assets are loaded from their own webserver too (also my PiHole didn't block anything.)
Perhaps you have autoplay blocked? They are set to autoplay by the looks of things.
I do indeed have autoplay blocked, but I would expect controls to appear if something could be played. Maybe the weirdest part of this personal revelation is that it hasn't happened more often (that I know of).
It is good, but it took me a few minutes to realize it was talking about two different products. (And I did literally lol at the comically-large "Get Started Today →" button.)
Most SaaS landing pages sell products that are boring and non-visual. It's hard to get excited about a CRM or an IDE or a Jira clone or CI/CD pipeline tool.
It's conspicuously lacking testimonials. I personally hate testimonials and never have understood their appeal. I don't care what some person I've likely never heard of has to say about this product. I'd rather see what it does.
Relatedly, I've never understood why some books have 4 pages of testimonials in the front. Once you get past 5 of these are you really more likely to buy the book?
This is weird, because some high-end, expensive things that are great usually lack testimonials because their buyer demographics would never care about them anyway. However, it would be nice to have them for new ML services because you don't know how good it could possible be.
Many of these ML features exist in regular video editing/grading software like the excellent Davinci Resolve - which is free for the non-studio version and lighting fast.
I can see this could be appealing for people who don’t want to learn or install professional software, I think there is value in that. I’d like to see a client side version of this app, many people have beefy gaming GPUs that could run the models client side
I've done professional video editing and VFX work. This will not get used among professionals. This machine learning tech is better than what's built into After Effects but that's irrelevant because the time it takes me to fix a mistake my software made is far smaller than the time it takes to learn new software that has a far smaller scope.
You're not going to beat Fusion, Nuke, or even consumer tools like Hitfilm at the VFX game. A better use of this tech would be to take the area you improved in and turn it into a plugin for all or one of these.
And you know the tech geeks in VFX tried this type of thing concurrently with the research, and probably have in-house mature tools already. I know from my time at R&H, back in 2005 there was a ML effort to support compositing and color timing TDs.
What is "impossible video"? As someone who doesn't know much about video editing, none of what they're showing looks "impossible" to me. I'm pretty sure I've seen videos where someone added clones of themselves, like the skateboarder sample.
The idea is a bit different. Filming clones usually requires leaving the camera "still" without moving it, then you re-film the same scene and same position multiple times so you can drop the backdrop very easily using very basic cutting or masking.
This is different though. No cutting or masking, the actual A.I. itself is doing it automatically. No filming the same takes with the same camera position 3 times in a row then doing tedious editing that takes hours.
I have my doubts as to how well it would work... But assuming it did then it's a real game-changer in the editing world because it could not only save a ton of time, but actually produce new abilities, like cloning with object crossover, and while the camera is moving around erratically.
I think needing a super fast Internet connection is like a huge barrier for adoption. Even in India, say they had their backend servers, expecting everyone doing video to have a 1Gbps connection is not easy. [1]
Also, the demo tells me the target audience is not pro video folks (film/TV) rather individuals. They might find it harder to keep spending so much on shuttling data. The power of ML in video has been clear to many but we need these tools to work offline.
Are there great ML specific chips for PC/laptops? I mean like the ones that Apple keeps talking about? I am not in the domain so I don't know, but I guess GPUs are the best chips for this, is there any reason this software would not work on a beefy RTX 30X0 based device?
Update
1. I mention "even in India" because I keep seeing bandwidth pricing here is still relatively cheap, globally speaking. OK, not cheap by typical Indian household measures but if you are doing pro video then yeah, cheap infrastructure.
Why would a 1Gbps connection be required for streaming video? Netflix, YouTube, and Twitch seem to be able to service most of the world.
I haven’t actually read the article, so perhaps they say as much. But if so, that’s surprising.
EDIT: thinking a bit more carefully, the limitation is upload speed, not download speed. In that area, the US has been lagging behind in tech.
Still, I wouldn’t bet against it being viable. I remember the first few months YouTube launched. The video quality was atrocious, the worst on the internet by far — this isn’t revisionism; other platforms tried to compete on quality.
Didn’t matter; youtube won. And now we enjoy 4K steaming.
From what I hear from my friends who edit video, pro video editing workflows can involve some pretty hefty bitrates beyond what 1gbit networking can handle.
I'm not sure I understand why that would be true. If all of the assets are on the server, why wouldn't the server send down a compressed view of the video?
It can do progressive sharpening too, so that e.g. if you pause the video you can see the full quality.
I agree that it's not ideal, but a lot of people would use it if the price was right. I would.
As far as the video assets being on the server, sure, it may suck to upload those. But people do (sometimes) upload HQ assets to youtube, so it's not unheard of.
If it takes multiple hours to upload footage, because upload speeds are generally horrible and not symmetric even on gigabit lines, I can definitely see that being a massive barrier.
I too think that's the most likely twist. Adobe is like the 1990s Microsoft of that space - big pool of committed users, very solid and mature technology stack, fought really hard for its current market share, extremely aggressive competitors, and deep pockets.
Is this any better than the current After Effects auto-roto feature? (Real question... is it?!) I thought they were already using ML models to do rotoscoping.
Does anyone have any resources on how you might make this yourself from scratch using machine learning? Obviously not this polished, but just a really rough-and-dodgy version to learn how it works.
Check out the Youtube channel Two Minute Papers if you haven't already. He covers a lot of stuff like this, and much of it comes with Google Collab demo scripts ready to run. Of course they're often poorly tested/maintained, and may take work to finish out into a tool for your specific use case, but you do at least get to touch the actual code and see how it works.
Reply to self! In this thread the user `mkaic` linked to their own work/study at: https://nomoregreenscreen.com/ ... perfect "getting started" material.
... "all on the web", says their main page. I'm not sure that this a great selling point... web applications are easy to start with (no installation), but it's very hard to have the same reactivity of a local native application; plus, not everyone can have a 1 Gbps connection...
and as another commenter pointed out, it only works on chrome and not any of the other browsers so a lot of people are going to have to install something anyway
The automated masking does look impressive, but you can do the same with Davinci Resolve, which charges a one off £250 for a lifetime license with free upgrades.
At $35/month Davinci works out cheaper in less than 10 months...
Is this stuff not possible with existing editing/compositing tools? Obviously the ease of use here is going to make it mainstream, but is ML changing the game in those other spaces yet?
If you don't mind a video that's a bit "Youtube-y", Corridor Crew had a great example of the situation here (including Runway!): https://www.youtube.com/watch?v=fmJ74774RO8. The short answer is that AI isn't mostly making new things possible, so much as it's making old things, like compositing, much MUCH faster. 10x so in some cases. Other videos have shown similar stuff getting gradually integrated into the larger production software.
Machine Learning is also genuinely making new things possible. Two Minute papers has a video here: https://www.youtube.com/watch?v=22Sojtv4gbg demonstrating a model that trains on driving footage, then adjusts video game footage to look almost indistinguishable from a real world shot (and in real time!). This is technically a video game application, but it's not hard to imagine how this could be used in a video or movie context.
The extent to which this site/software actually captures these broader trends is probably minimal, but they're definitely out there. I think we're on the edge of another huge shift, like the one where CG became more practical and down-to-earth so it could be used anywhere. A lot of the same stuff will be done, but by a couple of people rather than an army of editors.
I forget where I heard it (might be a McLuhan thing), but any new medium/technology that comes out is initially used to produce the content of the old - so, movies recreated plays, TV recreated radio, internet recreated prior media forms - and it takes time for the medium to come into its own - for creators and practitioners to understand the new tool in its own light and to learn what they can do with it. I think ML is still in that space for creativity here - it's Much better at doing the old stuff, but we're not quite sure yet what the new stuff it does will look like.
The auto-rotoscoping feature seems very useful if it works well. Auto-rotoscoping in After Effects can be extremely finicky and usually requires frame-by-frame touch ups. This would be very useful for people without green screens.
I just tried their software and it's... uhhh, no way this is better than AE, unless (and very possible) I'm doing something wrong, this isn't baked software. I tried playing with it, it glitched out like crazy, and then eventually chrome on my M1 MBP locked up, tried again, same deal. My degree is in digital imaging technology so I was pretty curious to try it.
Edit: I just spent the last 25 minutes persisting with it, I'd say their auto tools are actually worse than AE. I guess it's a consumer product, but even then... seems like you'd be spending a lot of time for little reward, for example, it's not clear to me how I might refine this mask: https://share.getcloudapp.com/rRujBABR
Edit 2: I'm gonna be a little more generous and say I think the mask itself is pretty good, it can deal with some pretty complicated situations well, but the tooling is very remedial.
These foreground/background separations look magic. However, they also look cherry picked. The first skater boy is shot with a fixed camera and a figure strongly contested against the bg. The woman walking against a misty bg is also a soft target for separation. The figure removal in the car park would have been easy enough even ten years ago. The depth maps look more promising. They seem high enough quality for a decent fog pass.
Not sure about the cloud based aspect. I am teaching video online right now. Bandwidth issues is making many students drop the course.
While this is impressive, the foundation of the tech was already there for quite some time.
Any modern mirrorless camera is doing the same analysis on the scene with object detection and tracking. So, they're just doing it on a video stream without phase/depth data.
Sony, Nikon, Canon, Panasonic and Fuji have similar technologies built in their cameras, but having this on your desktop for using with your videos is a nice step forward.
No. The IDEA and what they're proposing it can do is incredible, not the actual product. As far as I can tell it doesn't really work that well, and that's me being generous. It has a LONG way to go, and a massive amount of development before it's anywhere near ready for real production. In about a decade tools like this will be practical and ready using A.I. Right now we just aren't there yet.
Can you expand on what you think is not working well? From the couple of videos I tried, it worked extraordinarily well. Even though it's quite slow, the results are still very impressive.
Every time I tried to do anything useful, it crashed, though that could have to do with my high bitrate 4k footage origin. While attempting to use some tools I noticed the quality of the footage was degraded around the actual pixels where masks normally would be. I really just don't think tools like this are ready for mainstream professional use, but that's just my opinion. I still LOVE the idea though, and I hope this company keeps developing it, could be great one day. I HATE working with web-based tools though, wish they just produced a proper application.
If these demos fairly represent the user experience then it's slick as hell and further blurs the line between editor, compositor, DI specialist etc. Much will depend on whether the ML marketplace can be competitive with many mature commercial offerings from software studios that have no intention of letting their lunch be eaten. Video production involves a lot of bleeding edge technology but the client base is also suspicious of new providers or do-it-all solutions at first, and very loyal to products and tech support offerings that have got them through difficult projects in the past, so there will be a big hill to climb between people acknowledging it's cool as hell and their willingness to sell it to a producer who is making a 5, 6, or 7 figure bet on what some editor/VFX geek is telling them.
The web browser/cloud storage is an issue. It could be a plus in many circumstances, but it's also a big barrier to any production that's working in a remote location without reliable internet, especially given the massive data volumes involved. That limits a lot of the use cases to post production, and many producers and directors are going to be wary of starting on one platform and finishing on something else; nobody likes workflow changes unless they can be shown something seamless, like having your Avid/Final Cut/Premiere/* project leave your machine and show up frame-perfect in Runway, along with a definite answer about export time like being able to take in 12 hours of video and have it online within 24h. There will also have to be a lot of questions about security, downtime, and being able to get your project back out of Runway if money runs out, the editor gets fired, creative differences rear their head etc.
Looks like an instant win for short-form projects like personal shorts, music videos, commercials, corporate, demo reels, spec pieces. Potentially very good for reality TV indie and low-mid budget films once the above questions are answered. Toughest nut to crack is large budget or episodic TV where there's very stiff competition and contractual or professional commitments already in place, but doable within 5 years.