Xiph needs to start a public prototyping playground that extends unpatented video and image codec research to create islands of techniques that are not patentable. They don't hav e to work, just be an idea.
Like if I said, "I'd like to use cellular automata and voronoi segmentation to do multiscale texture extraction and motion representation."
Someone else could possibly do a 500-1000 line python program that implemented something like that.
Create a 1000 of those ideas drop them into the public domain (we need something like the GPL for ideas) so that there is a large body of techniques and work that is unpatentable.
> "I'd like to use cellular automata and voronoi segmentation to do multiscale texture extraction and motion representation."
Are you a mindreader? I've been thinking about using Adrian Secord's stippling work[0] in a motion picture context for years, but more for fun than anything serious.
I'm afraid the existing body of software patents dictates not placing such ideas in the public domain, but rather as a form of "patent left" foundation that can force cross licensing access.
It would work for future patents, but I'm afraid it wouldn't work for future software -- the theory being that any non-trivial software will likely infringe on any number of patents. A collection of patents might help force cross-licensing, and so protect new (Free or not) software against existing patents (the assumption being that also holders of non-trivial software patents develop new software, and might infringe on one or more patents in the collection).
I think because everyone is free to extend the public domain via patents. So a researcher at Panasonic or Bosch could tweak something in the codec corpus and patent the tweak.
Monty links to his demo pages [1] at the end of the slides, with a good introduction to the codec if you are new to Daala (like I was.) Also, these demos are linked from the main xiph.org page as well [2]. There's more context and explanation there then in the slide deck ... I assume close what would you would get if you were actually at the talk. :)
> Building a new codec from scratch may cost less than
licensing
Samsung is already participating in many open source projects: F2FS, Tizen, they are part of Linux foundation, and I believe is even helping Mozilla with Servo. So why not try to get them to commit to adopting Daala in all of their devices as soon as it's stable and out, and perhaps even help with funding a bit? There might be other companies out there willing to do it, too. They need to reach to them.
Something you may not know: Samsung had their own internal project to build a royalty-free codec, but they gave up because it was "too hard" (which for a large corporation usually means "costs too much"). The annual caps for H.264 are just low enough to discourage that kind of activity. The currently proposed caps for HEVC are much higher, and that may change the equation for some people.
There's no real guarantee that Daala won't be covered, at least partially by some submarine patents. I don't think you can work on a years long R&D effort and make that promise.
IMHO, you work on stuff with the assumption that if you're a big success, someone's gonna sue you, and plan for it accordingly.
In the meantime, HEVC is going to be widely deployed eventually. VP8 and VP9 are valid alternatives for some, and I see no reason not to support them while we wait years for a moonshot codec to save us. I have to wonder if there isn't some NIH fear that if VP8/9 get any degree of success it would make it that much harder to switch to Daala later.
That is, it's better to keep things in a "bad" state (H264/HEVC) because as the situation gets worse, it would be easier to justify Daala adoption later. Similar to how if you're waiting for Healthcare reform, and you want Single Payer, supporting a partial solution (e.g. health exchanges) might mitigate the worst pain, and make it harder to argue for Single Payer later.
> I have to wonder if there isn't some NIH fear that if VP8/9 get any degree of success it would make it that much harder to switch to Daala later.
As the person who
A) leads the Daala project, and
B) made the decision to ship VP9 (a conversation that went approximately like this: My Boss: "Should we support VP9 in Firefox?" Me: "Yes. Duh."), and
C) has been fighting hard to make VP8 Mandatory To Implement for WebRTC...
I can tell you that nothing would make me happier than to see VP8 and VP9 be wildly successful. Hell, I'd've been ecstatic if we'd successfully managed to get H.264 Baseline made RF (there was an effort to do so a couple of years ago: it failed by 2 votes). See also OpenH264.
I disagree. Most people need the choice first to see that change is possible. They don't go out of their ways to make those choices. That's the harder alternative.
I think the people behind it know it might fail miserably, but they also believe it's a risk worth taking, and a worthy endeavor.
At the end there is a request for niche areas that Daala could target. Here's my crazy idea:
Mozilla is adding webrtc into the browser, and I'm sure the basic case of video chat is being thought about. But another use case is screen sharing, and in particular sharing a web page. How much better/faster could a video encode be if you could feed it live information from the system that was drawing the page? e.g. knowing that nothing has changed without having to compare one picture to another, knowing that a certain area contains text, that another area contains a gradient, that another area is animated with a repeating animation or that the screen is being scrolled up/down at a certain speed,that the repeating background is composed of a specific repeating png, and so on.
No idea if that's a valid idea, but it's what popped into my head on reading the question.
With proper SIMD optimizations, the analysis to determine "nothing has changed" is so ridiculously fast that it's hard to compete even with direct XDamage output (or comparable things on other systems), whose data is not really in the format that an encoder wants.
Not saying there's no gains here, but people have proposed this idea before, and then given up on it after actually sitting down to implement it. It's also mostly an encoder optimization, and thus doesn't have much influence on the standard.
What's more interesting is adding special tools to the bitstream to represent things like text, which do not compress well with typical block transforms. This is certainly something we've spent some time thinking about, but there's no code committed for it yet.
Did anything else come out of this summit? I was trying to google info on it the other day and it was so invisible on the web I was beginning to doubt my recollection of the date that it was scheduled for.
It perhaps didn't help that someone just released a new gun model called VP9 which was filling up all the recent google results.
I'm only 20% through, but I thought I'd comment on the first section of "Don'ts". It's basically saying that Google's strategy with VP8/VP9 sucked. I don't agree. They took on a massive task and have had some small successes. It could have gone much worse. It reminds me a bit of those business books that look at firms that succeeded and cargo-cult everything they did or didn't do. But many of the factors that hold you back are random contingencies.
The big stumbling blocks for open web codecs are Microsoft/IE (on the desktop) and Apple/Safari (on mobile). VP8/VP9 has failed miserably on this score. But Opus, the amazing new audio codec developed by Xiph/Mozilla/IETF/etc. in the manner suggested by the "Do's" is also notably absent from iOS and IE, (the latter of which could be considered particularly galling since it was co-developed with Microsoft subsidiary Skype).
Not that I think multiple approaches isn't a good thing, and you've got to sell what you're doing. It just seems a bit negative when Google has VP8 shipping on Android (and Android seems like the only mobile OS likely to ship Opus any time soon too).
The thing is MS and Apple won't negotiate on any of this. They have never been involved with open standards and respectively pushed WMV and MOV formats instead.
You have to move forward under the presumption that neither will ever cooperate in a macroscopic sense, especially after Google caved and gave up making youtube webm based.
You still have to make the best solutions, and speak why they are the best, and hope everyone else will switch so that inevitably the dinosaurs do too. Because you can't reason with a trex.
> especially after Google caved and gave up making youtube webm based
They didn't remove h264 from Chrome, but when I watch videos on youtube, right click -> stats for nerds it lists the video as 'video/webm; codecs="vp9"'. That includes videos with ads, which didn't use to be the case.
Any idea when this happened? I only noticed a few days ago and there doesn't seem to be any official announcements or even blog posts about the switch.
I think they used to prompt you to install Flash if you didn't have it too, which it didn't seem to do earlier this week, just played adverts and content via HTML5/H.254 (in IE11) and HTML5/VP8 in Firefox.
After this the next step is probably to start delivering HTML5 in preference to FLash, where possible. Wonder what the plan is for that transition?
No, I hadn't noticed it until testing it for that comment (actually I didn't notice an ad started playing for the first video I clicked on and even the ad itself was webm), though I do feel like I've gotten almost all html5 videos for a while now (I have click to play on for plugins, so the ones that are ready to go stand out).
> After this the next step is probably to start delivering HTML5 in preference to FLash, where possible. Wonder what the plan is for that transition?
Notably, if you go to the html5 youtube page (http://www.youtube.com/html5) in incognito mode in Chrome, it actually says "The HTML5 player is currently used when possible", so I assume this is the default now. That isn't the case in Firefox, where the option to request the html5 player to be the default is still there. Maybe it's a vp9 thing? I don't believe Firefox has included vp9 yet (and the player says: 'codecs="vp8.0, vorbis"')
Ah that's interesting. I actually opted out of HTML5 to see what would happen, but only in Firefox.
Firefox does have VP9, but it doesn't yet have the MSE (Media Source Extension) support that YouTube requires to deliver VP9, though a semi-functional version is available behind a preference in the nightly builds.
That slide was just noting factors that Daala would have to beware or overcome to succeed. While those certainly apply to VP8/VP9, Theora had many of the same problems.
VP8 was always on tenuous ground both politically and technologically. VP9 is not in a better position, either. Xiph doesn't carry political baggage -- just a reputation for building solid codecs.
Hopefully Xiph's re-thinking of codec design will pay dividends. The fact that Daala's design is coming from left-field is a real risk for adoption, but is nevertheless very welcome -- there has been only incremental progress since research into wavelet codecs in the '90s, and every single DCT-based codec in use today has been stuck in a local optimum since MPEG-1.
Well, being DCT-based isn't that important, but they are all still block-based transforms. It doesn't help that all wavelet codecs have failed, because video researchers only test with PSNR and not with human subjects.
H.264 doesn't use the DCT, but still uses the same size macroblocks, which is what made it unsuitable for very high-res video. HEVC uses up to 32x32 sized transforms.
H264's transform is integer a DCT approximation— optimized to only use small multiplies and shifts—, some may say that its far enough that it ought not be called a DCT anymore, though at least after correcting for scaling I wouldn't agree. In any case, it shares DCT's advantages and disadvantages.
Its hard problem making something unique enough to avoid current patents while still being a nice fit for a haäw pipeline. Things like överlappning transforms may end up creating new minimum values for the vertical context needed.
New itu/MPEG standards typically has lägre involvement from hw companies.
HW decoders will not send the deblocking barrier to external memory (typically). Instead you will buffer the required rows until you have decoded the next line of macroblocks. As long as the filter margin are the same this will work fine in itself but you will not have done the buffering at the same place you did deblocking or any of the simpler overlap filters. In that way you will force larger changes to your architecture. Similarly your transform unit is likely not set up to handle this so it requires redesign. Getting to many of these hurdles in can be the death of adoption for the CODEC.
Too bad they're not making this plain open source They should copyleft all the IP in this which will foster innovation on the algo instead of just basing this on customer needs Open algos are much easier to extend.
Read the fourth bullet: "Use a patent license that encourages adoption and discourages defection."
It sounds like they are planning on a copyleft-style defensive patent licensing system.
Right now, one of the big reasons that open codecs like VP8 and VP9 don't see wider adoption is that there's a lot of FUD thrown around by people saying "they might infringe H.264/HEVC patents, it's too risky to use them without a patent licensing pool."
By filing patents of their own, and offering them under a free (libre) license that has a clause causing it to be revoked if you engage in some other related patent litigation, this adds pressure against anyone trying to use their patents against someone using Daala.
Not sure how well this would work out in practice; I know that some people have gotten patents under these kinds of defensive open patent licenses before, but I don't know if they've ever been used defensively in practice before.
So you're saying in practice copyleft and hence open source doesn't really work, you have to patent and then license for free to protect your users. How does this impact innovation based on the ideas you have patented?
We've talked about making a GPLv3 release to at least open up the patents we control to all copyleft software. But there are important details to work through, and like most FLOSS projects we have more things to do than people to do them, so it hasn't happened yet. If someone needed it to happen it should be a pretty easy conversation to have.
Keep in mind also, our goal is generally to stay out of court. See above about having too much to do.
Like if I said, "I'd like to use cellular automata and voronoi segmentation to do multiscale texture extraction and motion representation."
Someone else could possibly do a 500-1000 line python program that implemented something like that.
Create a 1000 of those ideas drop them into the public domain (we need something like the GPL for ideas) so that there is a large body of techniques and work that is unpatentable.