Hacker News new | past | comments | ask | show | jobs | submit login

> ... very little that wasn't in VP10 made it into AV1.

I am not sure I would say that is true.

The entire entropy coder, used by every tool, came from Daala (with changes in collaboration with others to reduce hardware complexity), as did some major tools like Chroma from Luma and the Constrained Directional Enhancement Filter (a merger of Daala's deringing and Thor's CLPF). There were also plenty of other improvements from the Daala team, such as structural things like pulling the entropy coder and other inter-frame state from reference frames instead of abstract "slots" like VP9 (important in real-time contexts where you can lose frames and not know what slots they would have updated) or better spatial prediction and coding for segment indices (important for block-level quantizer adjustments for better visual tuning). And that does not even touch on all of the contributions from other AOM members (scalable coding, the entire high-level syntax...).

Were there other things I wish we could have gotten in? Absolutely. But "done" is a feature.




Some "didn't make it in" things that looked promising were the perceptual vector quantization[1], and a butterfly transform that Monty was working on, IIRC as an occasional spectator to the process.

[1] https://jmvalin.ca/daala/pvq_demo/


Dropping PVQ was a hard choice. We did an initial integration into libaom, but due to substantial differences from the way that Daala was designed, the results were not outstanding [1]. Subsequent changes to the codebase made PVQ regress significantly from there, for reasons that were not entirely clear. When we sat down and detailed all of the work necessary for it to have a chance of being adopted, we concluded we would need to put the whole team on it for the entire remainder of the project. These were not straightforward engineering tasks, but open problems with no known solutions. Additional changes by other experiments getting adopted could have complicated the picture further. So we would have had to drop everything else, and the risk that something would not work out and PVQ would still not have gotten in was very high.

The primary benefit of PVQ is the side-information-free activity masking. That is the sort of thing that cannot be judged via PSNR and requires careful subjective testing with human viewers. Not something you want to be rushing at the last minute. After gauging the rest of AOM's enthusiasm for the work, we decided instead to improve the existing segmentation coding to make it easier for encoders to do visual tuning after standardization. That was a much simpler task with much less risk, and it was adopted relatively easily. I still think it was the right call.

[1] https://datatracker.ietf.org/doc/html/draft-cho-netvc-applyp...




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: