summaryrefslogtreecommitdiff
path: root/libavcodec/vp9block.c
Commit message (Collapse)AuthorAge
* avcodec/thread: Move ff_thread_(await|report)_progress to new headerAndreas Rheinhardt2022-02-09
| | | | | | | | | | This is in preparation for further commits that will stop using ThreadFrame for frame-threaded codecs that don't use ff_thread_(await|report)_progress(); the API for those codecs having inter-frame depdendencies will live in threadframe.h. Reviewed-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* Remove/replace some unnecessary avcodec.h inclusionsAndreas Rheinhardt2021-07-22
| | | | | | | Also remove other unnecessary headers and include headers directly while at it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* vp9dec: support exporting QP tables through the AVVideoEncParams APIAnton Khirnov2020-05-12
|
* Merge commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3'James Almer2017-09-26
|\ | | | | | | | | | | | | * commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3': Mark some arrays that never change as const. Merged-by: James Almer <jamrial@gmail.com>
| * Mark some arrays that never change as const.Anton Khirnov2017-02-01
| |
| * aarch64: vp9: Add NEON optimizations of VP9 MC functionsMartin Storsjö2016-11-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This work is sponsored by, and copyright, Google. These are ported from the ARM version; it is essentially a 1:1 port with no extra added features, but with some hand tuning (especially for the plain copy/avg functions). The ARM version isn't very register starved to begin with, so there's not much to be gained from having more spare registers here - we only avoid having to clobber callee-saved registers. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_neon: 27.2 23.7 vp9_avg8_neon: 56.5 54.7 vp9_avg16_neon: 169.9 167.4 vp9_avg32_neon: 585.8 585.2 vp9_avg64_neon: 2460.3 2294.7 vp9_avg_8tap_smooth_4h_neon: 132.7 125.2 vp9_avg_8tap_smooth_4hv_neon: 478.8 442.0 vp9_avg_8tap_smooth_4v_neon: 126.0 93.7 vp9_avg_8tap_smooth_8h_neon: 241.7 234.2 vp9_avg_8tap_smooth_8hv_neon: 690.9 646.5 vp9_avg_8tap_smooth_8v_neon: 245.0 205.5 vp9_avg_8tap_smooth_64h_neon: 11273.2 11280.1 vp9_avg_8tap_smooth_64hv_neon: 22980.6 22184.1 vp9_avg_8tap_smooth_64v_neon: 11549.7 10781.1 vp9_put4_neon: 18.0 17.2 vp9_put8_neon: 40.2 37.7 vp9_put16_neon: 97.4 99.5 vp9_put32_neon/armv8: 346.0 307.4 vp9_put64_neon/armv8: 1319.0 1107.5 vp9_put_8tap_smooth_4h_neon: 126.7 118.2 vp9_put_8tap_smooth_4hv_neon: 465.7 434.0 vp9_put_8tap_smooth_4v_neon: 113.0 86.5 vp9_put_8tap_smooth_8h_neon: 229.7 221.6 vp9_put_8tap_smooth_8hv_neon: 658.9 621.3 vp9_put_8tap_smooth_8v_neon: 215.0 187.5 vp9_put_8tap_smooth_64h_neon: 10636.7 10627.8 vp9_put_8tap_smooth_64hv_neon: 21076.8 21026.9 vp9_put_8tap_smooth_64v_neon: 9635.0 9632.4 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is pretty much the same as for the 32 bit case; on the A53 it's around 6-13x for ther larger 8tap filters. The exact speedup varies a little, since the C versions generally don't end up exactly as slow/fast as on 32 bit. Signed-off-by: Martin Storsjö <martin@martin.st>
| * arm: vp9: Add NEON optimizations of VP9 MC functionsMartin Storsjö2016-11-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This work is sponsored by, and copyright, Google. The filter coefficients are signed values, where the product of the multiplication with one individual filter coefficient doesn't overflow a 16 bit signed value (the largest filter coefficient is 127). But when the products are accumulated, the resulting sum can overflow the 16 bit signed range. Instead of accumulating in 32 bit, we accumulate the largest product (either index 3 or 4) last with a saturated addition. (The VP8 MC asm does something similar, but slightly simpler, by accumulating each half of the filter separately. In the VP9 MC filters, each half of the filter can also overflow though, so the largest component has to be handled individually.) Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_avg4_neon: 1.71 1.15 1.42 1.49 vp9_avg8_neon: 2.51 3.63 3.14 2.58 vp9_avg16_neon: 2.95 6.76 3.01 2.84 vp9_avg32_neon: 3.29 6.64 2.85 3.00 vp9_avg64_neon: 3.47 6.67 3.14 2.80 vp9_avg_8tap_smooth_4h_neon: 3.22 4.73 2.76 4.67 vp9_avg_8tap_smooth_4hv_neon: 3.67 4.76 3.28 4.71 vp9_avg_8tap_smooth_4v_neon: 5.52 7.60 4.60 6.31 vp9_avg_8tap_smooth_8h_neon: 6.22 9.04 5.12 9.32 vp9_avg_8tap_smooth_8hv_neon: 6.38 8.21 5.72 8.17 vp9_avg_8tap_smooth_8v_neon: 9.22 12.66 8.15 11.10 vp9_avg_8tap_smooth_64h_neon: 7.02 10.23 5.54 11.58 vp9_avg_8tap_smooth_64hv_neon: 6.76 9.46 5.93 9.40 vp9_avg_8tap_smooth_64v_neon: 10.76 14.13 9.46 13.37 vp9_put4_neon: 1.11 1.47 1.00 1.21 vp9_put8_neon: 1.23 2.17 1.94 1.48 vp9_put16_neon: 1.63 4.02 1.73 1.97 vp9_put32_neon: 1.56 4.92 2.00 1.96 vp9_put64_neon: 2.10 5.28 2.03 2.35 vp9_put_8tap_smooth_4h_neon: 3.11 4.35 2.63 4.35 vp9_put_8tap_smooth_4hv_neon: 3.67 4.69 3.25 4.71 vp9_put_8tap_smooth_4v_neon: 5.45 7.27 4.49 6.52 vp9_put_8tap_smooth_8h_neon: 5.97 8.18 4.81 8.56 vp9_put_8tap_smooth_8hv_neon: 6.39 7.90 5.64 8.15 vp9_put_8tap_smooth_8v_neon: 9.03 11.84 8.07 11.51 vp9_put_8tap_smooth_64h_neon: 6.78 9.48 4.88 10.89 vp9_put_8tap_smooth_64hv_neon: 6.99 8.87 5.94 9.56 vp9_put_8tap_smooth_64v_neon: 10.69 13.30 9.43 14.34 For the larger 8tap filters, the speedup vs C code is around 5-14x. This is significantly faster than libvpx's implementation of the same functions, at least when comparing the put_8tap_smooth_64 functions (compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from libvpx). Absolute runtimes from checkasm: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_64h_neon: 20150.3 14489.4 19733.6 10863.7 libvpx vpx_convolve8_horiz_neon: 52623.3 19736.4 21907.7 25027.7 vp9_put_8tap_smooth_64v_neon: 14455.0 12303.9 13746.4 9628.9 libvpx vpx_convolve8_vert_neon: 42090.0 17706.2 17659.9 16941.2 Thus, on the A9, the horizontal filter is only marginally faster than libvpx, while our version is significantly faster on the other cores, and the vertical filter is significantly faster on all cores. The difference is especially large on the A7. The libvpx implementation does the accumulation in 32 bit, which probably explains most of the differences. Signed-off-by: Martin Storsjö <martin@martin.st>
| * vp9: Flip the order of arguments in MC functionsMartin Storsjö2016-11-03
| | | | | | | | | | | | | | | | | | This makes it match the pattern already used for VP8 MC functions. This also makes the signature match ffmpeg's version of these functions, easing porting of code in both directions. Signed-off-by: Martin Storsjö <martin@martin.st>
| * vp9: ignore reference segmentation map if error_resilience flag is set.Ronald S. Bultje2016-10-04
| | | | | | | | | | | | | | | | Fixes ffvp9_fails_where_libvpx.succeeds.webm. Bug-Id: ffmpeg/3849. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * vp9: add frame threadingRonald S. Bultje2016-08-11
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * vp9: allocate 'b', 'block/uvblock' and 'eob/uveob' dynamically.Ronald S. Bultje2016-08-11
| | | | | | | | | | | | This will be needed for frame threading. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * vp9: split last/cur_frame from the reference buffers.Ronald S. Bultje2016-08-11
| | | | | | | | | | | | | | | | | | | | We need more information from last/cur_frame than from reference buffers, so we can use a simplified structure for reference buffers, and then store mvs and segmentation map information in last/cur. This prepares the decoder for frame threading support. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * Remove unnecessary get_bits.h #includesDiego Biurrun2016-06-07
| |
| * vp9: Use the correct upper bound for seg_idLuca Barbato2014-11-21
| | | | | | | | | | | | And use a macro to make apparent why the value. Bug-Id: CID 1108595
| * vp9: drop support for real (non-emulated) edgesAnton Khirnov2014-01-09
| | | | | | | | | | | | They are not measurably faster on x86, they might be somewhat faster on other platforms due to missing emu edge SIMD, but the gain is not large enough to justify the added complexity.
| * lavc: VP9 decoderRonald S. Bultje2013-11-15
| | | | | | | | | | | | | | Originally written by Ronald S. Bultje <rsbultje@gmail.com> and Clément Bœsch <u@pkh.me> Further contributions by: Anton Khirnov <anton@khirnov.net> Diego Biurrun <diego@biurrun.de> Luca Barbato <lu_zero@gentoo.org> Martin Storsjö <martin@martin.st> Signed-off-by: Luca Barbato <lu_zero@gentoo.org> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* avcodec/vp9: Add tile threading supportIlia Valiakhmetov2017-09-08
| | | | | Signed-off-by: Ilia Valiakhmetov <zakne0ne@gmail.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* avcodec/vp9block: fix runtime error: signed integer overflow: 196675 * 20670 ↵Michael Niedermayer2017-05-21
| | | | | | | | | | cannot be represented in type 'int' Fixes: 1710/clusterfuzz-testcase-minimized-4837032931098624 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* vp9: split out reconstruction functions in their own source file.Ronald S. Bultje2017-03-28
|
* vp9: re-split the decoder/format/dsp interface header files.Ronald S. Bultje2017-03-28
| | | | | The advantage here is that the internal software decoder interface is not exposed to the DSP functions or the hardware accelerations.
* lavc/vp9: consistent use of typedef instead of structClément Bœsch2017-03-27
|
* lavc/vp9: misc cosmeticsClément Bœsch2017-03-27
| | | | Imported from Libav
* lavc/vp9: rename ctx to avctxClément Bœsch2017-03-27
| | | | | This reduces diff with Libav. It also prevents a potential confusion between the private context and the AVCodecContext.
* lavc/vp9: split into vp9{block,data,mvs}Clément Bœsch2017-03-27
This is following Libav layout to ease merges.