summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAge
* cabac: x86: Give optimizations header a more meaningful nameDiego Biurrun2016-12-01
|
* aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 ↵Martin Storsjö2016-11-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and 32x32 This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0 By skipping individual 8x16 or 8x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_16x16_sub2_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub8_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 1372.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5190.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub8_add_neon: 5183.1 vp9_inv_dct_dct_32x32_sub12_add_neon: 6161.5 vp9_inv_dct_dct_32x32_sub16_add_neon: 6155.5 vp9_inv_dct_dct_32x32_sub20_add_neon: 7136.3 vp9_inv_dct_dct_32x32_sub24_add_neon: 7128.4 vp9_inv_dct_dct_32x32_sub28_add_neon: 8098.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8098.8 I.e. in general a very minor overhead for the full subpartition case due to the additional cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. Signed-off-by: Martin Storsjö <martin@martin.st>
* arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32Martin Storsjö2016-11-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3 By skipping individual 4x16 or 4x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8 vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5 vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6 vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6 vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7 vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1 I.e. in general a very minor overhead for the full subpartition case due to the additional loads and cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. In common VP9 content in a few inspected clips, 70-90% of the non-dc-only 16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left 8x8 or 16x16 subpartitions respectively. Signed-off-by: Martin Storsjö <martin@martin.st>
* arm: vp9itxfm: Only reload the idct coeffs for the iadst_idct combinationMartin Storsjö2016-11-30
| | | | | | | | | This avoids reloading them if they haven't been clobbered, if the first pass also was idct. This is similar to what was done in the aarch64 version. Signed-off-by: Martin Storsjö <martin@martin.st>
* vp9dsp: add DC only versions for idct/idct.Clément Bœsch2016-11-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | before: time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null - real 0m11.125s user 0m11.059s sys 0m0.050s time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null - real 0m10.944s user 0m10.819s sys 0m0.064s after: time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null - real 0m8.153s user 0m8.034s sys 0m0.050s time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null - real 0m8.038s user 0m7.980s sys 0m0.039s Signed-off-by: Martin Storsjö <martin@martin.st>
* hevc: Eliminate pointless variable indirectionDiego Biurrun2016-11-30
|
* hevc: Drop pointless av_unused attributeDiego Biurrun2016-11-30
|
* metasound: Drop unused tablesDiego Biurrun2016-11-30
|
* configure: Integrate X11 checks into vaapi/vdpau checksDiego Biurrun2016-11-29
|
* configure: Do not add newlines in filter()/filter_out() functionsDiego Biurrun2016-11-29
|
* configure: Move hardware-accelerated codec deps out of hwaccel sectionDiego Biurrun2016-11-29
|
* configure: MMAL-related decoders should depend on, not select, mmalDiego Biurrun2016-11-29
|
* mjpegdec: Check return values of functions that may failDiego Biurrun2016-11-29
|
* dxva2: Adjust printf length modifiers where appropriateDiego Biurrun2016-11-29
|
* avisynth: Cast to the right type when loading avisynth library functionsDiego Biurrun2016-11-29
| | | | Fixes a number of related warnings.
* lavc: move decoding-related code from utils.c to a new fileAnton Khirnov2016-11-29
|
* lavc: move encoding-related code from utils.c to a new fileAnton Khirnov2016-11-29
|
* aac_adtstoasc_bsf: validate and forward extradata if the stream is already ASCJames Almer2016-11-29
| | | | | | | Fixes AAC AudioSpecificConfig passthrough. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* mss2: only use error correction for matching block countsAndreas Cadhalpun2016-11-29
| | | | | | | | This fixes a heap-buffer-overflow in ff_er_frame_end when decoding mss2 with coded_width/coded_height larger than width/height. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* avconv: Fix the audio next dts computationLuca Barbato2016-11-29
| | | | | | Use the correct timebase. CC: libav-stable@libav.org
* ac3enc: Avoid unnecessary macro indirectionsDiego Biurrun2016-11-28
|
* ac3enc: Reshuffle functions to avoid forward declarationsDiego Biurrun2016-11-28
|
* ac3enc: Reshuffle some float/fixed-mode ifdefs to avoid a dummy functionDiego Biurrun2016-11-28
|
* hwcontext_vaapi: Don't abort on failing to allocate from a fixed-size poolMark Thompson2016-11-26
|
* tta: avoid undefined shiftsAnton Khirnov2016-11-25
| | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
* tta: use get_unary() instead of a custom implementationAnton Khirnov2016-11-25
| | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
* build: Drop gcrypt supportDiego Biurrun2016-11-25
| | | | GnuTLS in combination with gcrypt has been deprecated since 2010.
* configure: Use correct libm linker flag during math function checksDiego Biurrun2016-11-25
|
* configure: Add missing asyncts filter, movie filter, and output example depsDiego Biurrun2016-11-25
| | | | Also add a missing avcodec.h #include in the movie filter.
* configure: Use correct variable name in libsnappy testDiego Biurrun2016-11-25
|
* configure: Remove old avisynth support leftoverDiego Biurrun2016-11-25
|
* arm: warn/error on movrelx usage problematic with PIC on ELFJanne Grunau2016-11-24
| | | | | | The warning has false positives but our asm does not trigger it. For new code false positives can only be avoided by changing the register allocation.
* configure: Disable warning C4703 with MSVCDiego Biurrun2016-11-24
| | | | | This disables warnings about potentially uninitialized local pointer variables. Disabling the warning is in line with what we do for gcc.
* w32pthreads: Fix function pointer castsDiego Biurrun2016-11-24
| | | | This eliminates a handful of warnings at every inclusion of the header.
* qt-faststart: Do not try to use fancy 64-bit seeking functions on mingw32ceMartin Storsjö2016-11-24
| | | | | | These functions are not available on mingw32ce. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* rtmpdh: Do global initialization before running the testMartin Storsjö2016-11-24
| | | | | | | | | The rtmpdh code can use crypto libraries which may require a process global init. (gcrypt is one of the libraries where the rtmpdh test code can fail if global init hasn't been done, depending on gcrypt version.) Signed-off-by: Martin Storsjö <martin@martin.st>
* aarch64: vp9itxfm: Don't repeatedly set x9 when nothing overwrites itMartin Storsjö2016-11-24
| | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* rdt: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|
* ogg: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|
* mpegts: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|
* xsubdec: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|
* xan: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|
* wnv1: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|
* vima: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|
* vble: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|
* utvideodec: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|
* twinvq: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|
* tscc2: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|
* truespeech: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|
* tiertex: Convert to the new bitstream readerAlexandra Hájková2016-11-24
|