| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Performance improvements:
quant_bands:
with: 681 decicycles in quant_bands, 8388453 runs, 155 skips
without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips
Around 42% for the function
Twoloop coder:
abs_pow34:
with/without: 7.82s/8.17s
Around 4% for the entire encoder
Both:
with/without: 7.15s/8.17s
Around 12% for the entire encoder
Fast coder:
abs_pow34:
with/without: 3.40s/3.77s
Around 10% for the entire encoder
Both:
with/without: 3.02s/3.77s
Around 20% faster for the entire encoder
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
| |
Fixes trac 5579
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Acked-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
| |
Clean some header includes and use the same naming scheme as
in ttaencdsp
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\
| |
| |
| |
| |
| |
| | |
* commit '9df889a5f116c1ee78c2f239e0ba599c492431aa':
h264: rename h264.[ch] to h264dec.[ch]
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| | |
This is more consistent with the naming of other decoders.
|
| |
| |
| |
| |
| | |
Each takes about 0.1% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).
|
| |
| |
| |
| |
| | |
Each takes about 0.5% of runtime in my profiles, and they didn't have
any SIMD yet so far (we only had simd for npx=16 double-block versions).
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
About 1.8x speedup compared to AVX version for full IDCT. Other
sub-IDCT scenarios also see speedups. Full --bench output for
idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles):
nop: 16.5
vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4
vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0
vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4
vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1
vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2
vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8
vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2
vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9
vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5
vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2
vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1
vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1
vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7
vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7
vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1
vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4
vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8
vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5
vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0
vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4
vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7
vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7
vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4
vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7
vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5
vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6
vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6
vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9
vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6
vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0
|
| |
| |
| |
| |
| | |
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| | |
That SIMD is still x86_64 only for now.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
|
| |
| |
| |
| |
| |
| | |
Currently unused, to be used in the following commits.
Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows
that it's about 1.65x as fast as the AVX version for the full IDCT, and
similar speedups for the sub-IDCTs:
nop: 24.6
vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8
vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6
vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4
vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2
vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5
vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7
vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9
vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2
vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9
vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3
vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7
vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4
vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1
vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1
vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0
vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4
vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6
vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7
vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9
vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2
vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6
vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5
vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0
vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9
vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c':
x86: Add missing movsxd for the int stride parameter
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| | |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
| |
| |
| |
| | |
These warnings conflict with system macros on Solaris, producing
truckloads of warnings about macro redefinition.
|
| |
| |
| |
| |
| |
| | |
About 10% faster.
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| | |
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'dc40a70c5755bccfb1a1349639943e1f408bea50':
Drop unnecessary libavutil/x86/asm.h #includes
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196':
tests: Move all test programs to a subdirectory
Merged-by: Clément Bœsch <clement@stupeflix.com>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb':
cosmetics: Fix spelling mistakes
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| | |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
| |
| |
| |
| |
| |
| | |
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
|
| | |
|
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.
Timings: 68 -> 49 cycles
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622':
fft: x86: cosmetics: Drop silly comments, add comment, whitespace
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| | |
|
| |
| |
| |
| |
| | |
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
|
| |
| |
| |
| |
| |
| | |
x86 optimizations are used only for the cases they support (<=65536 samples)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
| |
| |
| |
| |
| | |
Asked-for-by: durandal_1707
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c':
build: Add vc1dsp component for more fine-grained dependencies
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| | |
|
| |
| |
| |
| |
| | |
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45':
v210: Use separate sample_factors
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
The 10bit and the 8bit functions can now be implemented to process
a different amount of samples.
And while at it simplify a little the code.
|
| |
| |
| |
| |
| |
| | |
Around 25% faster than the ssse3 version.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
| |
| |
| |
| |
| |
| |
| | |
Around 35% faster than the avx version.
Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a':
v210: x86: Add the correct guards around the asm code
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|