summaryrefslogtreecommitdiff
path: root/libavcodec/x86/rv40dsp.asm
Commit message (Collapse)AuthorAge
* avcodec/x86/rv40dsp_init: Remove obsolete MMX(EXT), 3dnow functionsAndreas Rheinhardt2022-06-22
| | | | | | | | | | | x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* Merge commit '6eef263aca281fb582e1fa3d841ac20ef747a252'James Almer2017-10-12
|\ | | | | | | | | | | | | * commit '6eef263aca281fb582e1fa3d841ac20ef747a252': x86: Merge align directives into SECTION_RODATA declarations where possible Merged-by: James Almer <jamrial@gmail.com>
| * x86: Merge align directives into SECTION_RODATA declarations where possibleDiego Biurrun2017-03-05
| |
* | Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'Clément Bœsch2016-06-21
|\| | | | | | | | | | | | | * commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb': cosmetics: Fix spelling mistakes Merged-by: Clément Bœsch <u@pkh.me>
| * cosmetics: Fix spelling mistakesVittorio Giovara2016-05-04
| | | | | | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
* | Merge commit '79793f833784121d574454af4871866576c0749d'Michael Niedermayer2014-07-01
|\| | | | | | | | | | | | | * commit '79793f833784121d574454af4871866576c0749d': Update Fiona's name in copyright statements. Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * Update Fiona's name in copyright statements.Diego Biurrun2014-07-01
| |
* | Merge commit '55519926ef855c671d084ccc151056de9e3d3a77'Michael Niedermayer2014-03-14
|\| | | | | | | | | | | | | | | | | | | * commit '55519926ef855c671d084ccc151056de9e3d3a77': x86: Make function prototype comments in assembly code consistent Conflicts: libavcodec/x86/sbrdsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Make function prototype comments in assembly code consistentDiego Biurrun2014-03-13
| | | | | | | | This helps grepping for functions, among other things.
* | Merge commit 'e2b5b097898c9155f4bdff4d83cdc54d5eef6930'Michael Niedermayer2013-11-05
|\| | | | | | | | | | | | | * commit 'e2b5b097898c9155f4bdff4d83cdc54d5eef6930': x86: rv40dsp: Use PAVGB instruction macro where appropriate Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: rv40dsp: Use PAVGB instruction macro where appropriateDiego Biurrun2013-11-04
| |
* | Reinstate proper FFmpeg license for all files.Thilo Borgmann2013-08-30
|/
* x86: mmx2 ---> mmxext in asm constructsDiego Biurrun2012-11-14
|
* x86: yasm: Use complete source path for macro helper %includesDiego Biurrun2012-10-31
| | | | | This is more consistent with the way we handle C #includes and it simplifies the build system.
* x86: include x86inc.asm in x86util.asmDiego Biurrun2012-10-31
| | | | This is necessary to allow refactoring some x86util macros with cpuflags.
* x86: use 32-bit source registers with movd instructionMans Rullgard2012-08-07
| | | | | | | | yasm tolerates mismatch between movd/movq and source register size, adjusting the instruction according to the register. nasm is more strict. Signed-off-by: Mans Rullgard <mans@mansr.com>
* x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions.Michael Kostylev2012-05-15
|
* rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MCChristophe Gisquet2012-05-10
| | | | | | | | | | | | | | | | | | | | Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* rv40dsp x86: use only one register, for both increment and loop counterChristophe GISQUET2012-04-10
| | | | | | Around 10 cycles faster for luma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* rv40dsp: implement prescaled versions for biweight.Christophe GISQUET2012-04-10
| | | | | | | | | | Quite often, the original weights are multiple of 512. By prescaling them by 1/512 when they are computed (once per frame), no intermediate shifting is needed, and no prescaling on each call either. The x86 code already used that trick. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* rv40: x86 SIMD for biweightChristophe Gisquet2012-01-30
Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are multiples of 512 (which is often the case when the values round up nicely). *_TIMER report for the 16x16 and 8x8 cases: C: 9015 decicycles in 16, 524257 runs, 31 skips 2656 decicycles in 8, 524271 runs, 17 skips MMX: 4156 decicycles in 16, 262090 runs, 54 skips 1206 decicycles in 8, 262131 runs, 13 skips MMX on fast-path: 2760 decicycles in 16, 524222 runs, 66 skips 995 decicycles in 8, 524252 runs, 36 skips SSE2: 2163 decicycles in 16, 262131 runs, 13 skips 832 decicycles in 8, 262137 runs, 7 skips SSE2 with fast path: 1783 decicycles in 16, 524276 runs, 12 skips 711 decicycles in 8, 524283 runs, 5 skips SSSE3: 2117 decicycles in 16, 262136 runs, 8 skips 814 decicycles in 8, 262143 runs, 1 skips SSSE3 with fast path: 1315 decicycles in 16, 524285 runs, 3 skips 578 decicycles in 8, 524286 runs, 2 skips This means around a 4% speedup for some sequences. Signed-off-by: Diego Biurrun <diego@biurrun.de>