summaryrefslogtreecommitdiff
path: root/libavutil/x86
Commit message (Collapse)AuthorAge
* x86/float_dsp: add ff_vector_fmul_reverse_avx2James Almer2017-04-11
| | | | | | ~20% faster than AVX. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: add ff_vector_dmac_scalar_{sse2,avx,fma3}James Almer2017-04-10
|
* Merge commit '99434f4df81b6801b2b535d5b9143305595784f6'Clément Bœsch2017-03-30
|\ | | | | | | | | | | | | * commit '99434f4df81b6801b2b535d5b9143305595784f6': float_dsp: Have implementation match function pointer prototype Merged-by: Clément Bœsch <cboesch@gopro.com>
| * float_dsp: Have implementation match function pointer prototypeDiego Biurrun2016-11-03
| | | | | | | | | | libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 1 different from declaration libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 2 different from declaration
* | Merge commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8'James Almer2017-03-23
|\| | | | | | | | | | | | | * commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8': emms: Give apriv_emms_yasm() a more general name Merged-by: James Almer <jamrial@gmail.com>
| * emms: Give apriv_emms_yasm() a more general nameDiego Biurrun2016-10-18
| |
* | Merge commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4'James Almer2017-03-23
|\| | | | | | | | | | | | | * commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4': x86: Add missing colons after assembly labels Merged-by: James Almer <jamrial@gmail.com>
| * x86: Add missing colons after assembly labelsDiego Biurrun2016-10-17
| | | | | | | | | | This fixes many warnings of the sort warning: label alone on a line without a colon might be in error
* | avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args ↵James Almer2017-03-21
| | | | | | | | | | | | | | are the same Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | Merge commit '07e1f99a1bb41d1a615676140eefc85cf69fa793'Clément Bœsch2017-03-20
|\| | | | | | | | | | | | | * commit '07e1f99a1bb41d1a615676140eefc85cf69fa793': x86util: Document SBUTTERFLY macro Merged-by: Clément Bœsch <u@pkh.me>
| * x86util: Document SBUTTERFLY macroAlexandra Hájková2016-09-19
| | | | | | | | Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* | Merge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5'Clément Bœsch2017-03-20
|\| | | | | | | | | | | | | * commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5': imgutils: add a function for copying image data from GPU mapped memory Merged-by: Clément Bœsch <u@pkh.me>
| * imgutils: add a function for copying image data from GPU mapped memoryAnton Khirnov2016-08-31
| | | | | | | | See https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers
* | avcodec/h264: sse2, avx h luma mbaff deblock/loop filterJames Darnley2017-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)
* | x86util: import MOVHL macroJames Darnley2017-02-18
| | | | | | | | | | | | | | | | | | | | Originally committed to x264 in 1637239a by Henrik Gramner who has agreed to re-license it as LGPL. Original commit message follows. x86: Avoid some bypass delays and false dependencies A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible.
* | avcodec/x86: deduplicate PASS8ROWS macroJames Darnley2017-02-18
| |
* | Merge commit '8e9cd81d291b1010c625b2766058aadf4affb537'James Almer2017-01-31
|\| | | | | | | | | | | | | * commit '8e9cd81d291b1010c625b2766058aadf4affb537': x86: cpu: Detect Conroe CPUs and their slow shuffle unit Merged-by: James Almer <jamrial@gmail.com>
| * x86: cpu: Detect Conroe CPUs and their slow shuffle unitFiona Glaser2016-07-20
| |
* | Merge commit '7d7355aa92bb36ca0765c49a569a999bcb96f332'James Almer2017-01-31
|\| | | | | | | | | | | | | * commit '7d7355aa92bb36ca0765c49a569a999bcb96f332': x86: Add SSSE3_SLOW CPU flag and related convenience macros Merged-by: James Almer <jamrial@gmail.com>
| * x86: Add SSSE3_SLOW CPU flag and related convenience macrosDiego Biurrun2016-07-20
| |
| * x86util: Extend SPLATW for avx2James Almer2016-07-18
| | | | | | | | | | | | Integration to Libav by Josh de Kock <josh@itanimul.li>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
| * asm: FF_-prefix internal macros used in inline assemblyDiego Biurrun2016-05-28
| | | | | | | | | | These warnings conflict with system macros on Solaris, producing truckloads of warnings about macro redefinition.
| * x86inc: Enable AVX emulation in additional casesAnton Mitrofanov2016-05-16
| | | | | | | | | | | | | | Allows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Improve handling of %ifid with multi-token parametersAnton Mitrofanov2016-05-16
| | | | | | | | | | | | | | | | The yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Fix AVX emulation of some instructionsAnton Mitrofanov2016-05-16
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Fix AVX emulation of scalar float instructionsHenrik Gramner2016-05-16
| | | | | | | | | | | | | | Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* | x86inc: Avoid using eax/rax for storing the stack pointerHenrik Gramner2017-01-09
| | | | | | | | | | | | | | | | | | When allocating stack space with an alignment requirement that is larger than the current stack alignment we need to store a copy of the original stack pointer in order to be able to restore it later. If we chose to use another register for this purpose we should not pick eax/rax since it can be overwritten as a return value.
* | avutil/x86/emms: Document the emms_c() vs alloc/free relation.Michael Niedermayer2016-10-23
| | | | | | | | | | Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* | vp9: add 16x16 idct avx2 (8-bit).Ronald S. Bultje2016-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
* | asm: FF_-prefix internal macros used in inline assemblyMatthieu Bouron2016-06-27
| | | | | | | | See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
* | Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'Clément Bœsch2016-06-21
|\| | | | | | | | | | | | | * commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb': cosmetics: Fix spelling mistakes Merged-by: Clément Bœsch <u@pkh.me>
| * cosmetics: Fix spelling mistakesVittorio Giovara2016-05-04
| | | | | | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
| * x86: Add ymm_reg structJames Almer2016-01-28
| | | | | | | | | | | | | | Needed to declare 32-byte long constants Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
| * x86inc: Add debug symbols indicating sizes of compiled functionsGeza Lore2016-01-23
| | | | | | | | | | | | | | | | | | | | | | Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Avoid creating unnecessary local labelsHenrik Gramner2016-01-23
| | | | | | | | | | | | | | | | | | | | | | | | The REP_RET workaround is only needed on old AMD cpus, and the labels clutter up the symbol table and confuse debugging/profiling tools, so use EQU to create SHN_ABS symbols instead of creating local labels. Furthermore, skip the workaround completely in functions that definitely won't run on such cpus. Note that EQU is just creating a local label when using nasm instead of yasm. This is probably a bug, but at least it doesn't break anything. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Simplify AUTO_REP_RETHenrik Gramner2016-01-23
| | | | | | | | | | | | | | | | cpuflags is never undefined any more, it's set to 0 instead. Also fix an incorrect comment. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Use more consistent indentationHenrik Gramner2016-01-23
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Preserve arguments when allocating stack spaceHenrik Gramner2016-01-23
| | | | | | | | | | | | | | | | When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Improve FMA instruction handlingHenrik Gramner2016-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * Correctly handle FMA instructions with memory operands. * Print a warning if FMA instructions are used without the correct cpuflag. * Simplify the instantiation code. * Clarify documentation. Only the last operand in FMA3 instructions can be a memory operand. When converting FMA4 instructions to FMA3 instructions we can utilize the fact that multiply is a commutative operation and reorder operands if necessary to ensure that a memory operand is used only as the last operand. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Be more verbose in assertion failuresHenrik Gramner2016-01-23
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Make cpuflag() and notcpuflag() return 0 or 1Henrik Gramner2016-01-23
| | | | | | | | | | | | Makes it possible to use them in arithmetic expressions. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Various minor backports from x264Henrik Gramner2015-08-13
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Drop SECTION_TEXT macroHenrik Gramner2015-08-11
| | | | | | | | | | | | | | The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Disable vpbroadcastq workaround in newer yasm versionsHenrik Gramner2015-08-11
| | | | | | | | | | | | The bug was fixed in 1.3.0, so only perform the workaround in earlier versions. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Fix instantiation of YMM registersChristophe Gisquet2015-08-11
| | | | | | | | | | Signed-off-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: warn when instructions incompatible with current cpuflags are usedAnton Mitrofanov2015-08-11
| | | | | | | | | | Signed-off-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Support arbitrary stack alignmentsHenrik Gramner2015-08-11
| | | | | | | | | | | | | | | | Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: warn if XOP integer FMA instruction emulation is impossibleAnton Mitrofanov2015-08-11
| | | | | | | | | | | | | | | | | | | | | | Emulation requires a temporary register if arguments 1 and 4 are the same; this doesn't obey the semantics of the original instruction, so we can't emulate that in x86inc. Also add pmacsdql emulation. Signed-off-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* | lavu/intmath.h: fix compilation with msvc10.Matt Oliver2016-06-13
| | | | | | | | Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* | x86/showcqt: use three operand format for some instructionsJames Almer2016-06-08
| | | | | | | | | | | | Fixes failures with yasm 1.1.0 and older Signed-off-by: James Almer <jamrial@gmail.com>