summaryrefslogtreecommitdiff
path: root/libavutil/x86
Commit message (Collapse)AuthorAge
* lavu/intmath.h: Move x86 only msvc/icl functions to x86 specific header.Matt Oliver2015-10-19
| | | | Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* lavu/intmath.h: Add msvc/icl ctzll optimisations.Matt Oliver2015-10-19
| | | | Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* x86inc: Make cpuflag() and notcpuflag() return 0 or 1Henrik Gramner2015-10-01
| | | | Makes it possible to use them in arithmetic expressions.
* avutil/attributes: add AV_GCC_VERSION_AT_MOSTJames Almer2015-09-18
| | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* x86: port PSIGNW to cpuflagsJames Almer2015-09-11
| | | | | Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* avutil/x86/asm: rename REG_SP to REG_spGanesh Ajjanagadde2015-08-22
| | | | | | | | | REG_SP is defined by Solaris system headers. This fixes a sea of warnings while building on Solaris: http://fate.ffmpeg.org/report.cgi?time=20150820233505&slot=x86-opensolaris-gcc4.3 Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* x86inc: warn if XOP integer FMA instruction emulation is impossibleAnton Mitrofanov2015-08-05
| | | | Signed-off-by: Henrik Gramner <henrik@gramner.com>
* x86inc: Drop SECTION_TEXT macroHenrik Gramner2015-08-04
| | | | | The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
* x86inc: Support arbitrary stack alignmentsHenrik Gramner2015-08-04
| | | | | | Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not.
* x86: move XOP emulation code back to x86incJames Almer2015-08-03
| | | | | | | | | | Only two functions that use xop multiply-accumulate instructions where the first operand is the same as the fourth actually took advantage of the macros. This further reduces differences with x264's x86inc. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86inc: Various minor backports from x264Henrik Gramner2015-08-03
| | | | | Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* x86inc: Disable vpbroadcastq workaround in newer yasm versionsHenrik Gramner2015-08-03
| | | | | | | The bug was fixed in 1.3.0, so only perform the workaround in earlier versions. Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* x86/float_dsp: add missing colon to labelsJames Almer2015-07-26
| | | | | | Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>
* avutil/x86/bswap: force inline asm versions with ICCJames Almer2015-07-18
| | | | | | | | Recent ICC versions that define GCC as >= 4.5 (like ICC 13) apparently can't optimize the generic C versions of av_bswap*() on their own. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* Merge commit 'd1a6cb195f610978ba5d2351e60f938f7f261d59'Michael Niedermayer2015-07-09
|\ | | | | | | | | | | | | * commit 'd1a6cb195f610978ba5d2351e60f938f7f261d59': x86: Serialize rdtsc in read_time() Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Serialize rdtsc in read_time()Henrik Gramner2015-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improves the accuracy of measurements, especially in short sections. To quote the Intel 64 and IA-32 Architectures Software Developer's Manual: "The RDTSC instruction is not a serializing instruction. It does not necessarily wait until all previous instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the read operation is performed. If software requires RDTSC to be executed only after all previous instructions have completed locally, it can either use RDTSCP (if the processor supports that instruction) or execute the sequence LFENCE;RDTSC." SSE2 is a requirement for lfence so only use it on SSE2-capable systems. Prefer lfence;rdtsc over rdtscp since rdtscp is supported on fewer systems. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
| * x86: check for AV_CPU_FLAG_AVXSLOW where usefulJames Almer2015-05-31
| | | | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* | avutil/x86/intmath: add missing check for inline assemblyJames Almer2015-06-27
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | avutil/x86/intmath: use bzhi gcc builtin in av_mod_uintp2()James Almer2015-06-27
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | x86: check for AV_CPU_FLAG_AVXSLOW where usefulJames Almer2015-06-01
| | | | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit 'cae39851201b7781f1262e1c23627b45e6e80bb4'Michael Niedermayer2015-05-31
|\| | | | | | | | | | | | | * commit 'cae39851201b7781f1262e1c23627b45e6e80bb4': x86: Add helper macros to check for slow cpuflags Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Add helper macros to check for slow cpuflagsJames Almer2015-05-31
| | | | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
| * x86: add AV_CPU_FLAG_AVXSLOW flagJames Almer2015-05-31
| | | | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
| * x86inc: Clear __SECT__Timothy Gu2015-05-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Silences warning(s) like: libavcodec/x86/fft.asm:93: warning: section flags ignored on section redeclaration The cause of this warning is that because `struc` and `endstruc` attempts to revert to the previous section state [1]. The section state is stored in the macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION` directive [2]. Thus, the `.note.GNU-stack` section is defined twice (once in x86inc.asm, once during `endstruc`), causing the warning. That is the first part of the commit: using the primitive `[section]` format for .note.GNU-stack etc., which does not update `__SECT__` [2]. That fixes only half of the problem. Even without any `SECTION` directives, `__SECT__` is predefined as `.text`, which conflicting with the later `SECTION_TEXT` (which expands to `.text align=16`). [1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4 [2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3 Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
| * v210enc: Add SIMD optimised 8-bit and 10-bit encodersKieran Kunhya2014-12-05
| | | | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
| * x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflagsHenrik Gramner2014-09-09
| | | | | | | | | | | | Previously there was a limit of two cpuflags. Signed-off-by: Diego Biurrun <diego@biurrun.de>
| * x86inc: Free up variable name "n" in global namespaceLoren Merritt2014-09-09
| | | | | | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
| * x86inc: Make ym# behave the same way as xm#Henrik Gramner2014-09-09
| | | | | | | | | | | | This makes more sense for future implementations of templates with zmm registers. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* | x86inc: Clear __SECT__Timothy Gu2015-05-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit silences warning(s) like: libavcodec/x86/fft.asm:93: warning: section flags ignored on section redeclaration The cause of this warning is that because `struc` and `endstruc` attempts to revert to the previous section state [1]. The section state is stored in the macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION` directive [2]. Thus, the `.note.GNU-stack` section is defined twice (once in x86inc.asm, once during `endstruc`), causing the warning. That is the first part of the commit: using the primitive `[section]` format for .note.GNU-stack etc., which does not update `__SECT__` [2]. That fixes only half of the problem. Even without any `SECTION` directives, `__SECT__` is predefined as `.text`, which conflicting with the later `SECTION_TEXT` (which expands to `.text align=16`). [1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4 [2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/cpu: add AV_CPU_FLAG_AVXSLOW flagJames Almer2015-05-27
| | | | | | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* | avutil/x86/Makefile: fix conditional x86/emms.o buildMichael Niedermayer2015-04-09
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | avutil/x86/Makefile: Make building and linking of emms.c conditionalRonald S. Bultje2015-04-08
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | libavutil: add bmi2 optimized av_mod_uintp2James Almer2015-03-20
| | | | | | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* | pixelutils: Comment on (lack of) sad_8x8_sse2Peter Cordes2015-03-04
| | | | | | | | Signed-off-by: Peter Cordes <peter@cordes.ca>
* | libavutil: add x86 optimized av_popcountJames Almer2015-02-25
| | | | | | | | | | Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | x86inc: Correctly warn on use of SSE2 instructions in SSE functionsChristophe Gisquet2015-02-17
| | | | | | | | | | | | | | | | SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2 instructions did not issue warnings when used in SSE functions. Handle it by also checking the register type when such instructions are used. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: lavu/x264asm: fix ymm register instantiationChristophe Gisquet2015-02-04
| | | | | | | | | | | | | | | | This mimicks what is done for the other instruction sets. Tested-by: James Almer <jamrial@gmail.com> Tested-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | lavu/x86/x86inc: deprecate INIT_AVXJames Darnley2015-02-02
| | | | | | | | | | | | The same can be done with INIT_XMM avx Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x264asm: warn when inappropriate instruction used in function with specified ↵Anton Mitrofanov2015-02-02
| | | | | | | | | | | | | | cpuflags Requested-by: Christophe Gisquet <christophe.gisquet@gmail.com> Requested-by: "Ronald S. Bultje" <rsbultje@gmail.com>
* | x86/swr: add SSE2/AVX pack_8ch functionsJames Almer2014-12-30
| | | | | | | | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | v210enc: Add SIMD optimised 8-bit and 10-bit encodersKieran Kunhya2014-11-26
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | avutil/lls: Make unchanged function arguments constMichael Niedermayer2014-09-28
| | | | | | | | | | Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | avutil/x86/cpu: fix cpuid sub-leaf selectionlvqcl2014-09-27
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflagsHenrik Gramner2014-09-05
| | | | | | | | | | | | Previously there was a limit of two cpuflags. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86inc: Make ym# behave the same way as xm#Henrik Gramner2014-09-05
| | | | | | | | | | | | This makes more sense for future implementations of templates with zmm registers. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86inc: free up variable name "n" in global namespaceLoren Merritt2014-09-05
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | avutil/pixelutils: faster pixelutils_sad_16x16Clément Bœsch2014-08-23
| | | | | | | | | | | | 501 to 439 decicycles. See 45c7f3997ea11c3d1007b2126b1c0049a8c27105.
* | avutil/pixelutils: faster pixelutils_sad_[au]_16x16Clément Bœsch2014-08-23
| | | | | | | | | | | | | | | | | | | | ~560 → ~500 decicycles This is following the comments from Michael in https://ffmpeg.org/pipermail/ffmpeg-devel/2014-August/160599.html Using 2 registers for accumulator didn't help. On the other hand, some re-ordering between the movs and psadbw allowed going ~538 to ~500.
* | drop LLS1, rename LLS2 to LLSMichael Niedermayer2014-08-09
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | avutil: add pixelutils APIClément Bœsch2014-08-05
| |