summaryrefslogtreecommitdiff
path: root/libavutil/x86
Commit message (Collapse)AuthorAge
* x86inc: Enable AVX emulation in additional casesAnton Mitrofanov2016-05-16
| | | | | | | Allows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Improve handling of %ifid with multi-token parametersAnton Mitrofanov2016-05-16
| | | | | | | | The yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Fix AVX emulation of some instructionsAnton Mitrofanov2016-05-16
| | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Fix AVX emulation of scalar float instructionsHenrik Gramner2016-05-16
| | | | | | | Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* cosmetics: Fix spelling mistakesVittorio Giovara2016-05-04
| | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
* x86: Add ymm_reg structJames Almer2016-01-28
| | | | | | | Needed to declare 32-byte long constants Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* x86inc: Add debug symbols indicating sizes of compiled functionsGeza Lore2016-01-23
| | | | | | | | | | | Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Avoid creating unnecessary local labelsHenrik Gramner2016-01-23
| | | | | | | | | | | | The REP_RET workaround is only needed on old AMD cpus, and the labels clutter up the symbol table and confuse debugging/profiling tools, so use EQU to create SHN_ABS symbols instead of creating local labels. Furthermore, skip the workaround completely in functions that definitely won't run on such cpus. Note that EQU is just creating a local label when using nasm instead of yasm. This is probably a bug, but at least it doesn't break anything. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Simplify AUTO_REP_RETHenrik Gramner2016-01-23
| | | | | | | | cpuflags is never undefined any more, it's set to 0 instead. Also fix an incorrect comment. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Use more consistent indentationHenrik Gramner2016-01-23
| | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Preserve arguments when allocating stack spaceHenrik Gramner2016-01-23
| | | | | | | | When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Improve FMA instruction handlingHenrik Gramner2016-01-23
| | | | | | | | | | | | | | * Correctly handle FMA instructions with memory operands. * Print a warning if FMA instructions are used without the correct cpuflag. * Simplify the instantiation code. * Clarify documentation. Only the last operand in FMA3 instructions can be a memory operand. When converting FMA4 instructions to FMA3 instructions we can utilize the fact that multiply is a commutative operation and reorder operands if necessary to ensure that a memory operand is used only as the last operand. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Be more verbose in assertion failuresHenrik Gramner2016-01-23
| | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Make cpuflag() and notcpuflag() return 0 or 1Henrik Gramner2016-01-23
| | | | | | Makes it possible to use them in arithmetic expressions. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Various minor backports from x264Henrik Gramner2015-08-13
| | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Drop SECTION_TEXT macroHenrik Gramner2015-08-11
| | | | | | | The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Disable vpbroadcastq workaround in newer yasm versionsHenrik Gramner2015-08-11
| | | | | | The bug was fixed in 1.3.0, so only perform the workaround in earlier versions. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Fix instantiation of YMM registersChristophe Gisquet2015-08-11
| | | | | Signed-off-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: warn when instructions incompatible with current cpuflags are usedAnton Mitrofanov2015-08-11
| | | | | Signed-off-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: Support arbitrary stack alignmentsHenrik Gramner2015-08-11
| | | | | | | | Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86inc: warn if XOP integer FMA instruction emulation is impossibleAnton Mitrofanov2015-08-11
| | | | | | | | | | | Emulation requires a temporary register if arguments 1 and 4 are the same; this doesn't obey the semantics of the original instruction, so we can't emulate that in x86inc. Also add pmacsdql emulation. Signed-off-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86: Serialize rdtsc in read_time()Henrik Gramner2015-07-09
| | | | | | | | | | | | | | | | | Improves the accuracy of measurements, especially in short sections. To quote the Intel 64 and IA-32 Architectures Software Developer's Manual: "The RDTSC instruction is not a serializing instruction. It does not necessarily wait until all previous instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the read operation is performed. If software requires RDTSC to be executed only after all previous instructions have completed locally, it can either use RDTSCP (if the processor supports that instruction) or execute the sequence LFENCE;RDTSC." SSE2 is a requirement for lfence so only use it on SSE2-capable systems. Prefer lfence;rdtsc over rdtscp since rdtscp is supported on fewer systems. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* x86: check for AV_CPU_FLAG_AVXSLOW where usefulJames Almer2015-05-31
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* x86: Add helper macros to check for slow cpuflagsJames Almer2015-05-31
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* x86: add AV_CPU_FLAG_AVXSLOW flagJames Almer2015-05-31
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* x86inc: Clear __SECT__Timothy Gu2015-05-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Silences warning(s) like: libavcodec/x86/fft.asm:93: warning: section flags ignored on section redeclaration The cause of this warning is that because `struc` and `endstruc` attempts to revert to the previous section state [1]. The section state is stored in the macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION` directive [2]. Thus, the `.note.GNU-stack` section is defined twice (once in x86inc.asm, once during `endstruc`), causing the warning. That is the first part of the commit: using the primitive `[section]` format for .note.GNU-stack etc., which does not update `__SECT__` [2]. That fixes only half of the problem. Even without any `SECTION` directives, `__SECT__` is predefined as `.text`, which conflicting with the later `SECTION_TEXT` (which expands to `.text align=16`). [1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4 [2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3 Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* v210enc: Add SIMD optimised 8-bit and 10-bit encodersKieran Kunhya2014-12-05
| | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflagsHenrik Gramner2014-09-09
| | | | | | Previously there was a limit of two cpuflags. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* x86inc: Free up variable name "n" in global namespaceLoren Merritt2014-09-09
| | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
* x86inc: Make ym# behave the same way as xm#Henrik Gramner2014-09-09
| | | | | | This makes more sense for future implementations of templates with zmm registers. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* Update Fiona's name in copyright statements.Diego Biurrun2014-07-01
|
* x86: add detection for Bit Manipulation Instruction setsJames Almer2014-02-23
| | | | | | Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com>
* x86: add detection for FMA3 instruction setJames Almer2014-02-23
| | | | | | Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com>
* x86: add missing XOP checks and macrosJames Almer2014-02-23
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86: float dsp: unroll SSE versionsChristophe Gisquet2014-02-20
| | | | | | | | | | vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* x86inc: Speed up assembling with YasmLoren Merritt2014-01-26
| | | | | | | Work around Yasm's inefficiency with handling large numbers of variables in the global scope. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* libavutil: x86: Add AVX2 capable CPU detection.Kieran Kunhya2013-10-25
| | | | | | Patch based on x264's AVX2 detection Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86: more AVX2 frameworkJason Garrett-Glaser2013-10-14
| | | | Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: FMA3/4 SupportJason Garrett-Glaser2013-10-14
| | | | Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Remove our FMA4 supportDerek Buitenhuis2013-10-14
| | | | | | | | This is so we can sync to x264's version of FMA4 support. This partialy reverts commit 79687079a97a039c325ab79d7a95920d800b791f. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Use VEX-encoded instructions in AVX functionsHenrik Gramner2013-10-14
| | | | | | | | | | | | | Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version. This change makes it easier to extend existing code to use AVX2. Also add support for AVX emulation of a few instructions that were missing before. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Remove .rodata kludgesHenrik Gramner2013-10-09
| | | | | | | The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old anymore. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: remove misaligned cpu flagHenrik Gramner2013-10-07
| | | | | | | | | | | | Prevents a crash if the misaligned exception mask bit is cleared for some reason. Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule. They also require modifying the MXCSR control register and by removing those functions we can get rid of that complexity altogether. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: various minor backports from x264Jason Garrett-Glaser2013-10-07
| | | | | | Small backports that sneaked into other asm commits in x264. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"Derek Buitenhuis2013-10-07
| | | | | | This is also a valid value for WIN64. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Utilize the shadow space on 64-bit WindowsHenrik Gramner2013-10-07
| | | | | | | | | Store XMM6 and XMM7 in the shadow space in functions that clobbers them. This way we don't have to adjust the stack pointer as often, reducing the number of instructions as well as code size. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: create xm# and ym#, analagous to m#Loren Merritt2013-10-07
| | | | | | For when we want to mix simd sizes within one function. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: fix some corner cases of SWAPLoren Merritt2013-10-07
| | | | | | | | SWAP with >=3 named (rather than numbered) args PERMUTE followed by SWAP with 2 named args used to produce the wrong permutation Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Use SSE instead of SSE2 for copying dataHenrik Gramner2013-10-07
| | | | | | | Reduces code size because movaps/movups is one byte shorter than movdqa/movdqu. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Set ELF hidden visibility for global constantsHenrik Gramner2013-10-07
| | | | Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>