summaryrefslogtreecommitdiff
path: root/libavutil/x86/x86inc.asm
Commit message (Collapse)AuthorAge
* avutil/x86inc: fix warnings when assembling with Nasm 2.15Henrik Gramner2020-07-12
| | | | | | | | | Some new warnings regarding use of empty macro parameters has been added, so adjust some x86inc code to silence those. Fixes part of ticket #8771 Signed-off-by: James Almer <jamrial@gmail.com>
* x86inc: Drop cpuflags_slowctzHenrik Gramner2018-01-20
|
* x86inc: Correctly set mmreg variablesHenrik Gramner2018-01-20
|
* x86inc: Support creating global symbols from local labelsHenrik Gramner2018-01-20
| | | | | On ELF platforms such symbols needs to be flagged as functions with the correct visibility to please certain linkers in some scenarios.
* x86inc: Use .rdata instead of .rodata on WindowsHenrik Gramner2018-01-20
| | | | | The standard section for read-only data on Windows is .rdata. Nasm will flag non-standard sections as executable by default which isn't ideal.
* x86inc: Enable AVX emulation for floating-point pseudo-instructionsHenrik Gramner2018-01-20
| | | | | | | There are 32 pseudo-instructions for each floating-point comparison instruction, but only 8 of them are actually valid in legacy-encoded mode. The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions and can therefore be disregarded for this purpose.
* x86inc: set the correct amount of simd regs in x86_64 when avx512 is enabled ↵James Almer2017-12-24
| | | | | | | | | but not used Fixes compilation of libavresample/x86/audio_mix.asm Reviewed-by: Gramner Signed-off-by: James Almer <jamrial@gmail.com>
* x86inc: AVX-512 supportHenrik Gramner2017-12-24
| | | | | | | | | | | | | | | | | | | | AVX-512 consists of a plethora of different extensions, but in order to keep things a bit more manageable we group together the following extensions under a single baseline cpu flag which should cover SKL-X and future CPUs: * AVX-512 Foundation (F) * AVX-512 Conflict Detection Instructions (CD) * AVX-512 Byte and Word Instructions (BW) * AVX-512 Doubleword and Quadword Instructions (DQ) * AVX-512 Vector Length Extensions (VL) On x86-64 AVX-512 provides 16 additional vector registers, prefer using those over existing ones since it allows us to avoid using `vzeroupper` unless more than 16 vector registers are required. They also happen to be volatile on Windows which means that we don't need to save and restore existing xmm register contents unless more than 22 vector registers are required. Big thanks to Intel for their support.
* Merge commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6'James Almer2017-09-26
|\ | | | | | | | | | | | | * commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6': asm: Consistently uppercase SECTION markers Merged-by: James Almer <jamrial@gmail.com>
| * asm: Consistently uppercase SECTION markersDiego Biurrun2017-02-03
| |
| * x86inc: Avoid using eax/rax for storing the stack pointerHenrik Gramner2017-01-09
| | | | | | | | | | | | | | | | | | | | | | When allocating stack space with an alignment requirement that is larger than the current stack alignment we need to store a copy of the original stack pointer in order to be able to restore it later. If we chose to use another register for this purpose we should not pick eax/rax since it can be overwritten as a return value. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Enable AVX emulation in additional casesAnton Mitrofanov2016-05-16
| | | | | | | | | | | | | | Allows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Improve handling of %ifid with multi-token parametersAnton Mitrofanov2016-05-16
| | | | | | | | | | | | | | | | The yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Fix AVX emulation of some instructionsAnton Mitrofanov2016-05-16
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Fix AVX emulation of scalar float instructionsHenrik Gramner2016-05-16
| | | | | | | | | | | | | | Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Add debug symbols indicating sizes of compiled functionsGeza Lore2016-01-23
| | | | | | | | | | | | | | | | | | | | | | Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Avoid creating unnecessary local labelsHenrik Gramner2016-01-23
| | | | | | | | | | | | | | | | | | | | | | | | The REP_RET workaround is only needed on old AMD cpus, and the labels clutter up the symbol table and confuse debugging/profiling tools, so use EQU to create SHN_ABS symbols instead of creating local labels. Furthermore, skip the workaround completely in functions that definitely won't run on such cpus. Note that EQU is just creating a local label when using nasm instead of yasm. This is probably a bug, but at least it doesn't break anything. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Simplify AUTO_REP_RETHenrik Gramner2016-01-23
| | | | | | | | | | | | | | | | cpuflags is never undefined any more, it's set to 0 instead. Also fix an incorrect comment. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Use more consistent indentationHenrik Gramner2016-01-23
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Preserve arguments when allocating stack spaceHenrik Gramner2016-01-23
| | | | | | | | | | | | | | | | When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Improve FMA instruction handlingHenrik Gramner2016-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * Correctly handle FMA instructions with memory operands. * Print a warning if FMA instructions are used without the correct cpuflag. * Simplify the instantiation code. * Clarify documentation. Only the last operand in FMA3 instructions can be a memory operand. When converting FMA4 instructions to FMA3 instructions we can utilize the fact that multiply is a commutative operation and reorder operands if necessary to ensure that a memory operand is used only as the last operand. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Be more verbose in assertion failuresHenrik Gramner2016-01-23
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Make cpuflag() and notcpuflag() return 0 or 1Henrik Gramner2016-01-23
| | | | | | | | | | | | Makes it possible to use them in arithmetic expressions. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Various minor backports from x264Henrik Gramner2015-08-13
| | | | | | | | Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Drop SECTION_TEXT macroHenrik Gramner2015-08-11
| | | | | | | | | | | | | | The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Disable vpbroadcastq workaround in newer yasm versionsHenrik Gramner2015-08-11
| | | | | | | | | | | | The bug was fixed in 1.3.0, so only perform the workaround in earlier versions. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Fix instantiation of YMM registersChristophe Gisquet2015-08-11
| | | | | | | | | | Signed-off-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: warn when instructions incompatible with current cpuflags are usedAnton Mitrofanov2015-08-11
| | | | | | | | | | Signed-off-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Support arbitrary stack alignmentsHenrik Gramner2015-08-11
| | | | | | | | | | | | | | | | Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: warn if XOP integer FMA instruction emulation is impossibleAnton Mitrofanov2015-08-11
| | | | | | | | | | | | | | | | | | | | | | Emulation requires a temporary register if arguments 1 and 4 are the same; this doesn't obey the semantics of the original instruction, so we can't emulate that in x86inc. Also add pmacsdql emulation. Signed-off-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86inc: Clear __SECT__Timothy Gu2015-05-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Silences warning(s) like: libavcodec/x86/fft.asm:93: warning: section flags ignored on section redeclaration The cause of this warning is that because `struc` and `endstruc` attempts to revert to the previous section state [1]. The section state is stored in the macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION` directive [2]. Thus, the `.note.GNU-stack` section is defined twice (once in x86inc.asm, once during `endstruc`), causing the warning. That is the first part of the commit: using the primitive `[section]` format for .note.GNU-stack etc., which does not update `__SECT__` [2]. That fixes only half of the problem. Even without any `SECTION` directives, `__SECT__` is predefined as `.text`, which conflicting with the later `SECTION_TEXT` (which expands to `.text align=16`). [1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4 [2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3 Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
| * x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflagsHenrik Gramner2014-09-09
| | | | | | | | | | | | Previously there was a limit of two cpuflags. Signed-off-by: Diego Biurrun <diego@biurrun.de>
| * x86inc: Free up variable name "n" in global namespaceLoren Merritt2014-09-09
| | | | | | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
| * x86inc: Make ym# behave the same way as xm#Henrik Gramner2014-09-09
| | | | | | | | | | | | This makes more sense for future implementations of templates with zmm registers. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* | x86inc: don't use read-only data sections on COFF targetsJames Almer2017-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | Yasm: src/libavfilter/x86/af_volume.asm:24: warning: Standard COFF does not support read-only data sections src/libavfilter/x86/af_volume.asm:24: warning: Unrecognized qualifier `align' Nasm: src/libavfilter/x86/af_volume.asm:24: error: standard COFF does not support section alignment specification src/libavutil/x86/x86inc.asm:92: ... from macro `SECTION_RODATA' defined here Tested-by: Clément Bœsch <u@pkh.me> Signed-off-by: James Almer <jamrial@gmail.com>
* | x86inc: Add some additional cpuflag relationsHenrik Gramner2017-06-12
| | | | | | | | | | | | | | | | Simplifies writing assembly code that depends on available instructions. LZCNT implies SSE2 BMI1 implies AVX+LZCNT AVX2 implies BMI2
* | x86inc: Remove argument from WIN64_RESTORE_XMMAnton Mitrofanov2017-06-09
| | | | | | | | | | The use of rsp was pretty much hardcoded there and probably didn't work otherwise with stack_size > 0.
* | x86inc: Prefer r14/r15 over r12/r13 on x86-64Henrik Gramner2017-06-09
| | | | | | | | | | | | | | Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13 registers sometimes requires an additional byte when used as a base register. r14 and r15 doesn't have that issue, so prefer using them.
* | x86inc: Make REP_RET identical to RET in SSSE3+ functionsHenrik Gramner2017-06-09
| | | | | | | | There's no point in emitting a rep prefix before ret on modern CPUs.
* | x86inc: Fix call with memory operandsHenrik Gramner2017-06-09
| | | | | | | | | | | | We overload the `call` instruction with a macro, but it would misbehave when the macro argument wasn't a valid identifier. Fix it by explicitly checking if the argument is an identifier.
* | x86inc: Avoid using eax/rax for storing the stack pointerHenrik Gramner2017-01-09
| | | | | | | | | | | | | | | | | | When allocating stack space with an alignment requirement that is larger than the current stack alignment we need to store a copy of the original stack pointer in order to be able to restore it later. If we chose to use another register for this purpose we should not pick eax/rax since it can be overwritten as a return value.
* | x86inc: Enable AVX emulation in additional casesAnton Mitrofanov2016-04-20
| | | | | | | | | | Allows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`.
* | x86inc: Improve handling of %ifid with multi-token parametersAnton Mitrofanov2016-04-20
| | | | | | | | | | | | The yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want.
* | x86inc: Fix AVX emulation of some instructionsAnton Mitrofanov2016-04-20
| |
* | x86inc: Fix AVX emulation of scalar float instructionsHenrik Gramner2016-04-20
| | | | | | | | | | Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified.
* | x86inc: Add debug symbols indicating sizes of compiled functionsGeza Lore2016-01-21
| | | | | | | | | | | | | | | | | | Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF.
* | x86inc: Avoid creating unnecessary local labelsHenrik Gramner2016-01-21
| | | | | | | | | | | | | | | | | | | | The REP_RET workaround is only needed on old AMD cpus, and the labels clutter up the symbol table and confuse debugging/profiling tools, so use EQU to create SHN_ABS symbols instead of creating local labels. Furthermore, skip the workaround completely in functions that definitely won't run on such cpus. Note that EQU is just creating a local label when using nasm instead of yasm. This is probably a bug, but at least it doesn't break anything.
* | x86inc: Simplify AUTO_REP_RETHenrik Gramner2016-01-21
| | | | | | | | | | | | cpuflags is never undefined any more, it's set to 0 instead. Also fix an incorrect comment.
* | x86inc: Use more consistent indentationHenrik Gramner2016-01-21
| |
* | x86inc: Preserve arguments when allocating stack spaceHenrik Gramner2016-01-21
| | | | | | | | | | | | When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments.