summaryrefslogtreecommitdiff
path: root/libavutil/x86
Commit message (Collapse)AuthorAge
* v210enc: Add SIMD optimised 8-bit and 10-bit encodersKieran Kunhya2014-12-05
| | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflagsHenrik Gramner2014-09-09
| | | | | | Previously there was a limit of two cpuflags. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* x86inc: Free up variable name "n" in global namespaceLoren Merritt2014-09-09
| | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
* x86inc: Make ym# behave the same way as xm#Henrik Gramner2014-09-09
| | | | | | This makes more sense for future implementations of templates with zmm registers. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* Update Fiona's name in copyright statements.Diego Biurrun2014-07-01
|
* x86: add detection for Bit Manipulation Instruction setsJames Almer2014-02-23
| | | | | | Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com>
* x86: add detection for FMA3 instruction setJames Almer2014-02-23
| | | | | | Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com>
* x86: add missing XOP checks and macrosJames Almer2014-02-23
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86: float dsp: unroll SSE versionsChristophe Gisquet2014-02-20
| | | | | | | | | | vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* x86inc: Speed up assembling with YasmLoren Merritt2014-01-26
| | | | | | | Work around Yasm's inefficiency with handling large numbers of variables in the global scope. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* libavutil: x86: Add AVX2 capable CPU detection.Kieran Kunhya2013-10-25
| | | | | | Patch based on x264's AVX2 detection Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86: more AVX2 frameworkJason Garrett-Glaser2013-10-14
| | | | Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: FMA3/4 SupportJason Garrett-Glaser2013-10-14
| | | | Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Remove our FMA4 supportDerek Buitenhuis2013-10-14
| | | | | | | | This is so we can sync to x264's version of FMA4 support. This partialy reverts commit 79687079a97a039c325ab79d7a95920d800b791f. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Use VEX-encoded instructions in AVX functionsHenrik Gramner2013-10-14
| | | | | | | | | | | | | Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version. This change makes it easier to extend existing code to use AVX2. Also add support for AVX emulation of a few instructions that were missing before. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Remove .rodata kludgesHenrik Gramner2013-10-09
| | | | | | | The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old anymore. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: remove misaligned cpu flagHenrik Gramner2013-10-07
| | | | | | | | | | | | Prevents a crash if the misaligned exception mask bit is cleared for some reason. Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule. They also require modifying the MXCSR control register and by removing those functions we can get rid of that complexity altogether. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: various minor backports from x264Jason Garrett-Glaser2013-10-07
| | | | | | Small backports that sneaked into other asm commits in x264. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"Derek Buitenhuis2013-10-07
| | | | | | This is also a valid value for WIN64. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Utilize the shadow space on 64-bit WindowsHenrik Gramner2013-10-07
| | | | | | | | | Store XMM6 and XMM7 in the shadow space in functions that clobbers them. This way we don't have to adjust the stack pointer as often, reducing the number of instructions as well as code size. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: create xm# and ym#, analagous to m#Loren Merritt2013-10-07
| | | | | | For when we want to mix simd sizes within one function. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: fix some corner cases of SWAPLoren Merritt2013-10-07
| | | | | | | | SWAP with >=3 named (rather than numbered) args PERMUTE followed by SWAP with 2 named args used to produce the wrong permutation Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Use SSE instead of SSE2 for copying dataHenrik Gramner2013-10-07
| | | | | | | Reduces code size because movaps/movups is one byte shorter than movdqa/movdqu. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: Set ELF hidden visibility for global constantsHenrik Gramner2013-10-07
| | | | Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86inc: activate REP_RET automaticallyLoren Merritt2013-10-07
| | | | | | | | | | | | Now RET checks whether it immediately follows a branch, so the programmer dosen't have to keep track of that condition. REP_RET is still needed manually when it's a branch target, but that's much rarer. The implementation involves lots of spurious labels, but that's OK because we strip them. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* avutil: Fix compilation with inline asm disabled on mingwAlex Smith2013-09-22
| | | | | | Because of -Werror=implicit-function-declaration the build will fail. Signed-off-by: Martin Storsjö <martin@martin.st>
* x86: Add and use more convenience macros to check CPU extension availabilityDiego Biurrun2013-08-29
|
* avutil: Refactor CPU extension availability macrosDiego Biurrun2013-08-28
|
* avutil: Move internal CPU detection function declarations to private headerDiego Biurrun2013-08-28
|
* Consistently use "cpu_flags" as variable/parameter name for CPU flagsDiego Biurrun2013-07-18
|
* lls/x86: use 3-operator vaddpd in ADDPD_MEMLoren Merritt2013-07-02
| | | | | | Fixes build with yasm-1.1 Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86: lpc: fix a segfault in av_evaluate_lls_sse2()Loren Merritt2013-06-30
|
* x86: lpc: simd av_evaluate_llsLoren Merritt2013-06-29
| | | | | | 1.5x-1.8x faster on sandybridge Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* x86: lpc: simd av_update_llsLoren Merritt2013-06-29
| | | | | | 4x-6x faster on sandybridge Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* avutil: Add av_cold attributes to init functions missing themDiego Biurrun2013-05-04
|
* x86: float dsp: butterflies_float SSEChristophe Gisquet2013-05-03
| | | | | 97c -> 49c Some codecs could benefit from more unrolling, but AAC doesn't.
* dsputil: Make dsputil selectableRonald S. Bultje2013-04-10
| | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* x86inc: Fix number of operands for cmp* instructionsChristophe Gisquet2013-04-09
| | | | | | cmp{p,s}{s,d} instructions do take an imm8 operand. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* cosmetics: Remove unnecessary extern keywords from function declarationsDiego Biurrun2013-03-27
|
* x86: Use simple nop codes for <= sse (rather than <= mmx)Ronald S. Bultje2013-02-19
| | | | | | | | The "CentaurHauls family 6 model 9 stepping 8" family of CPUs (flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse up rng rng_en ace ace_en) SIGILLs on long nop codes. Signed-off-by: Martin Storsjö <martin@martin.st>
* avutil: Ensure that emms_c is always defined, even on non-x86Diego Biurrun2013-02-14
|
* avutil: Move emms code to x86-specific headerDiego Biurrun2013-02-14
|
* floatdsp: move scalarproduct_float from dsputil to avfloatdsp.Ronald S. Bultje2013-01-22
| | | | This makes the aac decoder and all voice codecs independent of dsputil.
* floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.Ronald S. Bultje2013-01-22
| | | | | | Now, nellymoserenc and aacenc no longer depends on dsputil. Independent of this patch, wmaprodec also does not depend on dsputil, so I removed it from there also.
* floatdsp: move vector_fmul_add from dsputil to avfloatdsp.Ronald S. Bultje2013-01-22
|
* x86: Add a Yasm-based emms() replacementMartin Storsjö2013-01-18
| | | | | | | This provides a fallback when building with Yasm enabled, but neither inline assembly, nor the _mm_empty intrinsic are available or enabled. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* x86inc: Add cvisible macro for C functions with public prefixDiego Biurrun2013-01-18
| | | | | | This allows defining externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* x86inc: Rename "program_name" to "private_prefix"Diego Biurrun2013-01-18
| | | | | | | The new name is more descriptive and will allow defining a separate public prefix for externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* float_dsp: Add #ifdef HAVE_INLINE_ASM around vector_fmul_windowMartin Storsjö2013-01-17
| | | | | | This fixes builds on 64bit MSVC. Signed-off-by: Martin Storsjö <martin@martin.st>
* lavc: Move vector_fmul_window to AVFloatDSPContextJustin Ruggles2013-01-16
| | | | Signed-off-by: Luca Barbato <lu_zero@gentoo.org>