summaryrefslogtreecommitdiff
path: root/libavutil/x86
Commit message (Collapse)AuthorAge
* x86: get_cpu_flags: add necessary ifdefs around function bodyDiego Biurrun2012-10-04
| | | | | | | ff_get_cpu_flags_x86() requires cpuid(), which is conditionally defined elsewhere in the file. Surrounding the function body with ifdefs allows building even when cpuid is not defined. An empty cpuflags mask is returned in this case.
* x86: Drop CPU detection intrinsicsDiego Biurrun2012-10-04
| | | | | | Now that there is CPU detection in YASM, there will always be one of inline or external assembly enabled, which obviates the need to fall back on CPU detection through compiler intrinsics.
* x86: Add YASM implementations of cpuid and xgetbv from x264Diego Biurrun2012-10-04
| | | | | This allows detecting CPU features with builds that have neither gcc inline assembly nor the right compiler intrinsics enabled.
* x86: cpu: Break out test for cpuid capabilities into separate functionDiego Biurrun2012-10-04
|
* x86: ff_get_cpu_flags_x86(): Avoid a pointless variable indirectionDiego Biurrun2012-10-04
|
* x86: Replace checks for CPU extensions and flags by convenience macrosDiego Biurrun2012-09-08
| | | | | This separates code relying on inline from that relying on external assembly and fixes instances where the coalesced check was incorrect.
* x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64Justin Ruggles2012-09-07
| | | | | The SWAP macro does not work for explicit xmm/ymm usage, so instead just move the scalar value from xmm2 to xmm0.
* x86: Add convenience macros to check for CPU extensions and flagsDiego Biurrun2012-09-04
|
* x86: Split inline and external assembly #ifdefsDiego Biurrun2012-08-31
|
* x86: cosmetics: Comment some #endifs for better readabilityDiego Biurrun2012-08-30
|
* vf_hqdn3d: x86 asmLoren Merritt2012-08-26
| | | | | 13% faster on penryn, 16% on sandybridge, 15% on bulldozer Not simd; a compiler should have generated this, but gcc didn't.
* lavr: x86: optimized 6-channel s16 to fltp conversionJustin Ruggles2012-08-23
|
* x86: remove FASTDIV inline asmMans Rullgard2012-08-22
| | | | | | | | | | | | | | | | | | | GCC 4.3 and later do the right thing with the plain C code. Earlier versions in 32-bit mode generate one extra instruction, needlessly zeroing what would be the high half of the shifted value. At least two gcc configurations miscompile the inline asm in some situations. In 64-bit mode, all gcc versions generate imul r64, r64 followed by shr. On Intel i7 and later, this imul is faster 32-bit mul. On older Intel and all AMD, it is slightly slower. On Atom it is much slower. Considering where the FASTDIV macro is used, any overall negative performance impact of this change should be negligible. If anyone cares, they should file a bug against gcc and get the instruction selection fixed. Signed-off-by: Mans Rullgard <mans@mansr.com>
* Add more missing includes after removing the implicit common.hMartin Storsjö2012-08-16
| | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* Add some more missing includes after removing the implicit common.hMartin Storsjö2012-08-15
| | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* x86: move MANGLE() and related macros to libavutil/x86/asm.hMans Rullgard2012-08-09
| | | | | | These x86-specific macros do not belong in generic code. Signed-off-by: Mans Rullgard <mans@mansr.com>
* x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.hMans Rullgard2012-08-09
| | | | | | | This puts x86-specific things in the x86/ subdirectory where they belong. Signed-off-by: Mans Rullgard <mans@mansr.com>
* x86: fix build with nasm 2.08Mans Rullgard2012-08-07
| | | | | | | | It appears that something goes wrong in old nasm versions when the %+ operator is used in the last argument of a macro invocation and this argument is tested with %ifdef within the macro. This patch rearranges the macro arguments such that the %+ operator is never used in the last argument.
* x86: use nop cpu directives only if supportedMans Rullgard2012-08-07
| | | | | | | nasm does not support 'CPU foonop' directives. This adds a configure test for the directive and uses it only if supported. Signed-off-by: Mans Rullgard <mans@mansr.com>
* x86: fix rNmp macros with nasmMans Rullgard2012-08-07
| | | | | | For some reason, nasm requires this. No harm done to yasm. Signed-off-by: Mans Rullgard <mans@mansr.com>
* x86: add colons after labelsMans Rullgard2012-08-07
| | | | | | nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>
* x86: build: replace mmx2 by mmxextDiego Biurrun2012-08-03
| | | | | | | Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.
* x86: Use consistent 3dnowext function and macro name suffixesDiego Biurrun2012-08-03
| | | | | | Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to "3dnowext", which is a more common name of the CPU flag, as reported e.g. by the Linux kernel, unifies this.
* x86inc: clip num_args to 7 on x86-32.Loren Merritt2012-07-28
| | | | | | | | This allows us to unconditionally set the cglobal num_args parameter to a bigger value, thus making writing yasm code even easier than before. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* x86inc: sync to latest version from x264.Ronald S. Bultje2012-07-28
|
* x86: add support for fmaddps fma4 instruction with abstraction to avx/sseJustin Ruggles2012-07-27
|
* x86inc: automatically insert vzeroupper for YMM functions.Ronald S. Bultje2012-07-26
|
* dsputil: x86: add SHUFFLE_MASK_W macroJason Garrett-Glaser2012-07-22
| | | | Simplifies pshufb masks that operate on words.
* x86/cpu: implement get/set_eflags using intrinsicsRonald S. Bultje2012-07-10
| | | | | Signed-off-by: Diego Biurrun <diego@biurrun.de> Signed-off-by: Martin Storsjö <martin@martin.st>
* x86/cpu: implement support for cpuid through intrinsicsRonald S. Bultje2012-07-10
| | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* x86/cpu: implement support for xgetbv through intrinsicsRonald S. Bultje2012-07-10
| | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* x86/timer: implement an intrinsic-based version for rdtsc (AV_READ_TIME).Ronald S. Bultje2012-07-07
|
* x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macrosLoren Merritt2012-07-05
| | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
* x86inc: modify ALIGN to not generate long nops on i586Loren Merritt2012-07-05
| | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
* x86: cpu: clean up check for cpuid instruction supportMans Rullgard2012-07-01
| | | | | | | This adds macros for accessing the EFLAGS register and uses these instead of coding the entire check in inline asm. Signed-off-by: Mans Rullgard <mans@mansr.com>
* x86: cpu: whitespace (mostly) cosmeticsMans Rullgard2012-06-25
| | | | | | | | | This adds whitespace around operators, aligns line continuation backslashes, and breaks long lines. Also fixes an ifdef halfway through a statement. The one line of duplication this saved is not worth the ugliness. Signed-off-by: Mans Rullgard <mans@mansr.com>
* x86: place some inline asm under #if HAVE_INLINE_ASMRonald S. Bultje2012-06-25
| | | | Signed-off-by: Mans Rullgard <mans@mansr.com>
* x86: Add CPU flag for the i686 cmov instructionDiego Biurrun2012-06-23
|
* float_dsp: add x86-optimized functions for vector_fmac_scalar()Justin Ruggles2012-06-18
|
* Add a float DSP framework to libavutilJustin Ruggles2012-06-08
| | | | Move vector_fmul() from DSPContext to AVFloatDSPContext.
* x86: Avoid movs on BUTTERFLYPS when in AVX modeVitor Sessak2012-05-29
| | | | Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* lavr: replace the SSE version of ff_conv_fltp_to_flt_6ch() with SSE4 and AVXJustin Ruggles2012-05-09
| | | | | The current SSE version is slower than the MMX version on Athlon64 and Sandy Bridge, but the SSE4 and AVX versions are faster on Sandy Bridge.
* Add libavresampleJustin Ruggles2012-04-24
| | | | | This is a new library for audio sample format, channel layout, and sample rate conversion.
* x86inc: support AVX abstraction for 2-operand instructionsLoren Merritt2012-04-18
| | | | | | Add cvtdq2ps and cvtps2dq to the AVX instruction list. Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
* build: Move all arch OBJS declarations into arch subdirectory Makefiles.Diego Biurrun2012-04-12
|
* x86inc improvements for 64-bitHenrik Gramner2012-04-11
| | | | | | | | | | | | Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
* x86inc: add *mp named argument support to DEFINE_ARGS.Ronald S. Bultje2012-03-14
|
* x86inc: don't "bake" stack_offset in named arguments.Loren Merritt2012-03-03
| | | | Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* x86inc: support yasm -f win64 flag also.Haruhiko Yamagata2012-02-08
| | | | | | | This sets __OUTPUT_FORMAT__ to win64 instead of win32, even though both (through -m amd64) produce 64-bit binary code. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* x86inc: allow manual use of WIN64_SPILL_XMM.Henrik Gramner2012-02-08
| | | | | | | | Functions using INIT_MMX may still access XMM registers through direct means (xmm0-15). Therefore, they still need to be marked for clobber so they can be properly saved/restored. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>