| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
ff_get_cpu_flags_x86() requires cpuid(), which is conditionally defined
elsewhere in the file. Surrounding the function body with ifdefs allows
building even when cpuid is not defined. An empty cpuflags mask is
returned in this case.
|
|
|
|
|
|
| |
Now that there is CPU detection in YASM, there will always be one of
inline or external assembly enabled, which obviates the need to fall
back on CPU detection through compiler intrinsics.
|
|
|
|
|
| |
This allows detecting CPU features with builds that have neither
gcc inline assembly nor the right compiler intrinsics enabled.
|
| |
|
| |
|
|
|
|
|
| |
This separates code relying on inline from that relying on external
assembly and fixes instances where the coalesced check was incorrect.
|
|
|
|
|
| |
The SWAP macro does not work for explicit xmm/ymm usage, so instead just move
the scalar value from xmm2 to xmm0.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
13% faster on penryn, 16% on sandybridge, 15% on bulldozer
Not simd; a compiler should have generated this, but gcc didn't.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC 4.3 and later do the right thing with the plain C code. Earlier
versions in 32-bit mode generate one extra instruction, needlessly
zeroing what would be the high half of the shifted value. At least
two gcc configurations miscompile the inline asm in some situations.
In 64-bit mode, all gcc versions generate imul r64, r64 followed by
shr. On Intel i7 and later, this imul is faster 32-bit mul. On
older Intel and all AMD, it is slightly slower. On Atom it is much
slower.
Considering where the FASTDIV macro is used, any overall negative
performance impact of this change should be negligible. If anyone
cares, they should file a bug against gcc and get the instruction
selection fixed.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
| |
These x86-specific macros do not belong in generic code.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
| |
This puts x86-specific things in the x86/ subdirectory where they
belong.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
|
| |
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro. This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
|
|
|
|
|
|
|
| |
nasm does not support 'CPU foonop' directives. This adds a configure
test for the directive and uses it only if supported.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
| |
For some reason, nasm requires this. No harm done to yasm.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
| |
nasm prints a warning if the colon is missing.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
| |
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
|
|
|
|
|
|
| |
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
|
|
|
|
|
|
|
|
| |
This allows us to unconditionally set the cglobal num_args
parameter to a bigger value, thus making writing yasm code
even easier than before.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
| |
|
| |
|
| |
|
|
|
|
| |
Simplifies pshufb masks that operate on words.
|
|
|
|
|
| |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
| |
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| |
|
|
|
|
| |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|
|
|
| |
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|
|
|
|
|
|
| |
This adds macros for accessing the EFLAGS register and uses
these instead of coding the entire check in inline asm.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
|
|
|
|
|
| |
This adds whitespace around operators, aligns line continuation
backslashes, and breaks long lines. Also fixes an ifdef halfway
through a statement. The one line of duplication this saved is
not worth the ugliness.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
|
|
| |
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
| |
|
| |
|
|
|
|
| |
Move vector_fmul() from DSPContext to AVFloatDSPContext.
|
|
|
|
| |
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
|
|
|
|
| |
The current SSE version is slower than the MMX version on Athlon64 and Sandy
Bridge, but the SSE4 and AVX versions are faster on Sandy Bridge.
|
|
|
|
|
| |
This is a new library for audio sample format, channel layout, and sample rate
conversion.
|
|
|
|
|
|
| |
Add cvtdq2ps and cvtps2dq to the AVX instruction list.
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments
Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
|
| |
|
|
|
|
| |
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
|
|
|
| |
This sets __OUTPUT_FORMAT__ to win64 instead of win32, even though both
(through -m amd64) produce 64-bit binary code.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
|
|
|
|
| |
Functions using INIT_MMX may still access XMM registers through direct
means (xmm0-15). Therefore, they still need to be marked for clobber
so they can be properly saved/restored.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|