| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
~3x to 5x faster.
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
| |
Should fix compilation with old yasm/nasm
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
add ff_pixelutils_sad_32x32_sse2, ff_pixelutils_sad_{a,u}_32x32_sse2,
ff_pixelutils_sad_32x32_avx22, ff_pixelutils_sad_{a,u}_32x32_avx2
use perf record/report profiling, get instructions:u for avx2 sad_32x32:
72.05% pixelutils pixelutils [.] block_sad_32x32_c
18.50% pixelutils pixelutils [.] block_sad_16x16_c
4.78% pixelutils pixelutils [.] block_sad_8x8_c
2.69% pixelutils pixelutils [.] block_sad_4x4_c
0.89% pixelutils pixelutils [.] block_sad_2x2_c
0.16% pixelutils pixelutils [.] ff_pixelutils_sad_32x32_avx2
0.16% pixelutils pixelutils [.] ff_pixelutils_sad_u_32x32_avx2
0.12% pixelutils pixelutils [.] ff_pixelutils_sad_a_32x32_avx2
sse2 sad_32x32 instructions:u like:
71.86% pixelutils pixelutils [.] block_sad_32x32_c
18.42% pixelutils pixelutils [.] block_sad_16x16_c
4.81% pixelutils pixelutils [.] block_sad_8x8_c
2.68% pixelutils pixelutils [.] block_sad_4x4_c
0.88% pixelutils pixelutils [.] block_sad_2x2_c
0.29% pixelutils pixelutils [.] ff_pixelutils_sad_32x32_sse2
0.26% pixelutils pixelutils [.] ff_pixelutils_sad_u_32x32_sse2
0.23% pixelutils pixelutils [.] ff_pixelutils_sad_a_32x32_sse2
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
|
| |
|
|
|
|
| |
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
|
|\
| |
| |
| |
| |
| |
| | |
* commit '4cf84e254ae75b524e1cacae499a97d7cc9e5906':
Drop some unnecessary config.h #includes
Merged-by: James Almer <jamrial@gmail.com>
|
| | |
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
| | |
|
| | |
|
| |
| |
| |
| |
| | |
On ELF platforms such symbols needs to be flagged as functions with the
correct visibility to please certain linkers in some scenarios.
|
| |
| |
| |
| |
| | |
The standard section for read-only data on Windows is .rdata. Nasm will
flag non-standard sections as executable by default which isn't ideal.
|
| |
| |
| |
| |
| |
| |
| | |
There are 32 pseudo-instructions for each floating-point comparison
instruction, but only 8 of them are actually valid in legacy-encoded mode.
The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions
and can therefore be disregarded for this purpose.
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
but not used
Fixes compilation of libavresample/x86/audio_mix.asm
Reviewed-by: Gramner
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
AVX-512 consists of a plethora of different extensions, but in order to keep
things a bit more manageable we group together the following extensions
under a single baseline cpu flag which should cover SKL-X and future CPUs:
* AVX-512 Foundation (F)
* AVX-512 Conflict Detection Instructions (CD)
* AVX-512 Byte and Word Instructions (BW)
* AVX-512 Doubleword and Quadword Instructions (DQ)
* AVX-512 Vector Length Extensions (VL)
On x86-64 AVX-512 provides 16 additional vector registers, prefer using
those over existing ones since it allows us to avoid using `vzeroupper`
unless more than 16 vector registers are required. They also happen to
be volatile on Windows which means that we don't need to save and restore
existing xmm register contents unless more than 22 vector registers are
required.
Big thanks to Intel for their support.
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| | |
each part of an ymm in order to simplify avx2 asm func
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Technically _tzcnt* intrinsics are only available when the BMI
instruction set is present. However the instruction encoding
degrades to "rep bsf" on older processors.
Clang for Windows debatably restricts the _tzcnt* instrinics behind
the __BMI__ architecture define, so check for its presence or
exclude the usage of these intrinics when clang is present.
See also:
https://ffmpeg.org/pipermail/ffmpeg-devel/2015-November/183404.html
https://bugs.llvm.org/show_bug.cgi?id=30506
http://lists.llvm.org/pipermail/cfe-dev/2016-October/051034.html
Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Matt Oliver <protogonoi@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2':
x86util: Port all macros to cpuflags
See d5f8a642f6eb1c6e305c41dabddd0fd36ffb3f77
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| | |
Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2
macro name, drop pointless check for MMX support, we always assume MMX is
available in our SIMD code, fix spelling.
|
| |
| |
| |
| | |
None of them are specific to the YASM assembler.
|
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6':
asm: Consistently uppercase SECTION markers
Merged-by: James Almer <jamrial@gmail.com>
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.
If we chose to use another register for this purpose we should not pick
eax/rax since it can be overwritten as a return value.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Improved version of VBROADCASTSS that works like the avx2 instruction.
Emulation of vpbroadcastd.
Horizontal sum HSUMPS that places the result in all elements.
Emulation of blendvps and pblendvb.
Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Yasm:
src/libavfilter/x86/af_volume.asm:24: warning: Standard COFF does not support read-only data sections
src/libavfilter/x86/af_volume.asm:24: warning: Unrecognized qualifier `align'
Nasm:
src/libavfilter/x86/af_volume.asm:24: error: standard COFF does not support section alignment specification
src/libavutil/x86/x86inc.asm:92: ... from macro `SECTION_RODATA' defined here
Tested-by: Clément Bœsch <u@pkh.me>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
None of them are specific to the YASM assembler.
(Cherry-picked from libav commit 39e208f4d4756367c7cd2d581847e0c1b8a429c1)
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
About 2x faster than the c version.
|
| |
| |
| |
| |
| |
| |
| |
| | |
Simplifies writing assembly code that depends on available instructions.
LZCNT implies SSE2
BMI1 implies AVX+LZCNT
AVX2 implies BMI2
|
| |
| |
| |
| |
| | |
The use of rsp was pretty much hardcoded there and probably didn't work
otherwise with stack_size > 0.
|
| |
| |
| |
| |
| |
| |
| | |
Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13
registers sometimes requires an additional byte when used as a base register.
r14 and r15 doesn't have that issue, so prefer using them.
|
| |
| |
| |
| | |
There's no point in emitting a rep prefix before ret on modern CPUs.
|
| |
| |
| |
| |
| |
| | |
We overload the `call` instruction with a macro, but it would misbehave when
the macro argument wasn't a valid identifier. Fix it by explicitly checking
if the argument is an identifier.
|
| | |
|
| |
| |
| |
| |
| |
| | |
~20% faster than AVX.
Signed-off-by: James Almer <jamrial@gmail.com>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '99434f4df81b6801b2b535d5b9143305595784f6':
float_dsp: Have implementation match function pointer prototype
Merged-by: Clément Bœsch <cboesch@gopro.com>
|
| |
| |
| |
| |
| | |
libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 1 different from declaration
libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 2 different from declaration
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8':
emms: Give apriv_emms_yasm() a more general name
Merged-by: James Almer <jamrial@gmail.com>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4':
x86: Add missing colons after assembly labels
Merged-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| | |
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
|
| |
| |
| |
| |
| |
| |
| | |
are the same
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '07e1f99a1bb41d1a615676140eefc85cf69fa793':
x86util: Document SBUTTERFLY macro
Merged-by: Clément Bœsch <u@pkh.me>
|
| |
| |
| |
| | |
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5':
imgutils: add a function for copying image data from GPU mapped memory
Merged-by: Clément Bœsch <u@pkh.me>
|