| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
| |
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT, SSE and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2). So given that the only systems which benefit
from the 8x8 MMX (overridden by MMXEXT) or the 16x16 MMXEXT
(overridden by SSE2) are truely ancient 32bit x86s they are removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
|
|
| |
Otherwise nasm writes the full host-specific paths into .o
output, which breaks binary reproducibility.
Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
| |
Should fix compilation with old yasm/nasm
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
add ff_pixelutils_sad_32x32_sse2, ff_pixelutils_sad_{a,u}_32x32_sse2,
ff_pixelutils_sad_32x32_avx22, ff_pixelutils_sad_{a,u}_32x32_avx2
use perf record/report profiling, get instructions:u for avx2 sad_32x32:
72.05% pixelutils pixelutils [.] block_sad_32x32_c
18.50% pixelutils pixelutils [.] block_sad_16x16_c
4.78% pixelutils pixelutils [.] block_sad_8x8_c
2.69% pixelutils pixelutils [.] block_sad_4x4_c
0.89% pixelutils pixelutils [.] block_sad_2x2_c
0.16% pixelutils pixelutils [.] ff_pixelutils_sad_32x32_avx2
0.16% pixelutils pixelutils [.] ff_pixelutils_sad_u_32x32_avx2
0.12% pixelutils pixelutils [.] ff_pixelutils_sad_a_32x32_avx2
sse2 sad_32x32 instructions:u like:
71.86% pixelutils pixelutils [.] block_sad_32x32_c
18.42% pixelutils pixelutils [.] block_sad_16x16_c
4.81% pixelutils pixelutils [.] block_sad_8x8_c
2.68% pixelutils pixelutils [.] block_sad_4x4_c
0.88% pixelutils pixelutils [.] block_sad_2x2_c
0.29% pixelutils pixelutils [.] ff_pixelutils_sad_32x32_sse2
0.26% pixelutils pixelutils [.] ff_pixelutils_sad_u_32x32_sse2
0.23% pixelutils pixelutils [.] ff_pixelutils_sad_a_32x32_sse2
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
|
|
|
|
| |
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
|
|
|
|
|
| |
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
|
|
|
|
|
|
| |
501 to 439 decicycles.
See 45c7f3997ea11c3d1007b2126b1c0049a8c27105.
|
|
|
|
|
|
|
|
|
|
| |
~560 → ~500 decicycles
This is following the comments from Michael in
https://ffmpeg.org/pipermail/ffmpeg-devel/2014-August/160599.html
Using 2 registers for accumulator didn't help. On the other hand,
some re-ordering between the movs and psadbw allowed going ~538 to ~500.
|
|
|