| Commit message (Collapse) | Author | Age |
|
|
|
|
| |
They were superseded with their integer equivalents. Rename integer
decode_hf to decode_hf.
|
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
|
|
|
|
|
|
| |
This fixes compilation failures with --disable-fma3
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sandy Bridge Win64:
180 cycles in ff_synth_filter_inner_sse2
150 cycles in ff_synth_filter_inner_avx
Also switch some instructions to a three operand format to avoid
assembly errors with Yasm 1.1.0 or older.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
|
|
| |
Build only on x86_32 targets.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
|
|
|
| |
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.
Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.
The decode_hf implementations have following timings:
For x86 Arrandale:
C SSE SSE2 SSE4
win32: 260 162 119 104
win64: 242 N/A 89 72
The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
|
|
|
|
|
|
|
| |
Results for Arrandale/Windows:
32: 1670 -> 316
64: 728 -> 298
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
|
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|