| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
| |
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.
Timings: 68 -> 49 cycles
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|\
| |
| |
| |
| |
| |
| | |
* commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622':
fft: x86: cosmetics: Drop silly comments, add comment, whitespace
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| | |
|
| |
| |
| |
| |
| | |
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
|
| |
| |
| |
| |
| |
| | |
x86 optimizations are used only for the cases they support (<=65536 samples)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
| |
| |
| |
| |
| | |
Asked-for-by: durandal_1707
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c':
build: Add vc1dsp component for more fine-grained dependencies
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| | |
|
| |
| |
| |
| |
| | |
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45':
v210: Use separate sample_factors
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
The 10bit and the 8bit functions can now be implemented to process
a different amount of samples.
And while at it simplify a little the code.
|
| |
| |
| |
| |
| |
| | |
Around 25% faster than the ssse3 version.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
| |
| |
| |
| |
| |
| |
| | |
Around 35% faster than the avx version.
Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a':
v210: x86: Add the correct guards around the asm code
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| | |
|
| |
| |
| |
| |
| | |
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| | |
Using rNm and x86inc's stack allocation with a negative value at the same
time isn't supported, and caused the original stack pointer to be clobbered
when using a compiler that doesn't support stack alignment.
|
| |
| |
| |
| | |
2.6 times faster (366 vs. 142 cycles)
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| | |
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
|
| |
| |
| |
| |
| |
| | |
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\|
| |
| |
| | |
Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| | |
|
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
|
| |
| |
| |
| |
| |
| |
| | |
This can overread (either before start or beyond end) of the buffer in
Nx1 (i.e. height=1) images.
Fixes mozilla bug 1240080.
|
| | |
|
| |
| |
| |
| | |
Around 25% faster than the ssse3 version.
|
| |
| |
| |
| |
| |
| | |
Around 35% faster than the avx version.
Signed-off-by: Henrik Gramner <henrik@gramner.com>
|
| |
| |
| |
| |
| |
| | |
this should fix checkasm on x86_64-archlinux-gcc-valgrind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '8563f9887194b07c972c3475d6b51592d77f73f7':
x86: use emms after ff_int32_to_float_fmul_scalar_sse
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Intel's Instruction Set Reference (as of September 2015) clearly states
that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the
source is a memory location. The Instruction Set Reference from 1999
(Order Number 243191) describes this behaviour but all later versions
I've seen have make no distinction whether MMX registers or memory is
used as source.
The documentation for the matching SSE2 instruction to convert to double
(cvtpi2pd) was fixed (see the valgrind bug
https://bugs.kde.org/show_bug.cgi?id=210264).
It will take time to get a clarification and fixes in place. In the
meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to
be correct according to the documentation. The vast majority of users
will have SSE2 so a change to the SSE version has little effect.
Fixes fate-checkasm on x86 valgrind targets.
Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'f4f27e4cf1013c55b2c7df359ce8d58ee922662c':
x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| | |
This reverts commit 5dfe4edad63971d669ae456b0bc40ef9364cca80.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '2008f76054906e9ff6bf744800af0e5a5bfe61be':
dca: remove unused decode_hf function and quant_d tables
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| |
| |
| |
| |
| | |
They were superseded with their integer equivalents. Rename integer
decode_hf to decode_hf.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '5dfe4edad63971d669ae456b0bc40ef9364cca80':
x86_64: int32_to_float_fmul_scalar sign extend integer length
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| | |
|