summaryrefslogtreecommitdiff
path: root/libavcodec/x86
Commit message (Collapse)AuthorAge
* x86: lossless audio: SSE4 madd 32bitsChristophe Gisquet2016-05-07
| | | | | | | | | | The unique user so far is wmalossless 24bits. The few samples tested show an order of 8, so more unrolling or an avx2 version do not make sense. Timings: 68 -> 49 cycles Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* Merge commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622'Derek Buitenhuis2016-04-12
|\ | | | | | | | | | | | | * commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622': fft: x86: cosmetics: Drop silly comments, add comment, whitespace Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
| * fft: x86: cosmetics: Drop silly comments, add comment, whitespaceDiego Biurrun2016-02-26
| |
| * x86: hevc: Fix linking with both yasm and optimizations disabledDiego Biurrun2016-02-23
| | | | | | | | | | Some optimized functions reference optimized symbols, so the functions must be explicitly disabled when those symbols are unavailable.
* | avcodec/fft: Add revtab32 for FFTs with more than 65536 samplesMichael Niedermayer2016-03-04
| | | | | | | | | | | | x86 optimizations are used only for the cases they support (<=65536 samples) Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* | avcodec: Extend fft to size 2^17Michael Niedermayer2016-03-04
| | | | | | | | | | | | Asked-for-by: durandal_1707 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* | x86/vc1dsp: Split the file into MC and loopfilterTimothy Gu2016-02-29
| |
* | Merge commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c'Derek Buitenhuis2016-02-24
|\| | | | | | | | | | | | | * commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c': build: Add vc1dsp component for more fine-grained dependencies Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
| * build: Add vc1dsp component for more fine-grained dependenciesDiego Biurrun2016-02-19
| |
* | x86/dcadec: add ff_lfe_fir1_float_{sse3,avx}James Almer2016-02-22
| | | | | | | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | Merge commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45'Derek Buitenhuis2016-02-16
|\| | | | | | | | | | | | | * commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45': v210: Use separate sample_factors Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
| * v210: Use separate sample_factorsLuca Barbato2016-02-01
| | | | | | | | | | | | | | The 10bit and the 8bit functions can now be implemented to process a different amount of samples. And while at it simplify a little the code.
| * v210: Add avx2 version of the 10-bit line encoderJames Darnley2016-02-01
| | | | | | | | | | | | Around 25% faster than the ssse3 version. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
| * v210: Add avx2 version of the 8-bit line encoderJames Darnley2016-02-01
| | | | | | | | | | | | | | Around 35% faster than the avx version. Signed-off-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* | Merge commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a'Derek Buitenhuis2016-02-16
|\| | | | | | | | | | | | | * commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a': v210: x86: Add the correct guards around the asm code Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
| * v210: x86: Add the correct guards around the asm codeLuca Barbato2016-01-26
| | | | | | | | Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
| * x86inc: Add debug symbols indicating sizes of compiled functionsGeza Lore2016-01-23
| | | | | | | | | | | | | | | | | | | | | | Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF. Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * x86: build: Group all encoder objects togetherDiego Biurrun2016-01-18
| |
* | x86: use the new helper macros where usefulJames Almer2016-02-14
| | | | | | | | | | Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
* | x86/vc1dsp: Port vc1_*_hor_16b_shift2 to NASM formatTimothy Gu2016-02-14
| | | | | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
* | huffyuvencdsp: Undefine "i" macro after each useTimothy Gu2016-02-07
| |
* | x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3}James Almer2016-02-06
| | | | | | | | | | | | | | Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | dirac_dwt: Make x86 files/functions names consistentTimothy Gu2016-02-05
| |
* | diracdsp: Make x86 files/functions names consistentTimothy Gu2016-02-05
| |
* | avcodec/h264: Fix segfault in 4:2:2 chroma deblock with 32-bit msvcHenrik Gramner2016-02-05
| | | | | | | | | | | | Using rNm and x86inc's stack allocation with a negative value at the same time isn't supported, and caused the original stack pointer to be clobbered when using a compiler that doesn't support stack alignment.
* | avcodec/h264: mmxext 4:2:2 chroma deblock/loop filterJames Darnley2016-02-05
| | | | | | | | 2.6 times faster (366 vs. 142 cycles)
* | diracdsp_mmx: Fix some more indentationsTimothy Gu2016-02-01
| |
* | diracdsp_mmx: Fix indentationTimothy Gu2016-02-01
| |
* | x86: vc1dsp: Convert vc1_inv_trans_*_dc to NASM formatTimothy Gu2016-02-01
| |
* | all: Make header guard names consistentTimothy Gu2016-01-31
| |
* | avcodec/dca: add new decoder based on libdcadecfoo862016-01-31
| |
* | avcodec/dca: remove old decoderfoo862016-01-31
| | | | | | | | | | Remove all files and functions which are not going to be reused, and disable all functions and FATE tests temporarily which will be.
* | x86/imdct36: use extractps inside the STORE macroJames Almer2016-01-28
| | | | | | | | | | | | Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>
* | Merge commit '4f22b138886e29f7fffa8c715673951e51be9f32'Derek Buitenhuis2016-01-27
|\| | | | | | | Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
| * x86: ac3dsp: Drop forward declaration for nonexisting functionDiego Biurrun2016-01-18
| |
* | avcodec/synth_filter: split off remaining code from dcadec filesJames Almer2016-01-25
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | x86inc: Add debug symbols indicating sizes of compiled functionsGeza Lore2016-01-21
| | | | | | | | | | | | | | | | | | Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF.
* | videodsp: fix 1-byte overread in top/bottom READ_NUM_BYTES iterations.Ronald S. Bultje2016-01-18
| | | | | | | | | | | | | | This can overread (either before start or beyond end) of the buffer in Nx1 (i.e. height=1) images. Fixes mozilla bug 1240080.
* | avcodec/v210: guard new avx2 functions from old assemblersJames Darnley2016-01-17
| |
* | avcodec/v210: add avx2 version of the 10-bit line encoderJames Darnley2016-01-17
| | | | | | | | Around 25% faster than the ssse3 version.
* | avcodec/v210: add avx2 version of the 8-bit line encoderJames Darnley2016-01-17
| | | | | | | | | | | | Around 35% faster than the avx version. Signed-off-by: Henrik Gramner <henrik@gramner.com>
* | avcodec/x86/fmtconvert: Add emms to int32_to_float_fmul_array8_sse()Michael Niedermayer2016-01-15
| | | | | | | | | | | | this should fix checkasm on x86_64-archlinux-gcc-valgrind Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* | Merge commit '8563f9887194b07c972c3475d6b51592d77f73f7'Hendrik Leppkes2016-01-02
|\| | | | | | | | | | | | | * commit '8563f9887194b07c972c3475d6b51592d77f73f7': x86: use emms after ff_int32_to_float_fmul_scalar_sse Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
| * x86: use emms after ff_int32_to_float_fmul_scalar_sseJanne Grunau2015-12-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel's Instruction Set Reference (as of September 2015) clearly states that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the source is a memory location. The Instruction Set Reference from 1999 (Order Number 243191) describes this behaviour but all later versions I've seen have make no distinction whether MMX registers or memory is used as source. The documentation for the matching SSE2 instruction to convert to double (cvtpi2pd) was fixed (see the valgrind bug https://bugs.kde.org/show_bug.cgi?id=210264). It will take time to get a clarification and fixes in place. In the meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to be correct according to the documentation. The vast majority of users will have SSE2 so a change to the SSE version has little effect. Fixes fate-checkasm on x86 valgrind targets. Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059
* | Merge commit 'f4f27e4cf1013c55b2c7df359ce8d58ee922662c'Hendrik Leppkes2016-01-02
|\| | | | | | | | | | | | | * commit 'f4f27e4cf1013c55b2c7df359ce8d58ee922662c': x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
| * x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitlyJanne Grunau2015-12-29
| | | | | | | | This reverts commit 5dfe4edad63971d669ae456b0bc40ef9364cca80.
* | Merge commit '2008f76054906e9ff6bf744800af0e5a5bfe61be'Hendrik Leppkes2016-01-02
|\| | | | | | | | | | | | | * commit '2008f76054906e9ff6bf744800af0e5a5bfe61be': dca: remove unused decode_hf function and quant_d tables Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
| * dca: remove unused decode_hf function and quant_d tablesAlexandra Hájková2015-12-24
| | | | | | | | | | They were superseded with their integer equivalents. Rename integer decode_hf to decode_hf.
* | Merge commit '5dfe4edad63971d669ae456b0bc40ef9364cca80'Hendrik Leppkes2016-01-02
|\| | | | | | | | | | | | | * commit '5dfe4edad63971d669ae456b0bc40ef9364cca80': x86_64: int32_to_float_fmul_scalar sign extend integer length Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
| * x86_64: int32_to_float_fmul_scalar sign extend integer lengthJanne Grunau2015-12-14
| |