summaryrefslogtreecommitdiff
path: root/libavcodec/x86
Commit message (Collapse)AuthorAge
* dcadec: simplify decoding of VQ high frequenciesChristophe Gisquet2014-02-28
| | | | | | | | | | | | | | | | | | | The vector dequantization has a test in a loop preventing effective SIMD implementation. By moving it out of the loop, this loop can be DSPized. Therefore, modify the current DSP implementation. In particular, the DSP implementation no longer has to handle null loop sizes. The decode_hf implementations have following timings: For x86 Arrandale: C SSE SSE2 SSE4 win32: 260 162 119 104 win64: 242 N/A 89 72 The arm NEON optimizations follow in a later patch as external asm. The now unused check for the y modifier in arm inline asm is removed from configure.
* x86: synth filter float: implement SSE2 versionChristophe Gisquet2014-02-28
| | | | | | | | | | | | | | Timings for Arrandale: C SSE win32: 2108 334 win64: 1152 322 Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with the jmp destination being aligned. Unrolling for ARCH_X86_64 is a 20 cycles gain. Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* x86: dcadsp: implement SSE lfe_dirChristophe Gisquet2014-02-28
| | | | | | | | Results for Arrandale/Windows: 32: 1670 -> 316 64: 728 -> 298 Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* prores: Use consistent names for DSP arch initialization functionsDiego Biurrun2014-02-28
|
* x86: dsputil: Use correct file name as multiple inclusion guardDiego Biurrun2014-02-20
|
* x86: dca: Add missing multiple inclusion guardsDiego Biurrun2014-02-19
|
* dca: include dcadsp.h in {arm,x86}/dca.h for checkheadersJanne Grunau2014-02-08
|
* x86: use the inline int8x8_fmul_int32 only if inline SSE2 is availbaleJanne Grunau2014-02-08
| | | | | Fixes compilation with MSVC. Also does not rely on on earlier config.h include but include it directly.
* x86: dcadsp: implement int8x8_fmul_int32Christophe Gisquet2014-02-07
| | | | | | | | | | | For the callable function (as opposed to the inline one): C SSE SSE2 SSE4 Win32: 47 42 29 26 Win64: 30 33 25 23 The SSE version is neither compiled nor set for ARCH_X86_64, as the inlinable function takes over. Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* x86: videodsp: Fix a bug in a %if statement where we used '%%' instead of '&&'.Ronald S. Bultje2014-01-30
| | | | Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* x86: videodsp: Properly mark sse2 instructions in emulated_edge_mc as such.Ronald S. Bultje2014-01-30
| | | | | | | | Should fix crashes or corrupt output on pre-SSE2 CPUs when they were using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in hfix or hvar single-edge (left/right) extension functions. Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* x86: dsputil: Simplify xvmc deprecation conditionalDiego Biurrun2014-01-15
|
* x86: Consistently use cpu flag detection macros in places that still miss itDiego Biurrun2014-01-14
|
* x86: hpeldsp: Add missing av_cold attribute to init functionDiego Biurrun2014-01-09
|
* x86: avcodec: Add a bunch of missing #includes for av_coldDiego Biurrun2014-01-09
|
* h264: do not use 422 functions for monochromeAnton Khirnov2014-01-06
| | | | | | | Fixes invalid memory access. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC:libav-stable@libav.org
* x86: mpegvideo: move denoise_dct asm to mpegvideoencAnton Khirnov2013-12-20
| | | | | | This function is encoding-only. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* dsputil: Move apply_window_int16 to ac3dspDiego Biurrun2013-12-08
| | | | The (optimized) functions are used nowhere else.
* x86: Initialize mmxext after amd3dnow optimizationsDiego Biurrun2013-12-04
| | | | | | The mmxext optimizations should be at least equally fast if available and amd3dnow optimizations are being deprecated. Thus the former should override the latter, not the other way around.
* dsputil: x86: Move ff_inv_zigzag_direct16 table init to mpegvideoDiego Biurrun2013-12-02
| | | | The table is MMX-specific and used nowhere else.
* x86: dsputil: Suppress deprecation warnings for XvMC bitsDiego Biurrun2013-11-28
| | | | | | These parts are scheduled for removal on the next version bump. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* lavc: VP9 decoderRonald S. Bultje2013-11-15
| | | | | | | | | | | | | | Originally written by Ronald S. Bultje <rsbultje@gmail.com> and Clément Bœsch <u@pkh.me> Further contributions by: Anton Khirnov <anton@khirnov.net> Diego Biurrun <diego@biurrun.de> Luca Barbato <lu_zero@gentoo.org> Martin Storsjö <martin@martin.st> Signed-off-by: Luca Barbato <lu_zero@gentoo.org> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* lavc: Edge emulation with dst/src linesizeRonald S. Bultje2013-11-15
| | | | | | Allow supporting files for which the image stride is smaller than the maximum block size + number of subpel mc taps, e.g. a 64x64 VP9 file or a 16x16 VP8 file with -fflags +emu_edge.
* Deprecate obsolete XvMC hardware decoding supportDiego Biurrun2013-11-13
| | | | | | | XvMC has long ago been superseded by newer acceleration APIs, such as VDPAU, and few downstreams still support it. Furthermore XvMC is not implemented within the hwaccel framework, but requires its own specific code in the MPEG-1/2 decoder, which is a maintenance burden.
* dsputil: Split off H.263 bits into their own H263DSPContextDiego Biurrun2013-11-08
|
* x86: rv40dsp: Use PAVGB instruction macro where appropriateDiego Biurrun2013-11-04
|
* x86: hpeldsp: Use PAVGB instruction macro where necessaryMikulas Patocka2013-11-04
| | | | | Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Diego Biurrun <diego@biurrun.de>
* x86: vp8dsp: Split loopfilter code into a separate fileDiego Biurrun2013-11-01
|
* x86: h264_idct: Update comments to match 8/10-bit depth optimization splitDiego Biurrun2013-10-07
|
* x86inc: Utilize the shadow space on 64-bit WindowsHenrik Gramner2013-10-07
| | | | | | | | | Store XMM6 and XMM7 in the shadow space in functions that clobbers them. This way we don't have to adjust the stack pointer as often, reducing the number of instructions as well as code size. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86: fdct: Employ more specific ifdefsDiego Biurrun2013-10-06
| | | | This avoids building mmxext and sse2 code when disabled by configure.
* x86: dsputil: Separate ff_add_hfyu_median_prediction_cmov from dsputil_mmxDiego Biurrun2013-10-05
| | | | | The function does not depend on MMX and compilation without MMX enabled fails if the function is compiled conditional on MMX availability.
* x86: fdct: Initialize optimized fdct implementations in the standard wayDiego Biurrun2013-10-05
|
* x86: xviddct: Employ more specific ifdefsDiego Biurrun2013-10-05
| | | | This avoids building mmxext and sse2 code when disabled by configure.
* x86: fdct: Only build fdct code if encoders have been enabledDiego Biurrun2013-10-04
| | | | fdct is only initialized if encoders are enabled.
* x86: Add an xmm clobbering wrapper for avcodec_encode_video2Martin Storsjö2013-09-16
| | | | | | | This is required since 187105ff8 when we started trying to wrap this function as well. Signed-off-by: Martin Storsjö <martin@martin.st>
* mathops/x86: work around inline asm miscompilation with GCC 4.8.1Hendrik Leppkes2013-09-15
| | | | | | | | The volatile is not required here, and prevents a miscompilation with GCC 4.8.1 when building on x86 with --cpu=i686 Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* x86: avcodec: Consistently structure CPU extension initializationDiego Biurrun2013-08-29
|
* x86: avcodec: Use convenience macros to check for CPU flagsDiego Biurrun2013-08-29
|
* x86: rv40dsp: Move inline assembly optimizations out of YASM init sectionDiego Biurrun2013-08-28
|
* dsputil: x86: Hide arch-specific initialization detailsDiego Biurrun2013-08-28
| | | | Also give consistent names to init functions.
* vp56: Mark VP6-only optimizations as such.Diego Biurrun2013-08-23
| | | | | Most of our VP56 optimizations are VP6-only and will stay that way. So avoid compiling them for VP5-only builds.
* x86: Split DCT and FFT initialization into separate filesDiego Biurrun2013-08-21
|
* x86: h264_idct: Remove incorrect commentDiego Biurrun2013-08-21
|
* Consistently use "cpu_flags" as variable/parameter name for CPU flagsDiego Biurrun2013-07-18
|
* fmtconvert: Explicitly use int32_t instead of intChristophe Gisquet2013-07-17
| | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* mlpdsp: x86: Respect cpuflagsLuca Barbato2013-07-12
|
* cabac: x86 version of get_cabac_bypassJason Garrett-Glaser2013-07-04
| | | | Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* build: cosmetics: Place unconditional before conditional OBJS linesDiego Biurrun2013-05-30
| | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* mpegvideo: Remove commented-out PARANOID debug cruftDiego Biurrun2013-05-15
|