libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	x86: mmx2 ---> mmxext in asm constructs	Diego Biurrun	2012-11-14
\|
*	x86: Move optimization suffix to end of function names	Diego Biurrun	2012-10-31
\| \| \| \|	This simplifies cpuflags porting.
*	x86: mmx2 ---> mmxext in function names	Diego Biurrun	2012-10-31
\|
*	x86: MMX2 ---> MMXEXT in macro names	Diego Biurrun	2012-10-31
\|
*	x86: mmx2 ---> mmxext in comments and messages	Diego Biurrun	2012-10-31
\|
*	x86: dsputil: kill VLA in gmc_mmx()	Mans Rullgard	2012-10-05
\| \| \| \| \| \| \| \|	Instead of using an evil VLA, fall back to C version when edge emulation is needed. MPEG4 GMC is a rarely used fringe feature so the speed loss is an acceptable cost for safer code. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: dsputil: Move Xvid IDCT put/add functions to a more suitable place	Diego Biurrun	2012-09-14
\|
*	ac3: move ac3_downmix() from dsputil to ac3dsp	Mans Rullgard	2012-09-12
\| \| \| \|	Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: dsputil: Move specific optimization settings out of global init function	Diego Biurrun	2012-09-11
\| \| \| \|	They belong in the init functions specific to each CPU capability.
*	x86: avcodec: Drop silly "_mmx" suffix from dsputil template names	Diego Biurrun	2012-09-07
\|
*	cavsdsp: set idct permutation independently of dsputil	Mans Rullgard	2012-09-07
\| \| \| \| \| \| \|	CAVS uses its own idct so using dsputil to set the permutation is fragile. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: allow using add_hfyu_median_prediction_cmov on any cpu with cmov	Mans Rullgard	2012-09-07
\| \| \| \| \| \| \| \| \|	For some reason add_hfyu_median_prediction_cmov is only selected on 3Dnow-capable CPUs, even though it uses no 3Dnow instructions. This patch allows it to be selected on any cpu with cmov with the possibility of being overridden by the mmxext version. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: dsputil: Do not redundantly check for CPU caps before calling init funcs	Diego Biurrun	2012-09-06
\| \| \| \|	The init functions check for CPU capabilities on their own already.
*	x86: Split inline and external assembly #ifdefs	Diego Biurrun	2012-08-31
\|
*	x86: cosmetics: Comment some #endifs for better readability	Diego Biurrun	2012-08-30
\|
*	x86: avcodec: Drop silly "_mmx" suffixes from filenames	Diego Biurrun	2012-08-28
\|
*	x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h	Mans Rullgard	2012-08-09
\| \| \| \| \| \| \|	This puts x86-specific things in the x86/ subdirectory where they belong. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	Replace all CODEC_ID_* with AV_CODEC_ID_*	Anton Khirnov	2012-08-07
\|
*	x86: build: replace mmx2 by mmxext	Diego Biurrun	2012-08-03
\| \| \| \| \| \| \|	Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.
*	x86: Use consistent 3dnowext function and macro name suffixes	Diego Biurrun	2012-08-03
\| \| \| \| \| \|	Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to "3dnowext", which is a more common name of the CPU flag, as reported e.g. by the Linux kernel, unifies this.
*	x86: remove libmpeg2 mmx(ext) idct functions	Mans Rullgard	2012-08-02
\| \| \| \| \| \| \| \|	These functions are not faster than other mmx implementations on any hardware I have been able to test on, and they are horribly inaccurate. There is thus no reason to ever use them. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	h264_chromamc_10bit: port x86 simd to cpuflags.	Ronald S. Bultje	2012-07-27
\|
*	x86/dsputil: put inline asm under HAVE_INLINE_ASM.	Ronald S. Bultje	2012-07-25
\| \| \| \| \| \| \|	This allows compiling with compilers that don't support gcc-style inline assembly. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
*	dsputil_mmx: fix incorrect assembly code	Yang Wang	2012-07-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t", and so forth, have problems. From above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: "movq 8(%edi), %mm1". During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8. This will cause a segmentation fault. This error was fixed in the second block of the assembly code, but not in the unrolled loop. How to reproduce: This error is exposed when we build using Intel C++ Compiler, with IPO+PGO optimization enabled. Crashed when decoding an MJPEG video. Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
*	x86: dsputil: drop some unused CPU flag debug code	Diego Biurrun	2012-07-19
\|
*	vp3: move idct and loop filter pointers to new vp3dsp context	Mans Rullgard	2012-07-18
\| \| \| \| \| \| \| \|	This moves all VP3-specific function pointers from dsputil to a new vp3dsp context. There is no reason to ever use the VP3 IDCT where an MPEG2 IDCT is expected or vice versa. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	x86: Only use optimizations with cmov if the CPU supports the instruction	Diego Biurrun	2012-06-23
\|
*	x86: move some inline asm macros to the only places they are used	Mans Rullgard	2012-06-23
\| \| \| \|	Signed-off-by: Mans Rullgard <mans@mansr.com>
*	Add a float DSP framework to libavutil	Justin Ruggles	2012-06-08
\| \| \| \|	Move vector_fmul() from DSPContext to AVFloatDSPContext.
*	Convert vector_fmul range of functions to YASM and add AVX versions	Kieran Kunhya	2012-05-21
\| \| \| \|	Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
*	rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC	Christophe Gisquet	2012-05-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	dsputil x86: revert a test back to its previous value	Christophe GISQUET	2012-04-28
\| \| \| \| \| \|	Commit 356ee8d caused the initial inversion. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	Remove lowres video decoding	Mans Rullgard	2012-04-21
\| \| \| \| \| \| \|	This feature is complex, of questionable utility, and slows down normal decoding. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	avcodec: remove AVCodecContext.dsp_mask	Mans Rullgard	2012-04-21
\| \| \| \| \| \| \| \|	This removes all references to AVCodecContext.dsp_mask and marks it for eviction at the next version bump. It has been superseded by av_set_cpu_flag_mask() which, unlike this field, works everywhere. Signed-off-by: Mans Rullgard <mans@mansr.com>
*	dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype	Christophe GISQUET	2012-04-04
\| \| \| \|	Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	x86: dsputil: prettyprint gcc inline asm	Diego Biurrun	2012-03-25
\|
*	x86: K&R prettyprinting cosmetics for dsputil_mmx.c	Diego Biurrun	2012-03-25
\|
*	x86: conditionally compile H.264 QPEL optimizations	Diego Biurrun	2012-03-25
\|
*	dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks.	Diego Biurrun	2012-03-25
\| \| \| \|	This makes them safe to use in non-fully braced if-blocks and similar.
*	x86: clean up ff_dsputil_init_mmx()	Mans Rullgard	2012-03-05
\| \| \| \| \| \| \| \|	This splits ff_dsputil_init_mmx() into multiple functions, one for each MMX/SSE level, somewhat simplifying the nested conditions. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	dsputil: Add ff_ prefix to the dsputil_init functions	Martin Storsjö	2012-02-15
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf	Christophe Gisquet	2012-01-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	While pshufb allows emulating bswap on XMM registers for SSSE3, more shuffling is needed for SSE2. Alignment is critical, so specific codepaths are provided for this case. For the huffyuv sequence "angels_480-huffyuvcompress.avi": C (using bswap instruction): ~ 55k cycles SSE2: ~ 40k cycles SSSE3 using unaligned loads: ~ 35k cycles SSSE3 using aligned loads: ~ 30k cycles Signed-off-by: Diego Biurrun <diego@biurrun.de>
*	png: move DSP functions to their own DSP context.	Ronald S. Bultje	2012-01-29
\|
*	dsputil: use vertical component for drawing bottom edge.	Ronald S. Bultje	2012-01-25
\| \| \| \| \|	Current code only writes 8 pixels of vertical edge for YUV422, which causes MC artifacts when subsequent frames use data from that edge.
*	build: conditionally compile x86 H.264 chroma optimizations	Diego Biurrun	2011-12-14
\|
*	dsputil: use movups instead of movdqu in ff_emu_edge_core_sse()	Justin Ruggles	2011-11-22
\| \| \| \| \|	This allows emulated_edge_mc_sse() and gmc_sse() to be used under AV_CPU_FLAG_SSE.
*	twinvq: add SSE/AVX optimized sum/difference stereo interleaving	Justin Ruggles	2011-11-11
\|
*	dsputil: use cpuflags in x86 versions of vector_clip_int32()	Justin Ruggles	2011-11-06
\|
*	H.264: Cometics to dsputil_mmx.c	Daniel Kang	2011-10-26
\| \| \| \| \| \|	Add whitespace. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
*	prores: idct sse2/sse4 optimizations.	Ronald S. Bultje	2011-10-11
\| \| \| \|	~3.0-3.5x as fast as original C version, 1.6x as fast overall.