summaryrefslogtreecommitdiff
path: root/libswresample/x86
Commit message (Collapse)AuthorAge
* swresample/x86: add support for exact_rationalMuhammad Faiz2016-06-21
| | | | | | | phase_shift and phase_mask is removed generally exact_rational=on is faster than exact_rational=off Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
* swresample: add exact_rational optionMuhammad Faiz2016-06-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | give high quality resampling as good as with linear_interp=on as fast as without linear_interp=on tested visually with ffplay ffplay -f lavfi "aevalsrc='sin(10000*t*t)', aresample=osr=48000, showcqt=gamma=5" ffplay -f lavfi "aevalsrc='sin(10000*t*t)', aresample=osr=48000:linear_interp=on, showcqt=gamma=5" ffplay -f lavfi "aevalsrc='sin(10000*t*t)', aresample=osr=48000:exact_rational=on, showcqt=gamma=5" slightly speed improvement for fair comparison with -cpuflags 0 audio.wav is ~ 1 hour 44100 stereo 16bit wav file ffmpeg -i audio.wav -af aresample=osr=48000 -f null - old new real 13.498s 13.121s user 13.364s 12.987s sys 0.131s 0.129s linear_interp=on old new real 23.035s 23.050s user 22.907s 22.917s sys 0.119s 0.125s exact_rational=on real 12.418s user 12.298s sys 0.114s possibility to decrease memory usage if soft compensation is ignored Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
* x86: use the new helper macros where usefulJames Almer2016-02-14
| | | | | Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/audio_convert: fix clobbering of xmm registersJames Almer2015-10-01
| | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* x86: move XOP emulation code back to x86incJames Almer2015-08-03
| | | | | | | | | | Only two functions that use xop multiply-accumulate instructions where the first operand is the same as the fourth actually took advantage of the macros. This further reduces differences with x264's x86inc. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* swresample/x86: add missing colon to labelsJames Almer2015-07-26
| | | | | | Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>
* x86: check for AV_CPU_FLAG_AVXSLOW where usefulJames Almer2015-06-01
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swresample: add av_cold to init functionsMichael Niedermayer2015-02-21
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/swr: make pack_8ch functions work with compilers without aligned stackJames Almer2015-02-15
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* swresample/x86/rematrix_init: Check av_malloc* return codes, forward errorsMichael Niedermayer2015-02-09
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swresample/x86/rematrix_init: Use av_mallocz_array()Michael Niedermayer2015-02-09
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/swr: add SSE/AVX unpack_6ch functionsJames Almer2015-01-12
| | | | | | | int32/float only Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/swr: load constants outside the loop in pack_6ch functionsJames Almer2015-01-11
| | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/swr: disable pack_8ch functions on msvc/icl x86_32James Almer2014-12-31
| | | | | | Until a proper fix is committed. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/swr: add missing alignment check to pack_6ch functionsJames Almer2014-12-31
| | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/swr: add SSE2/AVX pack_8ch functionsJames Almer2014-12-30
| | | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/swr: add ff_float_to_int32_a_avx2James Almer2014-11-07
| | | | | | | | 13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips 8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/swr: replace sse4 instructions in pack_6ch with sse onesJames Almer2014-11-06
| | | | | | | | | There's no benefit from using blendps here except on CPUs with AVX, where it's faster than shufps according to Intel's documentation. As such, rename the sse4 functions to sse/sse2 and use shufps instead. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/swr: use lavu helper macros to check CPU extensionsJames Almer2014-07-04
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/swr: split audioconvert and rematrix DSP into separate filesJames Almer2014-07-04
| | | | | | | Also rename resample_x86_dsp.c to resample_init.c Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swr: initialize only the necessary resample dsp functionsJames Almer2014-07-04
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swr: rename swresample_dsp init functions to swri_resample_dspJames Almer2014-07-02
| | | | | | | The swresample_ prefix is not for internal functions Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/swr: add ff_resample_{common, linear}_int16_xopJames Almer2014-07-02
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/swr: add ff_resample_{common, linear}_float_fmaJames Almer2014-07-02
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/swr: convert resample_{common, linear}_double_sse2 to yasmJames Almer2014-07-01
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> 312531 -> 311528 dezicycles Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swr: convert resample_common/linear_int16_mmx2/sse2 to yasm.Ronald S. Bultje2014-06-30
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swr: rewrite resample_common/linear_float_sse/avx in yasm.Ronald S. Bultje2014-06-28
| | | | | | | | | | | Linear interpolation goes from 63 (llvm) or 58 (gcc) to 48 (yasm) cycles/sample on 64bit, or from 66 (llvm/gcc) to 52 (yasm) cycles/ sample on 32bit. Bon-linear goes from 43 (llvm) or 38 (gcc) to 32 (yasm) cycles/sample on 64bit, or from 46 (llvm) or 44 (gcc) to 38 (yasm) cycles/sample on 32bit (all testing on OSX 10.9.2, llvm 5.1 and gcc 4.8/9). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swr: compile mmx2 s16p functions only on x86-32.Ronald S. Bultje2014-06-15
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swr: add prototypes for resample dsp functionsJames Almer2014-06-15
| | | | | | | | Should fix compilation failures with MSVC and any other compiler without inline asm support. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swr: remove obsolete function prototypes.Ronald S. Bultje2014-06-15
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swr: split out DSP functions.Ronald S. Bultje2014-06-14
| | | | | | | | | | DSP bits of swri_resample go into their own mini-DSP functions; DSP init goes from a per-call branch in multiple_resample to a proper DSP init routine; x86 bits go into x86/; swri_resample() moves out of resample_template.c into resample.c because it's independent of DSP code or sample type; multiple_resample() is simplified. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swresample: add swri_resample_float_avxJames Almer2014-05-16
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* inline asm: fix arrays as named constraints.Matt Oliver2014-05-07
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swresample/resample: add missing xmm clobbersJames Almer2014-05-07
| | | | | | | Might fix fate-swr on ICL Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swresample: add swri_resample_double_sse2James Almer2014-04-25
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swresample/resample: sse float linear interpolationJames Almer2014-03-24
| | | | | | | About two times faster Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swresample/resample: mmx2/sse2 int16 linear interpolationJames Almer2014-03-24
| | | | | | | About three times faster Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swresample: add swri_resample_float_sseJames Almer2014-03-20
| | | | | | | At least two times faster than the C version. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Automatically change MANGLE() into named inline asm operands when direct ↵Matt Oliver2014-03-18
| | | | | | | | symbol reference in inline asm are not supported. This is part of the patch-set for intel C inline asm on windows support Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swresample: change COMMON_CORE_INT16 asm from SSSE3 to SSE2James Almer2014-03-18
| | | | | | | | | pshuf+paddd is slightly faster than phaddd. The real gain is in pre-ssse3 processors like AMD K8 and K10, which get a big boost in performance compared to the mmxext version Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swresample: Add arm&x86 clobber testsMartin Storsjö2014-01-18
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Avoid using empty macro arguments.Reimar Döffinger2013-12-31
| | | | | | | | | These are not supported by all compilers (gcc 2.95 but also older SPARC compilers, see gcc bug #33304 for example), and there is no real need for them. One use of this feature remains in libavdevice/v4l2.c which can't be replaced quite as easily. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
* x86: Fix compilation with nasm on PPC & OS/2Ronald S. Bultje2013-10-08
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swresample/x86/audio_convert: add emms to CONVMichael Niedermayer2013-06-18
| | | | | | Might fix Ticket1874 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* swr: add native_simd_oneMichael Niedermayer2013-06-04
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'Michael Niedermayer2012-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '6860b4081d046558c44b1b42f22022ea341a2a73': x86: include x86inc.asm in x86util.asm cng: Reindent some incorrectly indented lines cngdec: Allow flushing the decoder cngdec: Make the dbov variable have the right unit cngdec: Fix the memset size to cover the full array cngdec: Update the LPC coefficients after averaging the reflection coefficients configure: fix print_config() with broke awks Conflicts: libavcodec/x86/ac3dsp.asm libavcodec/x86/dct32.asm libavcodec/x86/deinterlace.asm libavcodec/x86/dsputil.asm libavcodec/x86/dsputilenc.asm libavcodec/x86/fft.asm libavcodec/x86/fmtconvert.asm libavcodec/x86/h264_chromamc.asm libavcodec/x86/h264_deblock.asm libavcodec/x86/h264_deblock_10bit.asm libavcodec/x86/h264_idct.asm libavcodec/x86/h264_idct_10bit.asm libavcodec/x86/h264_intrapred.asm libavcodec/x86/h264_intrapred_10bit.asm libavcodec/x86/h264_weight.asm libavcodec/x86/vc1dsp.asm libavcodec/x86/vp3dsp.asm libavcodec/x86/vp56dsp.asm libavcodec/x86/vp8dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
* swr: add av_cold to init/free functionsMichael Niedermayer2012-09-09
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Fix compilation with yasm-0.6.2.Carl Eugen Hoyos2012-09-01
|
* Add some missing _EXTERNAL suffixes to yasm source files.Carl Eugen Hoyos2012-08-31
|
* Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: mpegvideo: reduce excessive inlining of mpeg_motion() mpegvideo: convert mpegvideo_common.h to a .c file build: factor out mpegvideo.o dependencies to CONFIG_MPEGVIDEO Move MASK_ABS macro to libavcodec/mathops.h x86: move MANGLE() and related macros to libavutil/x86/asm.h x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h aacdec: Don't fall back to the old output configuration when no old configuration is present. rtmp: Add message tracking rtsp: Support mpegts in raw udp packets rtsp: Support receiving plain data over UDP without any RTP encapsulation rtpdec: Remove an unused include rtpenc: Remove an av_abort() that depends on user-supplied data vsrc_movie: discourage its use with avconv. avconv: allow no input files. avconv: prevent invalid reads in transcode_init() avconv: rename OutputStream.is_past_recording_time to finished. Conflicts: configure doc/filters.texi ffmpeg.c ffmpeg.h libavcodec/Makefile libavcodec/aacdec.c libavcodec/mpegvideo.c libavformat/version.h Merged-by: Michael Niedermayer <michaelni@gmx.at>