summaryrefslogtreecommitdiff
path: root/libavutil/x86/float_dsp.asm
Commit message (Collapse)AuthorAge
* avutil/x86/float_dsp: Remove obsolete 3dnowext functionAndreas Rheinhardt2022-06-22
| | | | | | | | | | | x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT, SSE and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2). So given that the only systems which benefit from ff_vector_fmul_window_3dnowext are truely ancient 32bit AMD x86s it is removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* libavutil: include assembly with full path from source rootAlexander Kanavin2022-02-08
| | | | | | | | Otherwise nasm writes the full host-specific paths into .o output, which breaks binary reproducibility. Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86/float_dsp: add ff_vector_dmul_{sse2,avx}James Almer2018-09-14
| | | | | | ~3x to 5x faster. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: remove usage of integer instructionsJames Almer2017-05-12
|
* x86/float_dsp: add ff_vector_fmul_reverse_avx2James Almer2017-04-11
| | | | | | ~20% faster than AVX. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: add ff_vector_dmac_scalar_{sse2,avx,fma3}James Almer2017-04-10
|
* x86/float_dsp: zero extend offset from ff_scalarproduct_float_sseJames Almer2016-01-08
| | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: zero extend len from ff_butterflies_float_sse implicitlyJames Almer2016-01-08
| | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: remove len check from ff_butterflies_float_sseJames Almer2016-01-08
| | | | | | | The function documentation explicitly mentions it needs to be a multiple of 4. Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: add missing colon to labelsJames Almer2015-07-26
| | | | | | Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: add missing femmsJames Almer2014-06-08
| | | | | | | | It was lost during the port. Should fix fate on 3dnowext machines. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: port vector_fmul_window to yasmJames Almer2014-06-08
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: remove duplicated code from vector_dmul_scalarJames Almer2014-04-19
| | | | | | | | Use the xm# and ym# aliases as they remain in sync with m# after a SWAP. No actual changes to the assembly. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: unroll loop in vector_fmac_scalarJames Almer2014-04-16
| | | | | | | | ~6% faster SSE2 performance. AVX/FMA3 are unaffected. Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: use SWAP in vector_fmac_scalar Win64James Almer2014-04-16
| | | | | | | The mova is unnecessary Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: add ff_vector_{fmul_add, fmac_scalar}_fma3James Almer2014-03-13
| | | | | | | ~7% faster than AVX Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: float dsp: unroll SSE versionsChristophe Gisquet2014-02-15
| | | | | | | | | | vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Merge commit '566b7a20fd0cab44d344329538d314454a0bcc2f'Michael Niedermayer2013-05-03
|\ | | | | | | | | | | | | | | | | | | * commit '566b7a20fd0cab44d344329538d314454a0bcc2f': x86: float dsp: butterflies_float SSE Conflicts: libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float dsp: butterflies_float SSEChristophe Gisquet2013-05-03
| | | | | | | | | | 97c -> 49c Some codecs could benefit from more unrolling, but AAC doesn't.
* | butterflies_float: replace 2 lea by 2 addMichael Niedermayer2013-04-17
| | | | | | | | | | | | | | adds are simpler instructions and should be faster or equally fast on all cpus Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: float dsp: butterflies_float SSEChristophe Gisquet2013-04-17
| | | | | | | | | | | | | | 97c -> 49c Some codecs could benefit from more unrolling, but AAC doesn't. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit '73b704ac609d83e0be124589f24efd9b94947cf9'Michael Niedermayer2013-01-23
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '73b704ac609d83e0be124589f24efd9b94947cf9': arm: Add some missing header #includes floatdsp: move scalarproduct_float from dsputil to avfloatdsp. Conflicts: libavcodec/acelp_pitch_delay.c libavcodec/amrnbdec.c libavcodec/amrwbdec.c libavcodec/ra288.c libavcodec/x86/dsputil_mmx.c libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * floatdsp: move scalarproduct_float from dsputil to avfloatdsp.Ronald S. Bultje2013-01-22
| | | | | | | | This makes the aac decoder and all voice codecs independent of dsputil.
* | Merge commit '42d324694883cdf1fff1612ac70fa403692a1ad4'Michael Niedermayer2013-01-23
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '42d324694883cdf1fff1612ac70fa403692a1ad4': floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp. Conflicts: libavcodec/arm/dsputil_init_vfp.c libavcodec/arm/dsputil_vfp.S libavcodec/dsputil.c libavcodec/ppc/float_altivec.c libavcodec/x86/dsputil.asm libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.Ronald S. Bultje2013-01-22
| | | | | | | | | | | | Now, nellymoserenc and aacenc no longer depends on dsputil. Independent of this patch, wmaprodec also does not depend on dsputil, so I removed it from there also.
* | Merge commit '55aa03b9f8f11ebb7535424cc0e5635558590f49'Michael Niedermayer2013-01-23
|\| | | | | | | | | | | | | | | | | | | | | * commit '55aa03b9f8f11ebb7535424cc0e5635558590f49': floatdsp: move vector_fmul_add from dsputil to avfloatdsp. Conflicts: libavcodec/dsputil.c libavcodec/x86/dsputil.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * floatdsp: move vector_fmul_add from dsputil to avfloatdsp.Ronald S. Bultje2013-01-22
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-12-08
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: golomb: use unsigned arithmetics in svq3_get_ue_golomb() x86: float_dsp: fix loading of the len parameter on x86-32 takdec: fix initialisation of LOCAL_ALIGNED array takdec: fix initialisation of LOCAL_ALIGNED array Conflicts: libavcodec/rv30.c libavcodec/svq3.c libavcodec/takdec.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float_dsp: fix loading of the len parameter on x86-32Justin Ruggles2012-12-07
| |
* | Merge commit 'c25fc5c2bb6ae8c93541c9427df3e47206d95152'Michael Niedermayer2012-12-07
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit 'c25fc5c2bb6ae8c93541c9427df3e47206d95152': fate: dpcm: Add dependencies SBR DSP x86: implement SSE sbr_hf_gen AAC SBR: use AVFloatDSPContext's vector_fmul fate: image: Add dependencies Changelog: add an entry for deprecating the avconv -vol option x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32 Conflicts: Changelog libavutil/x86/float_dsp.asm tests/fate/image.mak Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32Justin Ruggles2012-12-06
| | | | | | | | Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* | Merge commit '9d5c62ba5b586c80af508b5914934b1c439f6652'Michael Niedermayer2012-12-06
|\| | | | | | | | | | | | | | | | | | | | | | | | | * commit '9d5c62ba5b586c80af508b5914934b1c439f6652': lavu/opt: do not filter out the initial sign character except for flags eval: treat dB as decibels instead of decibytes float_dsp: add vector_dmul_scalar() to multiply a vector of doubles Conflicts: libavutil/eval.c tests/ref/fate/eval Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * float_dsp: add vector_dmul_scalar() to multiply a vector of doublesJustin Ruggles2012-12-05
| | | | | | | | Include x86-optimized versions for SSE2 and AVX.
* | Merge commit '3c370f5abc55739a261534b9f9bdc739cedbbbb9'Michael Niedermayer2012-11-27
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '3c370f5abc55739a261534b9f9bdc739cedbbbb9': riff: only warn on a bad INFO chunk code size instead of failing configure: Add separate list for libraries and use where appropriate x86: float_dsp: add SSE version of vector_fmul_scalar() Conflicts: configure libavformat/riff.c libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float_dsp: add SSE version of vector_fmul_scalar()Justin Ruggles2012-11-26
| |
| * build: Drop AVX assembly ifdefsDiego Biurrun2012-11-11
| | | | | | | | An assembler able to cope with AVX instructions is now required.
* | Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'Michael Niedermayer2012-10-31
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '6860b4081d046558c44b1b42f22022ea341a2a73': x86: include x86inc.asm in x86util.asm cng: Reindent some incorrectly indented lines cngdec: Allow flushing the decoder cngdec: Make the dbov variable have the right unit cngdec: Fix the memset size to cover the full array cngdec: Update the LPC coefficients after averaging the reflection coefficients configure: fix print_config() with broke awks Conflicts: libavcodec/x86/ac3dsp.asm libavcodec/x86/dct32.asm libavcodec/x86/deinterlace.asm libavcodec/x86/dsputil.asm libavcodec/x86/dsputilenc.asm libavcodec/x86/fft.asm libavcodec/x86/fmtconvert.asm libavcodec/x86/h264_chromamc.asm libavcodec/x86/h264_deblock.asm libavcodec/x86/h264_deblock_10bit.asm libavcodec/x86/h264_idct.asm libavcodec/x86/h264_idct_10bit.asm libavcodec/x86/h264_intrapred.asm libavcodec/x86/h264_intrapred_10bit.asm libavcodec/x86/h264_weight.asm libavcodec/x86/vc1dsp.asm libavcodec/x86/vp3dsp.asm libavcodec/x86/vp56dsp.asm libavcodec/x86/vp8dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: include x86inc.asm in x86util.asmDiego Biurrun2012-10-31
| | | | | | | | This is necessary to allow refactoring some x86util macros with cpuflags.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-09-08
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: mov_chan: Only set the channel_layout if setting it to a nonzero value mov_chan: Reindent an incorrectly indented line mp2 muxer: mark as AVFMT_NOTIMESTAMPS. x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64 x86: more specific checks for availability of required assembly capabilities x86: avcodec: Drop silly "_mmx" suffix from dsputil template names fate: Drop redundant setting of FUZZ to 1 cavsdsp: set idct permutation independently of dsputil x86: allow using add_hfyu_median_prediction_cmov on any cpu with cmov Conflicts: libavcodec/x86/dsputil_mmx.c libavformat/mp3enc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64Justin Ruggles2012-09-07
| | | | | | | | | | The SWAP macro does not work for explicit xmm/ymm usage, so instead just move the scalar value from xmm2 to xmm0.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-31
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: MSS1 and MSS2: set final pixel format after common stuff has been initialised MSS2 decoder configure: handle --disable-asm before check_deps x86: Split inline and external assembly #ifdefs configure: x86: Separate inline from standalone assembler capabilities pktdumper: Use a custom define instead of PATH_MAX for buffers pktdumper: Use av_strlcpy instead of strncpy pktdumper: Use sizeof(variable) instead of the direct buffer length Conflicts: Changelog configure libavcodec/allcodecs.c libavcodec/avcodec.h libavcodec/codec_desc.c libavcodec/dct-test.c libavcodec/imgconvert.c libavcodec/mss12.c libavcodec/version.h libavfilter/x86/gradfun.c libswscale/x86/yuv2rgb.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Split inline and external assembly #ifdefsDiego Biurrun2012-08-31
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-07
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: x86: fix build with nasm 2.08 x86: use nop cpu directives only if supported x86: fix rNmp macros with nasm build: add trailing / to yasm/nasm -I flags x86: use 32-bit source registers with movd instruction x86: add colons after labels Conflicts: Makefile libavutil/x86/x86inc.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: add colons after labelsMans Rullgard2012-08-07
| | | | | | | | | | | | nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-07-27
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: proresdsp: port x86 assembly to cpuflags. lavr: x86: improve non-SSE4 version of S16_TO_S32_SX macro lavfi: better channel layout negotiation alac: check for truncated packets alac: reverse lpc coeff order, simplify filter lavr: add x86-optimized mixing functions x86: add support for fmaddps fma4 instruction with abstraction to avx/sse tscc2: fix typo in array index build: use COMPILE template for HOSTOBJS build: do full flag handling for all compiler-type tools eval: fix printing of NaN in eval fate test. build: Rename aandct component to more descriptive aandcttables mpegaudio: bury inline asm under HAVE_INLINE_ASM. x86inc: automatically insert vzeroupper for YMM functions. rtmp: Check the buffer length of ping packets rtmp: Allow having more unknown data at the end of a chunk size packet without failing rtmp: Prevent reading outside of an allocate buffer when receiving server bandwidth packets Conflicts: Makefile configure libavcodec/x86/proresdsp.asm libavutil/eval.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86inc: automatically insert vzeroupper for YMM functions.Ronald S. Bultje2012-07-26
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-06-19
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: (24 commits) flvdec: remove incomplete, disabled seeking code mem: add support for _aligned_malloc() as found on Windows lavc: Extend the documentation for avcodec_init_packet flvdec: remove incomplete, disabled seeking code http: replace atoll() with strtoll() mpegts: remove unused/incomplete/broken seeking code af_amix: allow float planar sample format as input af_amix: use AVFloatDSPContext.vector_fmac_scalar() float_dsp: add x86-optimized functions for vector_fmac_scalar() float_dsp: Move vector_fmac_scalar() from libavcodec to libavutil lavr: Add x86-optimized function for flt to s32 conversion lavr: Add x86-optimized function for flt to s16 conversion lavr: Add x86-optimized functions for s32 to flt conversion lavr: Add x86-optimized functions for s32 to s16 conversion lavr: Add x86-optimized functions for s16 to flt conversion lavr: Add x86-optimized function for s16 to s32 conversion rtpenc: Support packetizing iLBC rtpdec: Add a depacketizer for iLBC Implement the iLBC storage file format mov: Support muxing/demuxing iLBC ... Conflicts: Changelog configure libavcodec/avcodec.h libavcodec/dsputil.c libavcodec/version.h libavformat/movenc.c libavformat/mpegts.c libavformat/version.h libavutil/mem.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * float_dsp: add x86-optimized functions for vector_fmac_scalar()Justin Ruggles2012-06-18
| |
* | x86/float_dsp.asm: restore author attributionMichael Niedermayer2012-06-09
|/ | | | | | | | | | | | | | | | | | | | | | | The attribution was removed by libav while moving the code to libavutil The original code is from commit eb4825b5d43bb6ecfae4d64688f9e2d2ac075263 Author: Loren Merritt <lorenm@u.washington.edu> Date: Thu Aug 10 19:06:25 2006 +0000 sse and 3dnow implementations of float->int conversion and mdct windowing. 15% faster vorbis. and commit 069720565ce0f2cc94fa2474f30d155b2755e350 Author: Loren Merritt <lorenm@u.washington.edu> Date: Fri Aug 11 18:19:37 2006 +0000 vorbis simd tweaks Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Add a float DSP framework to libavutilJustin Ruggles2012-06-08
Move vector_fmul() from DSPContext to AVFloatDSPContext.