summaryrefslogtreecommitdiff
path: root/libavutil/x86/float_dsp.asm
Commit message (Collapse)AuthorAge
* x86/float_dsp: add missing colon to labelsJames Almer2015-07-26
| | | | | | Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: add missing femmsJames Almer2014-06-08
| | | | | | | | It was lost during the port. Should fix fate on 3dnowext machines. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: port vector_fmul_window to yasmJames Almer2014-06-08
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: remove duplicated code from vector_dmul_scalarJames Almer2014-04-19
| | | | | | | | Use the xm# and ym# aliases as they remain in sync with m# after a SWAP. No actual changes to the assembly. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: unroll loop in vector_fmac_scalarJames Almer2014-04-16
| | | | | | | | ~6% faster SSE2 performance. AVX/FMA3 are unaffected. Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: use SWAP in vector_fmac_scalar Win64James Almer2014-04-16
| | | | | | | The mova is unnecessary Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: add ff_vector_{fmul_add, fmac_scalar}_fma3James Almer2014-03-13
| | | | | | | ~7% faster than AVX Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: float dsp: unroll SSE versionsChristophe Gisquet2014-02-15
| | | | | | | | | | vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Merge commit '566b7a20fd0cab44d344329538d314454a0bcc2f'Michael Niedermayer2013-05-03
|\ | | | | | | | | | | | | | | | | | | * commit '566b7a20fd0cab44d344329538d314454a0bcc2f': x86: float dsp: butterflies_float SSE Conflicts: libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float dsp: butterflies_float SSEChristophe Gisquet2013-05-03
| | | | | | | | | | 97c -> 49c Some codecs could benefit from more unrolling, but AAC doesn't.
* | butterflies_float: replace 2 lea by 2 addMichael Niedermayer2013-04-17
| | | | | | | | | | | | | | adds are simpler instructions and should be faster or equally fast on all cpus Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: float dsp: butterflies_float SSEChristophe Gisquet2013-04-17
| | | | | | | | | | | | | | 97c -> 49c Some codecs could benefit from more unrolling, but AAC doesn't. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit '73b704ac609d83e0be124589f24efd9b94947cf9'Michael Niedermayer2013-01-23
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '73b704ac609d83e0be124589f24efd9b94947cf9': arm: Add some missing header #includes floatdsp: move scalarproduct_float from dsputil to avfloatdsp. Conflicts: libavcodec/acelp_pitch_delay.c libavcodec/amrnbdec.c libavcodec/amrwbdec.c libavcodec/ra288.c libavcodec/x86/dsputil_mmx.c libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * floatdsp: move scalarproduct_float from dsputil to avfloatdsp.Ronald S. Bultje2013-01-22
| | | | | | | | This makes the aac decoder and all voice codecs independent of dsputil.
* | Merge commit '42d324694883cdf1fff1612ac70fa403692a1ad4'Michael Niedermayer2013-01-23
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '42d324694883cdf1fff1612ac70fa403692a1ad4': floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp. Conflicts: libavcodec/arm/dsputil_init_vfp.c libavcodec/arm/dsputil_vfp.S libavcodec/dsputil.c libavcodec/ppc/float_altivec.c libavcodec/x86/dsputil.asm libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.Ronald S. Bultje2013-01-22
| | | | | | | | | | | | Now, nellymoserenc and aacenc no longer depends on dsputil. Independent of this patch, wmaprodec also does not depend on dsputil, so I removed it from there also.
* | Merge commit '55aa03b9f8f11ebb7535424cc0e5635558590f49'Michael Niedermayer2013-01-23
|\| | | | | | | | | | | | | | | | | | | | | * commit '55aa03b9f8f11ebb7535424cc0e5635558590f49': floatdsp: move vector_fmul_add from dsputil to avfloatdsp. Conflicts: libavcodec/dsputil.c libavcodec/x86/dsputil.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * floatdsp: move vector_fmul_add from dsputil to avfloatdsp.Ronald S. Bultje2013-01-22
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-12-08
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: golomb: use unsigned arithmetics in svq3_get_ue_golomb() x86: float_dsp: fix loading of the len parameter on x86-32 takdec: fix initialisation of LOCAL_ALIGNED array takdec: fix initialisation of LOCAL_ALIGNED array Conflicts: libavcodec/rv30.c libavcodec/svq3.c libavcodec/takdec.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float_dsp: fix loading of the len parameter on x86-32Justin Ruggles2012-12-07
| |
* | Merge commit 'c25fc5c2bb6ae8c93541c9427df3e47206d95152'Michael Niedermayer2012-12-07
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit 'c25fc5c2bb6ae8c93541c9427df3e47206d95152': fate: dpcm: Add dependencies SBR DSP x86: implement SSE sbr_hf_gen AAC SBR: use AVFloatDSPContext's vector_fmul fate: image: Add dependencies Changelog: add an entry for deprecating the avconv -vol option x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32 Conflicts: Changelog libavutil/x86/float_dsp.asm tests/fate/image.mak Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32Justin Ruggles2012-12-06
| | | | | | | | Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* | Merge commit '9d5c62ba5b586c80af508b5914934b1c439f6652'Michael Niedermayer2012-12-06
|\| | | | | | | | | | | | | | | | | | | | | | | | | * commit '9d5c62ba5b586c80af508b5914934b1c439f6652': lavu/opt: do not filter out the initial sign character except for flags eval: treat dB as decibels instead of decibytes float_dsp: add vector_dmul_scalar() to multiply a vector of doubles Conflicts: libavutil/eval.c tests/ref/fate/eval Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * float_dsp: add vector_dmul_scalar() to multiply a vector of doublesJustin Ruggles2012-12-05
| | | | | | | | Include x86-optimized versions for SSE2 and AVX.
* | Merge commit '3c370f5abc55739a261534b9f9bdc739cedbbbb9'Michael Niedermayer2012-11-27
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '3c370f5abc55739a261534b9f9bdc739cedbbbb9': riff: only warn on a bad INFO chunk code size instead of failing configure: Add separate list for libraries and use where appropriate x86: float_dsp: add SSE version of vector_fmul_scalar() Conflicts: configure libavformat/riff.c libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float_dsp: add SSE version of vector_fmul_scalar()Justin Ruggles2012-11-26
| |
| * build: Drop AVX assembly ifdefsDiego Biurrun2012-11-11
| | | | | | | | An assembler able to cope with AVX instructions is now required.
* | Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'Michael Niedermayer2012-10-31
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '6860b4081d046558c44b1b42f22022ea341a2a73': x86: include x86inc.asm in x86util.asm cng: Reindent some incorrectly indented lines cngdec: Allow flushing the decoder cngdec: Make the dbov variable have the right unit cngdec: Fix the memset size to cover the full array cngdec: Update the LPC coefficients after averaging the reflection coefficients configure: fix print_config() with broke awks Conflicts: libavcodec/x86/ac3dsp.asm libavcodec/x86/dct32.asm libavcodec/x86/deinterlace.asm libavcodec/x86/dsputil.asm libavcodec/x86/dsputilenc.asm libavcodec/x86/fft.asm libavcodec/x86/fmtconvert.asm libavcodec/x86/h264_chromamc.asm libavcodec/x86/h264_deblock.asm libavcodec/x86/h264_deblock_10bit.asm libavcodec/x86/h264_idct.asm libavcodec/x86/h264_idct_10bit.asm libavcodec/x86/h264_intrapred.asm libavcodec/x86/h264_intrapred_10bit.asm libavcodec/x86/h264_weight.asm libavcodec/x86/vc1dsp.asm libavcodec/x86/vp3dsp.asm libavcodec/x86/vp56dsp.asm libavcodec/x86/vp8dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: include x86inc.asm in x86util.asmDiego Biurrun2012-10-31
| | | | | | | | This is necessary to allow refactoring some x86util macros with cpuflags.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-09-08
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: mov_chan: Only set the channel_layout if setting it to a nonzero value mov_chan: Reindent an incorrectly indented line mp2 muxer: mark as AVFMT_NOTIMESTAMPS. x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64 x86: more specific checks for availability of required assembly capabilities x86: avcodec: Drop silly "_mmx" suffix from dsputil template names fate: Drop redundant setting of FUZZ to 1 cavsdsp: set idct permutation independently of dsputil x86: allow using add_hfyu_median_prediction_cmov on any cpu with cmov Conflicts: libavcodec/x86/dsputil_mmx.c libavformat/mp3enc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64Justin Ruggles2012-09-07
| | | | | | | | | | The SWAP macro does not work for explicit xmm/ymm usage, so instead just move the scalar value from xmm2 to xmm0.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-31
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: MSS1 and MSS2: set final pixel format after common stuff has been initialised MSS2 decoder configure: handle --disable-asm before check_deps x86: Split inline and external assembly #ifdefs configure: x86: Separate inline from standalone assembler capabilities pktdumper: Use a custom define instead of PATH_MAX for buffers pktdumper: Use av_strlcpy instead of strncpy pktdumper: Use sizeof(variable) instead of the direct buffer length Conflicts: Changelog configure libavcodec/allcodecs.c libavcodec/avcodec.h libavcodec/codec_desc.c libavcodec/dct-test.c libavcodec/imgconvert.c libavcodec/mss12.c libavcodec/version.h libavfilter/x86/gradfun.c libswscale/x86/yuv2rgb.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Split inline and external assembly #ifdefsDiego Biurrun2012-08-31
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-07
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: x86: fix build with nasm 2.08 x86: use nop cpu directives only if supported x86: fix rNmp macros with nasm build: add trailing / to yasm/nasm -I flags x86: use 32-bit source registers with movd instruction x86: add colons after labels Conflicts: Makefile libavutil/x86/x86inc.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: add colons after labelsMans Rullgard2012-08-07
| | | | | | | | | | | | nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-07-27
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: proresdsp: port x86 assembly to cpuflags. lavr: x86: improve non-SSE4 version of S16_TO_S32_SX macro lavfi: better channel layout negotiation alac: check for truncated packets alac: reverse lpc coeff order, simplify filter lavr: add x86-optimized mixing functions x86: add support for fmaddps fma4 instruction with abstraction to avx/sse tscc2: fix typo in array index build: use COMPILE template for HOSTOBJS build: do full flag handling for all compiler-type tools eval: fix printing of NaN in eval fate test. build: Rename aandct component to more descriptive aandcttables mpegaudio: bury inline asm under HAVE_INLINE_ASM. x86inc: automatically insert vzeroupper for YMM functions. rtmp: Check the buffer length of ping packets rtmp: Allow having more unknown data at the end of a chunk size packet without failing rtmp: Prevent reading outside of an allocate buffer when receiving server bandwidth packets Conflicts: Makefile configure libavcodec/x86/proresdsp.asm libavutil/eval.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86inc: automatically insert vzeroupper for YMM functions.Ronald S. Bultje2012-07-26
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-06-19
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: (24 commits) flvdec: remove incomplete, disabled seeking code mem: add support for _aligned_malloc() as found on Windows lavc: Extend the documentation for avcodec_init_packet flvdec: remove incomplete, disabled seeking code http: replace atoll() with strtoll() mpegts: remove unused/incomplete/broken seeking code af_amix: allow float planar sample format as input af_amix: use AVFloatDSPContext.vector_fmac_scalar() float_dsp: add x86-optimized functions for vector_fmac_scalar() float_dsp: Move vector_fmac_scalar() from libavcodec to libavutil lavr: Add x86-optimized function for flt to s32 conversion lavr: Add x86-optimized function for flt to s16 conversion lavr: Add x86-optimized functions for s32 to flt conversion lavr: Add x86-optimized functions for s32 to s16 conversion lavr: Add x86-optimized functions for s16 to flt conversion lavr: Add x86-optimized function for s16 to s32 conversion rtpenc: Support packetizing iLBC rtpdec: Add a depacketizer for iLBC Implement the iLBC storage file format mov: Support muxing/demuxing iLBC ... Conflicts: Changelog configure libavcodec/avcodec.h libavcodec/dsputil.c libavcodec/version.h libavformat/movenc.c libavformat/mpegts.c libavformat/version.h libavutil/mem.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * float_dsp: add x86-optimized functions for vector_fmac_scalar()Justin Ruggles2012-06-18
| |
* | x86/float_dsp.asm: restore author attributionMichael Niedermayer2012-06-09
|/ | | | | | | | | | | | | | | | | | | | | | | The attribution was removed by libav while moving the code to libavutil The original code is from commit eb4825b5d43bb6ecfae4d64688f9e2d2ac075263 Author: Loren Merritt <lorenm@u.washington.edu> Date: Thu Aug 10 19:06:25 2006 +0000 sse and 3dnow implementations of float->int conversion and mdct windowing. 15% faster vorbis. and commit 069720565ce0f2cc94fa2474f30d155b2755e350 Author: Loren Merritt <lorenm@u.washington.edu> Date: Fri Aug 11 18:19:37 2006 +0000 vorbis simd tweaks Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Add a float DSP framework to libavutilJustin Ruggles2012-06-08
Move vector_fmul() from DSPContext to AVFloatDSPContext.