summaryrefslogtreecommitdiff
path: root/libavcodec/mips
Commit message (Collapse)AuthorAge
* avcodec/mips: Refine ff_h264_h_lpf_luma_inter_msagxw2021-05-07
| | | | | | | | Using mask to avoid judgment, H264 4K decoding speed improved about 0.1fps tested on 3A4000 Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.gxw2021-05-07
| | | | | | | Speed of decoding H264 1080P: 5.05x ==> 5.13x Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Refine get_cabac_inline_mips.Shiyou Yin2021-05-07
| | | | | | | | | 1. Refined function get_cabac_inline_mips. 2. Optimize function get_cabac_bypass and get_cabac_bypass_sign. Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Restore the initialization sequence of MSA and MMI in ↵Shiyou Yin2021-05-07
| | | | | | | | | | ff_h264chroma_init_mips. The MSA optimization has been refined in commit 93218c2 and ce0a52e. It is better than MMI version now. Speed of decoding H264: 4.83x ==> 4.89x (tested on 3A4000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* Include attributes.h directlyAndreas Rheinhardt2021-04-19
| | | | | | | | Some files currently rely on libavutil/cpu.h to include it for them; yet said file won't use include it any more after the currently deprecated functions are removed, so include attributes.h directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bumpAnton Khirnov2021-01-01
| | | | They are not properly namespaced and not intended for public use.
* lavu: move LOCAL_ALIGNED from internal.h to mem_internal.hAnton Khirnov2021-01-01
| | | | That is a more appropriate place for it.
* avcodec/mpegaudiodec: Hardcode tables to save spaceAndreas Rheinhardt2020-12-08
| | | | | | | | | | | | | | | | | | | The csa_tables (which always consist of 32 entries of four byte each, but the type depends upon whether the decoder is fixed or floating-point) are currently initialized once during decoder initialization; yet it turns out that this is actually no benefit: The code used to initialize these tables takes up 153 (fixed point) and 122 (floating point) bytes when compiled with GCC 9.3 with -O3 on x64, so it is better to just hardcode these tables. Essentially the same applies to the is_tables: They have a size of 128B each and the code to initialize them occupies 149 (fixed point) resp. 140 (floating point) bytes. So hardcode them, too. To make the origin of the tables clear, references to the code used to create them have been added. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
* avcodec/fft_template, fft_init_table: Make ff_fft_init() thread-safeAndreas Rheinhardt2020-11-24
| | | | | | | | | | | | | | | | | | | | | | | | Commit 1af615683e4a1a858407afbaa2fd686842da7e49 put initializing the ff_fft_offsets_lut (which is typically used if FFT_FIXED_32) behind an ff_thread_once() to make ff_fft_init() thread-safe; yet there is a second place where said table may be initialized which is not guarded by this AVOnce: ff_fft_init_mips(). MIPS uses this LUT even for ordinary floating point FFTs, so that ff_fft_init() is not thread-safe (on MIPS) for both 32bit fixed-point as well as floating-point FFTs; e.g. ff_mdct_init() inherits this flaw and therefore initializing e.g. the AAC decoders is not thread-safe (on MIPS) despite them having FF_CODEC_CAP_INIT_CLEANUP set. This commit fixes this by moving the AVOnce to fft_init_table.c and using it to guard all initializations of ff_fft_offsets_lut. (It is not that bad in practice, because every entry of ff_fft_offsets_lut is never read during initialization and is only once ever written to (namely to its final value); but even these are conflicting actions which are (by definition) data races and lead to undefined behaviour.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
* avcodec/mips: [loongson] Fixed mmi optimizationgxw2020-09-10
| | | | | | | | Test case fate-checkasm-h264pred failed in latest community code. This patch fixed the bug. Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Fix segfault in imdct36_mips_float.Shiyou Yin2020-07-30
| | | | | | | 'li.s' is a synthesized instruction, it does not work properly when compiled with clang on mips, and A segfault occurred. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips/cabac: Fix a bug in get_cabac_inline_mips.Shiyou Yin2020-07-30
| | | | | | | Failed fate case: fate-h264-conformance-caba2_sony_e Clang is more strict in the use of register constraint. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Fix register constraint error reported by clang.Shiyou Yin2020-07-30
| | | | | | | | | Clang report following error in aacsbr_mips.c,ac3dsp_mips.c and aacdec_mips.c: "couldn't allocate output register for constraint 'r'" Use 'f' constraint for float variable. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* libavcodec: MIPS: MMI: Move sp out of the clobber listJiaxun Yang2020-07-23
| | | | | | | | | | | | | | | GCC complains: warning: listing the stack pointer register ‘$29’ in a clobber list is deprecated [-Wdeprecated] Actually stack pointer was restored at the end of the inline assembly so there is no reason to add it to the clobber list. Also use $sp insted of $29 to make our intention much more clear. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* libavcodec: MIPS: MMI: Fix type mismatchesJiaxun Yang2020-07-23
| | | | | | | | GCC complains about them. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* libavcodec: Enable runtime detection for MIPS MMI & MSAJiaxun Yang2020-07-23
| | | | | | | | | Apply optimized functions according to cpuflags. MSA is usually put after MMI as it's generally faster than MMI. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* ffbuild: Refine MIPS handlingJiaxun Yang2020-07-23
| | | | | | | | | | | | | | | | To enable runtime detection for MIPS, we need to refine ffbuild part to support buildding these feature together. Firstly, we fixed configure, let it probe native ability of toolchain to decide wether a feature can to be enabled, also clearly marked the conflictions between loongson2 & loongson3 and Release 6 & rest. Secondly, we compile MMI and MSA C sources with their own flags to ensure their flags won't pollute the whole program and generate illegal code. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: fix type mismatch in h264dsp_msa.cShiyou Yin2020-07-19
| | | | | | gcc warning: assignment from incompatible pointer type. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: fix get_cabac_inline_mips function nameRosen Penev2020-04-12
| | | | | | | | On other platforms, the functions are named get_cabac_inline_xxx but not this one. There's also a define. Signed-off-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/aacdec: fix compilation under soft float MIPSRosen Penev2020-04-11
| | | | | | | | Place HAVE_MIPSFPU further up so that functions that use floating point ASM are defined away. Otherwise compilation failures result when soft float in enabled on the toolchain. Signed-off-by: Rosen Penev <rosenp@gmail.com>
* lavc/mips: simplify the switch codeLinjie Fu2019-12-12
| | | | | Signed-off-by: Linjie Fu <linjie.fu@intel.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: msa optimizations for vc1dspgxw2019-10-30
| | | | | | | Performance of WMV3 decoding has speed up from 3.66x to 5.23x tested on 3A4000. Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Fixed four warnings in vc1dspgxw2019-10-14
| | | | | | | | | Change the stride argument to ptrdiff_t in the following functions: ff_put_no_rnd_vc1_chroma_mc8_mmi, ff_put_no_rnd_vc1_chroma_mc4_mmi, ff_avg_no_rnd_vc1_chroma_mc8_mmi, ff_avg_no_rnd_vc1_chroma_mc4_mmi. Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: refactor msa SLDI_Bn_0 and SLDI_Bn macros.gxw2019-09-16
| | | | | | | | | | | | Changing details as following: 1. The previous order of parameters are irregular and difficult to understand. Adjust the order of the parameters according to the rule: (RTYPE, input registers, input mask/input index/..., output registers). Most of the existing msa macros follow the rule. 2. Remove the redundant macro SLDI_Bn_0 and use SLDI_Bn instead. Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Fix a warnning of indentation not reflect the block structure.Shiyou Yin2019-09-10
| | | | | | | | The indentation of code dose not reflect the if block structure in 'apply_ltp_mips', and this will generate a warnning when build with '-Wall' or '-Wmisleading-indentation'. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: refine msa macros CLIP_*.gxw2019-08-13
| | | | | | | | | | | | | | | Changing details as following: 1. Remove the local variable 'out_m' in 'CLIP_SH' and store the result in source vector. 2. Refine the implementation of macro 'CLIP_SH_0_255' and 'CLIP_SW_0_255'. Performance of VP8 decoding has speed up about 1.1%(from 7.03x to 7.11x). Performance of H264 decoding has speed up about 0.5%(from 4.35x to 4.37x). Performance of Theora decoding has speed up about 0.7%(from 5.79x to 5.83x). 3. Remove redundant macro 'CLIP_SH/Wn_0_255_MAX_SATU' and use 'CLIP_SH/Wn_0_255' instead, because there are no difference in the effect of this two macros. Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: Avoid instruction exception caused by gssqc1/gslqc1.Shiyou Yin2019-08-02
| | | | Ensure the address accesed by gssqc1/gslqc1 are 16-byte aligned.
* avcodec/mips: [loongson] refine process of setting block as 0 in h264dsp_mmi.Shiyou Yin2019-07-28
| | | | | | | | | | | | | | | | | | | | | | In function ff_h264_add_pixels4_8_mmi, there is no need to reset '%[ftmp0]' to 0, because it's value has never changed since the start of the asm block. This patch remove the redundant 'xor' and set src to zero once it was loaded. In function ff_h264_idct_add_8_mmi, 'block' is seted to zero twice. This patch removed the first setting zero operation and move the second one after the load operation of block. In function ff_h264_idct8_add_8_mmi, 'block' is seted to zero twice too. This patch just removed the second setting zero operation. This patch mainly simplifies the implementation of functions above, the effect on the performance of whole h264 decoding process is not obvious. According to the perf data, proportion of ff_h264_idct_add_8_mmi decreased from 0.29% to 0.26% and ff_h264_idct8_add_8_mmi decreased from 0.62% to 0.59% when decoding H264 format on loongson 3A3000(For reference only , not very stable.). Reviewed-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: refactor msa load and store macros.Shiyou Yin2019-07-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace STnxm_UB and LDnxm_SH with new macros ST_{H/W/D}{1/2/4/8}. The old macros are difficult to use because they don't follow the same parameter passing rules. Changing details as following: 1. remove LD4x4_SH. 2. replace ST2x4_UB with ST_H4. 3. replace ST4x2_UB with ST_W2. 4. replace ST4x4_UB with ST_W4. 5. replace ST4x8_UB with ST_W8. 6. replace ST6x4_UB with ST_W2 and ST_H2. 7. replace ST8x1_UB with ST_D1. 8. replace ST8x2_UB with ST_D2. 9. replace ST8x4_UB with ST_D4. 10. replace ST8x8_UB with ST_D8. 11. replace ST12x4_UB with ST_D4 and ST_W4. Examples of new macro: ST_H4(in, idx0, idx1, idx2, idx3, pdst, stride) ST_H4 store four half-word elements in vector 'in' to pdst with stride. About the macro name: 1) 'ST' means store operation. 2) 'H/W/D' means type of vector element is 'half-word/word/double-word'. 3) Number '1/2/4/8' means how many elements will be stored. About the macro parameter: 1) 'in0, in1...' 128-bits vector. 2) 'idx0, idx1...' elements index. 3) 'pdst' destination pointer to store to 4) 'stride' stride of each store operation. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips/cabac: replace addi with addiuYunQiang Su2019-07-10
| | | | | | | | | | | | addi/daddi are deprecated by MIPS for years, and MIPS r6 remove them. They should be replace with addiu: ADDIU performs the same arithmetic operation but does not trap on overflow. Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] fix mpeg4 decoding error on loongson platform.Shiyou Yin2019-05-26
| | | | | | | In function ff_dct_unquantize_mpeg2_intra_mmi, addr0 shoudn't be changed before storage operation. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] mmi optimizations for VP9 put and avg functionsgxw2019-02-27
| | | | | | | VP9 decoding speed improved about 60.5%(from 38fps to 61fps, tested on loongson 3A3000). Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] optimize theora decoding with mmi.gxw2019-02-16
| | | | | | | | | | | | | Optimize theora decoding with mmi in functions: 1. ff_vp3_idct_add_mmi 2. ff_vp3_idct_put_mmi 3. ff_vp3_idct_dc_add_mmi 4. ff_put_no_rnd_pixels_l2_mmi Theora decoding speed improved about 32%(from 88fps to 116fps, Tested on loongson 3A3000). Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] optimize put_hevc_qpel_h_8 with mmi.Shiyou Yin2019-02-02
| | | | | | | Optimize put_hevc_qpel_h_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 2%(2.39x to 2.44x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] optimize put_hevc_qpel_bi_h_8 with mmi.Shiyou Yin2019-02-02
| | | | | | | Optimize put_hevc_qpel_bi_h_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 2.1%(2.34x to 2.39x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] optimize put_hevc_epel_bi_hv_8 with mmi.Shiyou Yin2019-02-02
| | | | | | | Optimize put_hevc_epel_bi_hv_8 with mmi in the case width=4/8/12/16/24/32. This optimization improved HEVC decoding performance 1.7%(2.30x to 2.34x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] optimize put_hevc_qpel_uni_hv_8 with mmi.Shiyou Yin2019-02-02
| | | | | | | Optimize put_hevc_qpel_uni_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 2.7%(2.24x to 2.30x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] optimize put_hevc_qpel_bi_hv_8 with mmi.Shiyou Yin2019-01-22
| | | | | | | Optimize put_hevc_qpel_bi_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 11.4%(2.01x to 2.24x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] optimize put_hevc_qpel_hv_8 with mmi.Shiyou Yin2019-01-22
| | | | | | | Optimize put_hevc_qpel_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 11%(1.81x to 2.01x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] optimize put_hevc_pel_bi_pixels_8 with mmi.Shiyou Yin2019-01-20
| | | | | | | Optimize put_hevc_pel_bi_pixels_8 with mmi in the case width=8/16/24/32/48/64. This optimization improved HEVC decoding performance 2%(1.77x to 1.81x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] optimize theora decoding in vp3dsp.gxw2018-12-27
| | | | | | | | | | | | | | Optimize theora decoding with msa in functions: 1. ff_vp3_idct_add_msa 2. ff_vp3_idct_put_msa 3. ff_vp3_idct_dc_add_msa 4. ff_vp3_v_loop_filter_msa 5. ff_vp3_h_loop_filter_msa 6. ff_put_no_rnd_pixels_l2_msa Theora decoding speed improved about 36%(from 22fps to 30fps, Tested on loongson 2K1000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Fix failed case: hevc-conformance-AMP_A_Samsung_* when enable msagxw2018-12-24
| | | | | | | | | The AV_INPUT_BUFFER_PADDING_SIZE has been increased to 64, but the value is still 32 in function ff_hevc_sao_edge_filter_8_msa. So, use AV_INPUT_BUFFER_PADDING_SIZE directly. Also, use MAX_PB_SIZE directly instead of 64. Fate tests passed. Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] enable MSA optimization for loongson platform.Shiyou Yin2018-12-18
| | | | | | Set initialization order of MSA after MMI to make it work on loongson platform(msa is supported by loongson2k、3a4000 etc.). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] refine optimization in h264_chroma.Shiyou Yin2018-12-01
| | | | | | | Remove invalid operation in the case x and y all equal 0, this refine made about 2% speedup for H264 decode on loongson platform. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec: [loongson] optimize get_cabac_inline.Shiyou Yin2018-09-19
| | | | | | This optimization improved h264 decoding performance about 4%(from 74fps to 77fps, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] refine ff_vc1_inv_trans_8x8_mmi.Shiyou Yin2018-09-19
| | | | | | | Combined 1st and 2nd loop into one inline asm in function ff_vc1_inv_trans_8x8_mmi to reduce memory operation, and made some small optimization in ff_vc1_inv_trans_4x8_mmi. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] fix bug of svq3-watermark failed in fate test.Shiyou Yin2018-09-14
| | | | | | | | | | | Failed case: svq3-watermark When minimum loop count of following functions are greater than parameter h passed to them, svq3-watermark failed. 1. ff_put_pixels4_8_mmi 2. ff_avg_pixels4_8_mmi 3. ff_put_pixels4_l2_8_mmi 4. ff_avg_pixels4_l2_8_mmi Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: [loongson] simplify macro TRANSPOSE_4H and TRANSPOSE_8BShiyou Yin2018-09-09
| | | | | | Simplify macro TRANSPOSE_4H in mmiutils.h and add TRANSPOSE_8B as a common macro. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] optimize vp8 decoding in vp8dsp.gxw2018-09-09
| | | | | | | | | | | | | Optimize vp8 loop filter with mmi, four functions optimized: 1. ff_vp8_h_loop_filter8uv_mmi. 2. ff_vp8_v_loop_filter8uv_mmi. 3. ff_vp8_h_loop_filter16_mmi. 4. ff_vp8_v_loop_filter16_mmi. Vp8 decoding speed improved about 50%(from 73fps to 110fps, Tested on loongson 3A3000). Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] fix improper use of register constraints.Shiyou Yin2018-09-07
| | | | | | | | Constraint "g" means compiler can store variable in memory or register. When we use constraint "g" for a variable and this variable was operated by instruction which only support register operands may lead "invalid operands" error. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>