libav.git - [no description]

	Commit message (Collapse)	Author	Age
*	Add Apple Pixlet decoder	Paul B Mahol	2017-03-01
\| \| \| \|	Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
*	build: Generalize yasm/nasm-related variable names	Diego Biurrun	2017-03-01
\| \| \| \|	None of them are specific to the YASM assembler.
*	x86: hevc: Add missing colons after assembly labels	Diego Biurrun	2017-03-01
\| \| \| \| \|	This fixes several warnings of the sort warning: label alone on a line without a colon might be in error
*	h264_sei: Check actual presence of picture timing SEI message	Michael Niedermayer	2017-02-28
\| \| \| \| \|	Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
*	nvenc: Fix the preset mapping list	Ben Chang	2017-02-28
\| \| \| \| \| \| \| \| \| \| \| \|	The map is a sparse array and does not need a empty element to terminate it. The empty element is stored after the last one inserted in the list, overwriting whichever element was next with zeros. Bug-Id: 1029 Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
*	lavc: make sure not to return EAGAIN from codecs	Anton Khirnov	2017-02-25
\| \| \| \| \| \|	This error is treated specially by the API. CC: libav-stable@libav.org
*	svq3: fix the slice size check	Anton Khirnov	2017-02-25
\| \| \| \| \| \| \| \| \|	Currently it incorrectly compares bits with bytes. Also, move the check right before where it's relevant, so that the correct number of remaining bits is used. CC: libav-stable@libav.org
*	h264dec: fix dropped initial SEI recovery point	John Stebbins	2017-02-24
\|
*	aarch64: vp9itxfm: Reorder iadst16 coeffs	Martin Storsjö	2017-02-24
\| \| \| \| \| \| \| \| \| \| \| \|	This matches the order they are in the 16 bpp version. There they are in this order, to make sure we access them in the same order they are declared, easing loading only half of the coefficients at a time. This makes the 8 bpp version match the 16 bpp version better. Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Reorder iadst16 coeffs	Martin Storsjö	2017-02-24
\| \| \| \| \| \| \| \| \| \| \| \|	This matches the order they are in the 16 bpp version. There they are in this order, to make sure we access them in the same order they are declared, easing loading only half of the coefficients at a time. This makes the 8 bpp version match the 16 bpp version better. Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Reorder the idct coefficients for better pairing	Martin Storsjö	2017-02-24
\| \| \| \| \| \| \| \| \| \| \| \|	All elements are used pairwise, except for the first one. Previously, the 16th element was unused. Move the unused element to the second slot, to make the later element pairs not split across registers. This simplifies loading only parts of the coefficients, reducing the difference to the 16 bpp version. Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Reorder the idct coefficients for better pairing	Martin Storsjö	2017-02-24
\| \| \| \| \| \| \| \| \| \| \| \|	All elements are used pairwise, except for the first one. Previously, the 16th element was unused. Move the unused element to the second slot, to make the later element pairs not split across registers. This simplifies loading only parts of the coefficients, reducing the difference to the 16 bpp version. Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Avoid reloading the idct32 coefficients	Martin Storsjö	2017-02-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The idct32x32 function actually pushed d8-d15 onto the stack even though it didn't clobber them; there are plenty of registers that can be used to allow keeping all the idct coefficients in registers without having to reload different subsets of them at different stages in the transform. After this, we still can skip pushing d12-d15. Before: vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3 After: vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3 Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Avoid reloading the idct32 coefficients	Martin Storsjö	2017-02-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The idct32x32 function actually pushed q4-q7 onto the stack even though it didn't clobber them; there are plenty of registers that can be used to allow keeping all the idct coefficients in registers without having to reload different subsets of them at different stages in the transform. Since the idct16 core transform avoids clobbering q4-q7 (but clobbers q2-q3 instead, to avoid needing to back up and restore q4-q7 at all in the idct16 function), and the lanewise vmul needs a register in the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5 while doing idct16. While keeping these coefficients in registers, we still can skip pushing q7. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_sub32_add_neon: 18553.8 17182.7 14303.3 12089.7 After: vp9_inv_dct_dct_32x32_sub32_add_neon: 18470.3 16717.7 14173.6 11860.8 Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9lpf: Implement the mix2_44 function with one single filter pass	Martin Storsjö	2017-02-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For this case, with 8 inputs but only changing 4 of them, we can fit all 16 input pixels into a q register, and still have enough temporary registers for doing the loop filter. The wd=8 filters would require too many temporary registers for processing all 16 pixels at once though. Before: Cortex A7 A8 A9 A53 vp9_loop_filter_mix2_v_44_16_neon: 289.7 256.2 237.5 181.2 After: vp9_loop_filter_mix2_v_44_16_neon: 221.2 150.5 177.7 138.0 Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1	Martin Storsjö	2017-02-24
\| \| \| \| \| \| \| \| \| \| \|	This is one cycle faster in total, and three instructions fewer. Before: vp9_loop_filter_mix2_v_44_16_neon: 123.2 After: vp9_loop_filter_mix2_v_44_16_neon: 122.2 Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm/aarch64: vp9lpf: Keep the comparison to E within 8 bit	Martin Storsjö	2017-02-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The theoretical maximum value of E is 193, so we can just saturate the addition to 255. Before: Cortex A7 A8 A9 A53 A53/AArch64 vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7 vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0 After: vp9_loop_filter_v_4_8_neon: 136.0 125.7 112.6 84.0 83.0 vp9_loop_filter_v_8_8_neon: 234.0 195.5 171.5 136.0 133.7 vp9_loop_filter_v_16_8_neon: 490.0 417.5 377.7 289.0 271.0 vp9_loop_filter_v_16_16_neon: 951.2 814.7 732.3 571.0 446.7 Signed-off-by: Martin Storsjö <martin@martin.st>
*	Place attribute_deprecated in the right position for struct declarations	Diego Biurrun	2017-02-23
\| \| \| \|	libavcodec/vaapi.h:58:1: warning: attribute 'deprecated' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
*	nvenc: Fix nvec vs. nvenc typo	Diego Biurrun	2017-02-20
\|
*	webp: Fix alpha decoding	Mark Thompson	2017-02-18
\| \| \| \| \| \| \| \| \|	This was broken by 4e528206bc4d968706401206cf54471739250ec7 - the webp decoder was assuming that it could set the output pixfmt of the vp8 decoder directly, but after that change it no longer could because ff_get_format() was used instead. This adds an internal get_format() callback to webp use of the vp8 decoder to override the pixfmt appropriately.
*	vaapi_encode: Discard output buffer if picture submission fails	Mark Thompson	2017-02-16
\| \| \| \| \|	Previously this was leaking, though it actually hit an assert making sure that the buffer had already been cleared when freeing the picture.
*	libopenh264dec: Let the framework use the h264_mp4toannexb bitstream filter	Martin Storsjö	2017-02-15
\| \| \| \| \| \|	This avoids a lot of boilerplate code within the decoder wrapper itself. Signed-off-by: Martin Storsjö <martin@martin.st>
*	vaapi: Implement device-only setup	Mark Thompson	2017-02-13
\| \| \| \| \|	In this case, the user only supplies a device and the frame context is allocated internally by lavc.
*	lavc: Add device context field to AVCodecContext	Mark Thompson	2017-02-13
\| \| \| \|	For use by codec implementations which can allocate frames internally.
*	aarch64: vp9lpf: Fix broken indentation/vertical alignment	Martin Storsjö	2017-02-12
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9lpf: Interleave the start of flat8in into the calculation above	Martin Storsjö	2017-02-11
\| \| \| \| \| \| \|	This adds lots of extra .ifs, but speeds it up by a couple cycles, by avoiding stalls. Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9lpf: Interleave the start of flat8in into the calculation above	Martin Storsjö	2017-02-11
\| \| \| \| \| \| \|	This adds lots of extra .ifs, but speeds it up by a couple cycles, by avoiding stalls. Signed-off-by: Martin Storsjö <martin@martin.st>
*	dv: Convert to the new bitstream reader	Luca Barbato	2017-02-11
\|
*	aac: Validate the sbr sample rate before using the value	Luca Barbato	2017-02-11
\| \| \| \| \| \| \|	Avoid a floating point exception. Bug-Id: 1027 CC: libav-stable@libav.org
*	lavc: use av_cpu_max_align() instead of hardcoding alignment requirements	Anton Khirnov	2017-02-11
\|
*	arm: vp9lpf: Use orrs instead of orr+cmp	Martin Storsjö	2017-02-11
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm/aarch64: vp9lpf: Calculate !hev directly	Martin Storsjö	2017-02-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we first calculated hev, and then negated it. Since we were able to schedule the negation in the middle of another calculation, we don't see any gain in all cases. Before: Cortex A7 A8 A9 A53 A53/AArch64 vp9_loop_filter_v_4_8_neon: 147.0 129.0 115.8 89.0 88.7 vp9_loop_filter_v_8_8_neon: 242.0 198.5 174.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 500.0 419.5 382.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 971.2 825.5 731.5 579.0 453.0 After: vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7 vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0 Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling	Martin Storsjö	2017-02-11
\| \| \| \| \| \| \| \| \| \| \| \| \|	This work is sponsored by, and copyright, Google. Before: Cortex A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 180.2 vp9_inv_dct_dct_32x32_sub1_add_neon: 475.3 Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling	Martin Storsjö	2017-02-11
\| \| \| \| \| \| \| \| \| \| \| \| \|	This work is sponsored by, and copyright, Google. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 752.0 459.2 862.2 553.9 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 226.5 145.0 225.1 171.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 721.2 415.7 727.6 475.0 Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter	Martin Storsjö	2017-02-11
\| \| \| \| \| \|	No measured speedup on a Cortex A53, but other cores might benefit. Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter	Martin Storsjö	2017-02-11
\| \| \| \| \| \| \| \| \|	Before: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_4h_neon: 378.1 273.2 340.7 229.5 After: vp9_put_8tap_smooth_4h_neon: 352.1 222.2 290.5 229.5 Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9mc: Simplify the extmla macro parameters	Martin Storsjö	2017-02-11
\| \| \| \| \| \| \| \| \| \| \| \|	Fold the field lengths into the macro. This makes the macro invocations much more readable, when the lines are shorter. This also makes it easier to use only half the registers within the macro. Signed-off-by: Martin Storsjö <martin@martin.st>
*	utvideodec: Add a missing include	Martin Storsjö	2017-02-10
\| \| \| \| \| \|	This was missing from 77c23704c76, fixing building. Signed-off-by: Martin Storsjö <martin@martin.st>
*	nvenc: make gpu indices independent of supported capabilities	Timo Rothenpieler	2017-02-09
\| \| \| \| \| \|	Do not allocate a CUDA context for every available gpu. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
*	avcodec: Mark some codecs with threadsafe init as such	Derek Buitenhuis	2017-02-09
\| \| \| \| \|	Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
*	aarch64: vp9itxfm: Fix incorrect vertical alignment	Martin Storsjö	2017-02-09
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Update a comment to refer to a register with a different name	Martin Storsjö	2017-02-09
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Use the right lane sizes in 8x8 for improved readability	Martin Storsjö	2017-02-09
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible	Martin Storsjö	2017-02-09
\| \| \| \| \| \| \| \| \|	The ld1r is a leftover from the arm version, where this trick is beneficial on some cores. Use a single-lane load where we don't need the semantics of ld1r. Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 ↵	Martin Storsjö	2017-02-09
\| \| \| \| \| \|	function Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function	Martin Storsjö	2017-02-09
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32	Martin Storsjö	2017-02-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 14740 bytes to 24292 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1387.4 vp9_inv_dct_dct_16x16_sub16_add_neon: 1387.6 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5198.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 5198.6 vp9_inv_dct_dct_32x32_sub8_add_neon: 5196.3 vp9_inv_dct_dct_32x32_sub12_add_neon: 6183.4 vp9_inv_dct_dct_32x32_sub16_add_neon: 6174.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 7151.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 7145.3 vp9_inv_dct_dct_32x32_sub28_add_neon: 8119.3 vp9_inv_dct_dct_32x32_sub32_add_neon: 8118.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 640.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 639.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 842.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1388.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 1389.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 3685.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 3685.1 vp9_inv_dct_dct_32x32_sub8_add_neon: 3684.4 vp9_inv_dct_dct_32x32_sub12_add_neon: 5312.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 5315.4 vp9_inv_dct_dct_32x32_sub20_add_neon: 7154.9 vp9_inv_dct_dct_32x32_sub24_add_neon: 7154.5 vp9_inv_dct_dct_32x32_sub28_add_neon: 8126.6 vp9_inv_dct_dct_32x32_sub32_add_neon: 8127.2 Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible	Martin Storsjö	2017-02-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 12388 bytes to 19784 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5 vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2 vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9 vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7 vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3 vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3 vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2 vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4 vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8 vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6 vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7 vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9 vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1 vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0 Signed-off-by: Martin Storsjö <martin@martin.st>
*	aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 ↵	Martin Storsjö	2017-02-09
\| \| \| \| \| \| \| \| \|	function This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: Martin Storsjö <martin@martin.st>
*	arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function	Martin Storsjö	2017-02-09
\| \| \| \| \| \| \|	This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: Martin Storsjö <martin@martin.st>