From 2ab4d61cd4b632c0e991c781f3c15f3b054d1bbd Mon Sep 17 00:00:00 2001 From: eschnett Date: Mon, 6 Jun 2011 10:11:44 +0000 Subject: Introduce Cactus options for vectorisation Introduce configuration-time options for vectorisation, including options to allow architecture-specific choices that may influence performance. Introduce "middle" masked stores for large vector sizes and small loops. Clean up and simplify some of the implementation code. git-svn-id: https://svn.cct.lsu.edu/repos/numrel/LSUThorns/Vectors/trunk@10 105869f7-3296-0410-a4ea-f4349344b45a --- README | 47 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) (limited to 'README') diff --git a/README b/README index a49408d..40a19a7 100644 --- a/README +++ b/README @@ -6,4 +6,49 @@ Licence : GPL 1. Purpose -Provide a C++ class template that helps vectorisation. +Provide C macro definitions and a C++ class template that help +vectorisation. + + + +2. Build-time choices + +Several choices can be made via configuration options, which can be +set to "yes" or "no": + +VECTORISE (default "no"): Vectorise. Otherwise, scalar code is +generated, and the other options have no effect. + + + +VECTORISE_ALIGNED_ARRAYS (default "no", experimental): Assume that all +arrays have an extent in the x direction that is a multiple of the +vector size. This allows aligned load operations e.g. for finite +differencing operators in the y and z directions. (Setting this +produces faster code, but may lead to segfaults if the assumption is +not true.) + +VECTORISE_ALWAYS_USE_UNALIGNED_LOADS (default "no", experimental): +Replace all aligned load operations with unaligned load operations. +This may simplify some code where alignment is unknown at compile +time. This should never lead to better code, since the default is to +use aligned load operations iff the alignment is known to permit this +at build time. This options is probably useless. + +VECTORISE_ALWAYS_USE_ALIGNED_LOADS (default "no", experimental): +Replace all unaligned load operations by (multiple) aligned load +operations and corresponding vector-gather operations. This may be +beneficial if unaligned load operations are slow, and if vector-gather +operations are fast. + +VECTORISE_INLINE (default "yes"): Inline functions into the loop body +as much as possible. (Disabling this may reduce code size, which can +improve performance if the instruction cache is small.) + +VECTORISE_STREAMING_STORES (default "yes"): Use streaming stores, i.e. +use store operations that bypass the cache. (Disabling this produces +slower code.) + +VECTORISE_EMULATE_AVX (default "no", experimental): Emulate AVX +instructions with SSE2 instructions. This produces slower code, but +can be used to test AVX code on systems that don't support AVX. -- cgit v1.2.3