aboutsummaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authoreschnett <eschnett@105869f7-3296-0410-a4ea-f4349344b45a>2011-06-06 10:11:44 +0000
committereschnett <eschnett@105869f7-3296-0410-a4ea-f4349344b45a>2011-06-06 10:11:44 +0000
commit2ab4d61cd4b632c0e991c781f3c15f3b054d1bbd (patch)
tree6664b1e9ee360ee0abf9df6b9a5562eb5bdc88c5 /README
parent5d4858e0736a0c0881c65b9e9ac0983d3b5bb24b (diff)
Introduce Cactus options for vectorisation
Introduce configuration-time options for vectorisation, including options to allow architecture-specific choices that may influence performance. Introduce "middle" masked stores for large vector sizes and small loops. Clean up and simplify some of the implementation code. git-svn-id: https://svn.cct.lsu.edu/repos/numrel/LSUThorns/Vectors/trunk@10 105869f7-3296-0410-a4ea-f4349344b45a
Diffstat (limited to 'README')
-rw-r--r--README47
1 files changed, 46 insertions, 1 deletions
diff --git a/README b/README
index a49408d..40a19a7 100644
--- a/README
+++ b/README
@@ -6,4 +6,49 @@ Licence : GPL
1. Purpose
-Provide a C++ class template that helps vectorisation.
+Provide C macro definitions and a C++ class template that help
+vectorisation.
+
+
+
+2. Build-time choices
+
+Several choices can be made via configuration options, which can be
+set to "yes" or "no":
+
+VECTORISE (default "no"): Vectorise. Otherwise, scalar code is
+generated, and the other options have no effect.
+
+
+
+VECTORISE_ALIGNED_ARRAYS (default "no", experimental): Assume that all
+arrays have an extent in the x direction that is a multiple of the
+vector size. This allows aligned load operations e.g. for finite
+differencing operators in the y and z directions. (Setting this
+produces faster code, but may lead to segfaults if the assumption is
+not true.)
+
+VECTORISE_ALWAYS_USE_UNALIGNED_LOADS (default "no", experimental):
+Replace all aligned load operations with unaligned load operations.
+This may simplify some code where alignment is unknown at compile
+time. This should never lead to better code, since the default is to
+use aligned load operations iff the alignment is known to permit this
+at build time. This options is probably useless.
+
+VECTORISE_ALWAYS_USE_ALIGNED_LOADS (default "no", experimental):
+Replace all unaligned load operations by (multiple) aligned load
+operations and corresponding vector-gather operations. This may be
+beneficial if unaligned load operations are slow, and if vector-gather
+operations are fast.
+
+VECTORISE_INLINE (default "yes"): Inline functions into the loop body
+as much as possible. (Disabling this may reduce code size, which can
+improve performance if the instruction cache is small.)
+
+VECTORISE_STREAMING_STORES (default "yes"): Use streaming stores, i.e.
+use store operations that bypass the cache. (Disabling this produces
+slower code.)
+
+VECTORISE_EMULATE_AVX (default "no", experimental): Emulate AVX
+instructions with SSE2 instructions. This produces slower code, but
+can be used to test AVX code on systems that don't support AVX.