Introduce Cactus options for vectorisation

Introduce configuration-time options for vectorisation, including options to allow architecture-specific choices that may influence performance. Introduce "middle" masked stores for large vector sizes and small loops. Clean up and simplify some of the implementation code. git-svn-id: https://svn.cct.lsu.edu/repos/numrel/LSUThorns/Vectors/trunk@10 105869f7-3296-0410-a4ea-f4349344b45a
author: eschnett <eschnett@105869f7-3296-0410-a4ea-f4349344b45a> 2011-06-06 10:11:44 +0000
committer: eschnett <eschnett@105869f7-3296-0410-a4ea-f4349344b45a> 2011-06-06 10:11:44 +0000
commit: 2ab4d61cd4b632c0e991c781f3c15f3b054d1bbd (patch)
tree: 6664b1e9ee360ee0abf9df6b9a5562eb5bdc88c5 /README
parent: 5d4858e0736a0c0881c65b9e9ac0983d3b5bb24b (diff)
1 files changed, 46 insertions, 1 deletions
diff --git a/README b/README
index a49408d..40a19a7 100644
--- a/README
+++ b/README
@@ -6,4 +6,49 @@ Licence      : GPL
 
 1. Purpose
 
-Provide a C++ class template that helps vectorisation.
+Provide C macro definitions and a C++ class template that help
+vectorisation.
+
+
+
+2. Build-time choices
+
+Several choices can be made via configuration options, which can be
+set to "yes" or "no":
+
+VECTORISE (default "no"): Vectorise. Otherwise, scalar code is
+generated, and the other options have no effect.
+
+
+
+VECTORISE_ALIGNED_ARRAYS (default "no", experimental): Assume that all
+arrays have an extent in the x direction that is a multiple of the
+vector size. This allows aligned load operations e.g. for finite
+differencing operators in the y and z directions. (Setting this
+produces faster code, but may lead to segfaults if the assumption is
+not true.)
+
+VECTORISE_ALWAYS_USE_UNALIGNED_LOADS (default "no", experimental):
+Replace all aligned load operations with unaligned load operations.
+This may simplify some code where alignment is unknown at compile
+time. This should never lead to better code, since the default is to
+use aligned load operations iff the alignment is known to permit this
+at build time. This options is probably useless.
+
+VECTORISE_ALWAYS_USE_ALIGNED_LOADS (default "no", experimental):
+Replace all unaligned load operations by (multiple) aligned load
+operations and corresponding vector-gather operations. This may be
+beneficial if unaligned load operations are slow, and if vector-gather
+operations are fast.
+
+VECTORISE_INLINE (default "yes"): Inline functions into the loop body
+as much as possible. (Disabling this may reduce code size, which can
+improve performance if the instruction cache is small.)
+
+VECTORISE_STREAMING_STORES (default "yes"): Use streaming stores, i.e.
+use store operations that bypass the cache. (Disabling this produces
+slower code.)
+
+VECTORISE_EMULATE_AVX (default "no", experimental): Emulate AVX
+instructions with SSE2 instructions. This produces slower code, but
+can be used to test AVX code on systems that don't support AVX.
author	eschnett <eschnett@105869f7-3296-0410-a4ea-f4349344b45a>	2011-06-06 10:11:44 +0000
committer	eschnett <eschnett@105869f7-3296-0410-a4ea-f4349344b45a>	2011-06-06 10:11:44 +0000
commit	2ab4d61cd4b632c0e991c781f3c15f3b054d1bbd (patch)
tree	6664b1e9ee360ee0abf9df6b9a5562eb5bdc88c5 /README
parent	5d4858e0736a0c0881c65b9e9ac0983d3b5bb24b (diff)