SPRAAK has the following requirements:
tar zxvf scons-0.98.5.tar.gz cd scons-0.98.5
python setup.py install --prefix=$SPRAAK_BASE_DIR/scons_base
For building the documentation (optional), SPRAAK also relies on:
The functionality of SPRAAK can be further extended when the link libraries and header files for the following components are available:
These additional libraries are available in most Linux distribution and in the Cygwin environment for Windows. Whether or not they should be used, and any unusual installation directory can be specified in config.py.
SPRAAK uses 'scons' (a Python build-tool) the compile and install the software.
All user configurable settings (location of the different files, compiler options, ...) are grouped in the file 'config.py' in the root directory of the SPRAAK package. See the comments in config.py and the section Package structure for more info.
The default configuration of SPRAAK allows multiple versions (32 and 64 bits versions; different operating systems and/or processors; debug, release and profile versions; ...) to be installed and maintained in parallel on a single shared file system. Hence, the default directory structure differs somewhat from what is typical in packages ment for local installation only. See Package structure for more details.
To see the all possible build options, run
scons -h
Calling
scons install
will make a build using the default option (CONFIG=release, EXPORT=developer). This results in a library and executables that are aimed for final release (don't contain debug, profile, ... information). At the same time, the header files and the documentation are aimed at developers, i.e. they expose everything.
In order to be able to run the programs, users have to adjust their login scripts so that the PATH environment variable includes the location of the SPRAAK executables. On systems which do not support locating dynamic library files relative w.r.t. the executables, the LD_LIBRARY_PATH environment variable (or equivalent) must be adjusted to include the location of the SPRAAK library as well.
The examples directory contains a series of example experiments that may be used to test if the software behaves correctly. See spr_train_eval for more details on the different examples and the expected outcome.
The source files for the SPRAAK package are organized as follow:
Other files, related to but not an integrated part of SPRAAK, can be found in:
Upon building (using the default settings), the following directory structure will be created:
scons install) the desired export level <lvl> for his environment. If one wants to install multiple export levels (or make sure only a given export level can be installed), the file config.py must be adjusted! The export level <lvl> can have the following values:
The <arch> infix combines information about the operating system, CPU and configuration (debug, release, ...) into a single name.
In order to facilitate package management, SPRAAK also allows for setting symbolic links from the 'export' directories to some final destination directories (e.g. /usr/bin, /usr/lib and /usr/include). This way, all files from the SPRAAK package can be kept in a single directory while users still can have access to the full functionality without having the expand their PATH, LD_LIBRARY_PATH, ...). Furthermore, when SPRAAK is removed, upgraded, ... the worst that can happen is some dangling links in the /usr/bin, /usr/lib, and /usr/include/ directories. See the comments in config.py for more info.
# This file defines all user configurable options. # Since all options are defined using standard python code, platform specific # code, complex customization and even cross-compiling should be possible. # Some usefull variables which are set when calling this code: # - spraak.os operating system (linux, windows, osx, ...) # - spraak.cpu cpu family (x86_64, i686, power, ...) # - spraak.cc default compiler for this platform (gcc, icc, cc, ...) # - spraak.cc_ver version number of the default compiler [major minor ...] # - spraak.config string describing the requested configuration, e.g. "release", "debug", "regression", "profile", ... # - spraak.arch string describing the target architecture+config # set to spraak.os+'_'+spraak.cpu+('' if(spraak.config=="release") else '_'+spraak.config) # Note: this configuration script is allowed to change the above # variables (e.g. forcing 32bits executables on a 64bit platform) but # incorrect/inconsistent values are likely to lead to unexpected behavior ## ## destinatination directories for the final install ## # Windows is always a bit different :-) not_windows = (spraak.os!="Cygwin") # base installation directory spraak.prefix = '.' # directory for architecture independent executables (scripts) spraak.bindir = spraak.prefix+'/bin' # directory for architecture dependent executables spraak.bindir_arch = (spraak.prefix+'/bin/'+spraak.arch if(not_windows) else spraak.bindir) # directory for architecture independent libraries (scripts) spraak.libdir = (spraak.prefix+'/lib' if(not_windows) else spraak.bindir) # directory for architecture dependent libraries; must be identical to 'bindir_arch' on some platforms (e.g. Windows) spraak.libdir_arch = (spraak.prefix+'/lib/'+spraak.arch if(not_windows) else spraak.bindir) # directory for include files spraak.includedir = spraak.prefix+'/include' # documentation directories (two formats) spraak.docdir_html = spraak.prefix+'/doc/html' spraak.docdir_pdf = spraak.prefix+'/doc/pdf' # read-only architecture-independent data spraak.datadir = spraak.prefix+'/data' # In order to keep all files form spraak in a single subdirectory, the build # process allows setting symbolic links from the default system bin, lib and # include directories to the package specific bin(_arch), lib(_arch) and # include directories specified above. # Set to None if not wanted. spraak.bindir_link = None spraak.libdir_link = None spraak.includedir_link = None # some file systems do not support symbolic links, and hence the files must by copied spraak.symlink_support = not_windows ## ## build directory (the build sub-directory structure is not configurable) ## spraak.builddir = spraak.prefix+'/build' ## ## configurable options ## # typical number of threads (CPU cores available) # 0 : compile single threaded (no support for multi-threading at all) # 1 : optimize for single threaded operation but allow multiple threads # N : optimize for N concurrent threads spraak.Nthreads = 2 ## ## external libraries that provide additional functionality ## # possible values: # - None : functionality is not desired # - "auto" : automatic scan for the library and/or headers # - pull/path/name : exact location (directory) of the header files(s) # - pull/path/name/lib : exact location (file without extension and without 'lib' prefix) of the library # python: use python functionality in the C-code such as python pre-processing # or lattice processing modules (NIY) spraak.python = None spraak.python_h = None # zlib: handling gzipped files spraak.zlib = "auto" spraak.zlib_h = "auto" # bzlib: handling bzip2-compressed files (NIY) spraak.bzlib = None spraak.bzlib_h = None # flac: handling flac-compressed audio (NIY) spraak.flac = None spraak.flac_h = None # gcrypt: gnu privacy guard cryptography -- strong encryption of data (NIY) spraak.gcrypt = None spraak.gcrypt_h = None # arpack: handling larger sparse (or structured) eigen/singular value problems (NIY) spraak.arpack = "auto" spraak.arpack_h = "auto" # readline: read lines from the terminal with editing (NIY) spraak.readline = None spraak.readline_h = None ## ## compiler & linker flags ## def select_best_match(opt_dict,spraak): # the compiler options are typically architecture dependent and hence are best # specified using a dictionary having as key a string containing the following # fields separated with a single space: # spraak.os spraak.cpu spraak.cc spraak.cc_ver[0] spraak.config # Don't cares are indicated with '*'. sel = [spraak.os,spraak.cpu,spraak.cc,str(spraak.cc_ver[0]),spraak.config]; best_opt = None; best_cnt = 9; for opt_str,opt_val in opt_dict.iteritems(): cnt = 6; for ndx,val in enumerate(opt_str.split()): if(val == sel[ndx]): cnt -= 1; elif(val != "*"): opt_val = None; if((opt_val!=None) and (cnt<best_cnt)): best_opt = opt_val; best_cnt = cnt; if(best_opt == None): print "ERROR No match found for %s %s %s %s %s"%(spraak.os,spraak.cpu,spraak.cc,str(spraak.cc_ver[0]),spraak.config) spraak.signal_error = 1; return(best_opt); # path to a special version of the compiler, e.g. an experimental branch of the gcc compiler if(os.path.isdir("/freeware/bin/gnu-tools")): env.Replace(CC="gcc.unsupported"); spraak.cc_path = '/freeware/bin/gnu-tools'; spraak.cc_prog = env['CC']; spraak.cc_ver = os.popen(os.path.normpath(os.path.join(spraak.cc_path,env['CC']))+" --version","r").readline().split()[-1].split("."); env.Replace(CCVERSION=".".join(spraak.cc_ver)); # Optimization & configuration flags # Note: Currently -pic (or -fpie) MUST be specified when compiling programs. # If not, the programs will duplicate the public variables from the libraries # resulting in the program routines using one version of the public variables # and the library routines using another version. cflags_opt = { "* * gcc * release" : "-O2", "* * gcc * debug" : "-O2 -g", "* * gcc * regression" : "-O2 -g -DSPR_REGRESSION=1", "* * gcc * profile" : "-O2 -pg", "* * gcc * coverage" : "-O1 -ftest-coverage -fprofile-arcs", "* x86_64 gcc 3 release" : "-O3 -fno-keep-static-consts -fprefetch-loop-arrays -maccumulate-outgoing-args -ffast-math -mieee-fp -funroll-loops -fpeel-loops -funswitch-loops -finline-limit=250 --param max-unroll-times=4", "* x86_64 gcc * debug" : "-g -O3 -fprefetch-loop-arrays -maccumulate-outgoing-args -ffast-math -mieee-fp -funroll-loops -fpeel-loops -funswitch-loops -finline-limit=250 --param max-unroll-times=4", "* x86_64 gcc * profile" : "-pg -g -O3 -fprefetch-loop-arrays -maccumulate-outgoing-args -ffast-math -mieee-fp -funroll-loops -fpeel-loops -funswitch-loops -finline-limit=250 --param max-unroll-times=4", "* i686 gcc * release" : "-O3 -fno-keep-static-consts -maccumulate-outgoing-args -fstrict-aliasing -fomit-frame-pointer -ffast-math -mieee-fp -funroll-loops -fpeel-loops -funswitch-loops -finline-limit=250 --param max-unroll-times=4", "* i686 gcc * debug" : "-g -O3 -maccumulate-outgoing-args -fstrict-aliasing -fomit-frame-pointer -ffast-math -mieee-fp -funroll-loops -fpeel-loops -funswitch-loops -finline-limit=250 --param max-unroll-times=4", "* i686 gcc * profile" : "-pg -g -O3 -maccumulate-outgoing-args -fstrict-aliasing -ffast-math -mieee-fp -funroll-loops -fpeel-loops -funswitch-loops -finline-limit=250 --param max-unroll-times=4" } cflags_cfg = { "* * * * *" : "", "* x86_64 gcc 3 *" : "-std=gnu99 -fpic -m64 -mcmodel=small -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char -fasynchronous-unwind-tables", "* x86_64 gcc 4 *" : "-std=gnu99 -fpic -march=opteron -mtune=native -m64 -mcmodel=small -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char -fasynchronous-unwind-tables", "* i686 gcc 3 *" : "-std=gnu99 -fpic -march=pentium3 -mcpu=pentium4 -malign-double -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char -fasynchronous-unwind-tables", "* i686 gcc 4 *" : "-std=gnu99 -fpic -march=pentium4 -mtune=core2 -malign-double -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char -fasynchronous-unwind-tables", "Darwin x86_64 gcc 4 *" : "-m64 -std=gnu99 -fpic -mtune=native -mcmodel=small -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char -fasynchronous-unwind-tables", "Darwin i686 gcc 4 *" : "-m32 -std=gnu99 -fpic -mtune=native -malign-double -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char -fasynchronous-unwind-tables", "Cygwin i686 gcc 3 *" : "-std=gnu99 -fpic -mno-cygwin -march=pentium3 -mcpu=pentium4 -malign-double -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char", "Cygwin i686 gcc 4 *" : "-std=gnu99 -fpic -mno-cygwin -march=pentium3 -mtune=pentium4 -malign-double -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char" } cflags_shobj ={ "* * * * *" : "-DSPR_INCL_LVL=-1", "* * gcc * *" : "-DSPR_INCL_LVL=-1 -fvisibility=hidden -fno-common", "Darwin * gcc * *" : "-DSPR_INCL_LVL=-1 -fno-common", "Cygwin * gcc * *" : "-DSPR_INCL_LVL=-1 -fno-common" } spraak.cflags = select_best_match(cflags_cfg,spraak)+' '+select_best_match(cflags_opt,spraak) spraak.shcflags = select_best_match(cflags_shobj,spraak)+' '+spraak.cflags # extra libraries (and header files) for the linker (and compiler) ldlibs = { "Linux * * * *" : "m dl rt pthread", "* * gcc * *" : "m dl rt pthread", "Linux * gcc * *" : "m dl rt pthread gcc_s", "Darwin * gcc * *" : "m dl pthread", "Cygwin * gcc * *" : "m rt pthreadGC2 wsock32 winmm" } ldpath = { "* * * * *" : "", "Cygwin * gcc * *" : "extern/lib" } ldcpph = { "* * * * *" : "", "Cygwin * gcc * *" : "extern/include" } spraak.ldlibs = select_best_match(ldlibs,spraak) spraak.ldpath = select_best_match(ldpath,spraak) spraak.ldcpph = select_best_match(ldcpph,spraak) # other linker flags for linking programs / making the dynamic library ldflags = { "Linux * gcc * *" : "-rdynamic -Wl,-z,origin -Wl,-rpath,'$$ORIGIN/lib'", "Linux * gcc * profile" : "-pg -rdynamic -Wl,-z,origin -Wl,-rpath,'$$ORIGIN/lib'", "Linux * gcc * coverage": "-ftest-coverage -fprofile-arcs -rdynamic -Wl,-z,origin -Wl,-rpath,'$$ORIGIN/lib'", "Darwin x86_64 gcc * *" : "-m64 -rdynamic", "Darwin i686 gcc * *" : "-m32 -rdynamic", "* * gcc * profile" : "-pg", "* * gcc * coverage" : "-ftest-coverage -fprofile-arcs", "Cygwin * gcc * *" : "-mno-cygwin -Wl,--disable-stdcall-fixup -Wl,--enable-auto-import", "* * * * *" : "" } spraak.ldflags = select_best_match(ldflags,spraak) ldlibflags = { "* * * * *" : "", "Linux * gcc * *" : "-rdynamic -Wl,-O -Wl,--enable-new-dtags -Wl,--as-needed", "Darwin x86_64 gcc * *" : "-m64 -dynamiclib -install_name @executable_path/lib/libspraak.dylib", "Darwin i686 gcc * *" : "-m32 -dynamiclib -install_name @executable_path/lib/libspraak.dylib", "Cygwin * gcc * *" : "-mno-cygwin -Wl,--out-implib=spraak.dll.a -Wl,--enable-stdcall-fixup -Wl,--export-all-symbols -Wl,--enable-auto-import" } spraak.ldlibflags = select_best_match(ldlibflags,spraak) # C preprocessor flags, e.g. -I<include_dir> if you have headers in a nonstandard directory <include_dir> cppflags = { "* * * * *" : "", "* * gcc * *" : "-D_linux_86_ -D_XOPEN_SOURCE=600 -D_ISOC99_SOURCE -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_REENTRANT", "Darwin * gcc * *" : "-D_mac_osx_ -D_XOPEN_SOURCE=600 -D_ISOC99_SOURCE -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_REENTRANT", "Cygwin * gcc * *" : "-D_cygwin_ -D_XOPEN_SOURCE=600 -D_ISOC99_SOURCE -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_REENTRANT" } spraak.cppflags = select_best_match(cppflags,spraak) ## ## bugs that escape auto-detection (see the bug-specific comments below) ## bugs = { "* * * * *" : [], "* x86_64 * * *" : ["x86_twin48_18"] } for bug in select_best_match(bugs,spraak): setattr(spraak,'sysdep_bug_'+bug,True); # x86_twin48_18, x86_no_twin64 # Most 64 bits processors with the x86 instruction set made by AMD lack # support for twin 64bits atomic operations (load, store, cas). # A work-around is using single (no twin) 64bits atomic operations and # subdividing the 64bits value in 48 bits for a pointer and 18 bits for a # transaction counter (the missing 2 bits result from a required 4byte # pointer alignment). This is however not 100% safe (only 18 bits for the # transaction counter, so overrun problems cannot be ruled out). # The end result is that users have the following options: # - if you only use Intel core2 or AMD barcelona/phenom processors, then # all is OK (there are no bugs). # - else, if you can live with the extremely small chance that a # multi-threaded application goes wrong (requires a heavily over-loaded # computer), then specify the 'x86_twin48_18' option # - else, specify the 'x86_no_twin64' option (will result in locked instead # of lock-free implementations of some core routines). ## ## other bugs & features ## # uncomment those that are relevant to your setup # - sphere_shortpack_v0: used in the very first releases of wsj0, outmoded spraak.sysdep_bug_sphere_shortpack_v0 = True;
1.6.3