Installation Guide

Requirements

SPRAAK has the following requirements:

A C99 compliant compiler
Note:
None of the current compilers (gcc, visual-C, ...) are 100% C99 compliant. However, since SPRAAK avoids the more 'difficult' C99 constructs, this is not a problem.
SPRAAK works optimally in combination with gcc (even on Windows, see Python on Windows below on how to install gcc on a Windows system).
Python (version 2.5 or higher)
  • For Windows, we advise the Cygwin environment. This environment contains Python, gcc and a fair amount of other libraries and utilities that are useful to SPRAAK (see the optional list below). The easiest way to install Cygwin is by following the 'Install Cygwin now' quick link located at the top right of the http://www.cygwin.com/ page.
    If you only want Python and non of the additional functionality provided by Cygwin, the 'Windows installer' quick link on the http://www.python.org/ page can be used.
  • For Linux system, we advise to use the Python package that commes with the distribution (given this is at least Python2.5).
  • Alternatively, one can download the Python binaries and/or source files from http://www.python.org/
Scons (version 0.98 or higher)
Scons can be donwloaded from http://www.scons.org. Choose the appropriate download type in the right column. If you choose the tarball, you can proceed as follows:
tar zxvf scons-0.98.5.tar.gz
cd scons-0.98.5
Read the README file. It explains the installation procedure and prerequisites.
As a quickstart, you can type
python setup.py install --prefix=$SPRAAK_BASE_DIR/scons_base
to install scons in the directory referred to above as $SPRAAK_BASE_DIR/scons_base
Posix thread library
The Posix thread library is available by default on all unix systems. Hopwever, to make a build on Windows using the Cygwin package an extra library will be required for the multi-threading support of the package. Without this library it would not be possible to create executables that are independent of the Cygwin dll's.
To download the latest version of the libary visit http://sourceware.org/pthreads-win32/ . If you just need the binary files and the include files you can get them from ftp://sourceware.org/pub/pthreads-win32/pthreads-w32-2-8-0-release.exe .

For building the documentation (optional), SPRAAK also relies on:

Doxygen (1.5.x or higher)
Doxygen is used to create the documentation. For Linux systems, we advise to use the Doxygen package that commes with the distribution. Doxygen is available as a Cygwin package for those who are building the package on Windows. Alternatively, one can download Doxygen directly from http://www.doxygen.org
Graphviz (dot)
Doxygen cat use 'dot' to draw dependency graphs. If 'dot' cannot be found, the dependency graphs will be omitted from the documentation. For Linux system, we advise to use the Graphviz package that commes with the distribution. Alternatively, one can download Graphviz from http://www.graphviz.org/ At the time of writing no Cygwin package for Graphviz was available. It should be possible to manually build the Graphviz tools from source in Cygwin, but your mileage may vary.
pdflatex
Pdflatex can be used for creating the manuals in pdf format.

The functionality of SPRAAK can be further extended when the link libraries and header files for the following components are available:

python
When not only the Python excutable but also the Python header files and library are installed, Python functionality can be made available from with the SPRAAK system (e.g. a Python signal pre-processing block, a Python lattice post-processing block, ...).
zlib
This library allows reading and writing gzip-compressed files.
libbz2
This library allows reading and writing bzip2-compressed files.
gcrypt
This library allows reading and writing encrypted files.

These additional libraries are available in most Linux distribution and in the Cygwin environment for Windows. Whether or not they should be used, and any unusual installation directory can be specified in config.py.

Configuring & installing

SPRAAK uses 'scons' (a Python build-tool) the compile and install the software.

All user configurable settings (location of the different files, compiler options, ...) are grouped in the file 'config.py' in the root directory of the SPRAAK package. See the comments in config.py and the section Package structure for more info.

The default configuration of SPRAAK allows multiple versions (32 and 64 bits versions; different operating systems and/or processors; debug, release and profile versions; ...) to be installed and maintained in parallel on a single shared file system. Hence, the default directory structure differs somewhat from what is typical in packages ment for local installation only. See Package structure for more details.

To see the all possible build options, run

scons -h

Calling

scons install

will make a build using the default option (CONFIG=release, EXPORT=developer). This results in a library and executables that are aimed for final release (don't contain debug, profile, ... information). At the same time, the header files and the documentation are aimed at developers, i.e. they expose everything.

In order to be able to run the programs, users have to adjust their login scripts so that the PATH environment variable includes the location of the SPRAAK executables. On systems which do not support locating dynamic library files relative w.r.t. the executables, the LD_LIBRARY_PATH environment variable (or equivalent) must be adjusted to include the location of the SPRAAK library as well.

Testing the software

The examples directory contains a series of example experiments that may be used to test if the software behaves correctly. See spr_train_eval for more details on the different examples and the expected outcome.

Package structure

The source files for the SPRAAK package are organized as follow:

./src/lib/
The C-code that will be compiled to form the SPRAAK library. The documentation and some additional information for constructing headers files is embedded in the C-code.
./src/prog/
The C-code that will be compiled to form the SPRAAK executables. The documentation is embedded in the C-code.
./src/doc/
Free-standing documentation (i.e. documentation that has no 1-to-1 link to any of files in ./src/{lib,prog}).
./scripts/spr_parser.py
The parser to extract header files and documentation for the C source files and to create the object oriented glue code for each class.
./scripts/spr_*
High level Python scripts in SPRAAK framework.
./scripts/spraak/
A library of Python code, used by the SPRAAK scripts.
./SConstruct
The master build-file (using scons).
./config.py
Build-file containing all user configurable options.
./sysdep.py
Build-file containing the code to detect system dependent features, bugs, libraries, ...

Other files, related to but not an integrated part of SPRAAK, can be found in:

./src_ext/
Extra C-code for interfacing SPRAAK (library and/or file IO) with other environments such as Matlab.
./scripts/alien/
High level scripts that have been found to be useful in combination with SPRAAK

Upon building (using the default settings), the following directory structure will be created:

./build/
All items derived by the build process: header files, documentation in intermediate form, object files, libraries, ...
./build/include/<lvl>/spraak/
The include files with all items at export level <lvl> or higher.
./build/include/<lvl>/spraak_classdef/
The class specific definition derived by the parser with all items at export level <lvl> or higher.
./build/include/<arch>/spraak/
Architecture dependent include files.
./build/parse_tmp/
All information extracted by the parser (spr_parser.py) stored in raw format.
./build/doc_tmp/<lvl>/
The documentation in Doxygen format as extracted by the parser; contains only the items at export level <lvl> or higher.
./build/doc/<lvl>/html/
The documentation in html format as derived by Doxygen.
./build/doc/<lvl>/latex/
The documentation in pdflatex format as derived by Doxygen.
./build/<arch>/obj/
Object files created during the build process.
./build/<arch>/lib/
The library file.
./build/<arch>/bin/
The executables.
./export
All installed items (scripts, executables, libraries, header files and documentation).
Note:
The default setup expect that the package maintainer only installs (scons install) the desired export level <lvl> for his environment. If one wants to install multiple export levels (or make sure only a given export level can be installed), the file config.py must be adjusted!
./export/bin/spr_*
The (Python) scripts.
./export/bin/spraak/
The Python library used by the scripts.
./export/bin/<arch>/
The executables.
./export/lib/<arch>
The library file.
./export/include/
Include files.
./export/doc/html/
Documentation in html format.

The export level <lvl> can have the following values:

pub_hi
header files, documentation, library aimed at users.
pub_lo
header files, documentation, library aimed at programmers.
devel
header files, documentation, library aimed at developers.

The <arch> infix combines information about the operating system, CPU and configuration (debug, release, ...) into a single name.

In order to facilitate package management, SPRAAK also allows for setting symbolic links from the 'export' directories to some final destination directories (e.g. /usr/bin, /usr/lib and /usr/include). This way, all files from the SPRAAK package can be kept in a single directory while users still can have access to the full functionality without having the expand their PATH, LD_LIBRARY_PATH, ...). Furthermore, when SPRAAK is removed, upgraded, ... the worst that can happen is some dangling links in the /usr/bin, /usr/lib, and /usr/include/ directories. See the comments in config.py for more info.

config.py

# This file defines all user configurable options.
# Since all options are defined using standard python code, platform specific
# code, complex customization and even cross-compiling should be possible.

# Some usefull variables which are set when calling this code:
#  - spraak.os                  operating system (linux, windows, osx, ...)
#  - spraak.cpu                 cpu family (x86_64, i686, power, ...)
#  - spraak.cc                  default compiler for this platform (gcc, icc, cc, ...)
#  - spraak.cc_ver              version number of the default compiler [major minor ...]
#  - spraak.config              string describing the requested configuration, e.g. "release", "debug", "regression", "profile", ...
#  - spraak.arch                string describing the target architecture+config
#                               set to spraak.os+'_'+spraak.cpu+('' if(spraak.config=="release") else '_'+spraak.config)
# Note: this configuration script is allowed to change the above
# variables (e.g. forcing 32bits executables on a 64bit platform) but
# incorrect/inconsistent values are likely to lead to unexpected behavior


##
## destinatination directories for the final install
##
# Windows is always a bit different :-)
not_windows = (spraak.os!="Cygwin")
# base installation directory
spraak.prefix                   = '.'
# directory for architecture independent executables (scripts)
spraak.bindir                   = spraak.prefix+'/bin'
# directory for architecture dependent executables
spraak.bindir_arch              = (spraak.prefix+'/bin/'+spraak.arch if(not_windows) else spraak.bindir)
# directory for architecture independent libraries (scripts)
spraak.libdir                   = (spraak.prefix+'/lib' if(not_windows) else spraak.bindir)
# directory for architecture dependent libraries; must be identical to 'bindir_arch' on some platforms (e.g. Windows)
spraak.libdir_arch              = (spraak.prefix+'/lib/'+spraak.arch if(not_windows) else spraak.bindir)
# directory for include files
spraak.includedir               = spraak.prefix+'/include'
# documentation directories (two formats)
spraak.docdir_html              = spraak.prefix+'/doc/html'
spraak.docdir_pdf               = spraak.prefix+'/doc/pdf'
# read-only architecture-independent data
spraak.datadir                  = spraak.prefix+'/data'
# In order to keep all files form spraak in a single subdirectory, the build
# process allows setting symbolic links from the default system bin, lib and
# include directories to the package specific bin(_arch), lib(_arch) and
# include directories specified above.
# Set to None if not wanted.
spraak.bindir_link              = None
spraak.libdir_link              = None
spraak.includedir_link          = None
# some file systems do not support symbolic links, and hence the files must by copied
spraak.symlink_support = not_windows


##
## build directory (the build sub-directory structure is not configurable)
##
spraak.builddir = spraak.prefix+'/build'


##
## configurable options
##

# typical number of threads (CPU cores available)
# 0 : compile single threaded (no support for multi-threading at all)
# 1 : optimize for single threaded operation but allow multiple threads
# N : optimize for N concurrent threads
spraak.Nthreads = 2


##
## external libraries that provide additional functionality
##

# possible values:
#  - None               : functionality is not desired
#  - "auto"             : automatic scan for the library and/or headers
#  - pull/path/name     : exact location (directory) of the header files(s)
#  - pull/path/name/lib : exact location (file without extension and without 'lib' prefix) of the library

# python: use python functionality in the C-code such as python pre-processing
#         or lattice processing modules (NIY)
spraak.python   = None
spraak.python_h = None

# zlib: handling gzipped files
spraak.zlib   = "auto"
spraak.zlib_h = "auto"

# bzlib: handling bzip2-compressed files (NIY)
spraak.bzlib   = None
spraak.bzlib_h = None

# flac: handling flac-compressed audio (NIY)
spraak.flac   = None
spraak.flac_h = None

# gcrypt: gnu privacy guard cryptography -- strong encryption of data (NIY)
spraak.gcrypt   = None
spraak.gcrypt_h = None

# arpack: handling larger sparse (or structured) eigen/singular value problems (NIY)
spraak.arpack   = "auto"
spraak.arpack_h = "auto"

# readline: read lines from the terminal with editing (NIY)
spraak.readline   = None
spraak.readline_h = None


##
## compiler & linker flags
##

def select_best_match(opt_dict,spraak):
        # the compiler options are typically architecture dependent and hence are best
        # specified using a dictionary having as key a string containing the following
        # fields separated with a single space:
        #    spraak.os spraak.cpu spraak.cc spraak.cc_ver[0] spraak.config
        # Don't cares are indicated with '*'.
        sel = [spraak.os,spraak.cpu,spraak.cc,str(spraak.cc_ver[0]),spraak.config];
        best_opt = None;
        best_cnt = 9;
        for opt_str,opt_val in opt_dict.iteritems():
                cnt = 6;
                for ndx,val in enumerate(opt_str.split()):
                        if(val == sel[ndx]):
                                cnt -= 1;
                        elif(val != "*"):
                                opt_val = None;
                if((opt_val!=None) and (cnt<best_cnt)):
                        best_opt = opt_val;
                        best_cnt = cnt;
        if(best_opt == None):
                print "ERROR No match found for %s %s %s %s %s"%(spraak.os,spraak.cpu,spraak.cc,str(spraak.cc_ver[0]),spraak.config)
                spraak.signal_error = 1;
        return(best_opt);

# path to a special version of the compiler, e.g. an experimental branch of the gcc compiler
if(os.path.isdir("/freeware/bin/gnu-tools")):
        env.Replace(CC="gcc.unsupported");
        spraak.cc_path = '/freeware/bin/gnu-tools';
        spraak.cc_prog = env['CC'];
        spraak.cc_ver  = os.popen(os.path.normpath(os.path.join(spraak.cc_path,env['CC']))+" --version","r").readline().split()[-1].split(".");
        env.Replace(CCVERSION=".".join(spraak.cc_ver));

# Optimization & configuration flags
# Note: Currently -pic (or -fpie) MUST be specified when compiling programs.
# If not, the programs will duplicate the public variables from the libraries
# resulting in the program routines using one version of the public variables
# and the library routines using another version.

cflags_opt = {  "* * gcc * release"             : "-O2",
                "* * gcc * debug"               : "-O2 -g",
                "* * gcc * regression"          : "-O2 -g -DSPR_REGRESSION=1",
                "* * gcc * profile"             : "-O2 -pg",
                "* * gcc * coverage"            : "-O1 -ftest-coverage -fprofile-arcs",
                "* x86_64 gcc 3 release"        : "-O3 -fno-keep-static-consts -fprefetch-loop-arrays -maccumulate-outgoing-args -ffast-math -mieee-fp -funroll-loops -fpeel-loops -funswitch-loops -finline-limit=250 --param max-unroll-times=4",
                "* x86_64 gcc * debug"          : "-g -O3 -fprefetch-loop-arrays -maccumulate-outgoing-args -ffast-math -mieee-fp -funroll-loops -fpeel-loops -funswitch-loops -finline-limit=250 --param max-unroll-times=4",
                "* x86_64 gcc * profile"        : "-pg -g -O3 -fprefetch-loop-arrays -maccumulate-outgoing-args -ffast-math -mieee-fp -funroll-loops -fpeel-loops -funswitch-loops -finline-limit=250 --param max-unroll-times=4",
                "* i686 gcc * release"          : "-O3 -fno-keep-static-consts -maccumulate-outgoing-args -fstrict-aliasing -fomit-frame-pointer -ffast-math -mieee-fp -funroll-loops -fpeel-loops -funswitch-loops -finline-limit=250 --param max-unroll-times=4",
                "* i686 gcc * debug"            : "-g -O3 -maccumulate-outgoing-args -fstrict-aliasing -fomit-frame-pointer -ffast-math -mieee-fp -funroll-loops -fpeel-loops -funswitch-loops -finline-limit=250 --param max-unroll-times=4",
                "* i686 gcc * profile"          : "-pg -g -O3 -maccumulate-outgoing-args -fstrict-aliasing -ffast-math -mieee-fp -funroll-loops -fpeel-loops -funswitch-loops -finline-limit=250 --param max-unroll-times=4"
             }
cflags_cfg = {  "* * * * *"             : "",
                "* x86_64 gcc 3 *"      : "-std=gnu99 -fpic -m64 -mcmodel=small -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char -fasynchronous-unwind-tables",
                "* x86_64 gcc 4 *"      : "-std=gnu99 -fpic -march=opteron  -mtune=native -m64 -mcmodel=small -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char -fasynchronous-unwind-tables",
                "* i686 gcc 3 *"        : "-std=gnu99 -fpic -march=pentium3 -mcpu=pentium4 -malign-double -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char -fasynchronous-unwind-tables",
                "* i686 gcc 4 *"        : "-std=gnu99 -fpic -march=pentium4 -mtune=core2 -malign-double -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char -fasynchronous-unwind-tables",
                "Darwin x86_64 gcc 4 *" : "-m64 -std=gnu99 -fpic -mtune=native -mcmodel=small -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char -fasynchronous-unwind-tables",
                "Darwin i686 gcc 4 *"   : "-m32 -std=gnu99 -fpic -mtune=native -malign-double -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char -fasynchronous-unwind-tables",
                "Cygwin i686 gcc 3 *"   : "-std=gnu99 -fpic -mno-cygwin -march=pentium3 -mcpu=pentium4 -malign-double -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char",
                "Cygwin i686 gcc 4 *"   : "-std=gnu99 -fpic -mno-cygwin -march=pentium3 -mtune=pentium4 -malign-double -W -Wall -Wno-uninitialized -Wno-unused-value -fsigned-char"
             }
cflags_shobj ={ "* * * * *"             : "-DSPR_INCL_LVL=-1",
                "* * gcc * *"           : "-DSPR_INCL_LVL=-1 -fvisibility=hidden -fno-common",
                "Darwin * gcc * *"      : "-DSPR_INCL_LVL=-1 -fno-common",
                "Cygwin * gcc * *"      : "-DSPR_INCL_LVL=-1 -fno-common"
             }
spraak.cflags  = select_best_match(cflags_cfg,spraak)+' '+select_best_match(cflags_opt,spraak)
spraak.shcflags = select_best_match(cflags_shobj,spraak)+' '+spraak.cflags

# extra libraries (and header files) for the linker (and compiler)
ldlibs = {      "Linux * * * *"         : "m dl rt pthread",
                "* * gcc * *"           : "m dl rt pthread",
                "Linux * gcc * *"       : "m dl rt pthread gcc_s",
                "Darwin * gcc * *"      : "m dl pthread",
                "Cygwin * gcc * *"      : "m rt pthreadGC2 wsock32 winmm"
         }
ldpath = {      "* * * * *"             : "",
                "Cygwin * gcc * *"      : "extern/lib"
         }
ldcpph = {      "* * * * *"             : "",
                "Cygwin * gcc * *"      : "extern/include"
         }
spraak.ldlibs = select_best_match(ldlibs,spraak)
spraak.ldpath = select_best_match(ldpath,spraak)
spraak.ldcpph = select_best_match(ldcpph,spraak)

# other linker flags for linking programs / making the dynamic library
ldflags = {     "Linux * gcc * *"       : "-rdynamic -Wl,-z,origin -Wl,-rpath,'$$ORIGIN/lib'",
                "Linux * gcc * profile" : "-pg -rdynamic -Wl,-z,origin -Wl,-rpath,'$$ORIGIN/lib'",
                "Linux * gcc * coverage": "-ftest-coverage -fprofile-arcs -rdynamic -Wl,-z,origin -Wl,-rpath,'$$ORIGIN/lib'",
                "Darwin x86_64 gcc * *" : "-m64 -rdynamic",
                "Darwin i686 gcc * *"   : "-m32 -rdynamic",
                "* * gcc * profile"     : "-pg",
                "* * gcc * coverage"    : "-ftest-coverage -fprofile-arcs",
                "Cygwin * gcc * *"      : "-mno-cygwin -Wl,--disable-stdcall-fixup -Wl,--enable-auto-import",
                "* * * * *"             : ""
          }
spraak.ldflags = select_best_match(ldflags,spraak)
ldlibflags = {  "* * * * *"             : "",
                "Linux * gcc * *"       : "-rdynamic -Wl,-O -Wl,--enable-new-dtags -Wl,--as-needed",
                "Darwin x86_64 gcc * *" : "-m64 -dynamiclib -install_name @executable_path/lib/libspraak.dylib",
                "Darwin i686 gcc * *"   : "-m32 -dynamiclib -install_name @executable_path/lib/libspraak.dylib",
                "Cygwin * gcc * *"      : "-mno-cygwin -Wl,--out-implib=spraak.dll.a -Wl,--enable-stdcall-fixup -Wl,--export-all-symbols -Wl,--enable-auto-import"
             }
spraak.ldlibflags = select_best_match(ldlibflags,spraak)

# C preprocessor flags, e.g. -I<include_dir> if you have headers in a nonstandard directory <include_dir>
cppflags = {    "* * * * *"             : "",
                "* * gcc * *"           : "-D_linux_86_ -D_XOPEN_SOURCE=600 -D_ISOC99_SOURCE -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_REENTRANT",
                "Darwin * gcc * *"      : "-D_mac_osx_ -D_XOPEN_SOURCE=600 -D_ISOC99_SOURCE -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_REENTRANT",
                "Cygwin * gcc * *"      : "-D_cygwin_ -D_XOPEN_SOURCE=600 -D_ISOC99_SOURCE -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_REENTRANT"
           }
spraak.cppflags = select_best_match(cppflags,spraak)


##
## bugs that escape auto-detection (see the bug-specific comments below)
##

bugs = {        "* * * * *"             : [],
                "* x86_64 * * *"        : ["x86_twin48_18"]
       }
for bug in select_best_match(bugs,spraak):
        setattr(spraak,'sysdep_bug_'+bug,True);
# x86_twin48_18, x86_no_twin64
#   Most 64 bits processors with the x86 instruction set made by AMD lack
#   support for twin 64bits atomic operations (load, store, cas).
#   A work-around is using single (no twin) 64bits atomic operations and
#   subdividing the 64bits value in 48 bits for a pointer and 18 bits for a
#   transaction counter (the missing 2 bits result from a required 4byte
#   pointer alignment). This is however not 100% safe (only 18 bits for the
#   transaction counter, so overrun problems cannot be ruled out).
#   The end result is that users have the following options:
#    - if you only use Intel core2 or AMD barcelona/phenom processors, then
#      all is OK (there are no bugs).
#    - else, if you can live with the extremely small chance that a
#      multi-threaded application goes wrong (requires a heavily over-loaded
#      computer), then specify the 'x86_twin48_18' option
#    - else, specify the 'x86_no_twin64' option (will result in locked instead
#      of lock-free implementations of some core routines).

##
## other bugs & features
##

# uncomment those that are relevant to your setup
#  - sphere_shortpack_v0: used in the very first releases of wsj0, outmoded

spraak.sysdep_bug_sphere_shortpack_v0 = True;
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Defines
Generated on Thu May 10 14:56:31 2012 for SPRAAK by  doxygen 1.6.3