NetWorkSpaces FAQ
General Questions
What can I do with NetWorkSpaces?
Here are some ideas:
- Perform administrative tasks on a cluster
- Run a large set of unit tests in parallel
- Monitor activity on a cluster or set of workstations
- Perform a statistical study using Monte Carlo techniques
- Write a simple chat/talk program for use on your network
- Prototype MPI programs
- Write simple client/server program between Windows and Unix machines
- Allow scripts to cooperatively lock resources, rather than using traditional lock files
What is the relationship between NetWorkSpaces and Sleigh?
Sleigh is the part of NetWorkSpaces that allows you to execute
tasks in parallel. NetWorkSpaces also includes other classes and methods
that are used for communcating between different scripts, for example.
How does NetWorkSpaces compare to Linda?
NetWorkSpaces is rather similar to Linda, but was generally
designed to be simpler, and to work well with scripting
languages in particular.
The primary simplification in NetWorkSpaces is in the matching rules.
Linda has powerful, but somewhat complex rules for tuple matching.
NetWorkSpaces uses named variables that can have zero or more values.
The only "matching" that NetWorkSpaces uses is on the name of the variable.
Also, NetWorkSpaces allows you to easily define the order of the values of
a variable. In Linda, if a tuple query can match multiple tuples, the
actual value returned is not defined, requiring the programmer to include
an extra field or fields to define the order.
Linda's matching rules are very powerful, but NetWorkSpaces
makes it trivial to do simple things, like returning the values in "first-in first-out"
order, for example.
NetWorkSpaces also tries to simplify running tasks in parallel by
providing high-level methods for executing tasks: the Sleigh methods,
eachElem and eachWorker. These methods make writing embarrassing
parallel program trivial, often without having to modify your existing
code at all. And yet, you can also use eachWorker in much the same way
that you use eval in Linda, allowing you to write more sophisticated
parallel programs in NetWorkSpaces, as well.
How long do NetWorkSpace operations take to execute?
The best way to find that out is to time it yourself on your network and
your machines.
There are two example programs that act as simple benchmarks. One passes
a token around a ring of all of the workers (ring.py) and one passes data
in a star pattern (pping.py). The operations are timed, and a per-operation
time is printed at the end of the program.
(Probably ought to include some example results)
How do I pass data between different types of clients (Python to R, R to Matlab, etc)?
Use strings. Strings are sent to the NWS server as plain ascii text,
and all clients can read them. That allows you to use XML or YAML to pass
other data types between clients, since they can be encoded as strings.
Why do I get this message when I try to start the NWS server?
Failed to load application: No module named web
You installed twisted, but not twisted-web. The NWS server needs both.
See the INSTALL file for more information.
How do I install NetWorkSpaces for R?
The simplest way is to start R, and type in the command:
> install.packages("nws")
On the Windows version, you can use the Packages > Install Package(s)...
menu item to install NetWorkSpaces.
Just make sure that you do this from an account with the privileges to write
to the R installation.
If you don't have permission to write to the directory, please see the answer to
"I don't have permission to write to R-2.2.1\library directory"
I don't have permission to write to R-2.2.1\library directory
For Linux platforms
For Windows platforms
(Reference from R for Windows FAQ, http://cran.r-project.org/bin/windows/base/rw-FAQ.html)
- On the command line as name=value pairs. For example in the shortcut to RGui you could have
"path_to_R\bin\Rgui.exe" HOME=p:/ R_LIBS=p:/myRlib
- In an environment file .Renviron in the working directory or your home directory, for example containing the line
R_LIBS=p:/myRlib
- Set R_LIBS environment variable. In Windows 9x you can set environment variable
in autoexec.bat or in an MS-DOS window from which you launch Rgui / Rterm.
Under Windows NT/2000/XP/2003 you can use the control panel or the properties
of `My Computer'. Under Windows ME you can use the System Configuration Utility
(under Programs, Accessories, System Tools on the Start menu). You may have to
log out or reboot for such changes to take effect.
NetWorkSpaces Server
How do I start the NetWorkSpaces Server on a different port?
The simplest way is to set the NWS_SERVER_PORT environment variable when you
start the server.
What environment variables are used to configure the server?
NWS_SERVER_PORT, NWS_WEB_PORT, NWS_INTERFACE, NWS_WEB_SERVED_DIR, and NWS_TMP_DIR.
What is this nws.tac file?
It is a Twisted Application Configuration file, that specifies to the Twisted
Framework how to construct an application. Twisted considers this to be a configuration
file, although TAC files are actually Python code. To configure the NetWorkSpaces
server, you could edit nws.tac if you wish, but environment variables may be preferred
by many.
How can I start the NetWorkSpaces server when my machine starts up?
On Posix systems, install the nws script as an init script.
On Windows systems, install the NWS Service, using the
NwsService.py script.
Sleigh Programming
How do I initialize my workers once before executing many tasks on them?
Use eachWorker. Give it a function that does the initialization.
After eachWorker returns, you can execute eachElem, knowing that every
worker has executed your initialization function.
How do I get the workers to import a module?
Execute an "eval" task using eachWorker. In Python, it would look like:
>>> s.eachWorker("import time")
How do I access the arguments from an "eval" task?
Use the global variable "SleighArgs". For example:
>>> s.eachElem("'argument: ' + str(SleighArgs[0])", range(4))
Does the Sleigh worker have an ID of some kind?
Yes. There is a global variable called SleighRank that is available
to your task function.
>>> s.eachWorker("'My rank is: %d' % SleighRank")
How does the Sleigh worker determine the total number of workers in the Sleigh?
That isn't usually needed, but can be important for MPI-style parallel programs.
One method is to fetch the nodeList variable, split it into a list by whitespace,
and get the size of the resulting list. But it's much easier to simply pass the
number of workers as one of the arguments to the worker function, since the
master process knows (or can easily find out) the number of workers associated
with a Sleigh.
Starting Workers
How do I use rsh to start Sleigh workers, rather than ssh?
Please refer to the user manual's "Getting Started" chapter to setup rsh server on Windows.
Once that's done, you specify an appropriate launch function, using the "launch" argument
to the Sleigh constructor.
In R:
> s = sleigh(launch=rshcmd)
In Python:
>>> from nws.sleigh import rshcmd
>>> s = Sleigh(launch=rshcmd)
How do I start Sleigh workers that are outside my firewall?
Use the sshforwardcmd launch function.
(More details, please)
How do I run Sleigh programs using PBS (or LSF)?
Just submit your Sleigh program using qsub (or bsub), specifying
the number of nodes to use. When the batch queueing system runs your
script, it tells you what nodes it has allocated for you to run on using
an environment variable (PBS_NODEFILE for PBS, LSB_HOSTS for LSF).
Use that information to compute the appropriate nodeList when constructing
your Sleigh. See the batchqueueing.py example program for an example of
this technique.
How do I run Sleigh workers on Windows?
You can use rsh, ssh, or the "web launch" method to start Sleigh workers.
To use rsh or ssh to start Sleigh workers, you have to run an rsh or ssh server on each of the worker machines.
However, we don't really recommend using ssh to start Sleigh workers, since different quoting style used
by UNIX-like ssh server such as cygwin or copSSH and Windows platform can cause conflicts that are hard to debug.
Can I run Sleigh master on a Windows desktop/laptop and Sleigh workers on a Linux cluster?
Yes, you can. You need to have SSH client installed on the Windows machine, and setup up
password-less login to Linux cluster. The instructions on setting up password-less login is available in
User Guide's Getting Started Chapter, or see How do I stop ssh from asking me for my password?.
Once these two steps are done, you can simply create sleigh using
ssh launch method. For example, in R:
> s = sleigh(launch=sshcmd, nodeList=c('linux1'), scriptName='RNWSSleighWorker.sh',
+ scriptDir='/usr/local/lib64/R/library/nws/bin', scriptExec=envcmd, workingDir='/home/user', rprog='R')
How do I set the working directory that my Sleigh worker should use?
Use the workingDir argument to the Sleigh constructor:
>>> s = Sleigh(workingDir='/tmp')
Currently, there isn't a simple method of setting the working directory
differently on different workers.
How do I get the module that defines my execution function in my PYTHONPATH for all of my Sleigh workers?
Use the modulePath argument to the Sleigh constructor, or set PYTHONPATH
in your shell startup script on the worker machines (which is pretty simple if
you have a common, NFS-mounted home directory on each of the worker machines).
Sleigh Troubleshooting
Why is my Sleigh program hanging?
- Remote workers do not have Python, Matlab, or R installed, or they are not in the PATH.
Instead of setting PATH, you can also use environment variable, PythonProg, MatlabProg, and RProg,
to indicate the location of Python, Matlab, and R respectively.
- ssh command cannot find the worker script on remote nodes.
How do I stop ssh from asking me for my password?
Setup password-less ssh login.
The following shows one way to setup password-less ssh login.
- ssh-keygen -t rsa
- cd ~/.ssh (.ssh directory is located in your HOME directory)
- cp id_rsa.pub authorized_keys This step allows password-less login to local machine.
- For all remote machines that you want password-less login, append the content of id_rsa.pub to their authorized_keys file.
To test the password-less login, type the following command:
% ssh hostname date
If everything is setup correctly, you should not be asked for password and the current date on remote machine will be returned.
Why do I get a SleighOccupiedException when executing eachElem (or eachWorker) when using non-blocking mode?
Currently, you cannot execute more than one eachElem or eachWorker job at the
same time on the same Sleigh object, even using non-blocking mode.
Non-blocking mode is only intended to allow the script to perform other operations
while a long running job executes, not to allow multiple jobs.
To execute multiple jobs concurrently, you must create multiple
Sleigh objects.
How do I get debug messages for my Sleigh program?
Set the verbose argument to true when constructing your Sleigh.
See Where are the debug/log files created for the Sleigh workers?
for more information.
>>> s = Sleigh(verbose=True)
Where are the debug/log files created for the Sleigh workers?
If the verbose argument is set to true in the Sleigh
constructor, the workers will create log files in the directory specified
by the logDir argument. If logDir is
not set, it defaults to a system specific temporary directory.
On Posix systems, this is /tmp, but on Windows, the easiest thing to do is to
look at the "worker info" variable in the sleigh workspace using the web
interface, which includes among other things the full path of the log file
for each worker.
Actually, the log messages for each of the workers are also put into the
"logDebug" variable in the sleigh workspace, so you can view them directly
from the web interface (even if the babelfish isn't running). Also, error
messages are put in the "logError" variable, even if verbose
is false.
>>> s = Sleigh(verbose=True, logDir="/home/joe/tmp")