NetWorkSpaces FAQ


General Questions

What can I do with NetWorkSpaces?

Here are some ideas:


What is the relationship between NetWorkSpaces and Sleigh?

Sleigh is the part of NetWorkSpaces that allows you to execute tasks in parallel. NetWorkSpaces also includes other classes and methods that are used for communcating between different scripts, for example.


How does NetWorkSpaces compare to Linda?

NetWorkSpaces is rather similar to Linda, but was generally designed to be simpler, and to work well with scripting languages in particular. The primary simplification in NetWorkSpaces is in the matching rules. Linda has powerful, but somewhat complex rules for tuple matching. NetWorkSpaces uses named variables that can have zero or more values. The only "matching" that NetWorkSpaces uses is on the name of the variable.

Also, NetWorkSpaces allows you to easily define the order of the values of a variable. In Linda, if a tuple query can match multiple tuples, the actual value returned is not defined, requiring the programmer to include an extra field or fields to define the order. Linda's matching rules are very powerful, but NetWorkSpaces makes it trivial to do simple things, like returning the values in "first-in first-out" order, for example.

NetWorkSpaces also tries to simplify running tasks in parallel by providing high-level methods for executing tasks: the Sleigh methods, eachElem and eachWorker. These methods make writing embarrassing parallel program trivial, often without having to modify your existing code at all. And yet, you can also use eachWorker in much the same way that you use eval in Linda, allowing you to write more sophisticated parallel programs in NetWorkSpaces, as well.


How long do NetWorkSpace operations take to execute?

The best way to find that out is to time it yourself on your network and your machines. There are two example programs that act as simple benchmarks. One passes a token around a ring of all of the workers (ring.py) and one passes data in a star pattern (pping.py). The operations are timed, and a per-operation time is printed at the end of the program.

(Probably ought to include some example results)


How do I pass data between different types of clients (Python to R, R to Matlab, etc)?

Use strings. Strings are sent to the NWS server as plain ascii text, and all clients can read them. That allows you to use XML or YAML to pass other data types between clients, since they can be encoded as strings.


Why do I get this message when I try to start the NWS server?

        Failed to load application: No module named web
        

You installed twisted, but not twisted-web. The NWS server needs both. See the INSTALL file for more information.


How do I install NetWorkSpaces for R?

The simplest way is to start R, and type in the command:

        > install.packages("nws")
        

On the Windows version, you can use the Packages > Install Package(s)... menu item to install NetWorkSpaces.

Just make sure that you do this from an account with the privileges to write to the R installation.

If you don't have permission to write to the directory, please see the answer to "I don't have permission to write to R-2.2.1\library directory"


I don't have permission to write to R-2.2.1\library directory

For Linux platforms

For Windows platforms

(Reference from R for Windows FAQ, http://cran.r-project.org/bin/windows/base/rw-FAQ.html)
  1. On the command line as name=value pairs. For example in the shortcut to RGui you could have
                "path_to_R\bin\Rgui.exe" HOME=p:/ R_LIBS=p:/myRlib
                
  2. In an environment file .Renviron in the working directory or your home directory, for example containing the line
                R_LIBS=p:/myRlib
                
  3. Set R_LIBS environment variable. In Windows 9x you can set environment variable in autoexec.bat or in an MS-DOS window from which you launch Rgui / Rterm. Under Windows NT/2000/XP/2003 you can use the control panel or the properties of `My Computer'. Under Windows ME you can use the System Configuration Utility (under Programs, Accessories, System Tools on the Start menu). You may have to log out or reboot for such changes to take effect.

NetWorkSpaces Server

How do I start the NetWorkSpaces Server on a different port?

The simplest way is to set the NWS_SERVER_PORT environment variable when you start the server.


What environment variables are used to configure the server?

NWS_SERVER_PORT, NWS_WEB_PORT, NWS_INTERFACE, NWS_WEB_SERVED_DIR, and NWS_TMP_DIR.


What is this nws.tac file?

It is a Twisted Application Configuration file, that specifies to the Twisted Framework how to construct an application. Twisted considers this to be a configuration file, although TAC files are actually Python code. To configure the NetWorkSpaces server, you could edit nws.tac if you wish, but environment variables may be preferred by many.


How can I start the NetWorkSpaces server when my machine starts up?

On Posix systems, install the nws script as an init script.

On Windows systems, install the NWS Service, using the NwsService.py script.


Sleigh Programming

How do I initialize my workers once before executing many tasks on them?

Use eachWorker. Give it a function that does the initialization. After eachWorker returns, you can execute eachElem, knowing that every worker has executed your initialization function.


How do I get the workers to import a module?

Execute an "eval" task using eachWorker. In Python, it would look like:

        >>> s.eachWorker("import time")
        

How do I access the arguments from an "eval" task?

Use the global variable "SleighArgs". For example:

        >>> s.eachElem("'argument: ' + str(SleighArgs[0])", range(4))
        

Does the Sleigh worker have an ID of some kind?

Yes. There is a global variable called SleighRank that is available to your task function.

        >>> s.eachWorker("'My rank is: %d' % SleighRank")
        

How does the Sleigh worker determine the total number of workers in the Sleigh?

That isn't usually needed, but can be important for MPI-style parallel programs. One method is to fetch the nodeList variable, split it into a list by whitespace, and get the size of the resulting list. But it's much easier to simply pass the number of workers as one of the arguments to the worker function, since the master process knows (or can easily find out) the number of workers associated with a Sleigh.


Starting Workers

How do I use rsh to start Sleigh workers, rather than ssh?

Please refer to the user manual's "Getting Started" chapter to setup rsh server on Windows. Once that's done, you specify an appropriate launch function, using the "launch" argument to the Sleigh constructor.

In R:

        > s = sleigh(launch=rshcmd)
        

In Python:

        >>> from nws.sleigh import rshcmd
        >>> s = Sleigh(launch=rshcmd)
        

How do I start Sleigh workers that are outside my firewall?

Use the sshforwardcmd launch function.

(More details, please)


How do I run Sleigh programs using PBS (or LSF)?

Just submit your Sleigh program using qsub (or bsub), specifying the number of nodes to use. When the batch queueing system runs your script, it tells you what nodes it has allocated for you to run on using an environment variable (PBS_NODEFILE for PBS, LSB_HOSTS for LSF). Use that information to compute the appropriate nodeList when constructing your Sleigh. See the batchqueueing.py example program for an example of this technique.


How do I run Sleigh workers on Windows?

You can use rsh, ssh, or the "web launch" method to start Sleigh workers. To use rsh or ssh to start Sleigh workers, you have to run an rsh or ssh server on each of the worker machines. However, we don't really recommend using ssh to start Sleigh workers, since different quoting style used by UNIX-like ssh server such as cygwin or copSSH and Windows platform can cause conflicts that are hard to debug.


Can I run Sleigh master on a Windows desktop/laptop and Sleigh workers on a Linux cluster?

Yes, you can. You need to have SSH client installed on the Windows machine, and setup up password-less login to Linux cluster. The instructions on setting up password-less login is available in User Guide's Getting Started Chapter, or see How do I stop ssh from asking me for my password?. Once these two steps are done, you can simply create sleigh using ssh launch method. For example, in R:

        > s = sleigh(launch=sshcmd, nodeList=c('linux1'), scriptName='RNWSSleighWorker.sh',
        +     scriptDir='/usr/local/lib64/R/library/nws/bin', scriptExec=envcmd, workingDir='/home/user', rprog='R')
        

How do I set the working directory that my Sleigh worker should use?

Use the workingDir argument to the Sleigh constructor:

        >>> s = Sleigh(workingDir='/tmp')
        

Currently, there isn't a simple method of setting the working directory differently on different workers.


How do I get the module that defines my execution function in my PYTHONPATH for all of my Sleigh workers?

Use the modulePath argument to the Sleigh constructor, or set PYTHONPATH in your shell startup script on the worker machines (which is pretty simple if you have a common, NFS-mounted home directory on each of the worker machines).


Sleigh Troubleshooting

Why is my Sleigh program hanging?


How do I stop ssh from asking me for my password?

Setup password-less ssh login.

The following shows one way to setup password-less ssh login.

  1. ssh-keygen -t rsa
  2. cd ~/.ssh (.ssh directory is located in your HOME directory)
  3. cp id_rsa.pub authorized_keys This step allows password-less login to local machine.
  4. For all remote machines that you want password-less login, append the content of id_rsa.pub to their authorized_keys file.

To test the password-less login, type the following command:

        % ssh hostname date
        

If everything is setup correctly, you should not be asked for password and the current date on remote machine will be returned.


Why do I get a SleighOccupiedException when executing eachElem (or eachWorker) when using non-blocking mode?

Currently, you cannot execute more than one eachElem or eachWorker job at the same time on the same Sleigh object, even using non-blocking mode. Non-blocking mode is only intended to allow the script to perform other operations while a long running job executes, not to allow multiple jobs. To execute multiple jobs concurrently, you must create multiple Sleigh objects.


How do I get debug messages for my Sleigh program?

Set the verbose argument to true when constructing your Sleigh. See Where are the debug/log files created for the Sleigh workers? for more information.

        >>> s = Sleigh(verbose=True)
        

Where are the debug/log files created for the Sleigh workers?

If the verbose argument is set to true in the Sleigh constructor, the workers will create log files in the directory specified by the logDir argument. If logDir is not set, it defaults to a system specific temporary directory. On Posix systems, this is /tmp, but on Windows, the easiest thing to do is to look at the "worker info" variable in the sleigh workspace using the web interface, which includes among other things the full path of the log file for each worker.

Actually, the log messages for each of the workers are also put into the "logDebug" variable in the sleigh workspace, so you can view them directly from the web interface (even if the babelfish isn't running). Also, error messages are put in the "logError" variable, even if verbose is false.

        >>> s = Sleigh(verbose=True, logDir="/home/joe/tmp")