Getting Started
This chapter provides step-by-step procedures for setting up NetWorkSpaces and Sleigh for R.
2.1 Prerequisites
NetWorkSpaces and Sleigh run on all Linux and Windows platforms.
To use them, you must install the following software:
-
R 2.1.1 or above
- NetWorkSpaces server http://nws-r.sourceforge.net
The NetWorkSpaces server also requires the following:
-
Python 2.4 http://www.python.org
(We recommend ActiveState Python http://www.activestate.com for Windows platform).
- Twisted Framework http://twistedmatrix.com
2.2 NetWorkSpaces Server
2.2.1 Starting the Server
There are three ways to start a NetWorkSpaces server:
-
Use the twistd command:
Open up a shell in UNIX, or open up a twisted command prompt on Windows, and type the following:
% twistd -y nws.tac
Note: nws.tac can reside in different directories, depending on
the platform and the type of installation (root versus non-root).
For root installation, nws.tac is located in /etc on UNIX,
and in the PYTHON24 directory on Windows.
- execute the nws script (UNIX only):
% nws start
- start Windows services (Windows only):
-
Open up a command prompt.
- Install NwsService by executing NwsService.py, which located in Python's scripts directory.
% python NwsService.py install
- Start NwsService
% python NwsService.py start
This starts a NetWorkSpaces server on localhost at port 8765.
2.2.2 Stopping the Server
There are also three ways to stop a NetWorkSpaces server:
2.3 NetWorkSpaces Client
2.3.1 Installing the Client
To install NetWorkSpaces source distribution on UNIX:
-
R CMD INSTALL nws_version.tar.gz
To install NetWorkSpaces on Windows XP:
-
For binary distribution (in a zip file format),
-
Start RGui.
- Select Packages Menu.
- Install package(s) form local zip file.
- Find nws_version.zip you have obtained.
- For source distribution,
-
Follow instructions on this website,
http://www.murdoch-sutherland.com/Rtools/
- R CMD INSTALL nws
2.3.2 Start NetWorkSpaces Client
Once you've got a NetWorkSpace server up and running, you're ready to use NetWorkSpaces.
-
Start up an R session.
- Type the following:
> ws = netWorkSpace(`R space')
> nwsStore(ws, `x', 1)
This step creates a workspace named `R space' and stores a variable x with value 1 to the workspace.
You can also view what's in the workspace using a web interface.
To do this, you point your browser to http://server_host_name:8766, where
server_host_name is the machine that a NetWorkSpaces server resides on.
To examine values that you've created in a workspace using the server's web
interface, you also need a babelfish. The babelfish translates values into a human
readable format so they can be displayed in a web browser. If a value is a string, then
the web interface simply displays the contents of the string, without any help from the
babelfish. But, if the value is any other type of R object, it needs help from the
R babelfish. To start up babelfish, execute the following command in another
terminal:
% R CMD BATCH babelfish.R
Note: this function will not return until you exit out of R.
For Windows, user can also start babelfish as Windows service by following these steps:
-
Open up a command prompt
- cd to the directory where NWS client package is installed.
- cd to bin directory
- Install R Babelfish Service
python RBabelfishService.py install
- Start R Babelfish Service
python RBabelfishService.py start
For more examples on using NetWorkSpaces, see the Tutorials chapter.
2.4 Starting Sleigh
Sleigh is a R class, built on top of the NetWorkSpaces, that
makes it very easy to write simple parallel programs. Sleigh has concept
of one master and multiple workers. The master sends jobs to workers
who may or may not be on the same machine as the master. To enable the
master to communicate with workers, Sleigh supports several
mechanisms to launch workers to run jobs. For remote launch mechanism,
make sure R is in the PATH of worker machines.
2.4.1 Local Launch Mechanism
Local launch mechanism is the default option to start up workers. It starts
three workers by default on local machine. Users may choose to change the number of
workers by setting workerCount variable in the sleigh constructor.
Local launch is useful for SMP machine, where multiple cores/processors are
available. It is also useful in debugging parallel programs locally
before running over a large cluster.
To create a sleigh object, simply load the nws package and type the following:
> s = sleigh()
This is equivalent to:
> s = sleigh(launch='local')
2.4.2 SSH Launch Mechanism
To start up workers on different machines, a remote login mechanism, such as
SSH client, is needed. Remote workers also need to run SSH server in order to
accept requests.
For Windows users, we recommend Cygwin's ssh server or copSSH from ITef!x, http://www.itefix.no/phpws/.
To setup Cygwin ssh server, http://ncyoung.com/entry/389 provides a nice tutorial.
Installation of copSSH is straight out of box. After installation, remember to activate users.
Next, we need to setup password-less ssh login.
Setting Up a Password-less SSH Login
-
To generate public and private keys, follow the steps below.
-
Open a terminal (shell on UNIX and DOS prompt on Windows).
- ssh-keygen -t rsa (assume ssh-keygen is in your PATH)
- cd .ssh (.ssh directory is located in your HOME directory,
/home/user on UNIX or Cygwin and C:/Program Files/copssh/home/user on Windows)
- cp id_rsa.pub authorized_keys
This step allows password-less login to local machine.
- For all remote machines that you want password-less login, append the content of
id_rsa.pub to their authorized_keys file.
- To test the password-less login, type the following command:
% ssh hostname date
If everything is setup correctly, you should not be asked for password and the current date on
remote machine will be returned.
To start up sleigh workers using SSH, simply load the nws package and type the following:
> s = sleigh(launch=sshcmd)
This creates three workers on local machine.
To start up sleigh workers on multiple machines,
> s = sleigh(nodeList=c('node1', 'node2'), launch=sshcmd)
2.4.3 RSH Launch Mechanism on Windows
On Windows, where SSH is not available, users can use RSH instead. Windows 2000/XP comes
with RSH client, but in order to communicate with other Windows machines, user
must have an RSH server running on the machine. To do so, user must first download and
install a copy of Windows Services for UNIX (SFU), which is available for free at Microsoft website.
SFU allows users to start up RSH server as Windows service or as an UNIX daemon.
In this section, we will focus on how to start up RSH server as Windows service.
-
Install SFU to local machine.
- Log in to the machine as an Administrator.
- Open up a command prompt.
- Go to SFU's common directory.
- Install Windows Remote Shell Service.
rshsvc.exe -install
- Start Windows Remote Shell Service.
rshsvc.exe -start
Users can also enable automatic or manual start of Windows Remote Shell Service
through Services console in Administrative Tools.
- Add .rhosts file.
The .rhosts file resides in the location specified by the registry entry:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\RshSvc\RhostsPath
This value usually is C:\Windows\System32\Drivers\etc
The format of the .rhosts file is:
machine1 user1 # user1 can log in from machine1
+ user2 # user2 can log in from any machine
machine2 user1 user2 # user1 and user2 can log in from machine2
machine3 + # all users on machine3 can log in
+ + # any user from any machine can log in
- Test RSH from a command prompt
rsh localhost set
If everything setup correctly, this command should return a list of environment variables set locally.
To start up sleigh workers using RSH, simply load the nws package and type the following:
> s = sleigh(launch=rshcmd)
2.4.4 Web Launch Mechanism
Web launch mechansim allows user to start up workers on different machines
in an ad-hoc way, without setting up remote login mechanisms, such as SSH and RSH.
To use the web launch option, follow the steps below.
-
Create an instance of Sleigh:
> s = sleigh(launch=`web')
The Sleigh constructor does not return until it gets a signal that all workers have started and are ready to accept jobs.
- Log in to a remote machine.
- Start a R session.
- Open a web browser and point to http://server_host_name:8766
- Click on the newly created Sleigh workspace, and read the value from variable `runMe'. It usually has value similar to:
webLaunch(`sleigh_ride_0000000004_tmp1a6c0h', `mercury', 8765);
- Copy the `runMe' value to the R session.
- Repeat steps 2-6 for each worker that needs to be started.
- Once all workers have started, delete the `DeleteMeWhenAllWorkersStarted' variable from the Sleigh workspace. This signals
Sleigh master that the workers have started and are ready to accept work.
Now you're ready to send jobs to remote workers. See the Sleigh for R Tutorial section in the Tutorials chapter for more information.
This document was translated from LATEX by
HEVEA.