PHOSPHORUS: Shared Variables on Top of PVM

a

a An Overview of Phosphorus

a Data Sharing Patterns a User Interface Pointers:
a Sources: if interested contact Isabelle Demeure (demeure@inf.enst.fr)
a Article written for Euromicro'95 (not available on the net).
a tech_report.ps: Technical report (Postcript file).
a Article written for Euromicro'95 (not available on the net).

a

OVERVIEW OF PHOSPHORUS


Distributed Shared Memory (DSM) systems provide a virtual address space shared among processes running on distributed systems (where processors do not share a common physical address space).

Phosphorus is a DSM system developed on top of the Parallel Virtual Machine system (PVM). PVM supports the message-passing paradigm. Phosphorus supports the shared data paradigm. As shown in the figure below, the two systems can be used together for the support of parallel, distributed computations involving both paradigms.

     a

a

Goals of the Phosphorus Project

a The Phosphorus project was started with three goals in mind:
a We made the following choices:

Phosphorus Characteristics

a Like PVM, Phosphorus is comprised of a daemon (called phosd) and a library of functions. A daemon resides on each machine making use of shared data. This daemon corresponds to the shared memory manager who is in charge of keeping the shared data coherent.

a The unit of sharing is the variable. We support the types supported by PVM through the packing/unpacking functions. Shared arrays of these types may also be declared.
a The management of the shared variables is distributed among a collection of servers running on the various hosts of the supporting network; each shared variable is ``owned'' by a server and the ownership changes dynamically (which corresponds to the dynamic distributed scheme described by Li and Hudak).
a The challenge when building a DSM system is to provide the programmer with a shared virtual address space without impacting too much the performance of the system. Phosphorus was carefully designed to reduce the network traffic necessary to maintain data coherent. Four data sharing patterns (or protocols) are adapted to suit four different variable access behaviors, one of them corresponding to a relaxed memory consistency protocol (following what was done in the Munin system).
a The interface consists in a simple set of primitives for declaring, reading, writing and synchronizing accesses to shared variables.

a

DATA SHARING PATTERNS

The design of our system was inspired by the Munin System. In particular we chose to implement similar data sharing protocols as in Munin, namely:

a Read Only (Multiple Readers/Write once),
a Migratory (Single Reader/Single Writer),
a Conventional (Multiple Readers/Single Write) and
a Write Shared (Multiple Readers/Multiple Writers).
The way a shared variable is accessed determines the protocol that must be associated to it when declared. The protocol determines how the variable is replicated and/or migrated.

The following table shows four combinations and the corresponding Phosphorus protocol. Two of the algorithms migrate data to the requeting host, and the two others replicate data so that multiple readers can access data locally (note that the "central" protocol is not implemented in Phosphorus).

	a
a

READ_ONLY

Once a read-only variable has been initialized, no further updates occur.

Thus, the protocol simply consists of replication on demand with no write access rights. READ_ONLY protocol may also be classified as a Non-Migrating and Replicated algorithm.

A runtime error is generated when a task attempts to write to a READ_ONLY variable, and the variable has already been initialized.

MIGRATORY

A migratory variable is always migrated to the host where it is accessed. There is a unique owner of the variable at a time, and the owner is the only one who may read from or write to it. This protocol, also called ``single reader/single writer'' (SRSW), is the access pattern typical of shared data that is accessed only inside a critical section.

The consistency protocol for migratory data, forwards the data to the next task that requested access to it, provides this task with read and write access, i.e. ownership, and invalidates the original copy.

CONVENTIONAL

At any time there can be several read-only copies of a conventional variable, but only one read-write copy of it. This protocol is also called ``multiple readers/single writer'' (MRSW).

A read operation on a host with no read access rights, causes a read fault. In this case, the faulting host has to communicate with the owner host to acquire a read-only copy. Following, the owner changes the access rights to read-only, if necessary, and sends a read-only copy to the reader hosts.

A write operation on a host with no write access rights, causes a write fault. In this case the ownership is transferred to the host where the write fault occurred. At this time, an invalidate message is sent to every host keeping a read-only copy before the write operation can complete.

Note that this protocol enforces sequential consistency.

Full-Replication algorithm: WRITE_SHARED

Multiple copies with read-write access rights may reside on different hosts. This protocol, also called ``multiple readers/multiple writers'' (MRMW), is implemented with a Release Consistency Protocol: Eager Update Release Consistency Protocol (see below).

Each task may modify locally a portion of the variable. It is important for the programmer to be aware that each task must write to independent portions of the variable. Updates occur whenever a task acquires a lock or arrives at a barrier.

With the WRITE_SHARED protocol, the programmer must be aware that he or she is in charge of triggering the updates of the shared variables by calling phos_lock on a synchronisation variable or by forcing a rendez-vous at a barrier. Also note that locks should not be performed on WRITE-SHARED variables.

The Release Consistency Model

One of the problems raised by DSM systems is that of maintaining consistency between the various copies of a variable shared by several processes


The first (rather intuitive) way to maintain consistency is to ensure that every read operation will return the last written value. The problem with this approach is that it is very costly because it requires many messages to maintain consistency.

To address this problem, new approaches have been proposed. Release Consistency is one of them. The principle behind Release Consistency is that the modifications made to a variable by a process are propagated to the other processes at given synchronisation points, usually termed acquire and release operations. The synchronisation points can be implemented as lock-unlock pairs, or as barrier-arrival and barrier departure pairs.

Note that it is the programmer's job to place the synchronisation points in his application by invoking the acquire and release operations. However, it has been shown that most of the parallel programs naturally contain enough synchronization (lock-unlock and barriers); it should therefore not be necessary to modify them when release consistency is used.

The great strength of Release Consistency is that it allows multiple processes to write simultaneously to distinct parts of their local copy of a shared variable. The propagation of the modifications is postponed until the next synchronisation operation.

This protocol has been implemented in various systems such as TreadMarks; the resulting performance is much better than for conventional models of consistency because it involves less messages among the processes sharing a variable.

a

USER INTERFACE

This interface was inspired by the Munin system interface, which we found simple and complete.
Every function returns 0 if successful.
Upon failure, a value different from 0 is returned with an error message.

Process Control

a int phos_init(void)

This routine allows processes to enter the sharing service. It must be called after enrolling in PVM. The proper way to start a Phosphorus application is the following:

    mytid = pvm_mytid();   /* enroll in pvm */
    info = phos_init();    /* enroll in Phosphorus */

This call creates a new server phosd on the local host, if it was not already created by another task.


a int phos_end(void)

This routine informs the local phosd that the calling task no longer needs the sharing service. The phosd is aware of the number of tasks making use of the sharing service on the local host and will die as soon as no more tasks are enrolled into Phosphorus. It is important to call phos_end() at the end of the application, otherwise the local phosd would believe there are still tasks running. Even if the user has forgotten to free its shared variables (phos_free() see below), phos_end() cleans up all the variables belonging to this task. The proper way to end up a Phosphorus application is the following:

    phos_end();    /* exit DSM service */  
    pvm_exit();    /* exit PVM before stopping */ 
    exit(0);

Variable Declaration

a int phos_declare(int desc, int type, int count, int protocol)

This routine allows the declaration of a shared variable. The argument desc is a unique identifier given by the programmer as a descriptor for the shared variable; it must be common to every task sharing it. type corresponds to the PVM types, which are redefined in Phosphorus. Possible values are enumerated below.

BYTE=0 - CPLX=1 - DCPLX=2 - DOUBLE=3 - FLOAT=4 - INT=5

LONG=6 - SHORT=7 - UINT=8 - ULONG=9 - USHORT=10

count refers to the number of items of type type. And the argument protocol determines the protocol associated with this variable (see the values below). A variable may be shared (READ_ONLY, CONVENTIONAL, MIGRATORY or WRITE_SHARED) or a synchronization variable (SYNCH).

READ_ONLY=1 - MIGRATORY=2 - CONVENTIONAL=3 - WRITE-SHARED=4

SYNCH=5


a int phos_free(int desc)

This routine frees the shared variable identified by the descriptor desc. phos_free() is to be called when a variable in a task is no longer needed. Further accesses to it will result in a runtime error.

Access to Variables

a int phos_read(int desc, void *buffer)

The routine phos_read() is called to obtain a coherent value of the variable associated to the desc descriptor. It is the user's responsibility to ensure that enough storage in available in buffer. For WRITE_SHARED variables use the phos_ws_read call.

a int phos_write(int desc, void *buffer)

This routine is called to write a shared variable identified by the desc descriptor. It is the user's responsibility to ensure that enough storage in available in buffer. For WRITE_SHARED variables use the phos_ws_write call.

a int phos_ws_read(int desc, int offset, int length, void *buffer)

where offset indicates the position of the first byte to be read, length indicates the number of bytes to be read. The routine phos_ws_read() is called to obtain a coherent value of the WRITE_SHARED variable associated to the desc descriptor. It is the user's responsibility to ensure that enough storage in available in buffer. For READ_ONLY, MIGRATORY and CONVENTIONAL variables use the phos_read call.

a int phos_ws_write(int desc, int offset, int length, void *buffer)

where offset indicates the position of the first byte to be written, length indicates the number of bytes to be written. The routine phos_ws_write() is called to write part of the WRITE_SHARED variable associated to the desc descriptor. For READ_ONLY, MIGRATORY and CONVENTIONAL variables use the phos_write call.

Synchronization

a int phos_lock(int desc)

This routine may be called either with the descriptor of a synchronization variable (SYNCH), or with a shared variable descriptor. READ_ONLY and WRITE-SHARED variables may not be locked.

A phos_lock() performed on a synchronization variable is used for inter-task synchronization. In addition, if the user needs to access to a set of shared variables in mutual exclusion, a sequence of phos_lock()-phos_unlock() can be performed every time an access is to be done, to protect them. An example of a critical section is illustrated as follows:

    phos_lock(SYNCH_VAR);
    phos_read(VAR_1, &buffer_1);
    phos_read(VAR_2, &buffer_2);
    compute(buffer_1, buffer_2);
    phos_write(VAR_1, &buffer_1);
    phos_write(VAR_2, &buffer_2);
    phos_unlock(SYNCH_VAR);

If the user needs only to protect one variable at a time, it is possible to lock a single shared variable. This guarantees that read and write operations are performed in sequence. For example:

    phos_lock(SHARED_VAR);
    phos_read(SHARED_VAR, &buffer);
    compute(buffer);
    phos_write(SHARED_VAR, &buffer);
    phos_unlock(SHARED_VAR);

The user must make sure that a lock acquired by calling (phos_lock()) is always released by calling (phos_unlock()) in the same task that acquired it.

With the WRITE_SHARED protocol, the programmer must be aware that he or she is in charge of triggering the updates of the shared variables by calling phos_lock on a synchronisation variable or by forcing a rendez-vous at a barrier. Also note that locks should not be performed on WRITE-SHARED variables.

a int phos_unlock(int desc)

Liberates the previously acquired lock. A runtime error will be raised if the variable with the descriptor desc has not been previously acquired in the same task.

a

Phosphorus project - Page maintained by Isabelle Demeure (demeure@inf.enst.fr)
Students: Philippe Meunier (94-95), Rocío Cabrera (94-95), Vincent Bartro (95), Jean Sini (96).
Page last updated January 13, 1997.