3.3.1 Checkpointing Mapped Segments

Next: 3.3.2 Restoring Mapped Segments Up: 3.3 Shared Libraries Previous: 3.3 Shared Libraries

3.3.1 Checkpointing Mapped Segments

To find all active segments, the checkpoint library uses the ioctl() interface to the /proc file system, available on a number of UNIX variants. In the /proc directory, there is a file for each running process, named by process-ID, which gives access to the process's memory contents. There is an ioctl() call to find the number of mapped memory segments in use by a given process and another to find information about each segment (virtual start address, size of segment in bytes, protection and attribute flags). Note that this call returns all process segments, including stack, data, and text segments.

On systems where the /proc file system is not available or does not provide the needed interface, the library could instead record the needed information at the time of mmap() calls. The mmap() function is used by the dynamic linker to create new segments and map shared libraries into the segments. Our method for augmenting system calls to record needed information is described below in section 3.4.1. Dynamic library support is currently only provided on systems which have the needed /proc interface.

Once the segment information is obtained, Condor must determine which segments are which, because some of the segments must be treated specially. The data and text segments are identified by comparing the addresses to the address of a static function (in the Condor checkpoint library) and the address of a global variable, respectively. The stack segment(s) are identified by comparing the addresses to the stack pointer value and a system defined constant for the stack ending address. All other segments are assumed to contain dynamic library text or data.

Once all segments are identified, Condor saves all segments except the static text segment (because the text can be retrieved from the a.out file at restart). The write() system call is used to write bytes from addresses in memory to the file. Note that Condor saves the dynamic library text in addition to dynamic library data. There are a number of reasons for doing this. First, this is a simple way for Condor to ensure that the library text matches the library data on restart when migrating to a new machine which may have different versions of system libraries. If Condor did not ensure this, it is possible that library text would look for variables in incorrect locations. A second reason is that Condor must ensure that the library text is mapped back into the same location on restart, so that dynamic links are still valid. This behavior negates one of the benefits of dynamic linking (smaller executables) by increasing the size of the checkpoint file. However, we felt that this cost was acceptable compared to the complexity of comparing library versions on the checkpoint and restart machines and only moving libraries when necessary. It should at least be no worse than static checkpointing.

Next: 3.3.2 Restoring Mapped Segments Up: 3.3 Shared Libraries Previous: 3.3 Shared Libraries

condor-admin@cs.wisc.edu