Back to index

nordugrid-arc-nox  1.1.0~rc6
Public Member Functions | Private Member Functions | Private Attributes | Static Private Attributes
Arc::FileCache Class Reference

FileCache provides an interface to all cache operations to be used by external classes. More...

#include <FileCache.h>

Collaboration diagram for Arc::FileCache:
Collaboration graph
[legend]

List of all members.

Public Member Functions

 FileCache (std::string cache_path, std::string id, uid_t job_uid, gid_t job_gid)
 Create a new FileCache instance.
 FileCache (std::vector< std::string > caches, std::string id, uid_t job_uid, gid_t job_gid)
 Create a new FileCache instance with multiple cache dirs.
 FileCache (std::vector< std::string > caches, std::vector< std::string > remote_caches, std::vector< std::string > draining_caches, std::string id, uid_t job_uid, gid_t job_gid, int cache_max=100, int cache_min=100)
 Create a new FileCache instance with multiple cache dirs, remote caches and draining cache directories.
 FileCache ()
 Default constructor.
bool Start (std::string url, bool &available, bool &is_locked, bool use_remote=true)
 Prepare cache for downloading file, and lock the cached file.
bool Stop (std::string url)
 This method (or stopAndDelete) must be called after file was downloaded or download failed, to release the lock on the cache file.
bool StopAndDelete (std::string url)
 Release the cache file and delete it, because for example a failed download left an incomplete copy, or it has expired.
std::string File (std::string url)
 Returns the full pathname of the file in the cache which corresponds to the given url.
bool Link (std::string link_path, std::string url)
 Create a hard-link to the per-job dir from the cache dir, and then a soft-link from here to the session directory.
bool Copy (std::string dest_path, std::string url, bool executable=false)
 Copy the cache file corresponding to url to the dest_path.
bool Release ()
 Release claims on input files for the job specified by id.
bool AddDN (std::string url, std::string DN, Time expiry_time)
 Add the given DN to the list of cached DNs with the given expiry time.
bool CheckDN (std::string url, std::string DN)
 Check if the given DN is cached for authorisation.
bool CheckCreated (std::string url)
 Check if there is an information about creation time.
Time GetCreated (std::string url)
 Get the creation time of a cached file.
bool CheckValid (std::string url)
 Check if there is an information about expiry time.
Time GetValid (std::string url)
 Get expiry time of a cached file.
bool SetValid (std::string url, Time val)
 Set expiry time.
 operator bool ()
 Returns true if object is useable.
bool operator== (const FileCache &a)
 Return true if all attributes are equal.

Private Member Functions

bool _init (std::vector< std::string > caches, std::vector< std::string > remote_caches, std::vector< std::string > draining_caches, std::string id, uid_t job_uid, gid_t job_gid, int cache_max=100, int cache_min=100)
 Common code for constuctors.
std::string _getLockFileName (std::string url)
 Return the filename of the lock file associated to the given url.
std::string _getMetaFileName (std::string url)
 Return the filename of the meta file associated to the given url.
bool _checkLock (std::string url)
 Return true if the lock on the cache file corresponding to this url exists and is owned by this process.
bool _cacheMkDir (std::string dir, bool all_read)
 Generic method to make directories.
int _chooseCache (std::string url)
 Choose a cache directory to use for this url, based on the free size of the cache directories and cache_size limitation of the arc.conf Returns the index of the cache to use in the list.
std::pair< unsigned long long,
unsigned long long > 
_getCacheInfo (std::string path)
 Retun the cache info < total space in KB, used space in KB>

Private Attributes

std::map< std::string, int > _cache_map
 Map of the cache files and the cache directories.
std::vector< struct
CacheParameters
_caches
 Vector of caches.
std::vector< struct
CacheParameters
_remote_caches
 Vector of remote caches.
std::vector< struct
CacheParameters
_draining_caches
 Vector of caches to be drained.
std::string _id
 identifier used to claim files, ie the job id
uid_t _uid
 owner:group corresponding to the user running the job.
gid_t _gid
std::string _hostname
 Our hostname (same as given by uname -n)
std::string _pid
 Our pid.
int _max_used
 The max and min used space for caches, as a percentage of the file system.
int _min_used

Static Private Attributes

static const std::string CACHE_DATA_DIR = "data"
 The sub-dir of the cache for data.
static const std::string CACHE_JOB_DIR = "joblinks"
 The sub-dir of the cache for per-job links.
static const int CACHE_DIR_LENGTH = 2
 The length of each cache subdirectory.
static const int CACHE_DIR_LEVELS = 1
 The number of levels of cache subdirectories.
static const std::string CACHE_LOCK_SUFFIX = ".lock"
 The suffix to use for lock files.
static const std::string CACHE_META_SUFFIX = ".meta"
 The suffix to use for meta files.
static const int CACHE_DEFAULT_AUTH_VALIDITY = 86400
 Default validity time of cached DNs.
static Logger logger
 Logger for messages.

Detailed Description

FileCache provides an interface to all cache operations to be used by external classes.

An instance should be created per job, and all files within the job are managed by that instance. When it is decided a file should be downloaded to the cache, Start() should be called, so that the cache file can be prepared and locked. When a transfer has finished successfully, Link() or Copy() should be called to create a hard link to a per-job directory in the cache and then soft link, or copy the file directly to the session directory so it can be accessed from the user's job. Stop() must then be called to release any locks on the cache file.

The cache directory(ies) and the optional directory to link to when the soft-links are made are set in the global configuration file. The names of cache files are formed from a hash of the URL specified as input to the job. To ease the load on the file system, the cache files are split into subdirectories based on the first two characters in the hash. For example the file with hash 76f11edda169848038efbd9fa3df5693 is stored in 76/f11edda169848038efbd9fa3df5693. A cache filename can be found by passing the URL to Find(). For more information on the structure of the cache, see the Grid Manager Administration Guide.

A metadata file with the '.meta' suffix is stored next to each cache file. This contains the URL corresponding to the cache file and the expiry time, if it is available. For example lfc://lfc1.ndgf.org//grid/atlas/test/test1 20081007151045Z

While cache files are downloaded, they are locked by creating a lock file with the '.lock' suffix next to the cache file. Calling Start() creates this lock and Stop() releases it. All processes calling Start() must wait until they successfully obtain the lock before downloading can begin.

Definition at line 65 of file FileCache.h.


Constructor & Destructor Documentation

Arc::FileCache::FileCache ( std::string  cache_path,
std::string  id,
uid_t  job_uid,
gid_t  job_gid 
)

Create a new FileCache instance.

Parameters:
cache_pathThe format is "cache_dir[ link_path]". path is the path to the cache directory and the optional link_path is used to create a link in case the cache directory is visible under a different name during actual usage. When linking from the session dir this path is used instead of cache_path.
idthe job id. This is used to create the per-job dir which the job's cache files will be hard linked from
job_uidowner of job. The per-job dir will only be readable by this user
job_gidowner group of job

Definition at line 39 of file FileCache.cpp.

                                      {

    // make a vector of one item and call _init
    std::vector<std::string> caches;
    std::vector<std::string> remote_caches;
    std::vector<std::string> draining_caches;
    if (!cache_path.empty()) 
      caches.push_back(cache_path);

    // if problem in init, clear _caches so object is invalid
    if (!_init(caches, remote_caches, draining_caches, id, job_uid, job_gid))
      _caches.clear();
  }

Here is the call graph for this function:

Arc::FileCache::FileCache ( std::vector< std::string >  caches,
std::string  id,
uid_t  job_uid,
gid_t  job_gid 
)

Create a new FileCache instance with multiple cache dirs.

Parameters:
cachesa vector of strings describing caches. The format of each string is "cache_dir[ link_path]".
idthe job id. This is used to create the per-job dir which the job's cache files will be hard linked from
job_uidowner of job. The per-job dir will only be readable by this user
job_gidowner group of job

Definition at line 56 of file FileCache.cpp.

                                      {

    std::vector<std::string> remote_caches;
    std::vector<std::string> draining_caches;

    // if problem in init, clear _caches so object is invalid
    if (!_init(caches, remote_caches, draining_caches, id, job_uid, job_gid))
      _caches.clear();
  }

Here is the call graph for this function:

Arc::FileCache::FileCache ( std::vector< std::string >  caches,
std::vector< std::string >  remote_caches,
std::vector< std::string >  draining_caches,
std::string  id,
uid_t  job_uid,
gid_t  job_gid,
int  cache_max = 100,
int  cache_min = 100 
)

Create a new FileCache instance with multiple cache dirs, remote caches and draining cache directories.

Parameters:
cachesa vector of strings describing caches. The format of each string is "cache_dir[ link_path]".
remote_cachesSame format as caches. These are the paths to caches which are under the control of other Grid Managers and are read-only for this process.
draining_cachesSame format as caches. These are the paths to caches which are to be drained.
idthe job id. This is used to create the per-job dir which the job's cache files will be hard linked from
job_uidowner of job. The per-job dir will only be readable by this user
job_gidowner group of job
cache_maxmaximum used space by cache, as percentage of the file system
cache_minminimum used space by cache, as percentage of the file system

Definition at line 69 of file FileCache.cpp.

                                      {
  
    // if problem in init, clear _caches so object is invalid
    if (! _init(caches, remote_caches, draining_caches, id, job_uid, job_gid, cache_max, cache_min))
      _caches.clear();
  }

Here is the call graph for this function:

Default constructor.

Invalid cache.

Definition at line 247 of file FileCache.h.

                {
      _caches.clear();
    }

Member Function Documentation

bool Arc::FileCache::_cacheMkDir ( std::string  dir,
bool  all_read 
) [private]

Generic method to make directories.

Parameters:
dirdirectory to create
all_readif true, make the directory readable by all users, if false, it is readable only by the user who created it.

Definition at line 1270 of file FileCache.cpp.

                                                          {

    struct stat fileStat;
    int err = stat(dir.c_str(), &fileStat);
    if (0 != err) {
      logger.msg(VERBOSE, "Creating directory %s", dir);
      std::string::size_type slashpos = 0;

      // set perms based on all_read
      mode_t perm = S_IRWXU;
      if (all_read)
        perm |= S_IRGRP | S_IROTH | S_IXGRP | S_IXOTH;

      do {
        slashpos = dir.find("/", slashpos + 1);
        std::string dirname = dir.substr(0, slashpos);
        // list dir to see if it exists (we can't tell the difference between
        // dir already exists and permission denied)
        struct stat statbuf;
        if (stat(dirname.c_str(), &statbuf) == 0)
          continue;

        if (mkdir(dirname.c_str(), perm) != 0)
          if (errno != EEXIST) {
            logger.msg(ERROR, "Error creating required dirs: %s", strerror(errno));
            return false;
          }
        // chmod to get around GM umask setting
        if (chmod(dirname.c_str(), perm) != 0) {
          logger.msg(ERROR, "Error changing permission of dir %s: %s", dirname, strerror(errno));
          return false;
        }
      } while (slashpos != std::string::npos);
    }
    return true;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

bool Arc::FileCache::_checkLock ( std::string  url) [private]

Return true if the lock on the cache file corresponding to this url exists and is owned by this process.

Definition at line 1212 of file FileCache.cpp.

                                          {

    std::string filename = File(url);
    std::string lock_file = _getLockFileName(url);

    // check for existence of lock file
    struct stat fileStat;
    int err = stat(lock_file.c_str(), &fileStat);
    if (0 != err) {
      if (errno == ENOENT)
        logger.msg(ERROR, "Lock file %s doesn't exist", lock_file);
      else
        logger.msg(ERROR, "Error listing lock file %s: %s", lock_file, strerror(errno));
      return false;
    }

    // check the lock file's pid and hostname matches ours
    FILE *pFile;
    char lock_info[100]; // should be long enough for a pid + hostname
    pFile = fopen((char*)lock_file.c_str(), "r");
    if (pFile == NULL) {
      logger.msg(ERROR, "Error opening lock file %s: %s", lock_file, strerror(errno));
      return false;
    }
    if (fgets(lock_info, 100, pFile) == NULL) {
      logger.msg(ERROR, "Error reading lock file %s: %s", lock_file, strerror(errno));
      fclose(pFile);
      return false;
    }
    fclose(pFile);

    std::string lock_info_s(lock_info);
    std::string::size_type index = lock_info_s.find("@", 0);
    if (index == std::string::npos) {
      logger.msg(ERROR, "Error with formatting in lock file %s: %s", lock_file, lock_info_s);
      return false;
    }

    if (lock_info_s.substr(index + 1) != _hostname) {
      logger.msg(VERBOSE, "Lock is owned by a different host");
      // TODO: here do ssh login and check
      return false;
    }
    if (lock_info_s.substr(0, index) != _pid) {
      logger.msg(ERROR, "Another process owns the lock on file %s. Must go back to Start()", filename);
      return false;
    }
    return true;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

int Arc::FileCache::_chooseCache ( std::string  url) [private]

Choose a cache directory to use for this url, based on the free size of the cache directories and cache_size limitation of the arc.conf Returns the index of the cache to use in the list.

Definition at line 1307 of file FileCache.cpp.

                                           {
    
    // get the hash of the url
    std::string hash = FileCacheHash::getHash(url);
    int index = 0;
    for (int level = 0; level < CACHE_DIR_LEVELS; level ++) {
       hash.insert(index + CACHE_DIR_LENGTH, "/");
       // go to next slash position, add one since we just inserted a slash
       index += CACHE_DIR_LENGTH + 1;
    }
  
    int caches_size = _caches.size();
  
    // When there is only one cache directory   
    if (caches_size == 1) {
      return 0;
    }
    // check the fs to see if the file is already there
    for (int i = 0; i < caches_size ; i++) { 
      struct stat fileStat;  
      std::string c_file = _caches[i].cache_path + "/" + CACHE_DATA_DIR +"/" + hash;  
      if (stat(c_file.c_str(), &fileStat) == 0) {
        return i; 
      }  
    }
  
    // find a cache with the most unsed space and also the cache_size parameter defined in "arc.conf"
    std::map<int ,std::pair<unsigned long long, float> > cache_map;
    // caches which are under the usage percent of the "arc.conf": < cache number, chance to select this cache >
    std::map <int, int>  under_limit;
    // caches which are over the usage percent of the "arc.conf" < cache free space, cache number> 
    std::map<unsigned long long, int> over_limit;
    // sum of all caches 
    long total_size = 0; 
    // get the free spaces of the caches 
    for (int i = 0; i < caches_size; i++ ) {
      std::pair <unsigned long long, unsigned long long> p = _getCacheInfo(_caches[i].cache_path);
      cache_map.insert(std::make_pair(i, p));
      total_size = total_size + p.first;
    }
    for ( std::map< int, std::pair<unsigned long long,float> >::iterator cache_it = cache_map.begin(); cache_it != cache_map.end(); cache_it++) {
      // check if the usage percent is passed
      if ((100 - (100 * cache_it->second.second)/ cache_it->second.first) < _max_used) {                       
        // caches which are under the defined percentage 
        under_limit.insert(std::make_pair(cache_it->first, roundf((float) cache_it->second.first/total_size*10)));
      } else {
        // caches which are passed the defined percentage
        over_limit.insert(std::make_pair(cache_it->second.second, cache_it->first));
      }
    }
    int cache_no = 0;
    if (under_limit.size() > 0) {
      std::vector<int> utility_cache;
      for ( std::map<int,int> ::iterator cache_it = under_limit.begin(); cache_it != under_limit.end(); cache_it++) {
        // fill the vector with the frequency of cache number according to the cache size. 
        // for instance, a cache with 70% of the total cache space will appear 7 times in this vector and a cache with 30% will appear 3 times.           
        if (cache_it->second == 0) {
          utility_cache.push_back(cache_it->first);
        } else { 
          for (int i = 0; i < cache_it->second; i++) {
            utility_cache.push_back(cache_it->first);
          }
        }
      } 
      // choose a cache from the weighted list   
      cache_no = utility_cache.at((int)rand()%(utility_cache.size()));
    } else {
      // find a max free space amoung the caches that are passed the limit of usage
      cache_no = max_element(over_limit.begin(), over_limit.end(), over_limit.value_comp())->second;
    }
    return cache_no;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

std::pair< unsigned long long, unsigned long long > Arc::FileCache::_getCacheInfo ( std::string  path) [private]

Retun the cache info < total space in KB, used space in KB>

Definition at line 1380 of file FileCache.cpp.

                                                                                          {
  
    struct statvfs info;
    if (statvfs(path.c_str(), &info) != 0) {
      logger.msg(ERROR, "Error getting info from statvfs for the path: %s", path);
    }
    // return a pair of <cache total size (KB), cache free space (KB)>
    return std::make_pair((info.f_blocks * info.f_bsize)/1024, (info.f_bfree * info.f_bsize)/1024); 
  }

Here is the call graph for this function:

Here is the caller graph for this function:

std::string Arc::FileCache::_getLockFileName ( std::string  url) [private]

Return the filename of the lock file associated to the given url.

Definition at line 1262 of file FileCache.cpp.

                                                     {
    return File(url) + CACHE_LOCK_SUFFIX;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

std::string Arc::FileCache::_getMetaFileName ( std::string  url) [private]

Return the filename of the meta file associated to the given url.

Definition at line 1266 of file FileCache.cpp.

                                                     {
    return File(url) + CACHE_META_SUFFIX;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

bool Arc::FileCache::_init ( std::vector< std::string >  caches,
std::vector< std::string >  remote_caches,
std::vector< std::string >  draining_caches,
std::string  id,
uid_t  job_uid,
gid_t  job_gid,
int  cache_max = 100,
int  cache_min = 100 
) [private]

Common code for constuctors.

Definition at line 83 of file FileCache.cpp.

                                       {

    _id = id;
    _uid = job_uid;
    _gid = job_gid;
    _max_used = cache_max;
    _min_used = cache_min;

    // for each cache
    for (int i = 0; i < (int)caches.size(); i++) {
      std::string cache = caches[i];
      std::string cache_path = cache.substr(0, cache.find(" "));
      if (cache_path.empty()) {
        logger.msg(ERROR, "No cache directory specified");
        return false;
      }
      std::string cache_link_path = "";
      if (cache.find(" ") != std::string::npos)
        cache_link_path = cache.substr(cache.find_last_of(" ") + 1, cache.length() - cache.find_last_of(" ") + 1);

      // tidy up paths - take off any trailing slashes
      if (cache_path.rfind("/") == cache_path.length() - 1)
        cache_path = cache_path.substr(0, cache_path.length() - 1);
      if (cache_link_path.rfind("/") == cache_link_path.length() - 1)
        cache_link_path = cache_link_path.substr(0, cache_link_path.length() - 1);

      // create cache dir and subdirs
      if (!_cacheMkDir(cache_path + "/" + CACHE_DATA_DIR, true)) {
        logger.msg(ERROR, "Cannot create directory \"%s\" for cache", cache_path + "/" + CACHE_DATA_DIR);
        return false;
      }
      if (!_cacheMkDir(cache_path + "/" + CACHE_JOB_DIR, true)) {
        logger.msg(ERROR, "Cannot create directory \"%s\" for cache", cache_path + "/" + CACHE_JOB_DIR);
        return false;
      }
      // add this cache to our list
      struct CacheParameters cache_params;
      cache_params.cache_path = cache_path;
      cache_params.cache_link_path = cache_link_path;
      _caches.push_back(cache_params);
    }
  
    // add remote caches
    for (int i = 0; i < (int)remote_caches.size(); i++) {
      std::string cache = remote_caches[i];
      std::string cache_path = cache.substr(0, cache.find(" "));
      if (cache_path.empty()) {
        logger.msg(ERROR, "No remote cache directory specified");
        return false;
      }
      std::string cache_link_path = "";
      if (cache.find(" ") != std::string::npos) cache_link_path = cache.substr(cache.find_last_of(" ")+1, cache.length()-cache.find_last_of(" ")+1);
      
      // tidy up paths - take off any trailing slashes
      if (cache_path.rfind("/") == cache_path.length()-1) cache_path = cache_path.substr(0, cache_path.length()-1);
      if (cache_link_path.rfind("/") == cache_link_path.length()-1) cache_link_path = cache_link_path.substr(0, cache_link_path.length()-1);
  
      // add this cache to our list
      struct CacheParameters cache_params;
      cache_params.cache_path = cache_path;
      cache_params.cache_link_path = cache_link_path;
      _remote_caches.push_back(cache_params);
    }
  
    // for each draining cache
    for (int i = 0; i < (int)draining_caches.size(); i++) {
      std::string cache = draining_caches[i];
      std::string cache_path = cache.substr(0, cache.find(" "));
      if (cache_path.empty()) {
        logger.msg(ERROR, "No cache directory specified");
        return false;
      }
      // tidy up paths - take off any trailing slashes
      if (cache_path.rfind("/") == cache_path.length()-1) cache_path = cache_path.substr(0, cache_path.length()-1);
  
      // add this cache to our list
      struct CacheParameters cache_params;
      cache_params.cache_path = cache_path;
      cache_params.cache_link_path = "";
      _draining_caches.push_back(cache_params);
    }
      // our hostname and pid
    struct utsname buf;
    if (uname(&buf) != 0) {
      logger.msg(ERROR, "Cannot determine hostname from uname()");
      return false;
    }
    _hostname = buf.nodename;
    int pid_i = getpid();
    std::stringstream ss;
    ss << pid_i;
    ss >> _pid;
    return true;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

bool Arc::FileCache::AddDN ( std::string  url,
std::string  DN,
Time  expiry_time 
)

Add the given DN to the list of cached DNs with the given expiry time.

Parameters:
urlthe url corresponding to the cache file to which we want to add a cached DN
DNthe DN of the user
expiry_timethe expiry time of this DN in the DN cache

Definition at line 975 of file FileCache.cpp.

                                                                     {

    if (DN.empty())
      return false;
    if (expiry_time == Time(0))
      expiry_time = Time(time(NULL) + CACHE_DEFAULT_AUTH_VALIDITY);

    // add DN to the meta file. If already there, renew the expiry time
    std::string meta_file = _getMetaFileName(url);
    struct stat fileStat;
    int err = stat(meta_file.c_str(), &fileStat);
    if (0 != err) {
      logger.msg(ERROR, "Error reading meta file %s: %s", meta_file, strerror(errno));
      return false;
    }
    FILE *pFile;
    char mystring[fileStat.st_size + 1];
    pFile = fopen(meta_file.c_str(), "r");
    if (pFile == NULL) {
      logger.msg(ERROR, "Error opening meta file %s: %s", meta_file, strerror(errno));
      return false;
    }
    // get the first line
    fgets(mystring, sizeof(mystring), pFile);

    // check for correct formatting and possible hash collisions between URLs
    std::string first_line(mystring);
    if (first_line.find('\n') == std::string::npos)
      first_line += '\n';
    std::string::size_type space_pos = first_line.rfind(' ');
    if (space_pos == std::string::npos)
      space_pos = first_line.length() - 1;

    if (first_line.substr(0, space_pos) != url) {
      logger.msg(ERROR, "Error: File %s is already cached at %s under a different URL: %s - will not add DN to cached list", url, File(url), first_line.substr(0, space_pos));
      fclose(pFile);
      return false;
    }

    // read in list of DNs
    std::vector<std::string> dnlist;
    dnlist.push_back(DN + ' ' + expiry_time.str(MDSTime) + '\n');

    char *res = fgets(mystring, sizeof(mystring), pFile);
    while (res) {
      std::string dnstring(mystring);
      space_pos = dnstring.rfind(' ');
      if (space_pos == std::string::npos) {
        logger.msg(WARNING, "Bad format detected in file %s, in line %s", meta_file, dnstring);
        res = fgets (mystring, sizeof(mystring), pFile);
        continue;
      }
      // remove expired DNs (after some grace period)
      if (dnstring.substr(0, space_pos) != DN) {
        if (dnstring.find('\n') != std::string::npos)
          dnstring.resize(dnstring.find('\n'));
        Time exp_time(dnstring.substr(space_pos + 1));
        if (exp_time > Time(time(NULL) - CACHE_DEFAULT_AUTH_VALIDITY))
          dnlist.push_back(dnstring + '\n');
      }
      res = fgets(mystring, sizeof(mystring), pFile);
    }
    fclose(pFile);

    // write everything back to the file
    pFile = fopen(meta_file.c_str(), "w");
    if (pFile == NULL) {
      logger.msg(ERROR, "Error opening meta file for writing %s: %s", meta_file, strerror(errno));
      return false;
    }
    fputs((char*)first_line.c_str(), pFile);
    for (std::vector<std::string>::iterator i = dnlist.begin(); i != dnlist.end(); i++)
      fputs((char*)i->c_str(), pFile);
    fclose(pFile);
    return true;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

bool Arc::FileCache::CheckCreated ( std::string  url)

Check if there is an information about creation time.

Returns true if the file exists in the cache, since the creation time is the creation time of the cache file.

Parameters:
urlthe url corresponding to the cache file for which we want to know if the creation date exists

Definition at line 1100 of file FileCache.cpp.

                                            {

    // check the cache file exists - if so we can get the creation date
    // follow symlinks
    std::string cache_file = File(url);
    struct stat fileStat;
    return (stat(cache_file.c_str(), &fileStat) == 0) ? true : false;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

bool Arc::FileCache::CheckDN ( std::string  url,
std::string  DN 
)

Check if the given DN is cached for authorisation.

Parameters:
urlthe url corresponding to the cache file for which we want to check the cached DN
DNthe DN of the user

Definition at line 1052 of file FileCache.cpp.

                                                     {

    if (DN.empty())
      return false;

    std::string meta_file = _getMetaFileName(url);
    struct stat fileStat;
    int err = stat(meta_file.c_str(), &fileStat);
    if (0 != err) {
      if (errno != ENOENT)
        logger.msg(ERROR, "Error reading meta file %s: %s", meta_file, strerror(errno));
      return false;
    }
    FILE *pFile;
    char mystring[fileStat.st_size + 1];
    pFile = fopen(meta_file.c_str(), "r");
    if (pFile == NULL) {
      logger.msg(ERROR, "Error opening meta file %s: %s", meta_file, strerror(errno));
      return false;
    }
    fgets(mystring, sizeof(mystring), pFile); // first line

    // read in list of DNs
    char *res = fgets(mystring, sizeof(mystring), pFile);
    while (res) {
      std::string dnstring(mystring);
      std::string::size_type space_pos = dnstring.rfind(' ');
      if (dnstring.substr(0, space_pos) == DN) {
        if (dnstring.find('\n') != std::string::npos)
          dnstring.resize(dnstring.find('\n'));
        std::string exp_time = dnstring.substr(space_pos + 1);
        if (Time(exp_time) > Time()) {
          logger.msg(VERBOSE, "DN %s is cached and is valid until %s for URL %s", DN, Time(exp_time).str(), url);
          fclose(pFile);
          return true;
        }
        else {
          logger.msg(VERBOSE, "DN %s is cached but has expired for URL %s", DN, url);
          fclose(pFile);
          return false;
        }
      }
      res = fgets(mystring, sizeof(mystring), pFile);
    }
    fclose(pFile);
    return false;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

bool Arc::FileCache::CheckValid ( std::string  url)

Check if there is an information about expiry time.

Parameters:
urlthe url corresponding to the cache file for which we want to know if the expiration time exists

Definition at line 1129 of file FileCache.cpp.

                                          {
    return (GetValid(url) != Time(0));
  }

Here is the call graph for this function:

Here is the caller graph for this function:

bool Arc::FileCache::Copy ( std::string  dest_path,
std::string  url,
bool  executable = false 
)

Copy the cache file corresponding to url to the dest_path.

Definition at line 840 of file FileCache.cpp.

                                                                          {

    if (!(*this))
      return false;

    // check the original file exists
    std::string cache_file = File(url);
    struct stat fileStat;
    if (stat(cache_file.c_str(), &fileStat) != 0) {
      if (errno == ENOENT)
        logger.msg(ERROR, "Cache file %s does not exist", cache_file);
      else
        logger.msg(ERROR, "Error accessing cache file %s: %s", cache_file, strerror(errno));
      return false;
    }

    // make necessary dirs for the copy
    // this probably should have already been done... somewhere...
    std::string dest_dir = dest_path.substr(0, dest_path.rfind("/"));
    if (!_cacheMkDir(dest_dir, true))
      return false;
    if (chown(dest_dir.c_str(), _uid, _gid) != 0) {
      logger.msg(ERROR, "Failed to change owner of destination dir to %i: %s", _uid, strerror(errno));
      return false;
    }
    if (chmod(dest_dir.c_str(), S_IRWXU) != 0) {
      logger.msg(ERROR, "Failed to change permissions of session dir to 0700: %s", strerror(errno));
      return false;
    }

    // do the copy - taken directly from old datacache.cc
    char buf[65536];
    mode_t perm = S_IRUSR | S_IWUSR;
    if (executable)
      perm |= S_IXUSR;
    int fdest = open(dest_path.c_str(), O_WRONLY | O_CREAT | O_EXCL, perm);
    if (fdest == -1) {
      logger.msg(ERROR, "Failed to create file %s for writing: %s", dest_path, strerror(errno));
      return false;
    }
    if (fchown(fdest, _uid, _gid) == -1) {
      logger.msg(ERROR, "Failed change ownership of destination file %s: %s", dest_path, strerror(errno));
      close(fdest);
      return false;
    }

    int fsource = open(cache_file.c_str(), O_RDONLY);
    if (fsource == -1) {
      close(fdest);
      logger.msg(ERROR, "Failed to open file %s for reading: %s", cache_file, strerror(errno));
      return false;
    }

    // source and dest opened ok - copy in chunks
    for (;;) {
      ssize_t lin = read(fsource, buf, sizeof(buf));
      if (lin == -1) {
        close(fdest);
        close(fsource);
        logger.msg(ERROR, "Failed to read file %s: %s", cache_file, strerror(errno));
        return false;
      }
      if (lin == 0)
        break;          // eof

      for (ssize_t lout = 0; lout < lin;) {
        ssize_t lwritten = write(fdest, buf + lout, lin - lout);
        if (lwritten == -1) {
          close(fdest);
          close(fsource);
          logger.msg(ERROR, "Failed to write file %s: %s", dest_path, strerror(errno));
          return false;
        }
        lout += lwritten;
      }
    }
    close(fdest);
    close(fsource);
    return true;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

std::string Arc::FileCache::File ( std::string  url)

Returns the full pathname of the file in the cache which corresponds to the given url.

Definition at line 678 of file FileCache.cpp.

                                         {

    if (!(*this))
      return "";

    // get the hash of the url
    std::string hash = FileCacheHash::getHash(url);

    int index = 0;
    for (int level = 0; level < CACHE_DIR_LEVELS; level++) {
      hash.insert(index + CACHE_DIR_LENGTH, "/");
      // go to next slash position, add one since we just inserted a slash
      index += CACHE_DIR_LENGTH + 1;
    }
    // look up the cache map to see if the file is already in
    std::map <std::string, int>::iterator iter = _cache_map.find(hash) ;
    if (iter != _cache_map.end()) {
      return _caches[iter->second].cache_path + "/" + CACHE_DATA_DIR + "/" + hash;
    } 
  
    // else choose a new cache and assign the file to it
    int chosen_cache = _chooseCache(url);
    std::string path  = _caches[chosen_cache].cache_path + "/" + CACHE_DATA_DIR + "/" + hash;
  
    // update the cache map with the new file
    _cache_map.insert(std::make_pair(hash, chosen_cache));
    return path;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

Time Arc::FileCache::GetCreated ( std::string  url)

Get the creation time of a cached file.

If the cache file does not exist, 0 is returned.

Parameters:
urlthe url corresponding to the cache file for which we want to know the creation date

Definition at line 1109 of file FileCache.cpp.

                                          {

    // check the cache file exists
    std::string cache_file = File(url);
    // follow symlinks
    struct stat fileStat;
    if (stat(cache_file.c_str(), &fileStat) != 0) {
      if (errno == ENOENT)
        logger.msg(ERROR, "Cache file %s does not exist", cache_file);
      else
        logger.msg(ERROR, "Error accessing cache file %s: %s", cache_file, strerror(errno));
      return 0;
    }

    time_t ctime = fileStat.st_ctime;
    if (ctime <= 0)
      return Time(0);
    return Time(ctime);
  }

Here is the call graph for this function:

Here is the caller graph for this function:

Time Arc::FileCache::GetValid ( std::string  url)

Get expiry time of a cached file.

If the time is not available, a time equivalent to 0 is returned.

Parameters:
urlthe url corresponding to the cache file for which we want to know the expiry time

Definition at line 1133 of file FileCache.cpp.

                                        {

    // open meta file and pick out expiry time if it exists

    FILE *pFile;
    char mystring[1024]; // should be long enough for a pid or url...
    pFile = fopen((char*)_getMetaFileName(url).c_str(), "r");
    if (pFile == NULL) {
      logger.msg(ERROR, "Error opening meta file %s: %s", _getMetaFileName(url), strerror(errno));
      return Time(0);
    }
    if (fgets(mystring, sizeof(mystring), pFile) == NULL) {
      logger.msg(ERROR, "Error reading meta file %s: %s", _getMetaFileName(url), strerror(errno));
      fclose(pFile);
      return Time(0);
    }
    fclose(pFile);

    std::string meta_str(mystring);
    // get the first line
    if (meta_str.find('\n') != std::string::npos)
      meta_str.resize(meta_str.find('\n'));

    // if the file contains only the url, we don't have an expiry time
    if (meta_str == url)
      return Time(0);

    // check sensible formatting - should be like "rls://rls1.ndgf.org/file1 20080101123456Z"
    if (meta_str.substr(0, url.length() + 1) != url + " ") {
      logger.msg(ERROR, "Mismatching url in file %s: %s Expected %s", _getMetaFileName(url), meta_str, url);
      return Time(0);
    }
    if (meta_str.length() != url.length() + 16) {
      logger.msg(ERROR, "Bad format in file %s: %s", _getMetaFileName(url), meta_str);
      return Time(0);
    }
    if (meta_str.substr(url.length(), 1) != " ") {
      logger.msg(ERROR, "Bad separator in file %s: %s", _getMetaFileName(url), meta_str);
      return Time(0);
    }
    if (meta_str.substr(url.length() + 1).length() != 15) {
      logger.msg(ERROR, "Bad value of expiry time in %s: %s", _getMetaFileName(url), meta_str);
      return Time(0);
    }

    // convert to Time object
    return Time(meta_str.substr(url.length() + 1));
  }

Here is the call graph for this function:

Here is the caller graph for this function:

bool Arc::FileCache::Link ( std::string  link_path,
std::string  url 
)

Create a hard-link to the per-job dir from the cache dir, and then a soft-link from here to the session directory.

This is effectively 'claiming' the file for the job, so even if the original cache file is deleted, eg by some external process, the hard link still exists until it is explicitly released by calling Release().

If cache_link_path is set to "." then files will be copied directly to the session directory rather than via the hard link.

Parameters:
link_pathpath to the session dir for soft-link or new file
urlurl of file to link to or copy

Definition at line 707 of file FileCache.cpp.

                                                         {

    if (!(*this))
      return false;

    // check the original file exists
    std::string cache_file = File(url);
    struct stat fileStat;
  
    if (lstat(cache_file.c_str(), &fileStat) != 0) {
      if (errno == ENOENT)
        logger.msg(ERROR, "Error: Cache file %s does not exist", cache_file);
      else
        logger.msg(ERROR, "Error accessing cache file %s: %s", cache_file, strerror(errno));
      return false;
    }
  
    // get the hash of the url
    std::string hash = FileCacheHash::getHash(url);
    int index = 0;
    for (int level = 0; level < CACHE_DIR_LEVELS; level ++) {
      hash.insert(index + CACHE_DIR_LENGTH, "/");
      // go to next slash position, add one since we just inserted a slash
      index += CACHE_DIR_LENGTH + 1;
    }
  
    // look up the map file to see if the file is already mapped with a cache  
    std::map <std::string, int>::iterator iter = _cache_map.find(hash);
    int cache_no = 0;
    if (iter != _cache_map.end()) {
      cache_no = iter->second;}
    else {
      logger.msg(ERROR, "Error: Cache not found for file %s", cache_file);
      return false;
    }

    // choose cache
    struct CacheParameters cache_params = _caches[cache_no];
    std::string hard_link_path = cache_params.cache_path + "/" + CACHE_JOB_DIR + "/" +_id;
    std::string cache_link_path = cache_params.cache_link_path;

    // check if cached file is a symlink - if so get link path from the remote cache
    if (S_ISLNK(fileStat.st_mode)) {
      char link_target_buf[1024];
      int link_size = readlink(cache_file.c_str(), link_target_buf, sizeof(link_target_buf));
      if (link_size == -1) {
        logger.msg(ERROR, "Could not read target of link %s: %s", cache_file, strerror(errno));
        return false;
      }
      // need to match the symlink target against the list of remote caches
      std::string link_target(link_target_buf); link_target.resize(link_size);
      for (std::vector<struct CacheParameters>::iterator it = _remote_caches.begin(); it != _remote_caches.end(); it++) {
        std::string remote_data_dir = it->cache_path+"/"+CACHE_DATA_DIR;
        if (link_target.find(remote_data_dir) == 0) {
          hard_link_path = it->cache_path+"/"+CACHE_JOB_DIR + "/" + _id;
          cache_link_path = it->cache_link_path;
          cache_file = link_target;
          break;
        }
      }
      if (hard_link_path == cache_params.cache_path + "/" + CACHE_JOB_DIR + "/" +_id) {
        logger.msg(ERROR, "Couldn't match link target %s to any remote cache", link_target);
        return false;
      }
    }

    // if _cache_link_path is '.' then copy instead, bypassing the hard-link
    if (cache_params.cache_link_path == ".")
      return Copy(link_path, url);

    // create per-job hard link dir if necessary, making the final dir readable only by the job user
    if (!_cacheMkDir(hard_link_path, true)) {
      logger.msg(ERROR, "Cannot create directory \"%s\" for per-job hard links", hard_link_path);
      return false;
    }
    if (chown(hard_link_path.c_str(), _uid, _gid) != 0) {
      logger.msg(ERROR, "Cannot change owner of %s", hard_link_path);
      return false;
    }
    if (chmod(hard_link_path.c_str(), S_IRWXU) != 0) {
      logger.msg(ERROR, "Cannot change permissions of \"%s\" to 0700", hard_link_path);
      return false;
    }

    std::string filename = link_path.substr(link_path.rfind("/") + 1);
    std::string hard_link_file = hard_link_path + "/" + filename;
    std::string session_dir = link_path.substr(0, link_path.rfind("/"));

    // make the hard link
    if (link(cache_file.c_str(), hard_link_file.c_str()) != 0) {
      logger.msg(ERROR, "Failed to create hard link from %s to %s: %s", hard_link_file, cache_file, strerror(errno));
      return false;
    }
    // ensure the hard link is readable by all and owned by root (or GM user)
    // (to make cache file immutable but readable by all)
    if (chown(hard_link_file.c_str(), getuid(), getgid()) != 0) {
      logger.msg(ERROR, "Failed to change owner of hard link to %i: %s", getuid(), strerror(errno));
      return false;
    }
    if (chmod(hard_link_file.c_str(), S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH) != 0) {
      logger.msg(ERROR, "Failed to change permissions of hard link to 0644: %s", strerror(errno));
      return false;
    }

    // make necessary dirs for the soft link
    // this probably should have already been done... somewhere...
    if (!_cacheMkDir(session_dir, true))
      return false;
    if (chown(session_dir.c_str(), _uid, _gid) != 0) {
      logger.msg(ERROR, "Failed to change owner of session dir to %i: %s", _uid, strerror(errno));
      return false;
    }
    if (chmod(session_dir.c_str(), S_IRWXU) != 0) {
      logger.msg(ERROR, "Failed to change permissions of session dir to 0700: %s", strerror(errno));
      return false;
    }

    // make the soft link, changing the target if cache_link_path is defined
    if (!cache_params.cache_link_path.empty())
      hard_link_file = cache_params.cache_link_path + "/" + CACHE_JOB_DIR + "/" + _id + "/" + filename;
    if (symlink(hard_link_file.c_str(), link_path.c_str()) != 0) {
      logger.msg(ERROR, "Failed to create soft link: %s", strerror(errno));
      return false;
    }

    // change the owner of the soft link to the job user
    if (lchown(link_path.c_str(), _uid, _gid) != 0) {
      logger.msg(ERROR, "Failed to change owner of session dir to %i: %s", _uid, strerror(errno));
      return false;
    }
    return true;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

Arc::FileCache::operator bool ( void  ) [inline]

Returns true if object is useable.

Definition at line 373 of file FileCache.h.

                    {
      return (_caches.size() != 0);
    };
bool Arc::FileCache::operator== ( const FileCache a)

Return true if all attributes are equal.

Definition at line 1197 of file FileCache.cpp.

                                               {
    if (a._caches.size() != _caches.size())
      return false;
    for (int i = 0; i < (int)a._caches.size(); i++) {
      if (a._caches.at(i).cache_path != _caches.at(i).cache_path)
        return false;
      if (a._caches.at(i).cache_link_path != _caches.at(i).cache_link_path)
        return false;
    }
    return (
             a._id == _id &&
             a._uid == _uid &&
             a._gid == _gid
             );
  }

Release claims on input files for the job specified by id.

For each cache directory the per-job directory with the hard-links will be deleted.

Definition at line 921 of file FileCache.cpp.

                          {

    // go through all caches (including remote caches and draining caches)
    // and remove per-job dirs for our job id
    std::vector<std::string> job_dirs;
    for (int i = 0; i < (int)_caches.size(); i++)
      job_dirs.push_back(_caches[i].cache_path + "/" + CACHE_JOB_DIR + "/" + _id);
    for (int i = 0; i < (int)_remote_caches.size(); i++)
      job_dirs.push_back(_remote_caches[i].cache_path + "/" + CACHE_JOB_DIR + "/" + _id);
    for (int i = 0; i < (int)_draining_caches.size(); i++)
      job_dirs.push_back(_draining_caches[i].cache_path + "/" + CACHE_JOB_DIR + "/" + _id); 

    for (int i = 0; i < (int)job_dirs.size(); i++) {
      std::string job_dir = job_dirs[i];
      // check if job dir exists
      DIR *dirp = opendir(job_dir.c_str());
      if (dirp == NULL) {
        if (errno == ENOENT)
          continue;
        logger.msg(ERROR, "Error opening per-job dir %s: %s", job_dir, strerror(errno));
        return false;
      }

      // list all files in the dir and delete them
      struct dirent *dp;
      errno = 0;
      while ((dp = readdir(dirp))) {
        if (strcmp(dp->d_name, ".") == 0 || strcmp(dp->d_name, "..") == 0)
          continue;
        std::string to_delete = job_dir + "/" + dp->d_name;
        logger.msg(VERBOSE, "Removing %s", to_delete);
        if (remove(to_delete.c_str()) != 0) {
          logger.msg(ERROR, "Failed to remove hard link %s: %s", to_delete, strerror(errno));
          closedir(dirp);
          return false;
        }
      }
      closedir(dirp);

      if (errno != 0) {
        logger.msg(ERROR, "Error listing dir %s: %s", job_dir, strerror(errno));
        return false;
      }

      // remove now-empty dir
      logger.msg(VERBOSE, "Removing %s", job_dir);
      if (rmdir(job_dir.c_str()) != 0) {
        logger.msg(ERROR, "Failed to remove cache per-job dir %s: %s", job_dir, strerror(errno));
        return false;
      }
    }
    return true;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

bool Arc::FileCache::SetValid ( std::string  url,
Time  val 
)

Set expiry time.

Parameters:
urlthe url corresponding to the cache file for which we want to set the expiry time
valexpiry time

Definition at line 1182 of file FileCache.cpp.

                                                  {

    std::string meta_file = _getMetaFileName(url);
    FILE *pFile;
    pFile = fopen((char*)meta_file.c_str(), "w");
    if (pFile == NULL) {
      logger.msg(ERROR, "Error opening meta file %s: %s", meta_file, strerror(errno));
      return false;
    }
    std::string file_data = url + " " + val.str(MDSTime);
    fputs((char*)file_data.c_str(), pFile);
    fclose(pFile);
    return true;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

bool Arc::FileCache::Start ( std::string  url,
bool &  available,
bool &  is_locked,
bool  use_remote = true 
)

Prepare cache for downloading file, and lock the cached file.

On success returns true. If there is another process downloading the same url, false is returned and is_locked is set to true. In this case the client should wait and retry later. If the lock has expired this process will take over the lock and the method will return as if no lock was present, ie available and is_locked are false.

Parameters:
urlurl that is being downloaded
availabletrue on exit if the file is already in cache
is_lockedtrue on exit if the file is already locked, ie cannot be used by this process
remote_cachesSame format as caches. These are the paths to caches which are under the control of other Grid Managers and are read-only for this process.

Definition at line 185 of file FileCache.cpp.

                                                                                        {

    if (!(*this))
      return false;

    available = false;
    is_locked = false;
    std::string filename = File(url);
    std::string lock_file = _getLockFileName(url);

    // create directory structure if required, only readable by GM user
    if (!_cacheMkDir(lock_file.substr(0, lock_file.rfind("/")), false))
      return false;

    int lock_timeout = 86400; // one day timeout on lock TODO: make configurable?

    // locking mechanism:
    // - check if lock is there
    // - if not, create tmp file and check again
    // - if lock is still not there copy tmp file to cache lock file
    // - check pid inside lock file matches ours

    struct stat fileStat;
    int err = stat(lock_file.c_str(), &fileStat);
    if (0 != err) {
      if (errno == EACCES) {
        logger.msg(ERROR, "EACCES Error opening lock file %s: %s", lock_file, strerror(errno));
        return false;
      }
      else if (errno != ENOENT) {
        // some other error occurred opening the lock file
        logger.msg(ERROR, "Error opening lock file %s in initial check: %s", lock_file, strerror(errno));
        return false;
      }
      // lock does not exist - create tmp file
      std::string tmpfile = lock_file + ".XXXXXX";
      int h = Glib::mkstemp(tmpfile);
      if (h == -1) {
        logger.msg(ERROR, "Error creating file %s with mkstemp(): %s", tmpfile, strerror(errno));
        return false;
      }
      // write pid@hostname to the lock file
      std::string buf = _pid + "@" + _hostname;
      if (write(h, buf.c_str(), buf.length()) == -1) {
        logger.msg(ERROR, "Error writing to tmp lock file %s: %s", tmpfile, strerror(errno));
        // not much we can do if this doesn't work, but it is only a tmp file
        remove(tmpfile.c_str());
        close(h);
        return false;
      }
      if (close(h) != 0)
        // not critical as file will be removed after we are done
        logger.msg(WARNING, "Warning: closing tmp lock file %s failed", tmpfile);
      // check again if lock exists, in case creating the tmp file took some time
      err = stat(lock_file.c_str(), &fileStat);
      if (0 != err) {
        if (errno == ENOENT) {
          // ok, we can create lock
          if (rename(tmpfile.c_str(), lock_file.c_str()) != 0) {
            logger.msg(ERROR, "Error renaming tmp file %s to lock file %s: %s", tmpfile, lock_file, strerror(errno));
            remove(tmpfile.c_str());
            return false;
          }
          // check it's really there
          err = stat(lock_file.c_str(), &fileStat);
          if (0 != err) {
            logger.msg(ERROR, "Error renaming lock file, even though rename() did not return an error");
            return false;
          }
          // check the pid inside the lock file, just in case...
          if (!_checkLock(url)) {
            is_locked = true;
            return false;
          }
        }
        else if (errno == EACCES) {
          logger.msg(ERROR, "EACCES Error opening lock file %s: %s", lock_file, strerror(errno));
          remove(tmpfile.c_str());
          return false;
        }
        else {
          // some other error occurred opening the lock file
          logger.msg(ERROR, "Error opening lock file we just renamed successfully %s: %s", lock_file, strerror(errno));
          remove(tmpfile.c_str());
          return false;
        }
      }
      else {
        logger.msg(VERBOSE, "The file is currently locked with a valid lock");
        remove(tmpfile.c_str());
        is_locked = true;
        return false;
      }
    }
    else {
      // the lock already exists, check if it has expired
      // look at modification time
      time_t mod_time = fileStat.st_mtime;
      time_t now = time(NULL);
      logger.msg(VERBOSE, "%li seconds since lock file was created", now - mod_time);

      if ((now - mod_time) > lock_timeout) {
        logger.msg(VERBOSE, "Timeout has expired, will remove lock file");
        // TODO: kill the process holding the lock, only if we know it was the original
        // process which created it
        if (remove(lock_file.c_str()) != 0 && errno != ENOENT) {
          logger.msg(ERROR, "Failed to unlock file %s: %s", lock_file, strerror(errno));
          return false;
        }
        // lock has expired and has been removed. Try to remove cache file and call Start() again
        if (remove(filename.c_str()) != 0 && errno != ENOENT) {
          logger.msg(ERROR, "Error removing cache file %s: %s", filename, strerror(errno));
          return false;
        }
        return Start(url, available, is_locked, use_remote);
      }

      // lock is still valid, check if we own it
      FILE *pFile;
      char lock_info[100]; // should be long enough for a pid + hostname
      pFile = fopen((char*)lock_file.c_str(), "r");
      if (pFile == NULL) {
        // lock could have been released by another process, so call Start again
        if (errno == ENOENT) {
          logger.msg(VERBOSE, "Lock that recently existed has been deleted by another process, calling Start() again");
          return Start(url, available, is_locked, use_remote);
        }
        logger.msg(ERROR, "Error opening valid and existing lock file %s: %s", lock_file, strerror(errno));
        return false;
      }
      if (fgets(lock_info, 100, pFile) == NULL) {
        logger.msg(ERROR, "Error reading valid and existing lock file %s: %s", lock_file, strerror(errno));
        fclose(pFile);
        return false;
      }
      fclose(pFile);

      std::string lock_info_s(lock_info);
      std::string::size_type index = lock_info_s.find("@", 0);
      if (index == std::string::npos) {
        logger.msg(ERROR, "Error with formatting in lock file %s: %s", lock_file, lock_info_s);
        return false;
      }

      if (lock_info_s.substr(index + 1) != _hostname) {
        logger.msg(VERBOSE, "Lock is owned by a different host");
        // TODO: here do ssh login and check
        is_locked = true;
        return false;
      }
      std::string lock_pid = lock_info_s.substr(0, index);
      if (lock_pid == _pid)
        // safer to wait until lock expires than use cached file or re-download
        logger.msg(WARNING, "Warning: This process already owns the lock");
      else {
        // check if the pid owning the lock is still running - if not we can claim the lock
        // this is not really portable... but no other way to do it
        std::string procdir("/proc/");
        procdir = procdir.append(lock_pid);
        if (stat(procdir.c_str(), &fileStat) != 0 && errno == ENOENT) {
          logger.msg(VERBOSE, "The process owning the lock is no longer running, will remove lock");
          if (remove(lock_file.c_str()) != 0) {
            logger.msg(ERROR, "Failed to unlock file %s: %s", lock_file, strerror(errno));
            return false;
          }
          // lock has been removed. try to delete cache file and call Start() again
          if (remove(filename.c_str()) != 0 && errno != ENOENT) {
            logger.msg(ERROR, "Error removing cache file %s: %s", filename, strerror(errno));
            return false;
          }
          return Start(url, available, is_locked, use_remote);
        }
      }

      logger.msg(VERBOSE, "The file is currently locked with a valid lock");
      is_locked = true;
      return false;
    }

    // if we get to here we have acquired the lock

    // create the meta file to store the URL, if it does not exist
    std::string meta_file = _getMetaFileName(url);
    err = stat(meta_file.c_str(), &fileStat);
    if (0 == err) {
      // check URL inside file for possible hash collisions
      FILE *pFile;
      char mystring[fileStat.st_size+1];
      pFile = fopen((char*)_getMetaFileName(url).c_str(), "r");
      if (pFile == NULL) {
        logger.msg(ERROR, "Error opening meta file %s: %s", _getMetaFileName(url), strerror(errno));
        remove(lock_file.c_str());
        return false;
      }
      if (fgets(mystring, sizeof(mystring), pFile) == NULL) {
        logger.msg(ERROR, "Error reading valid and existing lock file %s: %s", lock_file, strerror(errno));
        fclose(pFile);
        return false;
      }
      fclose(pFile);

      std::string meta_str(mystring);
      // get the first line
      if (meta_str.find('\n') != std::string::npos)
        meta_str.resize(meta_str.find('\n'));

      std::string::size_type space_pos = meta_str.find(' ', 0);
      if (meta_str.substr(0, space_pos) != url) {
        logger.msg(ERROR, "Error: File %s is already cached at %s under a different URL: %s - this file will not be cached", url, filename, meta_str.substr(0, space_pos));
        remove(lock_file.c_str());
        return false;
      }
    }
    else if (errno == ENOENT) {
      // create new file
      FILE *pFile;
      pFile = fopen((char*)meta_file.c_str(), "w");
      if (pFile == NULL) {
        logger.msg(ERROR, "Failed to create info file %s: %s", meta_file, strerror(errno));
        remove(lock_file.c_str());
        return false;
      }
      fputs((char*)url.c_str(), pFile);
      fputs("\n", pFile);
      fclose(pFile);
      // make read/writeable only by GM user
      chmod(meta_file.c_str(), S_IRUSR | S_IWUSR);
    }
    else {
      logger.msg(ERROR, "Error looking up attributes of meta file %s: %s", meta_file, strerror(errno));
      remove(lock_file.c_str());
      return false;
    }
    // now check if the cache file is there already
    err = stat(filename.c_str(), &fileStat);
    if (0 == err)
      available = true;
      
    // if the file is not there. check remote caches
    else if (errno == ENOENT) {
      if (!use_remote) return true;    
      // get the hash of the url
      std::string hash = FileCacheHash::getHash(url);
    
      int index = 0;
      for(int level = 0; level < CACHE_DIR_LEVELS; level ++) {
        hash.insert(index + CACHE_DIR_LENGTH, "/");
        // go to next slash position, add one since we just inserted a slash
        index += CACHE_DIR_LENGTH + 1;
      }
      std::string remote_cache_file;
      std::string remote_cache_link;
      for (std::vector<struct CacheParameters>::iterator it = _remote_caches.begin(); it != _remote_caches.end(); it++) {
        std::string remote_file = it->cache_path+"/"+CACHE_DATA_DIR+"/"+hash;
        if (stat(remote_file.c_str(), &fileStat) == 0) {
          remote_cache_file = remote_file;
          remote_cache_link = it->cache_link_path;
          break;
        }
      }
      if (remote_cache_file.empty()) return true;
      
      logger.msg(INFO, "Found file %s in remote cache at %s", url, remote_cache_file);
      // if found, create lock file in remote cache
      std::string remote_lock_file = remote_cache_file+".lock";
      err = stat( remote_lock_file.c_str(), &fileStat );
      // if lock exists, exit
      if (0 == err) {
        logger.msg(VERBOSE, "File exists in remote cache at %s but is locked. Will download from source", remote_cache_file);
        return true;
      }
    
      // lock does not exist - create tmp file
      std::string remote_tmpfile = remote_lock_file + ".XXXXXX";
      int h = Glib::mkstemp(remote_tmpfile);
      if (h == -1) {
        logger.msg(WARNING, "Error creating tmp file %s for remote lock with mkstemp(): %s", remote_tmpfile, strerror(errno));
        return true;
      }
      // write pid@hostname to the lock file
      std::string buf2 = _pid + "@" + _hostname;
      if (write(h, buf2.c_str(), buf2.length()) == -1) {
        logger.msg(WARNING, "Error writing to tmp lock file for remote lock %s: %s", remote_tmpfile, strerror(errno));
        // not much we can do if this doesn't work, but it is only a tmp file
        remove(remote_tmpfile.c_str());
        close(h);
        return true;
      }
      if (close(h) != 0) {
        // not critical as file will be removed after we are done
        logger.msg(WARNING, "Warning: closing tmp lock file for remote lock %s failed", remote_tmpfile);
      }
      // check again if lock exists, in case creating the tmp file took some time
      err = stat( remote_lock_file.c_str(), &fileStat ); 
      if (0 != err) {
        if (errno == ENOENT) {
          // ok, we can create lock
          if (rename(remote_tmpfile.c_str(), remote_lock_file.c_str()) != 0) {
            logger.msg(WARNING, "Error renaming tmp file %s to lock file %s for remote lock: %s", remote_tmpfile, remote_lock_file, strerror(errno));
            remove(remote_tmpfile.c_str());
            return true;
          }
          // check it's really there
          err = stat( remote_lock_file.c_str(), &fileStat ); 
          if (0 != err) {
            logger.msg(WARNING, "Error renaming lock file for remote lock, even though rename() did not return an error: %s", strerror(errno));
            return true;
          }
        }
        else {
          // some error occurred opening the lock file
          logger.msg(WARNING, "Error opening lock file for remote lock we just renamed successfully %s: %s", remote_lock_file, strerror(errno));
          remove(remote_tmpfile.c_str());
          return true;
        }
      }
      else {
        logger.msg(VERBOSE, "The remote cache file is currently locked with a valid lock, will download from source");
        remove(remote_tmpfile.c_str());
        return true;
      }
      
      // we have locked the remote file - so find out what to do with it
      if (remote_cache_link == "replicate") {
        // copy the file to the local cache, remove remote lock and exit with available=true
        logger.msg(VERBOSE, "Replicating file %s to local cache file %s", remote_cache_file, filename);
          // do the copy - taken directly from old datacache.cc
        char copybuf[65536];
        int fdest = open(filename.c_str(), O_WRONLY | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
        if(fdest == -1) {
          logger.msg(ERROR, "Failed to create file %s for writing: %s",filename, strerror(errno));
          return false;
        };
        
        int fsource = open(remote_cache_file.c_str(), O_RDONLY);
        if(fsource == -1) {
          close(fdest);
          logger.msg(ERROR, "Failed to open file %s for reading: %s", remote_cache_file, strerror(errno));
          return false;
        };
        
        // source and dest opened ok - copy in chunks
        for(;;) {
          ssize_t lin = read(fsource, copybuf, sizeof(copybuf));
          if(lin == -1) {
            close(fdest); close(fsource);
            logger.msg(ERROR, "Failed to read file %s: %s", remote_cache_file, strerror(errno));
            return false;
          };
          if(lin == 0) break; // eof
          
          for(ssize_t lout = 0; lout < lin;) {
            ssize_t lwritten = write(fdest, copybuf+lout, lin-lout);
            if(lwritten == -1) {
              close(fdest); close(fsource);
              logger.msg(ERROR, "Failed to write file %s: %s", filename, strerror(errno));
              return false;
            };
            lout += lwritten;
          };
        };
        close(fdest); close(fsource);
        if (remove(remote_lock_file.c_str()) != 0) {
          logger.msg(ERROR, "Failed to remove remote lock file %s: %s. Some manual intervention may be required", remote_lock_file, strerror(errno));
          return true;
        }
      }
      // create symlink from file in this cache to other cache
      else {
        logger.msg(VERBOSE, "Creating temporary link from %s to remote cache file %s", filename, remote_cache_file);
        if (symlink(remote_cache_file.c_str(), filename.c_str()) != 0) {
          logger.msg(ERROR, "Failed to create soft link to remote cache: %s Will download %s from source", strerror(errno), url);
          if (remove(remote_lock_file.c_str()) != 0) {
            logger.msg(ERROR, "Failed to remove remote lock file %s: %s Some manual intervention may be required", remote_lock_file, strerror(errno));
          }
          return true;
        }
      }
      available = true;
    }
    else {
      // this is ok, we will download again
      logger.msg(WARNING, "Warning: error looking up attributes of cached file: %s", strerror(errno));
    }
    return true;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

bool Arc::FileCache::Stop ( std::string  url)

This method (or stopAndDelete) must be called after file was downloaded or download failed, to release the lock on the cache file.

Stop() does not delete the cache file. It returns false if the lock file does not exist, or another pid was found inside the lock file (this means another process took over the lock so this process must go back to Start()), or if it fails to delete the lock file.

Parameters:
urlthe url of the file that was downloaded

Definition at line 572 of file FileCache.cpp.

                                    {

    if (!(*this))
      return false;

    // if cache file is a symlink, remove remote cache lock and symlink
    std::string filename = File(url);
    struct stat fileStat;
    if (lstat(filename.c_str(), &fileStat) == 0 && S_ISLNK(fileStat.st_mode)) {
      char buf[1024];
      int link_size = readlink(filename.c_str(), buf, sizeof(buf));
      if (link_size == -1) {
        logger.msg(ERROR, "Could not read target of link %s: %s. Manual intervention may be required to remove lock in remote cache", filename, strerror(errno));
        return false;
      }
      std::string remote_lock(buf); remote_lock.resize(link_size); remote_lock += ".lock";
      if (remove(remote_lock.c_str()) != 0 && errno != ENOENT) {
        logger.msg(ERROR, "Failed to unlock remote cache lock %s: %s. Manual intervention may be required", remote_lock, strerror(errno));
        return false;
      }
      if (remove(filename.c_str()) != 0) {
        logger.msg(ERROR, "Error removing file %s: %s. Manual intervention may be required", filename, strerror(errno));
        return false;
      }
    }
    
     // check the lock is ok to delete
    if (!_checkLock(url))
      return false;

    // delete the lock
    if (remove(_getLockFileName(url).c_str()) != 0) {
      logger.msg(ERROR, "Failed to unlock file with lock %s: %s", _getLockFileName(url), strerror(errno));
      return false;
    }
    // get the hash of the url
    std::string hash = FileCacheHash::getHash(url);
    int index = 0;
    for(int level = 0; level < CACHE_DIR_LEVELS; level ++) {
      hash.insert(index + CACHE_DIR_LENGTH, "/");
      // go to next slash position, add one since we just inserted a slash
      index += CACHE_DIR_LENGTH + 1;
    }
    
    // remove the file from the cache map
    _cache_map.erase(hash);
    return true;
  }

Here is the call graph for this function:

Here is the caller graph for this function:

bool Arc::FileCache::StopAndDelete ( std::string  url)

Release the cache file and delete it, because for example a failed download left an incomplete copy, or it has expired.

This method also deletes the meta file which contains the url corresponding to the cache file. The logic of the return value is the same as Stop().

Parameters:
urlthe url corresponding to the cache file that has to be released and deleted

Definition at line 621 of file FileCache.cpp.

                                             {

    if (!(*this))
      return false;
    
    // if cache file is a symlink, remove remote cache lock
    std::string filename = File(url);
    struct stat fileStat;
    if (lstat(filename.c_str(), &fileStat) == 0 && S_ISLNK(fileStat.st_mode)) {
      char buf[1024];
      int link_size = readlink(filename.c_str(), buf, sizeof(buf));
      if (link_size == -1) {
        logger.msg(ERROR, "Could not read target of link %s: %s. Manual intervention may be required to remove lock in remote cache", filename, strerror(errno));
        return false;
      }
      std::string remote_lock(buf); remote_lock.resize(link_size); remote_lock += ".lock";
      if (remove(remote_lock.c_str()) != 0 && errno != ENOENT) {
        logger.msg(ERROR, "Failed to unlock remote cache lock %s: %s. Manual intervention may be required", remote_lock, strerror(errno));
        return false;
      }
    }

    // check the lock is ok to delete, and if so, remove the file and the
    // associated lock
    if (!_checkLock(url))
      return false;

    // delete the cache file
    if (remove(filename.c_str()) != 0 && errno != ENOENT) {
      logger.msg(ERROR, "Error removing cache file %s: %s", filename, strerror(errno));
      return false;
    }

    // delete the meta file - not critical so don't fail on error
    if (remove(_getMetaFileName(url).c_str()) != 0)
      logger.msg(ERROR, "Failed to unlock file with lock %s: %s", _getLockFileName(url), strerror(errno));

    // delete the lock
    if (remove(_getLockFileName(url).c_str()) != 0) {
      logger.msg(ERROR, "Failed to unlock file with lock %s: %s", _getLockFileName(url), strerror(errno));
      return false;
    }
    
    // get the hash of the url
    std::string hash = FileCacheHash::getHash(url);
    int index = 0;
    for(int level = 0; level < CACHE_DIR_LEVELS; level ++) {
      hash.insert(index + CACHE_DIR_LENGTH, "/");
      // go to next slash position, add one since we just inserted a slash
      index += CACHE_DIR_LENGTH + 1;
    }
  
    // remove the file from the cache map
    _cache_map.erase(hash);
    return true;
  }

Here is the call graph for this function:

Here is the caller graph for this function:


Member Data Documentation

std::map<std::string, int> Arc::FileCache::_cache_map [private]

Map of the cache files and the cache directories.

Definition at line 70 of file FileCache.h.

std::vector<struct CacheParameters> Arc::FileCache::_caches [private]

Vector of caches.

Each entry defines a cache and specifies a cache directory and optional link path.

Definition at line 75 of file FileCache.h.

std::vector<struct CacheParameters> Arc::FileCache::_draining_caches [private]

Vector of caches to be drained.

Definition at line 84 of file FileCache.h.

gid_t Arc::FileCache::_gid [private]

Definition at line 94 of file FileCache.h.

std::string Arc::FileCache::_hostname [private]

Our hostname (same as given by uname -n)

Definition at line 98 of file FileCache.h.

std::string Arc::FileCache::_id [private]

identifier used to claim files, ie the job id

Definition at line 88 of file FileCache.h.

The max and min used space for caches, as a percentage of the file system.

Definition at line 107 of file FileCache.h.

Definition at line 108 of file FileCache.h.

std::string Arc::FileCache::_pid [private]

Our pid.

Definition at line 102 of file FileCache.h.

std::vector<struct CacheParameters> Arc::FileCache::_remote_caches [private]

Vector of remote caches.

Each entry defines a cache and specifies a cache directory, per-job directory and link/copy information.

Definition at line 80 of file FileCache.h.

uid_t Arc::FileCache::_uid [private]

owner:group corresponding to the user running the job.

The directory with hard links to cached files will be searchable only by this user

Definition at line 93 of file FileCache.h.

const std::string Arc::FileCache::CACHE_DATA_DIR = "data" [static, private]

The sub-dir of the cache for data.

Definition at line 112 of file FileCache.h.

const int Arc::FileCache::CACHE_DEFAULT_AUTH_VALIDITY = 86400 [static, private]

Default validity time of cached DNs.

Definition at line 136 of file FileCache.h.

const int Arc::FileCache::CACHE_DIR_LENGTH = 2 [static, private]

The length of each cache subdirectory.

Definition at line 120 of file FileCache.h.

const int Arc::FileCache::CACHE_DIR_LEVELS = 1 [static, private]

The number of levels of cache subdirectories.

Definition at line 124 of file FileCache.h.

const std::string Arc::FileCache::CACHE_JOB_DIR = "joblinks" [static, private]

The sub-dir of the cache for per-job links.

Definition at line 116 of file FileCache.h.

const std::string Arc::FileCache::CACHE_LOCK_SUFFIX = ".lock" [static, private]

The suffix to use for lock files.

Definition at line 128 of file FileCache.h.

const std::string Arc::FileCache::CACHE_META_SUFFIX = ".meta" [static, private]

The suffix to use for meta files.

Definition at line 132 of file FileCache.h.

Logger Arc::FileCache::logger [static, private]

Logger for messages.

Definition at line 181 of file FileCache.h.


The documentation for this class was generated from the following files: