Post the First: Turbocharging MySQL Queries With Memcached

This is a function which I created that leverages¬†memcached to perform automatic caching of MySQL queries. This assumes that you have created functions memcache_get, memcache_set, and memcache_delete — personally, I just have these as wrapper functions for the native memcached functions, with some extra bells and whistles (attempts to connect to memcached server if not already connected, that sort of thing).

This is not quite a drop-in replacement for mysql_query, but it’s close — instead of using mysql_query() and then while (mysql_fetch_array()), you can use memcached_query() and then a foreach(). There are, however, a few other caveats:

Caveat 1: The “is this query currently being re-cached by someone else?” check and set operations are separate. In a very high-concurrency environment, this means that multiple threads could attempt to regenerate the same query simultaneously. I believe there is a way to resolve this using Memcached::cas(), but I found the documentation for that function a bit confusing, and it’s not that important for our environment anyway. If you know a way to do this more properly, let me know!

Caveat 2: This function assumes that queries will take no longer than 5 seconds. This should be a fairly safe assumption, but if not, then the function would need to be modified.

Caveat 3: There is no way to manually invalidate the cache for a given query. If you need to do this, I suggest making a separate function to get the hash for a given query.

/**
* Looks for results of given query in cache, or runs query if not in cache.
* Stores 500 rows max (since these results are being stored in memory, we don't
* want to be storing thousands of rows).
*
* Has mechanism to prevent cache stampedes -- detects if another thread is
* already regenerating this query, and if so waits a bit for it to finish.
*
* @param string $sql_select_query The SELECT query to perform.
* @param int $cache_length Length to keep query results in cache (seconds)
* @param string $custom_key Custom memcache key to use for this query; will
* default to 'query-'.md5($sql_select_query) if not specified.
* @param int $max_timeout_ms In case of cache contention, max time to wait for
* other "thread" to finish executing
* @param string $memcache_var_name Name of the global Memcached object to use for this
* operation.
* @param string $db_link Name of the variable to pass to tep_db_query.
* @return Array of results (or -1 if not cached and timed out waiting for
* another thread to generate cache)
*/
function memcached_query($sql_select_query, $cache_length = 3600,
        $custom_key = NULL, $max_timeout_ms = 1000, $limit_results = 500,
        $memcache_var_name = 'memcache', $db_link = 'db_link') {
    
    global $$memcache_var_name;
    
    // Make sure this is a select statement; cannot cache updates/inserts
    if (strtolower(substr($sql_select_query, 0, 6)) != 'select') {
        return -1;
    }
    
    // Limit to 500 results if no limit specified
    // TODO: Account for "limit" happening to be in query string somewhere else.
    if (!stristr($sql_select_query, 'LIMIT')) {
        $sql_select_query .= ' LIMIT '.(int)$limit_results;
    }
    
    $key = $custom_key;
    if ($key === NULL) {
        $key = 'query-'.md5($sql_select_query);
    }
    
    $ret_array = memcache_get($key, true, $memcache_var);
    if ($ret_array !== false) {
        return $ret_array;
    } else {        
        
        // Check and see if anyone else is caching this
        $caching = memcache_get('query-running-'.md5($sql_select_query), true, $memcache_var);
        
        // If query not already running, or memcached disabled, run query again
        if (!is_object($$memcache_var)
            || $$memcache_var->getResultCode() == Memcached::RES_NOTFOUND
            || ($_GET['force_recache'] == 1 && customer_is_admin())
            ) {           
            
            // Set 'caching' flag, but it expires after 5s, in case this process
            // is interrupted mid-regen
            memcache_set('query-running-'.md5($sql_select_query), '1', 5);
            
            $query = tep_db_query($sql_select_query, $db_link);
            $rows = tep_db_num_rows($query);
            $ret_array = array();
            $count = 0;
            if ($rows > 0) {
                while ($count < $limit_results && $value = tep_db_fetch_array($query)) {
                    $ret_array[] = $value;
                    $count++;
                }
            }
            
            // Store result in cache
            memcache_set($key, $ret_array, $cache_length, true, $memcache_var);
            
            // Un-set "regenerating cache" flag
            memcache_delete('query-running-'.md5($sql_select_query), true, $memcache_var);
    
        } else {
            
            // Someone else already caching; retry for a bit
            $retry_count = 0;
            $step = 100000; // Try another read every 0.1s (100,000 milliseconds)
            do {
                
                usleep($step);
                $ret_array = memcache_get($key, true, $memcache_var);
                $retry_count++;
                
            }

            while ($$memcache_var->getResultCode() == Memcached::RES_NOTFOUND && $retry_count*$step/1000 < $max_timeout_ms) {};
            
            if ($$memcache_var->getResultCode() == Memcached::RES_NOTFOUND) {
                return -1;
            }
            
        }
        
        return $ret_array;
        
    }
    
    
    
}

Category(s): Code Snippets
Tags: , , , ,

2 Responses to Post the First: Turbocharging MySQL Queries With Memcached

    James Stoddern says:

    Hi, thanks for the article.

    You make the assumption that we have written a memcache_get…etc

    ” This assumes that you have created functions memcache_get, memcache_set, and memcache_delete”

    As a newbie to all of this, would you be able to provide this code as well so that it can all be incorporated into one class?

    That would be great.

    Thanks again, James

    • Those are just basic wrapper functions for the memcached::get() etc. functions. So you could just replace those with $$memcache_var->get(), $$memcache_var->set(), and so forth. In our application, we have those functions doing a few extra things, such as checking for memcache connection and attempting to create it if it doesn’t exist, prefacing keys with database names, etc. But none of that is by any means necessary.

Leave a Reply

Your email address will not be published. Required fields are marked *