relstorage/relstorage/tests/blob/blob_cache.test

Caching of blob data
====================

(This test is adapted from ZEO/tests/zeo_blob_cache.test in ZODB3.)

We support 2 modes for providing clients access to blob data:

shared
    Blob data are shared via a network file system.  The client shares
    a common blob directory with the server.

non-shared
    Blob data are loaded from the database and cached locally.
    A maximum size for the blob data can be set and data are removed
    when the size is exceeded.

In this test, we'll demonstrate that blobs data are removed from a blob
cache when the amount of data stored exceeds a given limit.

Let's start by setting up some data:

    >>> blob_storage = create_storage(blob_dir='blobs',
    ...     blob_cache_size=3000, blob_cache_size_check=10)
    >>> from ZODB.DB import DB
    >>> db = DB(blob_storage)

Here, we passed a blob_cache_size parameter, which specifies a target
blob cache size.  This is not a hard limit, but rather a target.  It
defaults to no limit. We also passed a blob_cache_size_check
option. The blob_cache_size_check option specifies the number of
bytes, as a percent of the target that can be written or downloaded
from the server before the cache size is checked. The
blob_cache_size_check option defaults to 100. We passed 10, to check
after writing 10% of the target size.

.. We're going to wait for any threads we started to finish, so...

   >>> import threading
   >>> old_threads = list(threading.enumerate())

We want to check for name collisions in the blob cache dir. We'll try
to provoke name collisions by reducing the number of cache directory
subdirectories.

    >>> import relstorage.blobhelper
    >>> orig_blob_cache_layout_size = relstorage.blobhelper.BlobCacheLayout.size
    >>> relstorage.blobhelper.BlobCacheLayout.size = 11

Now, let's write some data:

    >>> import ZODB.blob, transaction, time
    >>> conn = db.open()
    >>> for i in range(1, 101):
    ...     conn.root()[i] = ZODB.blob.Blob()
    ...     with conn.root()[i].open('w') as f: _ = f.write((str(i)*100).encode('latin-1'))
    >>> transaction.commit()

We've committed 10000 bytes of data, but our target size is 3000.  We
expect to have not much more than the target size in the cache blob
directory.

    >>> import os
    >>> def cache_size(d):
    ...     size = 0
    ...     for base, dirs, files in os.walk(d):
    ...         for f in files:
    ...             if f.endswith('.blob'):
    ...                 try:
    ...                     size += os.stat(os.path.join(base, f)).st_size
    ...                 except OSError:
    ...                      if os.path.exists(os.path.join(base, f)):
    ...                          raise
    ...     return size

    >>> cache_size('blobs') > 2000
    True
    >>> def check():
    ...     return cache_size('blobs') < 5000
    >>> def onfail():
    ...     return cache_size('blobs')

    >>> from ZEO.tests.forker import wait_until
    >>> wait_until("size is reduced", check, 99, onfail)

If we read all of the blobs, data will be downloaded again, as
necessary, but the cache size will remain not much bigger than the
target:

    >>> for i in range(1, 101):
    ...     with conn.root()[i].open() as f: data = f.read()
    ...     if data != (str(i)*100).encode('ascii'):
    ...         print( 'bad data', str(i), data)

    >>> wait_until("size is reduced", check, 99, onfail)

    >>> for i in range(1, 101):
    ...     with conn.root()[i].open() as f: data = f.read()
    ...     if data != (str(i)*100).encode('ascii'):
    ...         print( 'bad data', str(i), data)

    >>> for i in range(1, 101):
    ...     with conn.root()[i].open('c') as f: data = f.read()
    ...     if data != (str(i)*100).encode('ascii'):
    ...         print( 'bad data', str(i), data)

    >>> wait_until("size is reduced", check, 99, onfail)

Now let see if we can stress things a bit.  We'll create many clients
and get them to pound on the blobs all at once to see if we can
provoke problems:

    >>> import threading, random
    >>> def run():
    ...     conn = db.open()
    ...     for i in range(300):
    ...         time.sleep(0)
    ...         i = random.randint(1, 100)
    ...         with conn.root()[i].open() as f: data = f.read()
    ...         if data != (str(i)*100).encode('ascii'):
    ...             print( 'bad data', str(i), data)
    ...         i = random.randint(1, 100)
    ...         with conn.root()[i].open('c') as f: data = f.read()
    ...         if data != (str(i)*100).encode('ascii'):
    ...             print( 'bad data', str(i), data)
    ...     conn.close()

    >>> threads = [threading.Thread(target=run) for i in range(10)]
    >>> for thread in threads:
    ...     thread.setDaemon(True)
    >>> for thread in threads:
    ...     thread.start()
    >>> for thread in threads:
    ...     thread.join(99)
    ...     if thread.isAlive():
    ...        print( "Can't join thread.")

    >>> wait_until("size is reduced", check, 99, onfail)

.. cleanup

    >>> for thread in threading.enumerate():
    ...     if thread not in old_threads:
    ...        thread.join(33)

    >>> db.close()
    >>> relstorage.blobhelper.BlobCacheLayout.size = orig_blob_cache_layout_size