MegaMap
A simple, unbounded hashtable for Java
Please be sure to read the Tutorial section before reading this section.

How MegaMap Works
MegaMap employs the following data structure to achieve its ends:

  • A HashSet is used to store all keys for easy access.
  • Each (key, value) pair is inserted in to a ReferenceMap. This special map implementation uses references to allow the garbage collector to reclaim elements as necessary.
  • Each (key, value) pair is also persisted to a disk cache with Ehcache.
The result is a simple look-up algorithm: first the ReferenceMap is checked to see if the object resides in main memory. If it does not, then the object is pulled from the disk cache (after first checking the HashSet to ensure that the disk cache search will not be a waste of effort).

Each MegaMap has a background thread associated with it. This thread handles the task of storing the elements on disk. This ensures that all disk writes happen in the background; that is, the application will not block on cache writes.

The MegaMapManager keeps a directory of all MegaMaps that have been created. This is done to ensure that two MegaMaps with the same name aren't created. This could result in data files being overwritten. It is also done to allow a single method call to shut down all MegaMaps cleanly.

Persistent MegaMaps
MegaMaps can be configured to persist between virtual machine invocations, resulting in a persistent hashtable implementation. However, there is a major caveat that the developer must be aware of before using this feature:

If the virtual machine is not shut down cleanly, then there is a chance that the MegaMap will become corrupted and be unusable.

For this reason, MegaMap should not be used to persist data of lasting value.

There are three ways to ensure that a MegaMap is cleanly shut down. The first is to call MegaMapManager.removeMegaMap(String name). This will ensure that the map is properly persisted. The second is to call MegaMapManager.shutdown(). This shuts down all MegaMaps and ensures that they are properly persisted. The final method is to call java.lang.System.runFinalizersOnExit(true). This will ensure that finalizer is called on all objects before the virtual machine exists. This will include MegaMapManager's finalizer, which will in turn call MegaMapManager.shutdown(). However, System.runFinalizersOnExit has been a deprecated method since Java 1.2. Furthermore, it will not guarantee that finalizers are called if the virtual machine aborts abnormally.

Performance Considerations
This section qualitatively examines the performance of the methods in the MegaMap class.

  • MegaMap.put(Serializable key, Serializable value)
    This results in the placement of the key and value in the ReferenceMap, and the key into the HashSet. It also creates an action notification for the background thread. The background thread will then index the object and serialize it to disk. The immediate performance of this method is quick. Placement of a very large object will eventually result in heavy disk activity, however.
  • MegaMap.get(Serializable key)
    If the value has not been garbage collected, then the get operation will be very quick. Otherwise, it must fetch the object from disk and deserialize it, which will be expensive. It is difficult to make assumptions about the behavior of the garbage collector. For example, it is not necessarily the case that it will have behaivior similar to a "Least Recently Used" algorithm, nor is it a safe assumption that small objects are unlikely to be garbage collected. A future version of MegaMap may attempt to address the unreliability of this method's performance.
  • MegaMap.remove(Serializable key)
    The performance of this method is similar to that of put, only in this case, the background disk activity will be as a result of a delete operation.
  • MegaMap.hasKey(Serializable key)
    This method only uses the in-memory HashSet to determine if a key is present in the data structure. So, its performance is guaranteed to be fast. It should always be used to check the existence of an object in the MegaMap rather than comparing the result of a get to null.
  • MegaMap.getKeys()
    This method returns a set of all the keys. It makes a copy of the original HashSet so as to prevent corruption of its data. So, the time it takes is proportional to the number of keys in the MegaMap.

Credits
MegaMap was developed by John Watkinson.

MegaMap uses Ehcache for its disk storage, developed by Greg Luck and contributors. A list of contributors to Ehcache is available here.

MegaMap also uses Apache Commons Collections and Apache Commons Logging.

The project is hosted by SourceForge.

SourceForge.net Logo