Welcome to the Condensed Collection Library for .NET |
[This is preliminary documentation and is subject to change.]
The Condensed Collection Library for .NET provides the CondensedCollection class, an IList<T> implementation that uses interning (deduplication) for efficient storage of large numbers of immutable elements.
Build from source: https://github.com/markwaterman/CondensedDotNet
The CondensedCollection class provides a generalized form of interning for any immutable .NET type. Use this collection like an ordinary list, keeping the following guidelines in mind:
The type stored in the collection immutable.
The collection must be able to properly compare elements for equality.
If the contents of the collection are subject to change after the initial load, you might need to think about memory reclamation from the collection's intern pool.
// Add 100 million dates spanning December 2016: var cc = new Condensed.CondensedCollection<DateTime>(capacity: 100000000); for (int i = 0; i < 100000000; i++) cc.Add(new DateTime(2016, 12, i % 30 + 1)); // Memory usage: 100 MB (compare to ~780 MB for an ordinary List<DateTime>)
Internally, the CondensedCollection uses a variable-width index to reference the unique values in its intern pool. Given sufficient repetition in your collection, this can provide significant memory savings even when storing small types. In the example above, references to the 31 unique days in a month can be held in an internal index that is just one byte wide.
Interning items can reduce memory consumption and can often improve performance for certain compute-intensive tasks. The trade-off is that modifications to the collection are slower. The Wikipedia article on string interning offers good background on the performance/space tradeoffs involved.
The CondensedCollection provides the following additional features:
Specialized LINQ operators in the Condensed.Linq namespace that are optimized to work on the collection. (Add a "using Condensed.Linq;" statement to source files that need to use these extension methods.)
Optional automatic fallback ("cutover") to non-deduplicated list behavior if the items in the collection aren't sufficiently unique to make the interning overhead worthwhile.
Support for nullable types.
Flexible options for reclaiming memory.
CondensedCollection is not thread-safe and does not perform any internal synchronization. Multiple simultaneous readers are allowed, provided there is no active writer. An exclusive lock or a ReaderWriterLockSlim should be used to synchronize multi-threaded access to each instance of the collection.