Cutover |
[This is preliminary documentation and is subject to change.]
Register a "cutover" predicate in the CondensedCollection's constructor to revert to non-deduplicated list behavior if it turns out that your collection's population has too many unique values.
The Condensed Library provides standard cutover predicates for common types.
By default, the CondensedCollection does not perform cutover.
A CondensedCollection creates significant overhead for each unique item in its intern pool (somewhere in the 70-80 byte range), plus an additional copy/reference of each interned value in the collection's internal lookup table. This high overhead is typically amortized if you're working with a very large collection that has a lot of repeated elements. But too many unique values will cause your memory usage to skyrocket.
In short, if you expect your collection to always have a lot of unique values, don't use a CondensedCollection--just use a normal List<T> or some other appropriate collection.
However, there may be situations when your app doesn't know anything about its incoming workload, but you want to take advantage of a CondensedCollection's behavior as a nice-to-have optimization if the elements happen to be sufficiently repetitive. In this case, registering a cutover callback can give you the flexibility you need.
A cutover callback is a user-supplied delegate that's occasionally called by a CondensedCollection as items are added/inserted/updated. Statistics are provided to your callback, and if you decide that the population is too diverse, the collection stops performing its internal deduplication and starts storing your objects like an ordinary list.
In the simplest case, say you're using a CondensedCollection to store Int32 values. A reasonable cutover predicate would look like this:
var cutover = new Predicate<CondensedStats>(delegate(CondensedStats stats) { // Return true to make a CondensedCollection stop performing deduplication. if (stats.UniqueCount > ushort.MaxValue) return true; else return false; }); // Provide the cutover predicate to the CondensedCollection: var cc = new CondensedCollection<int>(cutoverPredicate: cutover);
...the predicate above decides to stop performing deduplication if the collection exceeds 65,536 unique values (that's the number of unique values that will fit in the collection's internal 2-byte wide index--after that, the collection would start using a 4-byte index to reference your interned values, which is counterproductive because you're only storing a 4-byte type).
More elaborate cutover rules may be provided to suit the needs of your application or the size of your type. For example, you may not want to consider cutting over until your collection has at least one million elements to examine, at which point you choose to cutover if the ratio of elements to unique values isn't high enough:
var cutover = new Predicate<CondensedStats>(delegate (CondensedStats stats) { // Don't consider stopping deduplication until we have // at least 1 million elements in the population to look at: if (stats.Count < 1000000) return false; // Stop deduplication if we get less than 4 elements // in the collection for every unique value. if ((double)stats.Count / stats.UniqueCount < 4 ) return true; else return false; }); // Provide the cutover predicate to the CondensedCollection: var cc = new CondensedCollection<string>(cutoverPredicate: cutover, comparer: StringComparer.Ordinal);
You can determine whether cutover has occurred by using the IndexType property or the HasCutover property.
To simplify usage, the Condensed Library offers a set of standard cutover predicates for many common types. These can be found in the StandardCutoverPredicates class.
var cc = new CondensedCollection<decimal>(cutoverPredicate: StandardCutoverPredicates.DecimalPredicate);
These predicates offer reasonable behavior for most common types--for example, the standard predicate for an Int32 type is identical to the first example above. Other predicates for variable-sized types (like strings) are heuristics based on expected usage. Custom predicates should be used if these standard heuristics don't meet your needs or you would like more control over cutover behavior.