Making effective use of data collections is an essential skill of software developers. Nearly all computer science programs require students to take one or more courses focused solely on data structures.
The various collection types within the .NET Framework implement many of the most common data structures. The MSDN reference site lists the following as the most commonly used collection types:
*
Array
*
ArrayList and List
*
HashTable and Dictionary
*
SortedList and SortedDictionary
*
Queue
*
Stack
Of course, there are more collection types than these, but for the most part, you can get a lot done with just the types presented here. The big question is, how do we choose one collection over another?
Solutions
Your choice of collection should, of course, be based on the role it will serve. Performance and memory considerations should also influence your decision. Some of the questions you need to ask yourself before you choose a collection include:
*
Do I need to access elements at random, or will I only need to access them sequentially?
*
If I need random access, is it good enough to access them using an index, or will I need to access them with a key?
*
Does the collection need to grow, or do I know its size in advance?
*
Do I need to be able to sort elements?
The Array
The lowly but powerful Array is the simplest of collection types. It’s represented in memory as a sequence of values or references (depending on whether the array is storing value types or reference types, respectively).
An Array is useful when performance is an issue and you know how many elements you will need to store in advance.
The .NET Framework Design Guidelines recommend that you don’t use an Array as the return type of a public property. Most of the time, arrays are used in low-level programming as parameters to methods—for example, when working with streams:
byte[] data = new byte[] {0x0f, 0x0e, 0×13};
data[0] = 0xff; // random access
MemoryStream stream = new MemoryStream(data);
As the above code demonstrates, one benefit of using arrays is that the syntax for instantiating an Array in C# is very human readable.
The ArrayList and the List
The Array quickly loses its charm when you don’t know in advance the number of items that you need to store. The ArrayList object fills this void; it’s a replacement for an Array that’s capable of growing dynamically.
The ArrayList basically implements the same interface as the Array, but includes methods for adding new items:
ArrayList items = new ArrayList(new byte[] { 0x0f, 0x0e, 0×13 });
items[0] = 0×00; // Random access
items.Add(0xff); // Dynamically growing the ArrayList
Notice that instantiating an ArrayList is not as syntactically clean as it is with an Array. However, because the constructor of an ArrayList takes in an ICollection, and Array implements the ICollection interface, you can achieve the desired result with something that vaguely resembles our nice, neat Array syntax, as I’ve done above.
One problem with this code is that the ArrayList accepts items of type Object. This means that storing and retrieving values causes boxing and unboxing to take place, which can create a performance bottleneck. It’s in situations like this that the generic List
In fact, the List
List
new byte[] { 0x0f, 0x0e, 0×13 });
items[0] = 0×00;
items.Add(0xff);
As you can see, the code basically looks the same, but now it’s also type-safe, the result of which is that we’ve avoided the boxing and unboxing of each item in the list.
The Hashtable and the Dictionary
One potential problem with accessing elements via an index is that the index for a given item in the collection can change over time. Consider the following code sample:
List
Console.WriteLine(“At index 0 we have: ” + scores[0]);
scores.Insert(0, 23);
Console.WriteLine(scores[1] + ” is now t index 1.”);
The code adds three integers (in this case, high scores in a video game) to the List instance, then writes out the value at index 0—the value 962—to the console. We then insert another value at index 0, and write out the value stored at index 1. As you can see, the value 962 is now stored at index 1. This situation is problematic if your application relies on a stored value being retrieved from the same index at which it was inserted.
Using a Hashtable instead of a List
Hashtable scores = new Hashtable();
scores.Add(“Phil”, 196); // boxing occurs
scores.Add(“Jon”, 250);
scores.Add(“Scott”, 750);
scores.Add(“Jeff”, 901);
Console.WriteLine(“Phil’s Score is: ” + scores["Phil"]);
While lookups are extremely fast, the speed at which a Hashtable performs the lookup comes at the cost of increased memory usage.
The Dictionary class is the generic equivalent of the Hashtable; while a Hashtable only stores an Object for the key and value, a Dictionary allows us to specify the type of both the key and value, and thereby create a strongly typed hash table. Because storage and retrievals from a Dictionary are not subject to boxing or unboxing, the performance of these operations is greatly increased. Let’s see Dictionary in action:
Dictionary
scores.Add(“Phil”, 196); //no boxing occurs.
scores.Add(“Jon”, 250);
scores.Add(“Scott”, 750);
scores.Add(“Jeff”, 901);
Console.WriteLine(“Phil’s Score is: ” + scores["Phil"]);
SortedList and SortedDictionary
Continuing the example of video game scores that we discussed above, let’s imagine that our application needed to access a score both by index and by key. For example, suppose we wanted to keep the scores in an alphabetical order based on user names—using a Hashtable would give us no guarantee that the items would remain in any particular order.
The SortedList and SortedDictionary classes, both of which come in generic and non-generic flavors, are perfect for such a situation. While the interfaces for these classes are largely the same, for this discussion we’ll focus on SortedList. The following code demonstrates a SortedList in action:
SortedList
scores.Add(“Phil”, 196);
scores.Add(“Jon”, 250);
scores.Add(“Scott”, 750);
scores.Add(“Jeff”, 901);
Console.WriteLine(“Scores in alphabetical order”);
foreach(string key in scores.Keys)
{
Console.WriteLine(“{0}: {1}”, key, scores[key]);
}
// I can still access score by key.
Console.WriteLine(“Phil’s Score is: ” + scores["Phil"]);
Although this code has added our name-score combinations in random order, when we iterate over the sorted list, the scores will be displayed in alphabetical order:
Jeff: 901
Jon: 250
Phil: 196
Scott: 750
We can pass in an IComparer instance to apply a different sort to our items. For example, suppose we wanted to sort the items on the basis of the lengths of the users’ names, rather than in alphabetical order. We could write a quick class that implements IComparer to achieve this:
public class KeyLengthComparer : IComparer
{
public int Compare(string x, string y)
{
return x.Length.CompareTo(y.Length);
}
}
We’d then pass this class into the constructor for SortedList:
SortedList
new SortedList
So, how can you choose between using SortedList and SortedDictionary? According to the MSDN documentation, these two classes have very similar object models, although the SortedDictionary does not support the efficient random access of its Key and Value collections by index.
These classes’ performance in retrieving items is also similar, though the SortedList uses less memory than the SortedDictionary. If your collections are quite large, and memory usage is a concern, the SortedList is the class to use. However, for smaller collections that don’t need to be accessed by an index, a SortedDictionary is the way to go.
Queue
A queue is a First In, First Out (FIFO) collection. It’s easy to think of queues in terms of waiting in line to enter a theater—the first person to get into line is the first person to enter the theater.
In thinking about the circumstances under which we might use a queue, we need to consider why queues form in the real world. Queues usually form because there is more demand for a particular action or item than the system can meet. So there may be fifty people waiting to go into the theater, but only two people selling tickets.
In software development terms, queues are useful when we need to store messages in the order in which they were received so that we can handle them sequentially.
As a demonstration, let’s look at some code for an imaginary blog engine. Every comment in the system needs to be submitted to a third-party web service that will determine whether the comment is spam or not.
First, we instantiate a Queue
static Queue
Our method for adding comments to the Queue needs first to obtain a lock on the Queue for thread safety, because we’ll be using another thread to read from the Queue. Here’s how we obtain that lock:
public void AddToFilterQueue(Comment comment)
{
lock(queue)
{
queue.Enqueue(comment);
}
}
Every time a user submits a comment, the AddToFilterQueue method is called, which immediately adds the comment to the queue. We need to create a method to process this queue in a separate background thread:
public void ProcessQueue()
{
Queue
// Keep the queue locked for as short as possible.
lock (queue)
{
// Put comments from global queue into local queue.
while(queue.Count > 0) localQueue.Enqueue(queue.Dequeue());
// Tell any waiting threads we’re done with the queue.
Monitor.PulseAll(queue);
}
while(localQueue.Count > 0)
{
CommentService.Filter(comment);
}
}
One thing to notice here is that we copy the queue to a separate local Queue instance:
while(queue.Count > 0) localQueue.Enqueue(queue.Dequeue());
We do this because we don’t want to hold the lock on the Queue any longer than we have to, since sending the comment to the comment filter service could take a while. Other parts of our application cannot add new entries to the Queue while we’re holding a lock on it.
After we’ve pulled the comments into the local queue from the global queue, we notify any waiting threads that they can proceed to add new comments to the global Queue:
Monitor.PulseAll(queue);
Stack
In contrast to the queue, which, as we saw, is a First In, First Out (FIFO) collection, a stack implements a Last In, First Out (LIFO) collection. Stacks are used extensively by modern operating systems, as well as by the .NET Framework. For example—and this is a simplification—calling a method involves placing the parameters on the stack one after the other, in the order in which they’re encountered in the code. As the method is executed, each parameter is popped from the top of the stack—starting with the parameter that was added most recently, then working backwards through the added parameters—and processed in turn.
Stacks can also be useful for implementing recursive logic. One problem with recursive method calls is that the code can become to difficult to read. In fact, any recursive algorithm can be rewritten to use a stack. For example, a method I’ve found myself writing countless times is one that finds a control with a specific ID within a nested control hierarchy. Since web controls form a tree structure, one natural way to implement a method to find a specific control would be to use recursion. The following method accomplishes this by using a Stack:
public static Control FindControlUsingStack(Control root,
string id)
{
// Seed it.
Stack
stack.Push(root);
while(stack.Count > 0)
{
Control current = stack.Pop();
if (current.ID == id)
return current;
foreach (Control control in current.Controls)
{
stack.Push(control);
}
}
return null;
}
Using a stack to implement recursion results in code that is much easier to follow than code containing a method that calls itself.