RyanFX's Blog: Lists, Sets, And Maps, Oh My!

Friday, June 28, 2013

Lists, Sets, And Maps, Oh My!

Choosing the correct data structure can be a challenge, even for experienced developers.  If you're here, you may be wondering:

  • What kind of data structure should I use for my current task?
  • When should I use a List?
  • What about a Set?
  • What is a Map and why should I care?



Resources


If you want to learn more about what you read below, I highly recommend one of the following two resources:


Lists


Lists are quite possibly the most commonly used collection in any programming language.  They are easy to insert data into, easy to retrieve data out of, and let's face in...they're convenient!  For the most part, they accomplish everything we need, in an acceptable time frame.

However

Many developers use Lists when they simply want to represent a collection of something, even though a different data structure might do a better job.

When are Lists Useful?


  •   Your data needs to be accessed or inserted in an orderly fashion
Accessing ordered data is an extremely common need.  Whether you keep track of a queue of messages, movies in a recently watched playlist, or requests for your widget factory, they all have one thing in common.  They need to be ordered.

  • Your data needs to be accessed with an index
What does this mean?  You can think of the List's index as an offset into your List.  You can get the first person waiting in line, or the sixth turn necessary to drive to the supermarket.

Sets


Sets are probably my "go-to" data structure.  The majority of Set implementations provide constant time big O for add, remove, and contains.  This means that whether the set contains 1 or 1000 stored objects, it takes just as long to perform the operation.  Contrast that to a List where the contains function runs in linear time.  As the List grows, the iterator spends more time hunting over the entire collection.

Sets give you advantages over Lists, pending that you are okay with two traits
  • The presence of an entity is what matters, not the quantity
By definition sets do not allow duplicates.  As long as this is acceptable, Sets are generally a better choice than Lists.  This can be both a blessing and a curse depending on how it's used.

  • Data is unordered
What does being unordered mean?  The order in which data are inserted is not necessarily the order in which it will be retrieved.  Sure, there are implementations of Sets that also keep an internal List to maintain ordering such as a LinkedHashSet, but that is implementation specific and adds overhead.

When are Sets Useful?


Sets should be used for situations like keeping track of registered users, a collection of visited cities, and a collection of e-books that you can sell.  In all of these scenarios, quantity and order are irrelevant.  In the e-book example, you as the seller would like to know what books you can sell, but you don't have an inventory count.  You never run out of downloadable e-books (unless you're doing something wrong!).  Implementations of a Set such as HashSet will let us look for the existence of an entity nearly instantaneously ( big O(1) ), regardless of the size of the Set!

Maps


Maps are a unique data structure that let you query its contents via a unique identifier (the "key") and retrieve data associated with that key (the "value").  This lets you retrieve a piece of data given a piece of data.  Imagine there exists a Map<String, Book> where the key is of type String and the value is of type Book.  This Map can be queried with an identifier, likely an ISBN, and the Book object itself would be returned.  This happens in O(1) time in implementations like HashMap, which means lookups are extremely fast.  Imagine a List with 2 million Books - how long would it take to find the Book you want?  Maybe as little as one try, maybe as many as 2 million tries!

When are Maps useful?


Maps are useful when you need very fast insertion, deletion, and retrieval of data.  They are perfect for scenarios where you know a little bit about the data you want (the key) and want more! (the value).  As long as one element of your data can serve as a unique key they are wildly fast and extremely efficient structures.

1 comment:

mjmagnets said...
This comment has been removed by the author.