Exchange Server 2010 and Single Instance Storage

Last reviewed on August 23, 2011

Microsoft’s Exchange team invested a great deal of time and effort in both Exchange 2007 and Exchange 2010 in order to reduce the input-output (I/O) load generated by Exchange on its disk subsystem and to change the I/O profile so that Exchange’s database engine (called ESE) would generate large sequential I/O requests (as opposed to many small random I/O requests). Both versions of Exchange improve the performance and scalability of ESE, at a cost in system memory.

When compared to Exchange 2003, Exchange 2007 had an overall I/O reduction of approximately 70% for many workloads. When compared to Exchange 2007, Exchange 2010 also has an overall I/O reduction of approximately 70% for many workloads. So, in total, when compared to Exchange 2003, Exchange 2010 has an overall I/O reduction of approximately 90% – for many workloads.

The “for many workloads” is the “cover myself” statement you’ll see in lots of Microsoft material, because there are certain environments in which it’s possible for later versions of Exchange to actually perform worse than the earlier versions. The first one that comes to mind is a memory constrained environment – later versions of Exchange have a requirement for comparatively large amounts of memory in order for them to perform well.

The database changes that were introduced beginning in Exchange 2007 did come at some cost. Memory, of course, is on that we’ve already mentioned (and I’ve explored in earlier articles for EMO). Another is in additional disk space requirements. By the release of Exchange 2010, Single Instance Storage (SIS) is completely gone from Exchange databases. For some customers, this leads to an expansion of disk space requirements for Exchange databases.

SIS began disappearing in Exchange 2007. Prior to Exchange 2007, SIS was applied to message bodies. That is, if multiple mailboxes in a single mailbox database received a message, only a single physical copy of that message body was actually stored in the mailbox database. Pointers to that message body were what were actually stored in each individual mailbox (a rather advanced type of stubbing, which is used by many message archiving solutions). A message body contains the actual text (or HTML) that is the primary user-readable content. Microsoft humorously defines the message body as the “message payload” – what the message is designed to deliver.

One of the several mechanisms used to attain the reduction in disk I/O requirements present in Exchange 2007 was by removing SIS for message bodies. While conceptually it seems that having less data in the mailbox database would result in less physical disk I/O, the implementation of SIS for message bodies involved lots of indexing and extra tables in the mailbox database that caused it to actually be significantly more expensive (in terms of I/O cost). So in Exchange 2007, every user that received a message also was allocated their own unique copy of the message body. At a cost of some additional disk space, this provided a reduction in I/O requirements.

However, in Exchange 2007, if there were attachments present in a message, SIS was still applied to the attachments.

Fast forward to Exchange 2010 and we see that significant I/O reductions have again occurred. One of the ways that this reduction was attained was by the elimination of SIS for attachments. Prior to Exchange 2010, if multiple mailboxes in a single mailbox database received an email with an attachment, on a single physical copy of that attachment was stored. In Exchange 2010, every mailbox receives its own copy of the attachment. Again, perhaps counter-intuitively, this does provide for a reduction in the overall I/O requirements of the Exchange database.

However, in some environments, removal of SIS for attachments can cause a significant growth in the size of Exchange databases (just think of all those PowerPoint presentations your sales people email and all those Visio and CAD diagrams your engineering people email). While Microsoft has never recommended depending on SIS for the sizing of Exchange databases, the truth is that many environments have depended on SIS and are shocked at the growth that happens.

If retaining SIS is an important mechanism for your organization to control the overall amount disk space used by Exchange databases, then you will need to investigate third party solutions.

Overall, however, removing SIS has allowed Exchange to scale to much larger mailboxes, much larger user populations, and provides support for the use of less expensive disk to host Exchange databases.

Written by