Version 1.1 is available and migrated to GitHub !! Some improvements were made and added the Continuous Execution…

You can access it on the new site: https://github.com/alepelc/EyeOnSql

If you can donate, please do so using Paypal. It will be greatly appreciated !


After almost 20 years of working in IT, I’m still amazed by how often people confuses High Availability (HA) with Disaster Recovery (DR). And I’m not talking about people on their first job but also professionals that are in the industry for a quite some years now…

Some people tend to minimize the difference, but in my opinion it is not something wise to do as both the architecture of the solution and the cost has a direct impact based on HA and DR options.

So first thing first, what is HA and what is DR. High Availability stands for a solution that will allow a system to be up and running when one or more of its components fails. Generally speaking there is a disruption suffered that tends to be in a low number of seconds and degradation might be noticed, but overall the system keeps working. HA should not involve any manual interaction to be executed.

Disaster Recovery, on the other hand, kicks in when all components fails. At that moment, the recovery of the systems needs to be executed and a disruption is observed, that could be from minutes to days, depending on the RTO (recovery time objective) of the affected systems. DR solution often involves manual operations.

One example of HA is if you have a web site that resides in three web servers that connects to a database in a cluster. If one or two web servers goes down, the remaining will still function. Operations happening on the two faulty web servers will be terminated and will need to be resubmitted. Users might see an error message and after some retry attempts, will be able to keep using the web site. Another example could be a server that has two power cords connected to different electrical phases in a data center. If one phase has a problem, the other one keeps delivering electricity to the server avoiding a power outage.

When we look for Disaster Recovery examples, you need to consider scenarios like, on the examples above, the three web servers goes down, or the database cluster goes down, or the two power phases goes down bringing the server offline, or there’s a flood on the data center and everything goes down. When you suffer from this, then you need to declare DR and start the recovery on a remote location. There are different alternatives for recovering, like recalling tapes from a storage facility and restore them on standby hardware. If replication is the method for DR then you will need to bring the system up on the remote location, etc.

DR is associated with two indicators: RTO (Recovery TIme Objective) and RPO (Recovery Point Objective). RTO is how much time the company is willing to wait for the system to be operational again. For instance, an RTO of one hour means that, once declared the DR, in one hour or less the system should be operational again. RPO is how much data the company is willing to lose. For instance, an RPO of 15 minutes means the company accepts losing the last 15 minutes of business. So in conjunction, you need to bring the system up in one hour or less and loosing 15 minutes of data or less.

HA is associated with one indicator: SLA (Service Level Agreement). When you set your SLA to be %99.9, that means that outside your maintenance window, you will be %99.9 of the time up and running. For instance, if you have a maintenance window of two hours a month, an SLA of %99.9 means that you can be down approx 8.7 hours a year (%99.99 is less than one hour a year).


If you have an SLA of %99.9, you need to invest in robust HA and DR solutions, which translates into bigger costs derived from more servers in the environment, more time is designing and testing the solution before the Go-Live and regular DR execution tests to ensure you meet your metrics. HA needs to be treated differently than DR as the goals they serve are different…

After a while, finally version 1.0 is available !! Improvements were made on the performance checks and also added an execution duration setting in order to test in a loop for a certain period of time.
Documentation is on its way now…

The following is a link to a post I’ve created on SQLServerCentral. The essence of this article is getting the deadlock information from the SQL Server logs in a “readable” way when you have the trace flag 1222 enabled…


Today I published the first version of a SQL Load test tool in CodePlex. This is still an Alpha version, so any feedback will be appreciated !

The link is https://github.com/alepelc/EyeOnSql

On the previous post I covered some suggestions regarding performance related to the disk subsystem. The second part of performance analysis I’ll cover is related to the indexes and its utilization, as that’s also one of the big performance improvements that can easily be done.

Analyzing the Existing Indexes

One of the things you should always look inside SQL is how your indexes are being used (or unused). The best way to achieve this is using the DMVs the SQL 2005 and above provides. Using those DMCs, you can sneak into which indexes are missing, the usage of the existing ones or which ones are unused.
In order to have better results, it’s recommended that you let the server run for a reasonable amount of time first (restarting the SQL engine clears the content of the DMVs), as these are statistics prepared by SQL server itself. It’s hard to put a value to the “Reasonable Time” as that depends on the SQL usage, but as a rule of thumb you can wait for 3~4 weeks before considering using the information DMVs provides.

The Missing Indexes

Perhaps one of the scripts I use the most, the following code will give an idea of the missing indexes, together with the estimated improvement the index can have if created. Of course, you should take care on how you use this, as if you create an index you may introduce a performance issue elsewhere, so it’s really really suggested that you create the indexes on a test DB first and that a smoke test gets executed against that just to confirm, at least, that not only you are improving things but also not adding future nightmares.
So, the script I use if the following:

db_name(database_id) as ‘DBName’
,d.statement as ‘Table’
order by DBName asc, (gs.unique_compiles + gs.user_seeks) desc

The result of the above query can be, for instance, the following:

If you take a look into that results, you can see the engine thinks only three indexes are missing (based on its own statistics) and it shows the columns that it should contain together with and compiles, user seeks and impact (improvement). Next, you can create those indexes and analyze your database performance.

Index Usage

Another interesting script I use is the following. It’ll show how your indexes are used on the database, ordering the results having the less used on top.

object_name(s.object_id) as ‘Table Name’,
i.name as ‘Index Name’,
i.index_id as ‘Index ID’,
from sys.dm_db_index_usage_stats s, sys.indexes i
where objectproperty(s.object_id,’IsUserTable’) = 1
and i.object_id = s.object_id
and i.index_id = s.index_id
order by (user_seeks + user_scans + user_lookups + user_updates) asc

If you run this on your database, you can see how many times your indexes were used, and that will give you an idea of the less used ones. If you analyze that and compare it with the cost of having that index in place, you can end up removing indexes or moving the most used to its own disk system, for instance.

Unused Indexes

Finally, I run the following script to get the list of indexes that we not used at all since the server started:

B.NAME as IndexName

You should take extra and special care on taking decisions based on this results, as it can harm your DB. Here it’s important that you have a lot of statistical information and that you have also some deep knowledge of the application and processes using that DB.
For instance, if you have a process that creates a quarterly report, which of course runs quarterly and use one index of that DB, and you analyze only one month of statistical information, that index will be shown as never used, and if you drop it, when your process runs to generate the report, it’ll not find the index so it may take longer, cost a lot more to the engine and / or even fail.


Indexes are very complex and require a lot of analysis of trained people. This DMVs will give you a lot of information, but people need to make a decision based on that information, so if it’s not well interpreted, then the result won’t be the expected one. Remember to always run the query that will benefit from the index creation before and after the creation of it, showing the Execution Time on both runs so you can compare if the index created really improves your performance. Also use the Database Engine Tuning Advisor (DTA) as that will also give you extra information on indexes, like if it needs to be clustered or non-clustered, the order, etc. This DMVs are not a replacement for the DTA but a complement.

During the last years I was involved in different projects, both as a system admin and as a DBA. When you participate from the beginning, it’s easy to do things right (or tend to do so), but the real problem may occur when you are involved in a project that is already deployed and running. The nightmare there resides on analyzing what was deployed initially, what was intended to be deployed and what’s the way to follow. You can’t imagine how many servers are misconfigured out there. Well, perhaps you can…

Sometimes inexperienced IT guys are told to install and configure one server and no proper guide is there to help them, or the person that do the capacity planning was relaying on previous experiences installing other systems than SQL, which leads to servers with poor performance on SQL. Another thing that you must deal with is the developers and the way they tend to work. They usually have a good understanding of performance but focused on their code. Not all of them think on the databases the way a DBA does, so commonly errors are repeated once and again, like concurrency, volume of data, filegroups, partitions, etc. Those errors are easy to avoid or mitigate with just an intervention on time of the DBA. I’ll try to highlight some of them on this article and will cover others on the following ones.

So, let’s finish the introduction and let’s focus on the real interesting stuff: Performance.

Disk System

When it comes to the server’s initial configuration, you should take a deep look into the disks configuration. One common mistake is the RAID option’s IT admins choose. Usually, the servers to run SQL are configured as an IIS or file system also, so the standard RAID 1 for the OS and a big RAID 5 to the database, IIS and whatever the server has is a temptation. Of course your server will run if you choose this configuration, but far away this is not the best option. Also, I saw some servers with the disks configured as a single RAID 5 array and then two or more partitions are created, simulating different disks. Again, this is a big mistake.

Of course you must configure your server based on the money you have to purchase it, but you should always consider the disks bottlenecks and try to reduce them as much as you can spending the money appropriately. It’s better to lose some GBs but make your server runs faster, so when it comes to configure the server’s disks, you should try to configure at least one RAID 1 array for your OS, one RAID 1 array for your data files and one RAID 1 array for your log files. If money is not an issue, then I recommend using a RAID 10 for your SQL data files and another RAID 10 for the log files. Additionally, you should have a separate subsystem for your OS pagefile, another for your binaries and a separate disk for the tempdb. If you plan to have databases that will have heavy queries, used as transactional and/or for Datawarehouse, then you should consider also configuring an extra array for your indexes and perhaps one or more arrays for some of the biggest or more used tables.

One interesting aspect also is the disk partition alignment for Windows 2003 or prior. I won’t go into really deep details on this as there´s a great article by Jimmy May (http://blogs.msdn.com/jimmymay/archive/2008/10/14/disk-partition-alignment-for-sql-server-slide-deck.aspx) talking about the improvements in SQL (about %20~%30) only by correctly aligning the partitions. If you’re on Windows 2008 then you don’t need to change this, but anyway it’s worthy to take a look at that article.

Another thing to consider when you’re dimensioning your SQL server is the tempdb. Getting the right size for it is tough but not impossible. First of all, you should have one datafile per processor and make all datafiles the same size. Also, disable the autogrowth of the files. This is a way to reduce the contention on the tempdb database, but removing the autogrowth can lead into out of space on the filegroups, so you should definitely talk to the solution’s architect (if available) and discuss the usage of the tempdb by the DBs. If you can’t get that information, then you should rely on the development server (which you should have already) and see what’s the usage there. Having that info, you can estimate the size of the tempdb always leaving enough empty room on it, and you should analyze how it’s impacted during the firsts weeks after the server’s Go-Live date in order to correct the size (if needed) before getting an issue because of space.

About the databases you’ll host, you should analyze the table’s structure and do an estimate based on the number of expected rows added per month or week. Then, do a projection of that, let’s say for a year or two, and there you’ll have a DB big enough host all the data without any growth of the datafiles. Again, this is hard to achieve but not impossible. You can estimate the row size of a table and then it’s as simple as multiplying that for the expected rows per day, week or month. That will give you the basic idea for proper sizing the DB. Also, look at your development or testing servers as they can reveal incredible information.

In all cases, the load you expect on your servers and the response time wanted or desired will determine the type of disk system that you should implement. If you can afford using solid state disks, then go for it, but for a lot of us who can’t get those yet, a proper capacity plan will save you a lot of headaches. Choosing the right disk configuration will depend on factors like money, but will also depend on how good you know the applications that will connect to your SQL, the databases that you’ll host and of course the expected volume of data to be handled. As a rule of thumb, you should always try to separate the different database and OS components into different arrays in order to reduce IO bottlenecks.