About myself and everyone else.
Last week we started looking at this dmv.
As promised I’ll now discuss the code on how to use this view over a period of time and do some analyses on your IO throughput. Why do we do this, we can do this for interest sake or you can do this to spot potential bottlenecks on your system, this can be bottlenecks for a specific file , database or drive.
The logic cover 3 steps.
First we can to grab a snapshot of the view periodically, we do this because the view only stores one row per file and is cumulative from server restart. Thus to grab a point in time we first create the statement to grab to columns of interest to us and secondly we schedule this process, this can be done through a SQL server job.
Once we have this process setup we create a report on the data. I was planning to do this with a recurring CTE. But since it was maxing out the CPU on my test server I decided to rewrite it using a cross join. A cross join will join every row in the table with every other row in the table. Thus a table with 3 rows cross joined to itself will return 9 rows. This can quickly grow into a large operation. If you have issues with performance you can dump this data into a table in order to keep it streamlined.
In the first collection query I also did some of the conversions, I converted KB into MB and milliseconds to seconds.
Some notes: The code does not take into account that you can have multiple log \ data files. This is easy to fix as you will just do a group by when reading from the dmv. You don’t need to use a CTE to collect the data, I just had the query in my notes and didn’t change it.
if OBJECT_ID(‘IOStats’) is not null drop table IOStats
Create table IOStats(Servername varchar(128),Databasename varchar(128),Type varchar(5) ,MB_Reads int,MB_Writes int,Stall_Read_Sec int,Stall_writes_Sec int,Date smalldatetime default getdate())
/* Query to schedule i.e. SQL Agent job.*/
(select @@servername as servername,
db_name(a.database_id) as DBname,
num_of_bytes_read / 1024./ 1024. As num_of_mbytes_read,
num_of_bytes_written /1024./1024. As num_of_mbytes_written,
IO_stall_write_ms / 1000 as IO_stall_write_sec,
IO_stall_read_ms / 1000 as IO_stall_read_sec
from sys.dm_io_virtual_file_stats (null,null) a
inner join sys.master_files b
on a.database_id = b.database_id
And a.file_id = b.file_id)
Insert into IOStats
Select *,getdate() from CTE
Why the Rank CTE ?
I use the CTE to rank my data. This ensures that you compare the correct deltas to one another. You cannot compare your first collection of dmv data with the 3rd collection. Well you could and as long as you ignore the 2nd in this instance you should be fine, but to get nice time sliced info that you can chart in excel I’m using RANK. This also helps me for the cross joins, if I used data I had to be sure of the format. For instance if the server was busy and my collection ran for .006 milliseconds and another collection for 009 milliseconds, unless you convert the date on the join you won’t return the required rows. This cause bad data. If you have large tables you also don’t want to convert the column as it will prevent any indexes from being used as efficiently as possible.
Hence , I created a rank CTE which I use for the cross join.
/* Query to generate report*/
(Select Databasename,type,mb_reads,mb_writes,Stall_Read_Sec,Stall_writes_Sec,rank() over (partition by Databasename,type order by date asc ) as [rank]
,date from IOSTats )
b.mb_reads-a.mb_reads As mb_reads,
b.mb_writes-a.mb_writes AS mb_writes,
b.stall_read_sec-a.stall_read_sec As stall_read_sec,
b.Stall_writes_Sec-a.Stall_writes_Sec As Stall_writes_Sec,
from vw_rank a
cross join vw_rank b
where a.databasename = b.databasename
and a.type = b.type
and a.rank + 1 = b.rank
In the results I’ve taken 6 samples in total. The first sample used for the baseline is not indicated. I can see that I write 1015MB between 10:52 and 11:28. The values are very small and I suggest you take collections over a period of a day or even a week. The more data you collect the better results and conclusions. I can see one thing, that is that if I have writes and no read my hardware seems to perform fine. Even with a 29mb write between 11:34 and 12:34 . As soon as there is reads on the system the write times are impacted.
This could be an indication of incorrect RAID setup of simply just a poorly performing IO. However you need to get a good amount of data to be sure. Once you have an indication in this report you can use it to look for weaknesses or you can use it along with other evidence that agree with the output.
I hope this help you in some way getting more info about your SQL server.