About myself and everyone else.
My SQL Server is running out of memory
If I had a buck for every I’ve read that on a forum. I’d have a couple of hundred bucks. It kind of spoils the punchline doesn’t it.
There is enough to say about this topic to write books about, and many people do. When I’m confronted with the above question I try to get to something tangible. Tangible to me means proof. Is the symptom seen by a user trying to submit or query a form. Is it a symptom experienced by the guys that run Cognos on their local machine. Is it a problem on the physical server, someone is trying to stream itunes from the DB server and the best one of them all, can it be reproduced. If the latter is true you are actually fortunate ironic as it may be.
I know what the problem is but where do I start to look I think this is a problem because as you get acquainted with all of the indicators you can spend more time trying to find out which one to use than using them. My approach getting the biggest bang for buck.
The list is just a few points, before you can start you should know this. That is often why you don’t succeed if you just download the first query on google. troubleshooting is about a methodology applied on what you know. If you don’t know your servers , workload on infrastructure then you are not in the right line of work but good for you if the salary is good.
Lets assume the following
Yes I do have access. But if I log onto the box I’ll be able to view task manager and see the memory usage. I can verify that it’s in fact SQL running high on memory and not some rogue process eating all my RAM. Why not this approach – the next thing that will happen is you will most likely check out SQL. This means opening up SSMS and there goes another 125mb RAM. It’s not a lot sure the next step might lead to profiler there’s another 100mb -+ . Other eager DBAs might also log onto the box so RAM usage x2 for user applications. You access the SQL DB via shared memory, this would potentially give you different results if the error is specific to TCPIP.
x64 Win 2003 (no resource monitor). I picked .x64 so I don’t have to go into the technicalities of 3GB , AWE in this exercise, however you might check out if you SQL service account is allowed to lock pages in memory.
Dedicated SQL environment however I didn’t install SQL. This means that whoever installed the software could have installed Reporting services, SSIS and who knows maybe even analysis services on the box. It’s a very bleak outlook and these things shouldn’t be an issue unless you have terrible configuration management. You should have at this state configured the optimal amount of RAM for SQL minus a little bit for your OS. I usually prefer 2-10GB for OS on systems ranging from 8GB to 64GB RAM. You will also need to take into account if you are using this system as a cluster.
Yes, make sure that each person knows what each other is doing or planning to do. You might duplicate work or impact one another, as strange as this may seem I’ve worked in places where it’s literally like vultures on a carcass. I assume this is probably due to some poor management or the need to apply your own logic as you believe your colleagues’ might be flawed. I’m not sure if this is healthy or if anyone else have experienced this.
It seems to happen randomly but the severity is increasing.
The server is used for OLTP processing. The system is heavily used during business hours and process batches around 2-4 hours after business hours. Backups run sometime during the evening.
There is plenty of things that can go wrong from the above. I’m not going to log onto the SQL Server at this stage through TS because I know what should be running on the box and I know the configuration is correct. If you know this, you are 75% closer to tracking down your problem.
You can verify that someone didn’t change the configuration settings (sys.configurations – max server memory). This info is literally 1 right click and a single select statement away. remember our goal is best bang for buck. If you are a GUI DBA the memory is around 4 clicks away. I suppose you might argue the total time will be influenced by your typing speed (if you are pedantic), but since you already have a new query window at this stage the next bit is also a select statement away.
select database_id,(count(*))*8/1024. from sys.dm_os_buffer_descriptors
group by database_id
order by (count(*))*8/1024. desc
This view contains a good deal of information for a relatively simple query, everyone knows how to write a group by statement right. Immediately you can see which database is using the most memory and you can also quickly spot the sum of the memory and compare that to your “max server memory” and OS (You know these values by heart right?)
The next step would be to add more detail to your query this is done in order for us to spot the table in question. The first query you can run on any database. Because we went to find the DB that is using all the memory. Once we have the DB we want to find the object.
select OBJECT_NAME(object_id),sum(data_pages) *8/1024,COUNT(*)*8/1024 from sys.partitions a
inner join sys.allocation_units b
on b.container_id = case when (type = 2) then (partition_id ) else(hobt_id) end
inner join sys.dm_os_buffer_descriptors c
on c.database_id = DB_ID()
And c.allocation_unit_id = b.allocation_unit_id
where c.database_id = DB_ID()
group by OBJECT_NAME(object_id)
order by COUNT(*) desc
Now we have the detail on which table is using how much memory. Once you start using this to have a look at your workload during normal operation you can get an idea what is out of the ordinary. Again, know your tables, which tables are your biggest. In most of the OLTP systems I’ve seen that’s only a few big tables, it seems to follow the 80 / 20 rule. 20% of the tables (or less!) makes up 80% of the traffic or size it’s about as accurate as phi predicting every single flower ratio in a field.
Now that we have the table we can look if we need any modification or addition of indexes.
A note on this: If your page life expectancy is close to ZERO then this method won’t work. This is because the table won’t stay in the buffer pool long enough for you to find it using this query. Some query operations might also not be bound to the cache. You may for instance run a seriously recursive CTE which exhaust CPU, which will hold up processing and slow down the effectiveness of your RAM, the symptom could look like RAM but might be CPU related. The queries are a guide for you to use and get familiar and understand dmvs a little better.
When was the last time I had a memory issue?
I can’t remeber