Saturday, April 27, 2013

SQL Job performance issue

Problem Statement:  SQL Server job taking longer execution time. Import and load job was scheduled to run every hour and it’s taking longer time. On usual US time it executes well and gets completed in approx.. 2 hours while during US night it’s taking 4+ hours to finish.

Solution: The first point to focus is why job is taking longer time to run during night schedule.  The very next thing comes to mind is how the server load is during night hours. During job run we kept server under observation and executed few perfmon counters the server activities were normal. There was no memory issue or CPU contention present on the server. So we thought of checking what are the other activities happening on the server we found that there were many index defrag and backup job are running on the server, we guess this might be the culprit. Might be these jobs are consuming resources, might be causing locks and blocks. We checked that these jobs are finishing in secs and not doing any server contention so this option was ruled out as well. Then I monitored the inactive SQL session by sp_who2.  I saw SUSPENDED spid , that spid belongs to the concerned load job and this trend continued. SPID of that job was frequently getting SUSPENDED. So, what we need to understand here is - Why SQL server suspends any process? Simply put, SQL suspends the process because it has to wait on a resource of certain kind. Now we digged into for what reason these SPIDS are waiting? There is DMV sys.dm_exec_requests which tells the details about the SPID Executed following command to know the details :
select *  from sys.dm_exec_requests where session_id = (suspended spid )
Give spid as session id in the above query and here we got the clue the wait_type was ASYNC_NETWORK_IO. What do I understand from this - ASYNC_NETWORK_IO is commonly a result of a client application that is either slow to receive or process the resultset from SQL Server. So we analysed the output of the job and we amazed to see that query was handling around 2M rows in a shot. Hurray we nailed it right .We contacted application team with these details and advised them to decrease the number of rows pulled by application on each cycle of run and they agreed to change the logic to handle fewer loads in each run.

No comments:

Post a Comment