|
|
 | |  |
|
ClearCanvas Community Forums
|
Database connection leak and deadlock on large transfer
Last Post 2010-05-06 11:07 AM by jbluks. 17 Replies.
|
Sort:
|
|
Prev Next |
You are not authorized to post a reply. |
|
bmoloney
 Advanced Member Posts:47
 |
| 2010-03-10 01:20 PM |
|
I upgraded our server to the 2.0 release and tried sending a large set of studies (about 140 studies with close to 100GB of data). After a while I start to see the following warning which would seem to indicate a connection leak:
*****************************************************************************************
# Max SqlConnection Pool Size=100, current Db Connections=126
*****************************************************************************************
The number of Db Connections continues to grow during the send. After a while I see the following exceptions thrown (listed in the order they are thrown):
*****************************************************************************************
Exception thrown
Unexpected exception when calling stored procedure: QueryWorkQueue
System.Data.SqlClient.SqlException: Transaction (Process ID 70) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader()
at ClearCanvas.ImageServer.Enterprise.SqlServer2005.ProcedureQueryBroker`2.InternalFind(TInput criteria, Int32 maxResults, ProcedureQueryCallback`1 callback)
Exception thrown
Exception occured when processing WorkQueue item.
ClearCanvas.Enterprise.Core.PersistenceException: Unexpected problem with stored procedure: QueryWorkQueue: Transaction (Process ID 70) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. ---> System.Data.SqlClient.SqlException: Transaction (Process ID 70) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader()
at ClearCanvas.ImageServer.Enterprise.SqlServer2005.ProcedureQueryBroker`2.InternalFind(TInput criteria, Int32 maxResults, ProcedureQueryCallback`1 callback)
--- End of inner exception stack trace ---
at ClearCanvas.ImageServer.Enterprise.SqlServer2005.ProcedureQueryBroker`2.InternalFind
****************************************************************************************
I have tested the same transfer several times and consistently get the warnings/errors but the exact timing of the deadlock exception varies. I am going to revert to 1.5 and try the same transfer today. I will let you know what the result is. |
|
|
|
|
steve
 Senior Member Posts:1330
 |
| 2010-03-10 03:19 PM |
|
On the connection counts, how many simultaneous dicom connections do you have to the server when sending? I'm pretty sure we don't have a connection leak in the system, but we do use a fair amount of connections to the DB in our processes.
The deadlocks are not necessarily related to connection leaks, rather its a locking issue in the database with multiple connections attempting to lock the same database resource. Basically, the error is a transient error, and the ImageServer should recover.
I'm curious, what's your overall database size, and whats the performance like on your database server? We're not seeing these that often on our current production server, although they are still occasionally happening. In fact, the 2.0 release had a few changes to indexes related to the WorkQueue in order to try and prevent this from happening. Do you also have the WorkQueue screen open in the web gui when this is happening? Have you made any adjustments to the number of processing threads in the WorkQueue?
Thanks,
Steve |
|
|
|
|
bmoloney
 Advanced Member Posts:47
 |
| 2010-03-10 04:17 PM |
|
Hi Steve, I send all the data from a single instance of CCW by selecting all the studies at once and clicking send. The Image Server does seem to recover from the deadlock exceptions and all the data from the send does seem to show up (but I haven't had a chance to check this exhaustively). Before the send the database is quite small (31 studies, approx 20GB) since this is just our test server. We are running the test server in a virtual machine with two cores and 4GB or RAM assigned to it. I am not sure what the work queue looks like when the deadlock happens but I was observing it while the connection pool warnings were occurring and there was not a large backlog (typically 3-4 studies in the queue). I haven't made any adjustment to the number of processing threads in the WorkQueue. I have enabled a default RLE compress rule that is applied to all studies as they come in (but this goes fast enough that the work queue doesn't back up). We are running SQL Server 2008 and Windows Server 2008R2. |
|
|
|
|
steve
 Senior Member Posts:1330
 |
| 2010-03-11 04:33 PM |
|
Hi, Really with what you're doing, on normal hardware, you shouldn't be seeing these deadlocks. There really isn't that much load from what you're describing. The only thing may be that on a virtual machine, the disk IO is slowed down enough that database updates are happening too slowly, which is causing the problems. I'll have to monitor the deadlocks on our production servers here just to see if we're also seeing issues. Steve |
|
|
|
|
bmoloney
 Advanced Member Posts:47
 |
| 2010-03-12 12:47 PM |
|
I reverted to 1.5 and tried the same transfer and I don't see the warnings about the SQL connection pool or the exception about deadlock. Since I was able to reproduce the error on 2.0 by doing the same transfer (I tried three times and it happened each time) it seems to be correlated to some change in the software between the 1.5 and 2.0 releases. I am going to continue to test 1.5 by sending large transfers from multiple machines at once. I haven't tested the hard drive performance inside the virtual machine but I don't see any indication of it being particularly slow. |
|
|
|
|
bmoloney
 Advanced Member Posts:47
 |
| 2010-03-12 06:25 PM |
|
Doing two large transfers simultaneously from two different workstations has generated the deadlock exception in 1.5 (but still no connection pool warnings). We are going to do a full diff of the data on the server against the original data to make sure everything got transferred correctly. If this is indeed a transient error which is safe to ignore it would probably be good to log it as a warning instead of an error (or more importantly, not show an error in CCW since it will undoubtedly concern the users). I will post an update when we finish the diff. |
|
|
|
|
steve
 Senior Member Posts:1330
 |
| 2010-03-15 11:23 AM |
|
We did some rework in the 2.0 release with trying to keep database connections open on a thread, in order to improve performance. It appears that this caused there to be some extra connections open at any given time. You can change the connection pool size in the database connection string, in the app.config file. Just add this into your connection string in the ShredHostService.config file: Max Pool Size=200 Hmm, I'm surprised you are getting errors on the workstation? The earlier log file snippet you posted would not be relayed back to the workstation an error. This is related to internal processing of the WorkQueue, which would be after any connection from the Workstation has been closed. You must be having other errors if you're seeing an error on the workstation. If there is an error on the workstation, this would mean the workstation would have to resend the study that failed. Only the WorkQueue error would be considered a transient error. Steve |
|
|
|
|
bmoloney
 Advanced Member Posts:47
 |
| 2010-04-07 03:18 PM |
|
We have done some more testing and there are two types of deadlock exceptions that we see. One happens while processing a work queue item when it tries to call the stored procedure QueryWorkQueue (this is the exception that I posted above). When this happens no error appears in the CCW send queue. The other happens while trying to insert into the work queue when calling the stored procedure InsertWorkQueue. This exception also generates a warning and sends an error message back to CCW causing all of the entries in the send queue (including studies which had already successfully completed transferring/processing as well as studies which have not started transferring) to display the error message "Error encountered while sending file (AETITLE: Failure: Processing failure)". The errors in the logs for the second type of exception are: ******************************************************************************************** Exception thrown Unexpected exception when calling stored procedure: InsertWorkQueue System.Data.SqlClient.SqlException: Transaction (Process ID 99) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj) at System.Data.SqlClient.SqlDataReader.ConsumeMetaData() at System.Data.SqlClient.SqlDataReader.get_MetaData() at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method) at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method) at System.Data.SqlClient.SqlCommand.ExecuteReader() at ClearCanvas.ImageServer.Enterprise.SqlServer2005.ProcedureQueryBroker`2.InternalFind(TInput criteria, Int32 maxResults, ProcedureQueryCallback`1 callback) Exception thrown Unexpected error when executing command: Update/Insert a WorkQueue Entry ClearCanvas.Enterprise.Core.PersistenceException: Unexpected problem with stored procedure: InsertWorkQueue: Transaction (Process ID 99) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. ---> System.Data.SqlClient.SqlException: Transaction (Process ID 99) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj) at System.Data.SqlClient.SqlDataReader.ConsumeMetaData() at System.Data.SqlClient.SqlDataReader.get_MetaData() at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method) at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method) at System.Data.SqlClient.SqlCommand.ExecuteReader() at ClearCanvas.ImageServer.Enterprise.SqlServer2005.ProcedureQueryBroker`2.InternalFind(TInput criteria, Int32 maxResults, ProcedureQueryCallback`1 callback) --- End of inner exception stack trace --- at ClearCanvas.ImageServer.Enterprise.SqlServer2005.ProcedureQueryBroker`2.InternalFin ******************************************************************************************** And the warning in the logs is: ******************************************************************************************** Failure importing sop: Failure processing message: PersistenceException: Unexpected problem with stored procedure: InsertWorkQueue: Transaction (Process ID 75) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.. Sending failure status. ******************************************************************************************** We ran some benchmarks and the hard drive performance is not degraded from running inside the virtual machine. These more recent tests were on our production server where the VM has 4 cores and 8GB of RAM. We are still trying to verify the sent data against the original data. |
|
|
|
|
bmoloney
 Advanced Member Posts:47
 |
| 2010-04-08 08:17 PM |
|
I checked the study that the insert exception occurred on (as well as the study before and after in the send queue) and they all have been transferred and processed correctly. This would make it seem like this second type of deadlock exception is also a transient error which is safe to ignore (can you verify this?). This means that the image server sending back an error to the workstation is a bug. The workstation displaying the error message for all studies instead of just the study the error occurred on is also a bug (although inconsequential in this case if the first bug is fixed). Finally, it would be nice if these exceptions were logged as warnings instead of errors. |
|
|
|
|
bmoloney
 Advanced Member Posts:47
 |
| 2010-04-08 09:36 PM |
|
Ok I have seen another deadlock exception on InsertWorkQueue and this time the data was not all transferred successfully (one image was missing). So to revise my earlier statement, sending the error message back to the workstation in this case is not a bug. The fact that the workstation displays the error message for all transfers instead of just the one that failed is a bug however. Ideally the deadlock on InsertWorkQueue would also be handled gracefully but I don't know how feasible that is. |
|
|
|
|
steve
 Senior Member Posts:1330
 |
| 2010-04-14 02:05 PM |
|
The deadlock on InsertWorkQueue is basically "fatal" for receiving an image. We've somewhat wrapped up the receiving of a file off the network, storing it to disk, and inserting a record about it into the database (via InsertWorkQueue) into a transaction. This means if any part of the process fails, we roll back the changes, and send a failure status back to the SCU sending us the DICOM object. This way at least the sending side knows that the object failed sending, and it isn't "lost". The workstation software could be a bit better in attempting to do retrieves when this sort of transient error occurs, but in practice we haven't seen this happening that often. As for the general issue of the deadlocks themselves, you've hit upon one of the weaknesses of the current ImageServer design. Basically, almost all activities conducted by the ImageServer is passing through this single table. The table ends up being updated very frequently. We've spent a bunch of time already looking at different areas of how these queries, updates, inserts, and deletes are done against this table, and have done quite a bit in an attempt to remove deadlocks, but they do occasionally still occur. There were some further changes in the 2.0 release to attempt to address this, although the changes mostly revolved around deadlocks related to auto-routing. At our production site, we've basically addressed this by having a very powerful database server. Ie, Fusion IO solid state drives, multiple cores, lots of RAM, etc. etc. In this case, we do have 10 additional servers running the ShredHostService doing processing and have managed to process over 2 million studies on the system, but we still are having occasional issues with deadlocks. We'll continue to look at this with each release and try and make it work a bit better. Steve |
|
|
|
|
bmoloney
 Advanced Member Posts:47
 |
| 2010-04-14 06:39 PM |
|
Thanks for the information Steve. We are setting up ImageServer as the PACS for our relatively small research group. The issue has only been coming up since we have lots of old data that we are trying to push to the server in big chunks. If we push about 10 studies at a time (even one set right after another) then we do not see the deadlocks. Obviously this is not ideal but it is not a show stopper either. I am still a little unclear as to why the Workstation is currently displaying errors for every study in the send queue instead of just the study that had the error. Is this a limitation of DICOM? Would the the workstation have to query the server to try to figure out which study caused the failure? Thanks again for the help. Overall we are really happy with the software. |
|
|
|
|
steve
 Senior Member Posts:1330
 |
| 2010-04-16 06:00 PM |
|
As for the Workstation displaying errors for every study, this should not be happening. You should only be seeing errors for the associations/studies that have problems. The workstation should be able to track which specific images are failing, and link them to the correct studies displayed in the send/receive activity monitor. Steve |
|
|
|
|
bmoloney
 Advanced Member Posts:47
 |
| 2010-04-20 01:48 PM |
|
Could you please open a ticket for the workstation displaying the error for all studies in the send queue instead of just the study that failed? We are consistently seeing this behaviour for every InsertWorkQueue deadlock. Also we have noticed that the InsertWorkQueue deadlock always happens on the last image transferred (so there is just one image missing from the study). Not sure if this information is helpful. |
|
|
|
|
bmoloney
 Advanced Member Posts:47
 |
| 2010-05-01 09:49 PM |
|
We have done quite a bit more testing but cannot reduce the frequency of the deadlocks. We tried to run the software directly on a physical machine instead of a virtual machine, reducing the number of work queue threads, and even putting the SQL database on a pair of Intel SSDs in RAID 0 (which shows over 10,000 IOPS in IOMeter's database test). We are running the database and ImageServer software on the same server, but the hardware should be more than sufficient for our relatively light load. For now we are going to try to monitor the deadlocks so we see how often they occur in day-to-day use and so we can make sure any missing data gets resent. |
|
|
|
|
steve
 Senior Member Posts:1330
 |
| 2010-05-05 10:27 AM |
|
Stewart is looking into the issue with the send queue. With all the forum post lately, he lost track of this one, being in the ImageServer forums. He seemed to think the issue was with selecting multiple studies, then hitting Send once. This forces all the studies to be sent over one association, and may be why they were all failing. But, I'll let him confirm. I'm still not quite sure what the deal is with your deadlocks, but the clue about it being the last image in the study might be a clue. I know you've described it before, but could you explain a bit again about how you're using the Workstation to send when the problems occur? IE, how many studies you select and press send for at once, etc.?
|
|
|
|
|
stewart
 Senior Member Posts:1387
 |
| 2010-05-05 03:14 PM |
|
Hi, sorry I missed your workstation issue in the post. As Steve mentioned, I missed it because it started out as an ImageServer discussion and evolved to include the workstation.
I have verified the behaviour; any errors that occur on a particular association are shown next to every study that is being sent on that association. The Send Queue gives the impression that the studies are somehow being handled separately when in fact they're not; it's only split up by study for the purposes of display, really, but underneath they're all attached to the same "Send Job".
I've created this ticket, but I can't say when we'll actually be able to implement a fix for it.
Best regards,
Stewart |
|
|
|
|
jbluks
 Basic Member Posts:16
 |
| 2010-05-06 11:07 AM |
|
I'm not sure if this is related to the deadlock issue or not, but could possibly be related to the issue earlier in the thread about reaching the maximum pool size.
In testing, we discovered an issue where the ImageServer connections would always increase, hit the maximum pool size and then start acting wonky. It turned out that the ImageServer wasn't correctly configured for IIS 7.
So, if you're running IIS 7, this is definitely an issue you want to fix. If you're not running IIS 7, then this shouldn't affect you. Basically, in web.config, you need to copy the HttpContextDataManagementModule from the "httpModules" section, to the "modules" section under "system.webServer".
The ticket is here.
Jonathan |
|
|
|
|
| You are not authorized to post a reply. |
|
Active Forums 4.1
|
|
|
|
|
 | |  |
|