Deadlock issues when 20-30 users copying collections & pages at the same time

Bug #1492919 reported by Aaron Wells
52
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Mahara
Fix Released
High
Unassigned
1.10
Fix Released
High
Unassigned
1.9
Fix Released
High
Unassigned
15.04
Fix Released
High
Unassigned
15.10
Fix Released
High
Unassigned

Bug Description

Multiple users on the forum have reported deadlock issues when there are many users (20-30) copying pages & collections at the same time. See https://mahara.org/interaction/forum/topic.php?id=7309&offset=0&limit=10

Note that 20 to 30 users is around the number you'd hit if you're running a whole class of users through a training session at the same time.

The three reported cases of this are all in 15.04, so it may be a problem introduced in that version.

The problem has been reported in both Postgres & MySQL.

Aaron Wells (u-aaronw)
Changed in mahara:
status: New → Triaged
importance: Undecided → Medium
description: updated
Revision history for this message
Jim Judges (j-judges) wrote :

Just to add we have experienced this problem on our 15.04.01 when groups of students are copying pages and collections but also errors have also occurred under other circumstances e.g. during login or clicking on the Portfolio button. This suggests it may be a wider issue.

Any help to solve this problem appreciated - thanks.

Revision history for this message
Brett Lucas (brettaki) wrote :

We are very anxious to see a fix for this bug as our institutional upgrade has been delayed until it is resolved.

Revision history for this message
Samuel Witzig (wsam) wrote :

As I wrote in https://mahara.org/interaction/forum/topic.php?id=7309&offset=10, I suspect that Mahara 1.10.6 is also affected: When several students try to copy a view, and do this at the same time, they also get a deadlock-message (we use the most recent version of Mahara 1.10). We were able to reproduce this with 3 users (one was able to copy the view, the other 2 users got the error message.

Revision history for this message
Russell Boyatt (russell-boyatt) wrote :

To summarise what I've posted in the forum messages, we've looked at this issue with multiple versions of Mahara and different database platforms.

Using a JMeter test plan that:

1) GET main index page
2) POST a login,
3) GET list of portfolios pages, i.e. /view/index.php
4) GET choose template page, i.e. /view/choosetemplate.php
5) POST to /view/choosetemplate.php the action equivalent to select a page to copy (with all the appropriate form parameters)

Running this against Mahara 1.9, 1.10, or 15.04 and MySQL 5.5.44 - deadlock bug occurs when multiple users copy pages at the same time. It's relatively easy to trigger this, and I've included an example of the deadlock below.

With PostgreSQL it's a different story. Running with either PostgreSQL 9.3 does not produce the deadlock errors, even when scaling up the test to large numbers of simulated users copying pages. Tested with Mahara 15.04 and the current master branch from git. We believe this is also true on PostgreSQL 8.4 (but have done far less testing with that version).

Example MySQL deadlock message (from 'SHOW ENGINE INNODB STATUS'):
***************************************************************************
150916 20:46:31
*** (1) TRANSACTION:
TRANSACTION 1621D, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 9 lock struct(s), heap size 1024, 5 row lock(s), undo log entries 2
MySQL thread id 222, OS thread handle 0x66480450, query id 9755 localhost mahara update
INSERT INTO "view_rows_columns" ("view", "row", "columns") VALUES (17, 1, 3)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 2207 n bits 80 index "viewrowscolu_vie_ix" of table "maharamaster"."view_rows_columns" trx id 1621D lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) TRANSACTION:
TRANSACTION 1620F, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
19 lock struct(s), heap size 1024, 12 row lock(s), undo log entries 7
MySQL thread id 219, OS thread handle 0x662ff450, query id 9771 localhost mahara update
INSERT INTO "view_rows_columns" ("view", "row", "columns") VALUES ('15', 1, '3')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 2207 n bits 80 index "viewrowscolu_vie_ix" of table "maharamaster"."view_rows_columns" trx id 1620F lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 13 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 8; hex 800000000000000f; asc ;;
 1: len 8; hex 8000000000000019; asc ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 2207 n bits 80 index "viewrowscolu_vie_ix" of table "maharamaster"."view_rows_columns" trx id 1620F lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** WE ROLL BACK TRANSACTION (1)
***************************************************************************

Revision history for this message
Robert Lyon (robertl-9) wrote :

Here is some good information on how to avoid deadlocks in mysql: http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html

One of the more interesting points is
* Always be prepared to re-issue a transaction if it fails due to deadlock. Deadlocks are not dangerous. Just try again

And at http://stackoverflow.com/questions/2596005/working-around-mysql-error-deadlock-found-when-trying-to-get-lock-try-restarti it mentions
 "Often you can just put a try/catch around the query execution logic and look for a deadlock when errors occur. If you catch one, the normal thing to do is just attempt to execute the failed query again."

So we could try that for a start

Revision history for this message
Russell Boyatt (russell-boyatt) wrote :

Agree that deadlocks are not necessarily a bad thing and retrying is something to work considering. I've now had chance to look at what's going on and I *think* I understand what's going on now...

MySQL defaults to a transaction isolation level of 'REPEATABLE READ' (the transaction snapshot is stable from the first read) compared to the default in Oracle and PostgreSQL of 'READ COMMITTED'. As a REPEATABLE READ level causes any index lock to be held for the entire transaction then two page/collection copy operations working on the various view tables are going to end up in deadlock.

Some possible approaches:

1) Altering the transaction isolation for page copy operations

Force the transaction isolation level to 'read committed' for page copy operations. This should prevent the deadlock that is appearing in 15.04 (and 1.10, 1.9, 1.8 and likely back further) when run on MySQL. A crude patch to a branch of 15.04 is below. If I test using a JMeter test plan that simulates 30 users initiating page copies - without the patch I can trigger deadlocks, with the patch I've not managed to hit a deadlock.

https://github.com/rboyatt/mahara/tree/bug1492919-approach1

2) Setting transaction isolation application wide

As above but setting the transaction isolation level for MySQL to be the same as when running on PostgreSQL (i.e. read committed'). I'm afraid I don't know Mahara well enough to understand the full implications of this change. Others will be much better placed to advise here!

3) More complex approach is addressing the issue that any of the transactions in the application might still fail due to deadlock. I couldn't find anywhere in the code that attempts transaction retry but perhaps I've missed it? So perhaps longer-term looking at transaction retries and how some of the longer operations in transactions lock the various elements of the DB.

Aaron Wells (u-aaronw)
Changed in mahara:
importance: Medium → High
no longer affects: mahara/1.0
Revision history for this message
Mahara Bot (dev-mahara) wrote : A patch has been submitted for review

Patch for "master" branch: https://reviews.mahara.org/5335

Revision history for this message
Aaron Wells (u-aaronw) wrote :

Hi Russell,

Thanks for the research, and the patch! I've pushed it into our gerrit code review system here: https://reviews.mahara.org/#/c/5335

What you've described makes a lot of sense. We haven't experienced this issue with the large Mahara sites we host here at Catalyst IT, and those are all on Postgres. Doing a little checking, I see that we are using the default transaction level of READ COMMITTED on our hosted site databases.

So, I think the best bet is probably to change the commit level to READ COMMITTED at the application level. I've amended your commit so that it sets that up along with the other basic database parameters when establishing the initial connection. Since this is what we're already doing on our many production Mahara sites, it's unlikely to cause problems (although race conditions are always a possibility).

In answer to your item #3, we do not have any code in place to retry when a transaction fails. We have encountered so few cases of this so far, that we've just relied on "F5" in the web browser, as the retry mechanism.

Anyway, please give the latest version of the patch a try and see if that helps: https://reviews.mahara.org/#/c/5335/

Cheers,
Aaron

Revision history for this message
Russell Boyatt (russell-boyatt) wrote :

Hi Aaron,

Thanks very much for taking a look. Agree with your modified patch - it seems like the best way to go. I did give it a quick test with the JMeter test plan just to check and it looked ok.

For others reading this....these changes don't completely remove the possiblity of a deadlock (just makes it far more unlikely).

Russell

Revision history for this message
Aaron Wells (u-aaronw) wrote :

Thanks for the feedback, Russell! Let me know if you notice any problems. If not, we'll go ahead and include this patch in the next minor release.

Revision history for this message
Mahara Bot (dev-mahara) wrote : A change has been merged

Reviewed: https://reviews.mahara.org/5335
Committed: https://git.nzoss.org.nz/mahara/mahara/commit/0eea42ce3b129e0df80814f14b48f8a6da47479e
Submitter: Robert Lyon (<email address hidden>)
Branch: master

commit 0eea42ce3b129e0df80814f14b48f8a6da47479e
Author: Russell Boyatt <email address hidden>
Date: Sun Sep 20 15:25:19 2015 +0100

Set transaction mode to "READ COMMITTED" to avoid deadlocks. Bug 1492919

behatnotneeded: Testing requires multiple simultaneous users

Change-Id: Icfac83b0ac9ed6ab0c304658550dd0260bd5d7eb

Revision history for this message
Mahara Bot (dev-mahara) wrote : A patch has been submitted for review

Patch for "15.04_STABLE" branch: https://reviews.mahara.org/5360

Revision history for this message
Mahara Bot (dev-mahara) wrote :

Patch for "1.10_STABLE" branch: https://reviews.mahara.org/5361

Revision history for this message
Mahara Bot (dev-mahara) wrote : A change has been merged

Reviewed: https://reviews.mahara.org/5360
Committed: https://git.nzoss.org.nz/mahara/mahara/commit/2572307a392991cd8b19ddf2c590653ce99c7a02
Submitter: Robert Lyon (<email address hidden>)
Branch: 15.04_STABLE

commit 2572307a392991cd8b19ddf2c590653ce99c7a02
Author: Russell Boyatt <email address hidden>
Date: Sun Sep 20 15:25:19 2015 +0100

Set transaction mode to "READ COMMITTED" to avoid deadlocks. Bug 1492919

behatnotneeded: Testing requires multiple simultaneous users

Change-Id: Icfac83b0ac9ed6ab0c304658550dd0260bd5d7eb
(cherry picked from commit 0eea42ce3b129e0df80814f14b48f8a6da47479e)

Revision history for this message
Mahara Bot (dev-mahara) wrote :

Reviewed: https://reviews.mahara.org/5361
Committed: https://git.nzoss.org.nz/mahara/mahara/commit/35d7bfa5c68fdf9aea4cdd05321b69f818ff9dc4
Submitter: Robert Lyon (<email address hidden>)
Branch: 1.10_STABLE

commit 35d7bfa5c68fdf9aea4cdd05321b69f818ff9dc4
Author: Russell Boyatt <email address hidden>
Date: Sun Sep 20 15:25:19 2015 +0100

Set transaction mode to "READ COMMITTED" to avoid deadlocks. Bug 1492919

behatnotneeded: Testing requires multiple simultaneous users

Change-Id: Icfac83b0ac9ed6ab0c304658550dd0260bd5d7eb
(cherry picked from commit 0eea42ce3b129e0df80814f14b48f8a6da47479e)

Revision history for this message
Mahara Bot (dev-mahara) wrote : A patch has been submitted for review

Patch for "1.9_STABLE" branch: https://reviews.mahara.org/5366

Revision history for this message
Mahara Bot (dev-mahara) wrote : A change has been merged

Reviewed: https://reviews.mahara.org/5366
Committed: https://git.nzoss.org.nz/mahara/mahara/commit/bd084edce5737fc1103a3f3339bca7c374fbf937
Submitter: Aaron Wells (<email address hidden>)
Branch: 1.9_STABLE

commit bd084edce5737fc1103a3f3339bca7c374fbf937
Author: Russell Boyatt <email address hidden>
Date: Sun Sep 20 15:25:19 2015 +0100

Set transaction mode to "READ COMMITTED" to avoid deadlocks. Bug 1492919

behatnotneeded: Testing requires multiple simultaneous users

Change-Id: Icfac83b0ac9ed6ab0c304658550dd0260bd5d7eb
(cherry picked from commit 0eea42ce3b129e0df80814f14b48f8a6da47479e)
(cherry picked from commit 35d7bfa5c68fdf9aea4cdd05321b69f818ff9dc4)

Robert Lyon (robertl-9)
Changed in mahara:
status: Fix Committed → Fix Released
Revision history for this message
Aaron Wells (u-aaronw) wrote :

Since this bug has still been reported even after the deployment of the patch 5366, I've opened a subsequent bug: https://bugs.launchpad.net/mahara/+bug/1514608

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.