InnoDB: Failing assertion: (buf_pool->flush_list).count > 0 [ in buf_flush_remove - buf0flu.c ] {Was: Crash in insert benchmark}

Bug #934377 reported by Vadim Tkachenko
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Laurynas Biveinis
5.1
Invalid
Undecided
Unassigned
5.5
Fix Released
High
Laurynas Biveinis

Bug Description

In some insert intensive benchmark I've got crash
Version: '5.5.20-rel24.1-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona Server with XtraDB (GPL), Release rel24.1, Revision 217
120217 9:52:09 InnoDB: Assertion failure in thread 140660403742464 in file buf0flu.c line 542
InnoDB: Failing assertion: (buf_pool->flush_list).count > 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.

I do not have reliable procedure to repeat, but I will update ticket when I have more information.

Related branches

Stewart Smith (stewart)
Changed in percona-server:
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
Laurent Bigonville (bigon) wrote :

Hi,

I've been hit by a similiar issue with percona 5.5.21. I get the following backtrace (unfortunately without the debuging symbols installed.)

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7c1805]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x6912a4]
/lib/libpthread.so.0(+0xeff0)[0x7fb11d00dff0]
/lib/libc.so.6(gsignal+0x35)[0x7fb11c2091b5]
/lib/libc.so.6(abort+0x180)[0x7fb11c20bfc0]
/usr/sbin/mysqld[0x8bb7f4]
/usr/sbin/mysqld[0x8bb837]
/usr/sbin/mysqld[0x8b603b]
/usr/sbin/mysqld[0x8e908b]
/usr/sbin/mysqld[0x86b5a0]
/lib/libpthread.so.0(+0x68ca)[0x7fb11d0058ca]
/lib/libc.so.6(clone+0x6d)[0x7fb11c2a692d]

Revision history for this message
Chris Clifton (juice-qr) wrote :

We appear to have experienced this bug today as well with percona 5.5.27-28.1-log,

####

121108 12:35:25 InnoDB: Assertion failure in thread 140155688658688 in file buf0flu.c line 542
InnoDB: Failing assertion: (buf_pool->flush_list).count > 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
17:35:25 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=30
max_threads=1000
thread_count=6
connection_count=6
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2221025 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7ab385]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x687584]
/lib64/libpthread.so.0(+0xf4a0)[0x7f79c1dd54a0]
/lib64/libc.so.6(gsignal+0x35)[0x7f79c0fa0885]
/lib64/libc.so.6(abort+0x175)[0x7f79c0fa2065]
/usr/sbin/mysqld[0x88cb34]
/usr/sbin/mysqld[0x88cb77]
/usr/sbin/mysqld[0x889af1]
/usr/sbin/mysqld[0x8bac63]
/usr/sbin/mysqld[0x83b3a0]
/lib64/libpthread.so.0(+0x77f1)[0x7f79c1dcd7f1]
/lib64/libc.so.6(clone+0x6d)[0x7f79c1053ccd]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
121108 12:35:26 mysqld_safe Number of processes running now: 0
121108 12:35:26 mysqld_safe mysqld restarted
...
####

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Chris -

The issue you are experiencing could be bug 1042640, which is fixed in 5.5.28-29.1. Please upgrade and check if you can reproduce the issue then. Thanks!

Revision history for this message
Chris Clifton (juice-qr) wrote : Re: [Bug 934377] Re: Crash in insert benchmark

Thanks for the followup, we'll try that.

- Chris Clifton

On Sat, Nov 10, 2012 at 4:44 AM, Laurynas Biveinis <
<email address hidden>> wrote:

> Chris -
>
> The issue you are experiencing could be bug 1042640, which is fixed in
> 5.5.28-29.1. Please upgrade and check if you can reproduce the issue
> then. Thanks!
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/934377
>
> Title:
> Crash in insert benchmark
>
> Status in Percona Server with XtraDB:
> Incomplete
>
> Bug description:
> In some insert intensive benchmark I've got crash
> Version: '5.5.20-rel24.1-log' socket: '/var/lib/mysql/mysql.sock'
> port: 3306 Percona Server with XtraDB (GPL), Release rel24.1, Revision 217
> 120217 9:52:09 InnoDB: Assertion failure in thread 140660403742464 in
> file buf0flu.c line 542
> InnoDB: Failing assertion: (buf_pool->flush_list).count > 0
> InnoDB: We intentionally generate a memory trap.
> InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
> InnoDB: If you get repeated assertion failures or crashes, even
> InnoDB: immediately after the mysqld startup, there may be
> InnoDB: corruption in the InnoDB tablespace. Please refer to
> InnoDB:
> http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
> InnoDB: about forcing recovery.
>
>
> I do not have reliable procedure to repeat, but I will update ticket
> when I have more information.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/percona-server/+bug/934377/+subscriptions
>

Revision history for this message
Roel Van de Paar (roel11) wrote :

Seen in Percona-Server-5.5.28-rel29.1-359.Linux.x86_64

121204 10:02:19 InnoDB: Assertion failure in thread 320534848 in file buf0flu.c line 542
InnoDB: Failing assertion: (buf_pool->flush_list).count > 0

+thread 24
[Switching to thread 24 (LWP 18474)]
#0 0x00000038e260bd72 in pthread_kill () from /lib64/libpthread.so.0
(gdb) bt
+bt
#0 0x00000038e260bd72 in pthread_kill () from /lib64/libpthread.so.0
#1 0x0000000000695e23 in handle_fatal_signal (sig=6) at /home/jenkins/workspace/percona-server-5.5-nigthly-binaries/label_exp/centos5-64/Percona-Server-5.5.28-rel29.1/sql/signal_h
andler.cc:249
#2 <signal handler called>
#3 0x00000038e1630285 in raise () from /lib64/libc.so.6
#4 0x00000038e1631d30 in abort () from /lib64/libc.so.6
#5 0x000000000087bf6c in buf_flush_remove (bpage=0x6282940) at /home/jenkins/workspace/percona-server-5.5-nigthly-binaries/label_exp/centos5-64/Percona-Server-5.5.28-rel29.1/stora
ge/innobase/buf/buf0flu.c:542
#6 0x000000000087bfad in buf_flush_write_complete (bpage=0x4814) at /home/jenkins/workspace/percona-server-5.5-nigthly-binaries/label_exp/centos5-64/Percona-Server-5.5.28-rel29.1/
storage/innobase/buf/buf0flu.c:655
#7 0x00000000008794a4 in buf_page_io_complete (bpage=0x6282940) at /home/jenkins/workspace/percona-server-5.5-nigthly-binaries/label_exp/centos5-64/Percona-Server-5.5.28-rel29.1/s
torage/innobase/buf/buf0buf.c:4158
#8 0x00000000008b1f88 in fil_aio_wait (segment=6) at /home/jenkins/workspace/percona-server-5.5-nigthly-binaries/label_exp/centos5-64/Percona-Server-5.5.28-rel29.1/storage/innobas
e/fil/fil0fil.c:5587
#9 0x00000000008286e0 in io_handler_thread (arg=<optimized out>) at /home/jenkins/workspace/percona-server-5.5-nigthly-binaries/label_exp/centos5-64/Percona-Server-5.5.28-rel29.1/
storage/innobase/srv/srv0start.c:485
#10 0x00000038e260677d in start_thread () from /lib64/libpthread.so.0
#11 0x00000038e16d3c1d in clone () from /lib64/libc.so.6
#12 0x0000000000000000 in ?? ()

summary: - Crash in insert benchmark
+ InnoDB: Failing assertion: (buf_pool->flush_list).count > 0 [ in
+ buf_flush_remove - buf0flu.c ] {Was: Crash in insert benchmark}
Changed in percona-server:
importance: Medium → High
status: Incomplete → Confirmed
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

Note to self: /data/ssd/qa/105/vardir1_151/master-data | script:

$ cat cmd151
ps -ef | grep 'cmdrun_151' | grep -v grep | awk '{print $2}' | xargs sudo kill -9
rm -Rf /data/ssd/qa/105/cmdrun_151
mkdir /data/ssd/qa/105/cmdrun_151
cd /data/ssd/qa/105/randgen
bash -c "set -o pipefail; perl runall.pl --queries=100000000 --seed=8157 --duration=300 --querytimeout=60 --short_column_names --sqltrace --reporter=Shutdown,Backtrace,QueryTimeout,ErrorLog,ErrorLogAlarm --mysqld=--log-output=none --mysqld=--sql_mode=ONLY_FULL_GROUP_BY --grammar=conf/percona_qa/percona_qa.yy --gendata=conf/percona_qa/percona_qa.zz --basedir=/data/ssd/qa/105/Percona-Server-5.5.28-rel29.1-359.Linux.x86_64 --valgrind --reporter=ValgrindErrors --validator=MarkErrorLog --threads=25 --views --notnull --validator=Transformer --mysqld=--innodb_file_per_table=1 --mysqld=--innodb_file_format=barracuda --mtr-build-thread=728 --mask=42382 --vardir1=/data/ssd/qa/105/cmdrun_151 > /data/ssd/qa/105/cmdrun151.log2>&1"

Stewart Smith (stewart)
tags: added: crash
Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

So,
do we have a repeatable procedure for this bug?

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

Set to incomplete, as it is unclear if we have repeatable case

Revision history for this message
Roel Van de Paar (roel11) wrote :

Agreed with Vadim to mark as not-incomplete based on fact we have full bt info + core available.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Further testing has shown that the RQG run is not reproducible: the issue is sporadic. Next step would be for a developer to review the full bt info + core dump (ref above).

Revision history for this message
Roel Van de Paar (roel11) wrote :

Related to bug 1086680 ?

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Possible but not certain. Would be interesting to see if reproducible with bug 1086680 fix branch.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Laurynas, this was reproducible with QA-16274-5.5 tree (i.e. supersedes bug 1086680 fix branch) on a optimized build (./build/build-binary.sh without additional options). Revid Percona-Server-5.5.28-rel29.3-416.Linux.x86_64

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :
tags: added: xtradb
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Developed a theory by code analysis. It's a race condition in accessing the page I/O fix state, resulting in two threads attempting to remove the same block from the flush list.

One thread is performing a LRU or flush list flush. Let's say it stops at buf_flush_page() for a certain page between ut_ad(buf_flush_ready_for_flush(bpage, flush_type)) and buf_page_set_io_fix(bpage, BUF_IO_WRITE) calls. This thread holds the buffer pool and block mutexes, does not hold the LRU nor flush list mutexes.

Another thread is performing a DROP or TRUNCATE for a tablespace and arrives at buf_flush_or_remove_page() for the same page. It holds the LRU and flush list mutexes, does not hold the buffer pool nor block mutexes. It checks the block state to be BUF_IO_NONE and proceeds to remove the block from the flush list.

Meanwhile the first thread proceeds with flushing the same block, resulting in flush I/O completion routine unable to remove the flushed block from the flush list.

The fix would be not to perform dirty I/O fix state reads in buf_flush_or_remove_pages(). It's a recent regression in merging upstream DROP TABLE performance improvements (5.5.27 or so).

Will review the rest of stacktraces in this and other bugs and will make a tentative fix for testing.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

5.1 is immune, bug 1110102 #5 to #12 stacktraces match the theory.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

Adding two further occurrences for stack analysis, this time at line 544 in buf0flu.c:

130203 7:26:59 InnoDB: Assertion failure in thread 140033223284480 in file buf0flu.c line 544
InnoDB: Failing assertion: (buf_pool->flush_list).count > 0

Attaching 2x error log, 2x full stacktrace set

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Related: bug 1122462.

tags: added: i29552
Revision history for this message
fimbulvetr (fimbulvetr) wrote :

I've started to get this issue recently as well. I've been running percona for a long time, dozens of instances and have never seen this issue before. I can't say for sure this is the only thing that has changed, but recently some portion of our application has started using temporary tables heavily.

Perhaps 2-5 temporary tables per connection, connection held for 5-30 seconds and 10+ of these connections per second.

We experienced this crash this morning, and another one 25 days ago.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Fibulvetr -

What version are you running? Please open a new bug report with this and other information. Thanks!

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-5

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.