adaptive flushing does a dirty scan of the flush list

Bug #1085015 reported by Alexey Kopytov
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Triaged
High
Unassigned
5.5
Triaged
High
Unassigned

Bug Description

The adaptive flushing code scans the flush list without acquiring the flush list mutex. Which is unsafe and may result in crashes. Backtrace which is likely a result of this:

Thread 1 (Thread 0x4fabf940 (LWP 11815)):
#0 0x00002b6af8db3d02 in pthread_kill () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00000000006839ae in handle_fatal_signal (sig=11) at /usr/src/debug/Percona-Server-5.5.27-rel28.1/Percona-Server-5.5.27-rel28.1/sql/signal_handler.cc:249
        curr_time = <value optimized out>
        thrs = <value optimized out>
        mins = <value optimized out>
        hrs_buf = "22"
        hrs = <value optimized out>
        mins_buf = "56"
        secs_buf = "06"
        tmins = <value optimized out>
        secs = <value optimized out>
        thd = 0x0
#2 <signal handler called>
No symbol table info available.
#3 srv_master_thread (arg=<value optimized out>) at /usr/src/debug/Percona-Server-5.5.27-rel28.1/Percona-Server-5.5.27-rel28.1/storage/innobase/srv/srv0srv.c:3429
        oldest_modification = <value optimized out>
        buf_pool = 0x0
        n_blocks = <value optimized out>
        lsn = 9603942490514
        bpage = 0x98
        level = <value optimized out>
        bpl = <value optimized out>
        j = <value optimized out>
        cur_time = 1351637765661
        buf_stat = {n_page_gets = 404558629170, n_pages_read = 51791593, n_pages_written = 571199613, n_pages_created = 10282915, n_ra_pages_read_rnd = 0, n_ra_pages_read = 451058,
          n_ra_pages_evicted = 6098044, n_pages_made_young = 165713072, n_pages_not_made_young = 0}
        slot = 0x8bc1794a192
        old_activity_count = 9693488870
        n_bytes_merged = <value optimized out>
        n_pages_flushed = 0
        n_pages_flushed_prev = 86
        n_bytes_archived = <value optimized out>
        n_tables_to_drop = <value optimized out>
        n_ios = <value optimized out>
        n_ios_old = <value optimized out>
        n_ios_very_old = 18446744072282976242
        n_pend_ios = <value optimized out>
        next_itr_time = 1351637767604
        prev_adaptive_flushing_method = 1
        inner_loop = 0
        i = <value optimized out>
        prev_flush_info = {{count = 0, space = 0, offset = 0, oldest_modification = 0} <repeats 64 times>}
        lsn_old = 9603942232306
        oldest_lsn = 9603252154283
        last_print_time = 1347991762
#4 0x00002b6af8dae73d in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#5 0x00002b6af9b624bd in clone () from /lib64/libc.so.6
No symbol table info available.

Tags: i27369
tags: added: i27369
Revision history for this message
Alexey Kopytov (akopytov) wrote :
Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

This looks like a serious enough bug to me, as I have had a customer hit this bug, the only workaround being to disable adaptive_flushing. What that means that anyone hitting this bug would not be able to use the new and intelligent checkpointing algorithm and hence be deprived with the progress w.r.t checkpointing.

Revision history for this message
Roel Van de Paar (roel11) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.