Akiban Persistit

persisit not recoverable after hard crash

Bug #1065677 reported by Christopher Ernenwein on 2012-10-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Akiban Persistit	Fix Released	Critical	Peter Beaman	Akiban Persistit 3.2.0

Bug Description

Doing a long delete in OLB_latest dataset, my Mac crashed, taking my VM with it. The Vm was running akiban. It was via a local psql client.

INFO 15:14:05,258 Starting services.
Exception in thread "main" java.lang.AssertionError
at com.persistit.JournalManager$PageNode.setPrevious(JournalManager.java:1918)
at com.persistit.RecoveryManager.scanLoadPageMap(RecoveryManager.java:1150)
at com.persistit.RecoveryManager.scanOneRecord(RecoveryManager.java:937)
at com.persistit.RecoveryManager.findAndValidateKeystone(RecoveryManager.java:784)
at com.persistit.RecoveryManager.buildRecoveryPlan(RecoveryManager.java:1354)
at com.persistit.Persistit.initializeRecovery(Persistit.java:639)
at com.persistit.Persistit.initialize(Persistit.java:588)
at com.persistit.Persistit.initialize(Persistit.java:555)
at com.akiban.server.service.tree.TreeServiceImpl.start(TreeServiceImpl.java:165)
at com.akiban.server.service.servicemanager.GuicedServiceManager$2.onStart(GuicedServiceManager.java:309)
at com.akiban.server.service.servicemanager.GuicedServiceManager$2.onStart(GuicedServiceManager.java:303)
at com.akiban.server.service.servicemanager.Guicer.startServiceIfApplicable(Guicer.java:317)
at com.akiban.server.service.servicemanager.Guicer.startService(Guicer.java:295)
at com.akiban.server.service.servicemanager.Guicer.get(Guicer.java:78)
at com.akiban.server.service.servicemanager.GuicedServiceManager.startServices(GuicedServiceManager.java:76)
at com.akiban.server.AkServer.main(AkServer.java:177)
INFO 15:14:05,555 Stopping services normally.
INFO 15:14:05,556 Services stopped.

Related branches

lp:~pbeaman/akiban-persistit/fix-1065677-retrograde-timestamp-on-recovery

Merged into lp:akiban-persistit at revision 379

Nathan Williams: Approve on 2012-10-13

Revision history for this message

Peter Beaman (pbeaman) wrote on 2012-10-12:

Upon review the journal files were found to be in a strange state. The following output shows that journal files 200 and 201 were both terminated by crashes (or perhaps kill -9 by the oomkiller), but the bug states that the system crashed only once.

   200,000,000,000 660,539,749 JH ( 140) version 2 blockSize 1,000,000,000 baseAddress 194,879,573,072 journalCreated 2012/08/18 07:34:05.981 fileCreated 2012/10/11 14:10:06.213
   200,112,005,645 0 ~~ ( 0)
   201,000,000,000 660,622,333 JH ( 140) version 2 blockSize 1,000,000,000 baseAddress 194,879,573,072 journalCreated 2012/08/18 07:34:05.981 fileCreated 2012/10/11 14:28:04.051
   201,298,677,931 0 ~~ ( 0)
   202,000,000,000 660,622,333 JH ( 140) version 2 blockSize 1,000,000,000 baseAddress 194,879,573,072 journalCreated 2012/08/18 07:34:05.981 fileCreated 2012/10/11 14:38:51.726
   202,999,991,119 660,622,333 JE ( 40) baseAddress 194,879,573,072 currentAddress 202,999,991,119 journalCreated 2012/08/18 07:34:05.981
   203,000,000,000 660,981,681 JH ( 140) version 2 blockSize 1,000,000,000 baseAddress 194,879,573,072 journalCreated 2012/08/18 07:34:05.981 fileCreated 2012/10/11 15:03:27.817
   203,085,819,359 0 ~~ ( 0)

The specific issue is that the following page has been dirtied and written with out-of-order timestamps as follows:

201,242,155,271 660,668,657 PA ( 6500) page 1: 1,408,127 type Data right 1,421,664
202,254,908,667 660,659,062 PA ( 6500) page 1: 1,408,127 type Data right 1,421,664

Note that the reference in file 202 has smaller timestamp than the one in file 201. This is reflected in the page map at the start of file 203 and it is while loading that page map that the AssertionError occurs.

The bug mechanism appears to be a failure to properly advance the system timestamp after recovery on file 201. When the system restarted at 14:38:51.726, the recovered timestamp was less than 660,668, 657. The reason for this, and the cause of the bug, is that the system timestamp is updated during recovery only by certain kinds of journal records, and the PA record denoting a new page version is not one of them.

Note that journal file 201 contains only header records and PA (page) records; there are no TX, CP or other records that advance the timestamp.

Marking this as CRITICAL because the database is rendered unusable (albeit in a way that could have been corrected if truly necessary).

Upon review the journal files were found to be in a strange state.  The following output shows that journal files 200 and 201 were both terminated by crashes (or perhaps kill -9 by the oomkiller), but the bug states that the system crashed only once.

200,000,000,000     660,539,749 JH (     140)  version   2 blockSize  1,000,000,000 baseAddress    194,879,573,072 journalCreated  2012/08/18 07:34:05.981 fileCreated  2012/10/11 14:10:06.213
   200,112,005,645               0 ~~ (       0) 
   201,000,000,000     660,622,333 JH (     140)  version   2 blockSize  1,000,000,000 baseAddress    194,879,573,072 journalCreated  2012/08/18 07:34:05.981 fileCreated  2012/10/11 14:28:04.051
   201,298,677,931               0 ~~ (       0) 
   202,000,000,000     660,622,333 JH (     140)  version   2 blockSize  1,000,000,000 baseAddress    194,879,573,072 journalCreated  2012/08/18 07:34:05.981 fileCreated  2012/10/11 14:38:51.726
   202,999,991,119     660,622,333 JE (      40)  baseAddress    194,879,573,072 currentAddress    202,999,991,119 journalCreated  2012/08/18 07:34:05.981
   203,000,000,000     660,981,681 JH (     140)  version   2 blockSize  1,000,000,000 baseAddress    194,879,573,072 journalCreated  2012/08/18 07:34:05.981 fileCreated  2012/10/11 15:03:27.817
   203,085,819,359               0 ~~ (       0)

The specific issue is that the following page has been dirtied and written with out-of-order timestamps as follows:

201,242,155,271     660,668,657 PA (    6500)  page     1:   1,408,127 type       Data right    1,421,664
   202,254,908,667     660,659,062 PA (    6500)  page     1:   1,408,127 type       Data right    1,421,664

Note that the reference in file 202 has smaller timestamp than the one in file 201.  This is reflected in the page map at the start of file 203 and it is while loading that page map that the AssertionError occurs.

The bug mechanism appears to be a failure to properly advance the system timestamp after recovery on file 201.  When the system restarted at 14:38:51.726, the recovered timestamp was less than 660,668, 657.  The reason for this, and the cause of the bug, is that the system timestamp is updated during recovery only by certain kinds of journal records, and the PA record denoting a new page version is not one of them.

Note that journal file 201 contains only header records and PA (page) records; there are no TX, CP or other records that advance the timestamp.

Marking this as CRITICAL because the database is rendered unusable (albeit in a way that could have been corrected if truly necessary).

Changed in akiban-persistit:
status:	New → Confirmed
status:	Confirmed → In Progress
importance:	Undecided → Critical
assignee:	nobody → Peter Beaman (pbeaman)
milestone:	none → 3.2.0

Peter Beaman (pbeaman) on 2012-10-15

Changed in akiban-persistit:
status:	In Progress → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.