Stress test failure after RebalanceException

Bug #1074372 reported by Peter Beaman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Akiban Persistit
Fix Released
High
Peter Beaman

Bug Description

In extremely rare cases, Persistit throws a RebalanceException when attempting to remove a key; both the key and the contents of two adjacent pages containing the need to be in a particular configuration in order for this to happen. (See com.persistit.BufferTest2#testRebalanceException.) Since the preconditions are difficult to construct, even in a test, and because a RebalanceException has never been seen in application testing, and because the consequences were thought to be benign, we have deferred work to eliminate it.

However, a recent stress test run shows that the consequences are not necessarily benign:

Starting Stress10Suite for page size 4,096
deleted /mnt/persistit_tests/persistit_201211012250.log
deleted /mnt/persistit_tests/persistit_journal.000000000000
deleted /mnt/persistit_tests/persistit
deleted /mnt/persistit_tests/persistit_journal.000000000001
Stress10Suite at 60 seconds: live= 12 ended= 0 stopped = 0, failed= 0 totalwork= 14,226,812 intervalwork= 14,226,812 workrate= 234,535

Stress10[Thread-626] at 76 seconds: FAILED [Thread-626]: com.persistit.exception.RebalanceException
 at com.persistit.Buffer.join(Buffer.java:2563)
 at com.persistit.Exchange.raw_removeKeyRangeInternal(Exchange.java:3418)
 at com.persistit.Exchange.removeKeyRangeInternal(Exchange.java:3130)
 at com.persistit.Exchange.removeInternal(Exchange.java:3048)
 at com.persistit.Exchange.remove(Exchange.java:2978)
 at com.persistit.stress.unit.Stress10.executeTest(Stress10.java:170)
 at com.persistit.stress.AbstractStressTest.run(AbstractStressTest.java:91)
 at java.lang.Thread.run(Thread.java:662)

Stress10[Thread-633] at 76 seconds: FAILED [Thread-633]: com.persistit.exception.InvalidPageAddressException: Page -1 out of bounds [0-148391]
 at com.persistit.VolumeStorageV2.readPage(VolumeStorageV2.java:446)
 at com.persistit.Buffer.load(Buffer.java:461)
 at com.persistit.BufferPool.get(BufferPool.java:825)
 at com.persistit.LongRecordHelper.fetchLongRecord(LongRecordHelper.java:96)
 at com.persistit.Exchange.fetchFixupForLongRecords(Exchange.java:2887)
 at com.persistit.Exchange.fetchFromValueInternal(Exchange.java:2822)
 at com.persistit.Exchange.fetchFromBufferInternal(Exchange.java:2790)
 at com.persistit.Exchange.traverse(Exchange.java:2190)
 at com.persistit.Exchange.traverse(Exchange.java:1987)
 at com.persistit.Exchange.traverse(Exchange.java:1924)
 at com.persistit.Exchange.next(Exchange.java:2369)
 at com.persistit.stress.unit.Stress10.executeTest(Stress10.java:139)
 at com.persistit.stress.AbstractStressTest.run(AbstractStressTest.java:91)
 at java.lang.Thread.run(Thread.java:662)

Stress10[Thread-625] at 78 seconds: FAILED [Thread-625]: com.persistit.exception.InvalidPageAddressException: Page -1 out of bounds [0-150186]
 at com.persistit.VolumeStorageV2.readPage(VolumeStorageV2.java:446)
 at com.persistit.Buffer.load(Buffer.java:461)
 at com.persistit.BufferPool.get(BufferPool.java:825)
 at com.persistit.LongRecordHelper.fetchLongRecord(LongRecordHelper.java:96)
 at com.persistit.Exchange.fetchFixupForLongRecords(Exchange.java:2887)
 at com.persistit.Exchange.fetchFromValueInternal(Exchange.java:2822)
 at com.persistit.Exchange.fetchFromBufferInternal(Exchange.java:2790)
 at com.persistit.Exchange.traverse(Exchange.java:2190)
 at com.persistit.Exchange.traverse(Exchange.java:1987)
 at com.persistit.Exchange.traverse(Exchange.java:1924)
 at com.persistit.Exchange.next(Exchange.java:2369)
 at com.persistit.stress.unit.Stress10.executeTest(Stress10.java:139)
 at com.persistit.stress.AbstractStressTest.run(AbstractStressTest.java:91)
 at java.lang.Thread.run(Thread.java:662)

... many similar stack traces...

Stress10[Thread-630] at 83 seconds: FAILED [Thread-630]: com.persistit.exception.InvalidPageAddressException: Page -1 out of bounds [0-154327]
 at com.persistit.VolumeStorageV2.readPage(VolumeStorageV2.java:446)
 at com.persistit.Buffer.load(Buffer.java:461)
 at com.persistit.BufferPool.get(BufferPool.java:825)
 at com.persistit.LongRecordHelper.fetchLongRecord(LongRecordHelper.java:96)
 at com.persistit.Exchange.fetchFixupForLongRecords(Exchange.java:2887)
 at com.persistit.Exchange.fetchFromValueInternal(Exchange.java:2822)
 at com.persistit.Exchange.fetchFromBufferInternal(Exchange.java:2790)
 at com.persistit.Exchange.traverse(Exchange.java:2190)
 at com.persistit.Exchange.traverse(Exchange.java:1987)
 at com.persistit.Exchange.traverse(Exchange.java:1924)
 at com.persistit.Exchange.next(Exchange.java:2369)
 at com.persistit.stress.unit.Stress10.executeTest(Stress10.java:139)
 at com.persistit.stress.AbstractStressTest.run(AbstractStressTest.java:91)
 at java.lang.Thread.run(Thread.java:662)

[Thread-632] ERROR BTree structure error LONG_RECORD chain is invalid at page 48850 - invalid page type: Page 48,850 in volume persistit(/mnt/persistit_tests/persistit) at index 53,620 timestamp=1,308,373 status=vr1 type=Data
Exchange(Volume=/mnt/persistit_tests/persistit,Tree=shared,,Key=<{"stress10",20663,2}>)
0: Buffer=<Page 152,456 in volume persistit(/mnt/persistit_tests/persistit) at index 10,926 timestamp=1,308,373 status=v type=Data>, keyGeneration=3039009, bufferGeneration=58, foundAt=<48:depth=14:end>>
1: Buffer=<Page 61,989 in volume persistit(/mnt/persistit_tests/persistit) at index 61,989 timestamp=1,263,481 status=vd type=Index1>, keyGeneration=3039002, bufferGeneration=323, foundAt=<396:depth=16:ebc=13:db=184:tail=1584>>
2: Buffer=<Page 799 in volume persistit(/mnt/persistit_tests/persistit) at index 799 timestamp=1,266,902 status=vd type=Index2>, keyGeneration=3039002, bufferGeneration=550, foundAt=<464:depth=12:ebc=12:db=162:tail=3392>>
3: Buffer=<Page 119,960 in volume persistit(/mnt/persistit_tests/persistit) at index 85 timestamp=909,567 status=vd type=Index3>, keyGeneration=3039002, bufferGeneration=95, foundAt=<36:fixup:depth=11:ebc=0:db=128:tail=4060>>

Stress10[Thread-632] at 85 seconds: FAILED [Thread-632]: com.persistit.exception.CorruptVolumeException: LONG_RECORD chain is invalid at page 48850 - invalid page type: Page 48,850 in volume persistit(/mnt/persistit_tests/persistit) at index 53,620 timestamp=1,308,373 status=vr1 type=Data
 at com.persistit.LongRecordHelper.corrupt(LongRecordHelper.java:238)
 at com.persistit.LongRecordHelper.fetchLongRecord(LongRecordHelper.java:98)
 at com.persistit.Exchange.fetchFixupForLongRecords(Exchange.java:2887)
 at com.persistit.Exchange.fetchFromValueInternal(Exchange.java:2822)
 at com.persistit.Exchange.fetchFromBufferInternal(Exchange.java:2790)
 at com.persistit.Exchange.traverse(Exchange.java:2190)
 at com.persistit.Exchange.traverse(Exchange.java:1987)
 at com.persistit.Exchange.traverse(Exchange.java:1924)
 at com.persistit.Exchange.next(Exchange.java:2369)
 at com.persistit.stress.unit.Stress10.executeTest(Stress10.java:139)
 at com.persistit.stress.AbstractStressTest.run(AbstractStressTest.java:91)
 at java.lang.Thread.run(Thread.java:662)

[Thread-634] ERROR BTree structure error LONG_RECORD chain is invalid at page 48850 - invalid page type: Page 48,850 in volume persistit(/mnt/persistit_tests/persistit) at index 53,620 timestamp=1,314,078 status=vr1 type=Data
Exchange(Volume=/mnt/persistit_tests/persistit,Tree=shared,,Key=<{"stress10",20663,2}>)
0: Buffer=<Page 152,456 in volume persistit(/mnt/persistit_tests/persistit) at index 10,926 timestamp=1,314,078 status=v type=Data>, keyGeneration=3203946, bufferGeneration=58, foundAt=<48:depth=14:end>>
1: Buffer=<Page 61,989 in volume persistit(/mnt/persistit_tests/persistit) at index 61,989 timestamp=1,263,481 status=vd type=Index1>, keyGeneration=3203939, bufferGeneration=323, foundAt=<396:depth=16:ebc=13:db=184:tail=1584>>
2: Buffer=<Page 799 in volume persistit(/mnt/persistit_tests/persistit) at index 799 timestamp=1,266,902 status=vd type=Index2>, keyGeneration=3203939, bufferGeneration=550, foundAt=<464:depth=12:ebc=12:db=162:tail=3392>>
3: Buffer=<Page 119,960 in volume persistit(/mnt/persistit_tests/persistit) at index 85 timestamp=909,567 status=vd type=Index3>, keyGeneration=3203939, bufferGeneration=95, foundAt=<36:fixup:depth=11:ebc=0:db=128:tail=4060>>

Stress10[Thread-634] at 85 seconds: FAILED [Thread-634]: com.persistit.exception.CorruptVolumeException: LONG_RECORD chain is invalid at page 48850 - invalid page type: Page 48,850 in volume persistit(/mnt/persistit_tests/persistit) at index 53,620 timestamp=1,314,078 status=vr1 type=Data
 at com.persistit.LongRecordHelper.corrupt(LongRecordHelper.java:238)
 at com.persistit.LongRecordHelper.fetchLongRecord(LongRecordHelper.java:98)
 at com.persistit.Exchange.fetchFixupForLongRecords(Exchange.java:2887)
 at com.persistit.Exchange.fetchFromValueInternal(Exchange.java:2822)
 at com.persistit.Exchange.fetchFromBufferInternal(Exchange.java:2790)
 at com.persistit.Exchange.traverse(Exchange.java:2190)
 at com.persistit.Exchange.traverse(Exchange.java:1987)
 at com.persistit.Exchange.traverse(Exchange.java:1924)
 at com.persistit.Exchange.next(Exchange.java:2369)
 at com.persistit.stress.unit.Stress10.executeTest(Stress10.java:139)
 at com.persistit.stress.AbstractStressTest.run(AbstractStressTest.java:91)
 at java.lang.Thread.run(Thread.java:662)

Marking this has HIGH since it is rare, but serious.

Peter Beaman (pbeaman)
Changed in akiban-persistit:
assignee: nobody → Peter Beaman (pbeaman)
Peter Beaman (pbeaman)
Changed in akiban-persistit:
status: Confirmed → Fix Committed
Peter Beaman (pbeaman)
Changed in akiban-persistit:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.