Format string overflow in Monitor.c:check_array

Bug #946758 reported by Steve McInerney
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Fix Released
Medium
Unassigned
Precise
Fix Released
High
Dimitri John Ledkov

Bug Description

SRU Justification

[Impact]

If mdadm --monitor is being used to monitor RAID (very common), then if a RAID reconstruction completes but with mismatches detected by the kernel, and the number of mismatches is more than 99, then mdadm crashes due to a buffer overflow. This will cause the loss of RAID monitoring, possibly without the administrator noticing. This could cause loss of data if a future RAID failure is not detected because monitoring has failed.

[Test Case]

0. Check that mdadm --monitor is running (it should be already on a md-based RAID system by default).
1. Arrange for RAID reconstruction to complete but with a large number of mismatches (difficult!).
2. Check if mdadm is still running. It should be, but this bug causes it to crash.

[Regression Potential]

The fix is taken from upstream and is trivial. The code change is solely in the monitoring code that runs when reconstruction is complete. If there is a regression, it is most likely to be in another similar C memory mismanagement bug that was already present in the monitoring code.

Original message:

possibly dupe of ​ #946344
on the off chance it's a new, created accordingly.

ProblemType: Crash
DistroRelease: Ubuntu 12.04
Package: mdadm 3.2.3-2ubuntu1
ProcVersionSignature: Ubuntu 3.2.0-17.27-generic-pae 3.2.6
Uname: Linux 3.2.0-17-generic-pae i686
NonfreeKernelModules: nvidia
ApportVersion: 1.94-0ubuntu1
Architecture: i386
Date: Sun Mar 4 01:58:16 2012
ExecutablePath: /sbin/mdadm
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Alpha i386 (20120201.2)
MDadmExamine.dev.sda:
 /dev/sda:
    MBR Magic : aa55
 Partition[0] : 54687744 sectors at 2048 (type fd)
 Partition[1] : 433587772 sectors at 54691838 (type 05)
MDadmExamine.dev.sda2:
 /dev/sda2:
    MBR Magic : aa55
 Partition[0] : 431634357 sectors at 1953415 (type fd)
 Partition[1] : 1951745 sectors at 1 (type 05)
MDadmExamine.dev.sdb:
 /dev/sdb:
    MBR Magic : aa55
 Partition[0] : 54687744 sectors at 2048 (type fd)
 Partition[1] : 433587772 sectors at 54691838 (type 05)
MDadmExamine.dev.sdb2:
 /dev/sdb2:
    MBR Magic : aa55
 Partition[0] : 431634357 sectors at 1953415 (type fd)
 Partition[1] : 1951745 sectors at 1 (type 05)
MDadmExamine.dev.sdc: Error: command ['/sbin/mdadm', '-E', '/dev/sdc'] failed with exit code 1: mdadm: cannot open /dev/sdc: No medium found
MDadmExamine.dev.sdd: Error: command ['/sbin/mdadm', '-E', '/dev/sdd'] failed with exit code 1: mdadm: cannot open /dev/sdd: No medium found
MDadmExamine.dev.sde: Error: command ['/sbin/mdadm', '-E', '/dev/sde'] failed with exit code 1: mdadm: cannot open /dev/sde: No medium found
MDadmExamine.dev.sdf: Error: command ['/sbin/mdadm', '-E', '/dev/sdf'] failed with exit code 1: mdadm: cannot open /dev/sdf: No medium found
MachineType: Dell Inc. Inspiron 530
ProcCmdline: /sbin/mdadm --monitor --pid-file /var/run/mdadm/monitor.pid --daemonise --scan --syslog
ProcEnviron:
 TERM=linux
 PATH=(custom, no user)
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-17-generic-pae root=UUID=4de18d92-4134-4795-943f-3cf94658f0d1 ro quiet splash vt.handoff=7
Signal: 6
SourcePackage: mdadm
StacktraceTop:
 raise () from /lib/i386-linux-gnu/libc.so.6
 abort () from /lib/i386-linux-gnu/libc.so.6
 ?? () from /lib/i386-linux-gnu/libc.so.6
 __fortify_fail () from /lib/i386-linux-gnu/libc.so.6
 __chk_fail () from /lib/i386-linux-gnu/libc.so.6
Title: mdadm crashed with SIGABRT in raise()
UpgradeStatus: Upgraded to precise on 2012-02-09 (24 days ago)
UserGroups:

dmi.bios.date: 03/20/2008
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.0.13
dmi.board.name: 0FM586
dmi.board.vendor: Dell Inc.
dmi.board.version: ���
dmi.chassis.type: 3
dmi.chassis.vendor: Dell Inc.
dmi.chassis.version: OEM
dmi.modalias: dmi:bvnDellInc.:bvr1.0.13:bd03/20/2008:svnDellInc.:pnInspiron530:pvr:rvnDellInc.:rn0FM586:rvr:cvnDellInc.:ct3:cvrOEM:
dmi.product.name: Inspiron 530
dmi.sys.vendor: Dell Inc.
etc.blkid.tab: Error: [Errno 2] No such file or directory: '/etc/blkid.tab'

Revision history for this message
Steve McInerney (spm) wrote :
Revision history for this message
Apport retracing service (apport) wrote :

StacktraceTop:
 __libc_message (do_abort=2, fmt=0xb76aedde "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:201
 __GI___fortify_fail (msg=0xb76aed5f "buffer overflow detected") at fortify_fail.c:32
 __GI___chk_fail () at chk_fail.c:29
 _IO_str_chk_overflow (fp=0xbfa3b200, c=49) at vsprintf_chk.c:35
 _IO_default_xsputn (f=0xbfa3b200, data=0xbfa3b143, n=1) at genops.c:485

Revision history for this message
Apport retracing service (apport) wrote : Stacktrace.txt
Revision history for this message
Apport retracing service (apport) wrote : ThreadStacktrace.txt
Changed in mdadm (Ubuntu):
importance: Undecided → Medium
summary: - mdadm crashed with SIGABRT in raise()
+ mdadm crashed with SIGABRT in __libc_message()
tags: removed: need-i386-retrace
Revision history for this message
Launchpad Janitor (janitor) wrote : Re: mdadm crashed with SIGABRT in __libc_message()

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mdadm (Ubuntu):
status: New → Confirmed
visibility: private → public
Robie Basak (racb)
summary: - mdadm crashed with SIGABRT in __libc_message()
+ Format string overflow in Monitor.c:check_array
Revision history for this message
Robie Basak (racb) wrote :

From the stack trace the culprit is:

                        char cnt[40];
                        sprintf(cnt, " mismatches found: %d (on raid level %d)",
                                sra->mismatch_cnt, array.level);
                        alert("RebuildFinished", dev, cnt, ainfo);

If mismatch_cnt > 99, then the buffer will overflow. In the crash report, it looks like the submitter had 1536 in mismatch_cnt.

It looks like this has already been fixed in Quantal, which now has:

                       char cnt[80];
                        snprintf(cnt, sizeof(cnt),
                                 " mismatches found: %d (on raid level %d)",
                                sra->mismatch_cnt, array.level);
                        alert("RebuildFinished", dev, cnt, ainfo);

Lucid, Natty and Oneiric use a shorter format string, so I don't think this bug exists there. So Precise is the only version affected.

Marking this as Fix Released as it is fixed in Quantal, and nominating Precise.

Changed in mdadm (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Robie Basak (racb) wrote :

Marking as Importance: High. The chances of this happening is fairly low, but if it does then mdadm will crash and administrators will lose RAID monitoring, quite possibly without noticing.

Changed in mdadm (Ubuntu Precise):
status: New → Triaged
importance: Undecided → High
Robie Basak (racb)
description: updated
Revision history for this message
Robie Basak (racb) wrote :

Debdiff attached, which backports the upstream fix. Note that the return value of snprintf isn't being checked, which ideally it should be to code this defensively. But that's what upstream have done, and with 32-bit integers an 80-byte buffer will always be big enough in this case, so I think it is acceptable for Precise.

I have test built this, but have not done any further testing as I don't have suitable hardware available. This is one of those cases where the fix is trivial yet testing is very awkward.

Revision history for this message
Tim Frost (timfrost) wrote :

80 bytes may not be enough on a server running in 64-bit mode with a large disk/array, given that the format string is 41 bytes lonmg - including 2 '%d' variables . How many digits could there be in the longest possible number of mis-matches on a system that has a raid partition of maximum supported size?

Revision history for this message
Robie Basak (racb) wrote :

I understand that it's not perfect, but this is the fix that is upstream and in Quantal. If we wanted to fix the problem differently from upstream, we'd need to also do it in Quantal and carry a delta as a minimum, and ideally forward the patch to upstream as well.

The format string is 39 bytes long, less two '%d's so 35 (excluding the null terminator). It does not appear to be i8n-ised. How did you get 41?

"%d" expects a plain int, which is still 32 bits on amd64 on Precise. And mismatch_cnt is a plain int, so 32 bits too. Or is there an option somewhere that is making ints bigger? And the second %d is just the raid level, which I don't expect to be bigger than two decimal digits anyway.

Even if the %d did expand a 64-bit number, then it'd expand to at most 20 bytes (including the sign). Assuming that the raid level is at most two bytes, the string would be a maximum of 58 bytes including the null terminator, which is less than the 80 now allocated.

Even if both %ds expanded full 64-bit integers (just in case the user happens to be using a RAID level of -9223372036854775808), that'd be 35+40+1=76 which is still under 80.

I'm just following the path of least resistance by doing what upstream has done. We can go beyond that if necessary, by patching Quantal and ideally upstream as well. I don't it is necessary here, but I'd appreciate input from a sponsor on this. If you care, then a new upstream bug and patch would be appropriate.

Revision history for this message
Sebastien Bacher (seb128) wrote :

Dmitrij plans to do a mdadm SRU soon and says that this fix will be included in the upload, unsubscribing sponsors and assigning to him

Changed in mdadm (Ubuntu Precise):
assignee: nobody → Dmitrijs Ledkovs (dmitrij.ledkov)
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Hello Steve, or anyone else affected,

Accepted mdadm into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/mdadm/3.2.5-1ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in mdadm (Ubuntu Precise):
status: Triaged → Fix Committed
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Changed in mdadm (Ubuntu Precise):
status: Fix Committed → Fix Released
To post a comment you must log in.