karmic kernel configuration gotchas...

Bug #446480 reported by Daniel J Blueman
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned
Declined for Karmic by Leann Ogasawara

Bug Description

Since Karmic is now in beta and converged on a near-final kernel and configuration, (as a kernel-level developer) I conducted a full kernel configuration audit on x86-64 and found some gotchas. This is inline with keeping Karmic the best ever Ubuntu release.

Since most of these options have been in production upstream for a good time and are stable, we have a short window to act. Commented config options are considered more important.

-> crucial config options which should be enabled (or justify why disabled)
X86_MCE - allows processor to report corrupt state; no overhead
X86_PAT - allows Xorg to mark framebuffer pages write-combining. MTRRs are frequently incorrect, giving a heavy graphics penalty; also needed for high-performance PCI adapters
DMAR, INTR_REMAP - important on newer (large?) Intel platforms
FRAMEBUFFER_CONSOLE_DETECT_PRIMARY - prevents PCI bus scan ordering from getting the 'wrong' graphics head when multiple; no impact otherwise

-> development environment config options, should be disabled for production:
DEBUG_MEMORY_INIT - expensive
OPTIMIZE_INLINING - previous benchmarks show asm bitops being uninlined, causing performance degredation; disable until proven good?
UNUSED_SYMBOLS - prevent symbol table bloat
SCHED_DEBUG - adds a small run-time penalty, measurable at high context-switch rate
EARLY_PRINTK - only useful for developing on new hardware; adds a small setup time at boot perhaps
STRIP_ASM_SYMS - remove unneeded internal symbols
KPROBES - largely replace by trace points
K8_NUMA - deprecated by ACPI NUMA detection
CPU_FREQ_STAT - needed by powertop?
ANDROID - only useful on android hardware
INOTIFY - deprecated by other kernel implementation, exporting same interface to userspace
LEGACY_PTYS - only useful for quite old userspace
SND_SUPPORT_OLD_API - only needed for really old alsalibs
IEEE1394, IEEE1394_DV1394 - only used by old lib1394 userspace
AMD_IOMMU_STATS
PM_TEST_SUSPEND
PCI_LEGACY
PCMCIA_IOCTL
NF_CONNTRACK_PROC_COMPAT
NET_ACT_SIMP
IRDA_DEBUG
CFG80211_REG_DEBUG
MAC80211_DEBUGFS
MTD_MTDRAM
MTD_DOC2000
MTD_DOC2001
MTD_DOC2001PLUS
MTD_ONENAND_SIM
MTD_ONENAMD_VERIFY_WRITE
PARPORT_PC_PCMCIA
PNP_DEBUG_MESSAGES
AIC7XXX_DEBUG_ENABLE
AIC79XX_DEBUG_ENABLE
SCSI_MVSAS_DEBUG
SCSI_LPFC_DEBUG_FS
SCSI_DEBUG
MD_FAULTY
FUSION_LOGGING
I2O_CONFIG_OLD_IOCTL
8139TOO_PIO
MAC80211_HWSIM
ATH9K_DEBUG
LIBIPW_DEBUG
B43LEGACY_DEBUG
ISDN_I4L
VIDEO_ALLOW_V4L1
SND_DUMMY
SOUND_PRIME
USB_GADGET_DUMMY_HCD
USB_ZERO
INFINIBAND_AMSO1100_DEBUG
INFINIBAND_IPOIB_DEBUG
THINKPAD_ACPI_DEBUGFACILITIES
JFS_STATISTICS
OCFS2_FS_STATS
QFMT_V1
AUTOFS_FS -
JFFS2_COMPRESSION_OPTIONS
SECURITY_SELINUX

-> config options beneficial for power-saving:
PCIEASPM - potential compatibility issues - perhaps leave for future release
CPU_FREQ_DEFAULT_GOV_ONDEMAND - presently on performance governor, for bootup speed, right?
SND_AC97_POWER_SAVE_DEFAULT=1, SND_HDA_POWER_SAVE_DEFAULT=1 - were these evaluated before?

-> config options which should be enabled for full expected functionality:
CORE_DUMP_DEFAULT_ELF_HEADERS - we have new enough GDB for this
AIC7XXX_CMDS_PER_DEVICE=32 - help text says default is 32, so why 8?
SCSI_SYM53C8XX_DEFAULT_TAGS=32 - help text says default is 32, so why 8?
NFS_FSCACHE, AFS_FSCACHE - very useful over low-performance links
ASYNC_TX_DMA - used on newer platforms
HEADERS_CHECK - ensure incorrect definitions don't leak into userspace headers
BLK_DEV_BSG - important feature for some newer (eg SAS) SCSI controllers
NETFILTER_XT_TARGET_TCPOPTSTRIP
NET_DROP_MONITOR
CAN_CALC_BITTIMING
MTD_ONENAND_OTP
PARPORT_PC_SUPERIO
PARIDE_EPATC8
PATA_HPT3X3_DMA
DM_LOG_USERSPACE
ENC28J60
TULIP_MWI
TULIP_MMIO
TULIP_NAPI
FORCEDETH_NAPI
R6040
DL2K
IWLWIFI_SPECTRUM_MEASUREMENT
B43LEGACY_DMA_MODE
PC300TOO
SBNI_MULTILINE
DEFXX_MMIO
ROADRUNNER_LARGE_RINGS
NETPOLL_TRAP
MOUSE_PS2_TOUCHKIT
INPUT_APANEL
ISI
LP_CONSOLE
HP_WATCHDOG
SSB_PCMCIAHOST
REGULATOR_FIXED_VOLTAGE
USB_GSPCA_SN9C20X_EVDEV
LEDS_CLEVO_MAIL
ACCESSIBILITY
A11Y_BRAILE_CONSOLE
INFINIBAND_NES
OTUS
STLC45XX
VT6655 (?)
GFS2_FS_LOCKING_DLM
JFFS2_SUMMARY
JFFS2_FS_XATTR
IMA

Revision history for this message
Daniel J Blueman (danielblueman) wrote :

correction:
DEBUG_MEMORY_INIT _isn't_ expensive, but outputs only at debug log level, so of limited end-user use

by far, the really smoking items here are:
X86_PAT
X86_MCE

 - both of these are on per upstream defaults and Fedora 12-alpha and rawhide for good reason

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Daniel J Blueman (danielblueman) wrote :

X86_MCE (Machine Check Exception) use cases/justification:

- David's laptop is running slowly with no indication why
 -> before moving his workflow over to Microsoft Windows, he (spending considerable time) rebuilds the kernel with CONFIG_X86_MCE after finding a tip on a forum
 -> after booting into his built kernel, he sees MCE reports in 'dmesg' saying that the processor is raising thermal trip (PROC_HOT#) events and is being throttled
 -> he cleans out a load of fluff from the heatsink air-flow path and performance is restored

- Jane is frustrated that occasionally on her desktop, processes get hit by SIGSEGV, SIGILL and SIGBUS, so she raises a launchpad report, but nothing looks suspicious and engineers at Canonical cannot reproduce the problem
 -> she switches to Windows and experiences the same issues
 -> after checking the Windows Event Log, she learns Machine Check Exceptions are being raised from the processor with timed-out memory reads. She replaces the memory in her desktop, resolving the problem and stays using Windows

Revision history for this message
Daniel J Blueman (danielblueman) wrote :
Download full text (3.6 KiB)

X86_PAT (Page Attribute Table support) use cases/justification:

- Tania uses her workstation for graphic design and wants optimal performance
 -> she deploys a discreet graphics card, but is disappointed by the performance

- James has a high-end GPGPU he uses to mathematical modelling
 -> he experiences full performance on Fedora, and has to rebuild the Karmic kernel with X86_PAT to get the same performance, but has lost a lot of time investigating

Benchmarks:

- using Karmic 9.10 beta as of 2009-10-09 with accelerated radeon driver on a ATI Radeon HD 3470 (R600):
$ uname -r
2.6.31-12-generic
$ grep X86_PAT /boot/config-2.6.31-12-generic
# CONFIG_X86_PAT is not set
$ dmesg
[ 7.005248] mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
[ 7.005282] mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
[ 7.198527] mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
[ 7.198562] mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
[ 7.198587] mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
$ cat /proc/mtrr
reg00: base=0x0d0000000 ( 3328MB), size= 256MB, count=1: uncachable
reg01: base=0x0e0000000 ( 3584MB), size= 512MB, count=1: uncachable
reg02: base=0x000000000 ( 0MB), size= 4096MB, count=1: write-back
reg03: base=0x100000000 ( 4096MB), size= 512MB, count=1: write-back
reg04: base=0x120000000 ( 4608MB), size= 256MB, count=1: write-back
$ x11perf -shmputxy500
x11perf - X11 performance program, version 1.2
The X.Org Foundation server version 10603000 on :0.0
from veyron
Fri Oct 9 11:09:46 2009

Sync time adjustment is 0.0530 msecs.

   2400 reps @ 2.6098 msec ( 383.0/sec): ShmPutImage XY 500x500 square
   2400 reps @ 2.6925 msec ( 371.0/sec): ShmPutImage XY 500x500 square
   2400 reps @ 2.6896 msec ( 372.0/sec): ShmPutImage XY 500x500 square
   2400 reps @ 2.6905 msec ( 372.0/sec): ShmPutImage XY 500x500 square
   2400 reps @ 2.6905 msec ( 372.0/sec): ShmPutImage XY 500x500 square
  12000 trep @ 2.6746 msec ( 374.0/sec): ShmPutImage XY 500x500 square

-> results: BIOS has not specified an MTRR covering the prefetchable PCI BAR for the graphics card. Xorg tries to modify the MTRRs (see dmesg) but failed. 372 reps/s.
-> built kernel using same upstream sources with X86_PAT enabled

$ uname -r
2.6.31.3-295c
$ grep X86_PAT /boot/config-2.6.31.3-295c
CONFIG_X86_PAT=y
$ x11perf -shmputxy500
x11perf - X11 performance program, version 1.2
The X.Org Foundation server version 10603000 on :0.0
from veyron
Fri Oct 9 11:42:19 2009

Sync time adjustment is 0.0390 msecs.

  16000 reps @ 0.3667 msec ( 2730.0/sec): ShmPutImage XY 500x500 square
  16000 reps @ 0.3629 msec ( 2760.0/sec): ShmPutImage XY 500x500 square
  16000 reps @ 0.3622 msec ( 2760.0/sec): ShmPutImage XY 500x500 square
  16000 reps @ 0.3623 msec ( 2760.0/sec): ShmPutImage XY 500x500 square
  16000 reps @ 0.3622 msec ( 2760.0/sec): ShmPutImage XY 500x500 square
  80000 trep @ 0.3633 msec ( 2750.0/sec): ShmPutImage XY 500x500 square

-> results: PAT allowed write-combining to coallesce data-wri...

Read more...

Revision history for this message
Daniel J Blueman (danielblueman) wrote :

I omitted to add, while it's possible to disable PAT support via booting with 'nopat', it's not possible to enable this mechanism when compiled out. If pathological cases are found, booting with 'nopat' is an unexpensive (vs teaching users how to reconfigure and rebuild kernels).

Revision history for this message
Daniel J Blueman (danielblueman) wrote :

On a different platform (Thinkpad T400 w/ Radeon HD 3470) with the radeon stack, we see:

2.6.31-12-generic w/o CONFIG_X86_PAT

$ x11perf -movetree
 800000 reps @ 0.0094 msec (107000.0/sec): Move window via parent (4 kids)
$ x11perf -scroll500
   5000 reps @ 1.2153 msec ( 823.0/sec): Scroll 500x500 pixels

2.6.31.3-295c w/ CONFIG_X86_PAT

$ x11perf -movetree
1200000 reps @ 0.0043 msec (235000.0/sec): Move window via parent (4 kids)
$ x11perf -scroll500
  50000 reps @ 0.1209 msec ( 8270.0/sec): Scroll 500x500 pixels

-> that's a 2.2x and 10x speed increase respectively. Power is saved, as PCIe bus and processor utilisation are lower; scrolling in applications is smoother

Revision history for this message
Roland Dreier (roland.dreier) wrote :

I do agree that X86_PAT and X86_MCE should be set.

However on the more minor options (as upstream InfiniBand maintainer), I think INFINIBAND_AMSO1100_DEBUG and INFINIBAND_IPOIB_DEBUG should continue to be set. These options enable runtime-controllable debug output, and it is very useful to be able to tell end users to set a module option to enable debug reporting when trying to debug something. By default they don't produce any output and don't have any performance impact beyond increasing the module size by a trivial amount.

Revision history for this message
Roland Dreier (roland.dreier) wrote :

Oh and yes, INFINIBAND_NES should definitely be 'm' -- it is needed to use Intel NetEffect 10-gigabit ethernet adapters. (Even just as straight NICs)

Revision history for this message
nick (swcodfather) wrote :

Can you confirm that the nopat kernel command line option was enabled for the 9.10 release.

tags: added: kconfig
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Daniel,

Thanks for the config review. The two config options you'd specifically pointed out in comment #1 are currently enabled in the actively developed Maverick kernel:

ogasawara@emiko:~/ubuntu-maverick/debian.master/config$ grep -rn "X86_MCE=" *
config.common.ubuntu:4686:CONFIG_X86_MCE=y

ogasawara@emiko:~/ubuntu-maverick/debian.master/config$ grep -rn "X86_PAT" *
config.common.ubuntu:4698:CONFIG_X86_PAT=y

Seeing as this was a config evaluation targeted for Karmic, and we're now actively focusing on Maverick, it seems this review might actually be a bit dated now. Additionally, we typically require a justification for each config change, much like you did for justifying the changes for X86_MCE and X86_PAT in comments #2 and #3. If any changes are still critically important to you, could you re-evaluate for Maverick, provide a proper justification, and file them as separate bugs? I suspect this original request for config changes will not qualify for SRU for Karmic and Lucid. Note that we prefer to make config changes early in the dev cycle (ie right now) to get as much testing as possible and avoid potential for regressions later on. I'm also going to remove this from the the Maverick config blueprint for the time being. Thanks.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Daniel J Blueman (danielblueman) wrote :

Hi Leann - thanks for the input. I have conducted a review of Maverick's 2.6.35 kernel (based on 2.6.34) configuration options at:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/589439

It would be impractical on both sides to raise these as separate LP bugs, so better than no bugs at all ;-) .

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.