RPi2: connection not realiable

Bug #1489412 reported by Iñigo M
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Snappy
Expired
Undecided
Unassigned

Bug Description

I'm trying to execute a piece of software that works fine in Raspbian and even in BBB Snappy.

Some info:
- This software sends data through several interfaces: ethernet, using WiFi dongles, serial ports.
- This software requieres of GPIOs, SPI and I2C buses.
- Has different priority threads (priorities are OK)

I have executed it through upd (ethernet) and serial ttyAMA0 port, but is not as rensponsive as in Raspian or BBB Snappy images.
Could be related to different resources delay (SPI & I2C bus, GPIO..)???

Revision history for this message
Iñigo M (inigo-muguruza89) wrote :
Revision history for this message
Oliver Grawert (ogra) wrote :

while debugging on IRC it showed that adding "smsc95xx.turbo_mode=N" to /boot/uboot/cmdline.txt (or respectively to /system-boot/cmdline.txt when editing the SD card on a PC) improves the situation but does not completely fix the issue ... we should consider adding this option to the 15.04.3 image by default.

Revision history for this message
Ricardo Salveti (rsalveti) wrote :

turbo_mode was indeed something that always caused us issues, so it should be safe to just add it again for this image.

Can we check the kernel config differences between both kernels used?

Revision history for this message
Victor Mayoral (vmayoral) wrote :

Thanks ogra and rsalveti for looking into this.

Our config is available here https://gist.github.com/vmayoral/79bdde0e79dc836fcbbb. It's pretty much the same as the one that ppisati set up but with CONFIG_STRICT_DEVMEM not set (we needed this to run the autopilots since we access directly /dev/mem).

Revision history for this message
Victor Mayoral (vmayoral) wrote :

I'm providing below the configs of Debian (which allow us to execute this binary perfectly):

- https://gist.github.com/vmayoral/79bdde0e79dc836fcbbb (Snappy config, communications in the binary are really slow)
- https://gist.github.com/vmayoral/caef13f92b813b165321 (Debian config as provided by Raspberrypi.org, ok)
- https://gist.github.com/vmayoral/461f977245eb4c7231f6 (Debian with PREEMPT_RT, ok)

Revision history for this message
Paolo Pisati (p-pisati) wrote :

There are a lots of variables here: can we reduce the communication channels? Like, can we test the communication on serial only? Then move to ethernet only, etcetc

Moreover, which kernel versions does Debian use? 3.18.x?
We could create an rpi2 kernel with the debian config and see if that improves the situation - if it does, it's a config difference between the two, if it doesn't, we continue digging.

Another simple test that you can try: have you tried booting a snappy rpi2 image with that debian kernel? Does it perform as expected?

Revision history for this message
Victor Mayoral (vmayoral) wrote :
Download full text (6.4 KiB)

As suggested by ppisati I've tried isolating this issue:

- Keeping the tests it to serial
- Kernel version: Debian uses 3.18.x while Snappy 3.19.x.
- Boot snappy with Debian kernel.

Replaced Debian's kernel on the Snappy image:

>wget erlerobotics.com/files/boot-debian.tar.gz
>tar -zxf boot-debian.tar.gz
>sudo cp ~/boot-debian/kernel7.rt.img /boot/uboot/a/vmlinuz
>reboot

U-Boot 2015.07-dirty (Aug 06 2015 - 16:55:15 +0200)

DRAM: 944 MiB
WARNING: Caches not enabled
RPI 2 Model B
MMC: bcm2835_sdhci: 0
reading uboot.env
In: serial
Out: lcd
Err: lcd
Net: Net Initialization Skipped
No ethernet found.
Hit any key to stop autoboot: 0
switch to partitions #0, OK
mmc0 is current device ...

Read more...

Revision history for this message
Paolo Pisati (p-pisati) wrote :

sudo cp ~/boot-debian/kernel7.rt.img /boot/uboot/a/vmlinuz
...
reading a/vmlinuz
8546784 bytes read in 5379 ms (1.5 MiB/s)
reading a/initrd.img
5842717 bytes read in 3675 ms (1.5 MiB/s)
Bad Linux ARM zImage magic!

the kernel7.img is a file modified using mkknlimg to boot straight into it (bcm bootloader -> kernel instead of bcm -> uboot -> kernel) and thus a trailer was appended, so you have two options:

1) either you purge the trailer appended by mkknling (https://github.com/raspberrypi/tools/blob/master/mkimage/mkknlimg) - tedious IMO

or

2) you change how snappy boot: modify config.txt and replace "kernel=uboot.bin" with "kernel=kernel7.img", that should be enough (if not, modify the rest of the environment until you get to userspace)

Question: you mentioned you turned on the RT option on that kernel, that means you compiled it, and if that's true, why don't you simply copy the vmlinuz from the compilation tree? That's another option.

Revision history for this message
Victor Mayoral (vmayoral) wrote :

Hi everyone,

I've been doing quite a bit of testing on this matter. Following Paolo's your indications on 2) i didn't manage to be successful. I tried enabling the serial console and waited for a new network interface. Nothing that i can see, it seems that the board hangs at:
"Uncompressing Linux... done, booting the kernel."

Regarding your question, yes we are using this kernel (https://github.com/erlerobot/rpi2-kernel/blob/rpi-3.18.9-rt5/README.MD) with the PREEMPT_RT patches applied. Just compiled the kernel and copied the zImage to the boot partition and after a few changes on the kernel configuration i got to userspace. Here's the config that i'm using https://gist.github.com/vmayoral/881d3d63f1f2a08620ef.

At this point I could start comparing the two file systems (Debian and Snappy). When launching the binary described above (which corresponds with the APM autopilot that accesses several peripherals and also /dev/mem):
- Both images contain the same kernel: 3.18.9-rt5-v7+ #4 SMP PREEMPT RT
- Binaries have been launched in both file systems with superuser privileges
- Debian shows a responsive behavior, the autopilot works as expected
- Snappy works surprisingly slow

Tests in both file systems have been made under the same conditions. It doesn't seem to be related to communications. At this point we can discard that this issue is not related to the kernel.

Revision history for this message
Victor Mayoral (vmayoral) wrote :

If anyone has any suggestion on how we can study this situation, please let me know.

Revision history for this message
Paolo Pisati (p-pisati) wrote :

Actually PREEMPT_RT should make your kernel slower (more preemption points, threaded irq vs fast irq, etcetc) so i'm suspecting it's something else in the config or a regression in the kernel itself, more than the PREEMPT_RT patch.

Before i investigate producing an -rt kernel for the rpi2, i need one last test: can you turn off the PREEMPT_RT option in the debian kernel and test it?

Revision history for this message
Paolo Pisati (p-pisati) wrote :

Hold on, reading comments #5 it seems both PREEMPT RT and !PREEMPT_RT debian kernel are good, ok, let me do some config comparison then.

Revision history for this message
Paolo Pisati (p-pisati) wrote :

Attached there's a config diff between the debian rt config(*) and snappy - unfortunately the non-rt debian config wasn't complete so i had to use the rt file albeit a comparison between snappy and the debian non-rt would be more accurate.

Here's an excerpt of the config diff - lines with an '-' in front come from the Debian config, while lines prepended with a '+' come from Snappy.

In general in snappy we turn on more security options (AUDIT, STACKPROTECTOR, APPARMOR) and we default for less power usage (CPUFREQ_POWERSAVE, IDLE, NOHZ).

In particular, the CPUFREQ governor difference is highly suspicious, could you rerun the test on snappy after switching to the performance governor?

To see the available governors:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_available_governors

To change the running governor:

echo $governor > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---

+AUDITSYSCALL=y (pulled in by SECURITY_SELINUX)

-CONFIG_CC_STACKPROTECTOR_NONE=y
+CONFIG_CC_STACKPROTECTOR_REGULAR=y
+CONFIG_CC_STACKPROTECTOR=y

-CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
+CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE=y
+CONFIG_CPU_FREQ_GOV_COMMON=y
+CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
+CONFIG_CPU_FREQ_GOV_ONDEMAND=y

+CONFIG_CPU_IDLE_GOV_LADDER=y
+CONFIG_CPU_IDLE_GOV_MENU=y
+CONFIG_CPU_IDLE=y

+CONFIG_NO_HZ_COMMON=y
+CONFIG_NO_HZ_IDLE=y
+CONFIG_NO_HZ=y

+CONFIG_SECURITY_APPARMOR_HASH_DEFAULT=y
+CONFIG_SECURITY_APPARMOR_HASH=y
+CONFIG_SECURITY_APPARMOR_UNCONFINED_INIT=y
+CONFIG_SECURITY_APPARMOR=y
+CONFIG_SECURITYFS=y
+CONFIG_SECURITY_NETWORK=y
+CONFIG_SECURITY_PATH=y
+CONFIG_SECURITY_SELINUX_AVC_STATS=y
+CONFIG_SECURITY_SELINUX_DEVELOP=y
+CONFIG_SECURITY_SELINUX=y
+CONFIG_SECURITY_SMACK=y
+CONFIG_SECURITY=y
+CONFIG_SECURITY_YAMA=y

---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---

Revision history for this message
Paolo Pisati (p-pisati) wrote :
Revision history for this message
Paolo Pisati (p-pisati) wrote :

Ok so, after a discussion in irc, CPU_FREQ_DEFAULT_GOV_POWERSAVE=y was a leftover from the original rpi2 defconfig, while in the 4.2 kernel we changed it to CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y and it seems the situation there has improved: if you can confirm this, i'll push a config changes to the 3.19 tree (just in case someone want to use it) and close this bug.

Revision history for this message
Michael Vogt (mvo) wrote :

Setting to incomplete as this is waiting for input from the reporter (after Paolos last comment)

Changed in snappy:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Snappy because there has been no activity for 60 days.]

Changed in snappy:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.