repowerd crashes on xenial/arm64

Bug #1613602 reported by Jean-Baptiste Lallement
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical System Image
Fix Released
Critical
Unassigned
repowerd
Invalid
Undecided
Unassigned

Bug Description

xenial/arm64/frieza channel ubuntu-touch/staging/ubuntu

repowerd crashes on boot with the crash file attached

According to vicamo it crashes in glibc timezone functions. It could be the same root cause than bug 1613605.

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :
Changed in canonical-devices-system-image:
importance: Undecided → Critical
milestone: none → xenial
description: updated
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

From the attached backtrace (and also the backtrace of the related bug 1613605), this doesn't seem to be a problem caused by repowerd (or USC/Mir). It's just exhibited in these two programs because they try to log with timestamps, which indirectly involve timezone calculations (inside libc, TZ calculations are not performed explicitly by either repowerd or USC).

repowerd in particular doesn't even handle timestamps itself, it just calls the vsyslog() function, which adds timestamps internally.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

unity-system-compositor has the same problem.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

In the beginning I guess the root cause might be https://lists.debian.org/debian-glibc/2007/04/msg00007.html . Basically vsnprintf is not a async-signal-safe function, and vsnprintf is called indirectly in a signal handler for SIGINT & SIGTERM. Full Async-signal-safe functions list is in http://www.shrubbery.net/solaris9ab/SUNWdev/MTP/p37.html .

Then I set REPOWERD_LOG=null for repowerd, the crash in vsscanf is gone, but a segmentation fault appears in strlen(). This time, similar to last time, seems to be something wrong in glibc locale handling. The error is in glibc source code stdlib/strtod_l.c [1], where variable 'decimal' was assigned to an invalid pointer value 0xa94153f3540003c1.

[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_l.c;h=3d66eac70682c430d211109cbe51cc9a622d4ad9;hb=fdfc9260b61d3d72541f18104d24c7bcb0ce5ca2#l568

description: updated
Changed in canonical-devices-system-image:
status: New → Confirmed
Revision history for this message
You-Sheng Yang (vicamo) wrote :

Merge https://code.launchpad.net/~vicamo/avila-private/+git/device_malatamobile_bq_aquaris_m10_FHD/+ref/xenial-snappy/force-adb locally to get adb access. The device may still reboot after the first boot, but you should have adb access then.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

So _NL_CURRENT_LOCALE expands to `__libc_tsd_get(__locale_t, LOCALE)`, which expands to __libc_tsd_LOCALE. That __libc_tsd_LOCALE variable is only set with __libc_tsd_set macro, which is only called by uselocale(). repowerd does calls uselocale during the startup, twice. One with a valid pointer value, the other with 0xFFFFFFFFFFFFFFFF. However, neither case changes value of __libc_tsd_LOCALE at locale/uselocale.c line 37. So even we try to set a another value TZ env var, actually it's using glibc default "UTC0" if unset, repowerd still gets segfault.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

If we tried to work-around:

  1. segfault due to vsyslog() by exporting env var REPOWERD_LOG=null,
  2. segfault in std::stod() by returning constant number 680 instead,
  3. setprop ubuntu.booster.dl /system/lib64/libperfservice.so,

repowerd still dies within _IO_vfscanf_internal as 1. did.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

Removed system/lib64/power.default.so, lights.default.so, and system/lib64/libperfservice.so, then repowerd no longer crashes.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

I mean with the three hacks in #7, then repowerd no longer crashes. But when #7 hacks revoked, then crashes come back.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

https://code.launchpad.net/~morphis/tangxi-midori/+git/platform_frameworks_native/+merge/296205
I found it's related to android side libEGL. When stepping in _IO_vfscanf_internal, sometimes variable curnumeric is assigned a value that falls in libEGL memory pages.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

With that pthread_key_t fix cherry-picked to avila, now we have login dots running. Merge proposal in https://code.launchpad.net/~avila-private-team/avila-private/+git/platform_frameworks_native/+merge/303403 .

Changed in canonical-devices-system-image:
assignee: nobody → Vicamo Yang (vicamo)
assignee: Vicamo Yang (vicamo) → nobody
status: Confirmed → In Progress
You-Sheng Yang (vicamo)
Changed in canonical-devices-system-image:
status: In Progress → Fix Committed
Jason Yen (jasonyen)
Changed in repowerd:
status: New → Invalid
Changed in canonical-devices-system-image:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.