apport hook for Evolution

Bug #391623 reported by C de-Avillez
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
evolution (Ubuntu)
Triaged
Wishlist
C de-Avillez

Bug Description

Binary package hint: evolution

I have written an apport hook for Evolution; the main idea was to sanitise the stacktraces (by taking out all private data). Since I was at it, I went ahead and collected a bit of non-private data from gconf: eplugins, Junk, and prompts entries.

The hook tries to find instances of:

- email addresses
- IP addresses, or fully-qualified host names
- some variables that are known (to developers and triageres) to contain private data, like 'key', 'uri', 'profname', 'username', 'password', etc.

Full sanitising depends on a hook being driven from the back-office retrace (see bug 387933), and will probably need a slightly different hook: for example, we should not try to collect gconf data...).

I am attaching the following files:

* source_evolution.py: the apport hook itself
* evo-bugs-with-stacktrace.csv: a comma-separated-values file with the Evolution bugs that contained a stacktrace attachment
* apport-download.py: a first-try on a small apport utility to download (rebuild) an apport report from a bug
* evo-tester.py: a hack to download the apport reports from the CSV file above.

Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
C de-Avillez (hggdh2) wrote :
Revision history for this message
Sebastien Bacher (seb128) wrote :

Thank you for your work, the change is not trivial do you think you could add some extra details on the regular expressions and the formats used by giving some basic examples maybe and a testcase which has some values and known updated ones to make sure the code does what is expecteD?

Changed in evolution (Ubuntu):
assignee: nobody → Ubuntu Desktop Bugs (desktop-bugs)
importance: Undecided → Wishlist
Revision history for this message
Sebastien Bacher (seb128) wrote :

unsubscribing the sponsors for now so it doesn't stay on the review list while it's being discussed

Revision history for this message
C de-Avillez (hggdh2) wrote :
Download full text (4.2 KiB)

The Evolution hook works on the BTs, looking for known variables that hold (potentially) private data, and -- for any other variable --, we scan for instances of IP or email addresses, and fully-qualified server names. All matches are replaced by the string '##MASKED##'.

Of course, this will only be fully effective when bug 387933 is resolved for the backoffice.

Meanwhile, the hook seems to be working correctly for the list of Evolution bugs Brian provided me with (BTW, thank you!). The hook currently:

1. Collects Evolution GConf data ( Plugins, Junk Setup, and Prompts subkeys of /apps/evolution); these are added in a [Miscellaneous] string;
2. for each of {Stacktrace, ThreadStacktrace): scans the lines, and replaces any string value for following Evolution variables by the string "##MASKED##":
    r'''(key|url_string|url|filename|filesave|uri|profname|user|source|username|password|server|domain|domain_name) # variables in trace
    ([\s]*[=].+?["]) # intermediate text (class, address, etc)
    (.*?) # what we really want: the string data
    (["][, ]*)''' # the delimiter
3. then we search & replace still-existing instances of email addresses, fully-qualified server names, and IP addresses (in this order), in any other variables.
4. (Currently) writes a *diff* for the changes made (creates two *new* entries in reports[]. This was done because we were not sure of how invasive the changes would be, and considered better to just write a diff, at least for now. *Input needed*

For both FQSN and email addresses we use the following RE for domain names:
    '(aero|arpa|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|[a-z]{2})'
This RE wil match on any of the initial words, or on any two letters.

For IP addresses we use the following RE:
    '([^\d])(\d{1,3}[.]\d{1,3}[.]\d{1,3}[.](\d{1,3}))([^\d])'
This RE will match on *any* dotted sequence of one to three digits, enclosed in non-digits (for example, "[1.2.3.4]"). It will also match on invalid IP addresses (since no limits are set on the range; for example it will match on "a912.513.401.12/".

For email addresses we use the following RE:
    '[\w\.\-]+@[\w\.\-]+[.]' + DOMAIN_NAMES
This RE will match on words (plus '.' and '-', followed by an at symbol ('@') and a DOMAIN_NAME. This is clearly not fully correct (it would allow, for example, for an email starting with '.'), but it is enough.

For FQSN we use the following RE:
    '([^\w])([\w.-]*[.]' + DOMAIN_NAMES + ')([^\w\-]|[\n])'
This RE is very similar to the email RE; the differences are (1) it is pre/post-fixed with non-words, and has a dot instead of an at symbol.

5. Finally, we currently calculate a diff of the changes to Stacktrace and ThreadStacktrace, and add it in the report as [Stacktrace.diff] and [ThreadStacktrace.diff].

6. and exit.

Additional comments:

(a) although the idea is to provide a sanitised stacktrace in order to allow for the bug to be classified Public, I was reluctant to delete the original stacktraces: not only I may be missing something, but also there *might* be a case where the sanitised value would be needed for a full understanding...

Read more...

Changed in evolution (Ubuntu):
status: New → Triaged
Revision history for this message
C de-Avillez (hggdh2) wrote :

Well, I am back on this. I lost all original bits when I had to reinstall Ubuntu (and a mis-understanding/error on debian-installer formatted the filesystem).

So I will backtrack a bit, and get this done.

C de-Avillez (hggdh2)
Changed in evolution (Ubuntu):
assignee: Ubuntu Desktop Bugs (desktop-bugs) → C de-Avillez (hggdh2)
Revision history for this message
Nigel Babu (nigelbabu) wrote :

Looking at the last few comments, marking this as 'patch-needswork' Please feel free to change that to a patch tag once you have a new patch.

This patch was reviewed as part of operation cleansweep. Please see https://wiki.ubuntu.com/OperationCleansweep for details on how to help us review all patches in Ubuntu!

tags: added: patch-needswork
Revision history for this message
C de-Avillez (hggdh2) wrote :

Indeed. After a quite long lull, ddecator and kermiac proposed themselves to work on this, and we are moving the target a big (still being discussed): we want to provide a generic process to clean up backtraces, not something just for Evo.

Revision history for this message
Kip Warner (kip) wrote :

Great idea C de-Avillez.

Revision history for this message
Jörg Frings-Fürst (jff-de) wrote :

Bug from 2009. Version not longer supported.
Change status to Invalid

Changed in evolution (Ubuntu):
status: Triaged → Invalid
Revision history for this message
Brian Murray (brian-murray) wrote :

@Jörg Frings-Fürst - What about this particular bug report leads you to believe that it is specific to a certain version of evolution? As far as I know backtraces would still benefit from sanitization and this is not being done for evolution crashes.

Changed in evolution (Ubuntu):
status: Invalid → Triaged
Revision history for this message
Jörg Frings-Fürst (jff-de) wrote :

@Brian Murray

I think a bug who is not forwarded to gnome since 2009 can realy closed.

CU Jörg

Revision history for this message
C de-Avillez (hggdh2) wrote :

Actually, this is an item for Ubuntu, not upstream -- this deals with Apport, and the way we do (or dir) things.So it should still be triaged. And... I will check how things are going, and decide on what do to.

Revision history for this message
Jörg Frings-Fürst (jff-de) wrote :

Thanks for your feedback.

Isn't it better to take it in a blueprint?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.