Use XPath to parse LP-Pages

Bug #93499 reported by Markus Korn
4
Affects Status Importance Assigned to Milestone
python-launchpad-bugs
Fix Released
Wishlist
Markus Korn

Bug Description

So far we are using Regular Expressions to parse the LP-Pages. These RegEx are mostly complicated and hard to maintain. The usage of XPath is more intuitive.

The attached patch against bughelper.main r118 provides a implementation of XPath. The Html-code of the LP-Pages is parsed by libxml2.htmlParseDoc

In some cases I was unable to replace the RegEx with a equivalent (simple) XPath-Construction.

This code needs to be tested and reviewed.
Also someone who is more familiar with XPath should review the statements and constructions i have chosen.

Markus

Revision history for this message
Markus Korn (thekorn) wrote :
Changed in bughelper:
assignee: nobody → thekorn
importance: Undecided → Wishlist
status: Unconfirmed → In Progress
Revision history for this message
Daniel Holbach (dholbach) wrote :

I pushed and updated and slightly modified patch to https://code.launchpad.net/~bugsquad/bughelper/xpath - let's continue our work together in there.

Revision history for this message
Markus Korn (thekorn) wrote :

Set "Fix Status" to "Abandoned Attempt" because we should do further development in the xpath-version in python-launchpad-bugs and added python-launchpad-bugs/xpath branch.

Markus

Revision history for this message
Markus Korn (thekorn) wrote :

Merged into main,

------------------------------------------------------------
revno: 6
committer: Markus Korn <email address hidden>
branch nick: main
timestamp: Thu 2007-04-26 12:42:35 +0200
message:
  use XPath to parse launchpad's HTML-pages instead of regular expressions
    ------------------------------------------------------------
    revno: 4.1.7
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Thu 2007-04-26 12:39:24 +0200
    message:
      fixed issue in BugAttachment; added some more usefull error-messages
    ------------------------------------------------------------
    revno: 4.1.6
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Thu 2007-04-26 10:11:36 +0200
    message:
      adding assert statement to check URL in BugAttachment
    ------------------------------------------------------------
    revno: 4.1.5
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Thu 2007-04-26 09:55:52 +0200
    message:
      fix error in Bug.info, thanks to Daniel Holbach; adding information to assert statements
    ------------------------------------------------------------
    revno: 4.1.4
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Tue 2007-04-24 12:28:49 +0200
    message:
      merged fix for bug 109213
    ------------------------------------------------------------
    revno: 4.1.3
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Tue 2007-04-24 12:14:32 +0200
    message:
      add reporter and proptags property to the Bug object; choosen 'proptags' as name to avoid conflicts with existing tags attribute (has to be renamed later!)
    ------------------------------------------------------------
    revno: 4.1.2
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Tue 2007-04-24 12:10:27 +0200
    message:
      rename hdoc into xmldoc; adding xmldoc attribute to bug-object; adding bugreport attribute to bug-object to get the pure text of a bug-report
    ------------------------------------------------------------
    revno: 4.1.1
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Mon 2007-04-23 11:44:09 +0200
    message:
      use XPath in BugPage and Bug
------------------------------------------------------------

Changed in python-launchpad-bugs:
status: In Progress → Fix Committed
Markus Korn (thekorn)
Changed in python-launchpad-bugs:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.