Use gettext-po parser for imports

Bug #105855 reported by Carlos Perelló Marín
4
Affects Status Importance Assigned to Milestone
Launchpad itself
Invalid
High
Unassigned

Bug Description

We already detected some errors, and fixed them, in our current python based .po parser that blocks some files to be imported in Launchpad and thus, it's a reason to use the standard gettext parser, to prevent this to happen again for some other corner cases that we don't handle well.

This week, we got another big argument to use the gettext standard parser, our is dammed slow. We got a 7.2MB .pot file to be imported and our parser takes more than 2 hours to parse it using 100% of the CPU power while gettext's parser is able to parse, validate and give us statistics of that file in a couple of seconds.

Revision history for this message
Carlos Perelló Marín (carlos) wrote :

Here is the file that is causing us performance problems

Changed in rosetta:
importance: Undecided → Critical
status: Unconfirmed → Confirmed
description: updated
Revision history for this message
James Henstridge (jamesh) wrote :

Which features are missing that you require?

Revision history for this message
Carlos Perelló Marín (carlos) wrote :

Hmm, checking what we have right now seems like it has most of the required functionality, at least I didn't find anything that we cannot workaround except for a limitation itself in gettextpo, the fact that we don't have a way to parse a .po file from memory, it requires that the file exist in the filesystem.

Taking another look to the code, I see that it could read from standard input if you set filename to '-' so maybe we could add something to the PoFile.__init__ method to accept a buffer and to provide it to libgettext using standard input.

What do you think? Would that be possible?
If that's done, I think we could try to switch to this other parser.

Revision history for this message
James Henstridge (jamesh) wrote :

Parsing from stdin does not sound like a good idea. Are you planning on doing dup2() tricks to open a new file descriptor over the top of it?

If you need a named file, just use tempfile.NamedTemporaryFile. Something like:

     fp = tempfile.NamedTemporaryFile()
     fp.write('data')
     fp.flush()

Now you have fp.name as the filename, and the file will be removed when the NamedTemporaryFile instance goes away.

Revision history for this message
Carlos Perelló Marín (carlos) wrote :

Reduced to high, we will try to do this as part of our urgent tasks before Gutsy is open to translations

Changed in rosetta:
importance: Critical → High
Revision history for this message
Carlos Perelló Marín (carlos) wrote :

We are not doing this in the near future. We implemented an easy fix that improve the speed a lot. From more than 2 hours to 12 minutes!

Changed in rosetta:
status: Confirmed → Rejected
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.