Launchpad itself

Use gettext-po parser for imports

Bug #105855 reported by Carlos Perelló Marín on 2007-04-12

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Invalid	High	Unassigned

Bug Description

We already detected some errors, and fixed them, in our current python based .po parser that blocks some files to be imported in Launchpad and thus, it's a reason to use the standard gettext parser, to prevent this to happen again for some other corner cases that we don't handle well.

This week, we got another big argument to use the gettext standard parser, our is dammed slow. We got a 7.2MB .pot file to be imported and our parser takes more than 2 hours to parse it using 100% of the CPU power while gettext's parser is able to parse, validate and give us statistics of that file in a couple of seconds.

See original description

Tags:

Revision history for this message

Carlos Perelló Marín (carlos) wrote on 2007-04-12:

big .pot file Edit (7.2 MiB, application/vnd.ms-powerpoint)

Here is the file that is causing us performance problems

Changed in rosetta:
importance:	Undecided → Critical
status:	Unconfirmed → Confirmed

Carlos Perelló Marín (carlos) on 2007-04-12

description:

updated

Revision history for this message

James Henstridge (jamesh) wrote on 2007-04-18:

Which features are missing that you require?

Revision history for this message

Carlos Perelló Marín (carlos) wrote on 2007-04-18:

Hmm, checking what we have right now seems like it has most of the required functionality, at least I didn't find anything that we cannot workaround except for a limitation itself in gettextpo, the fact that we don't have a way to parse a .po file from memory, it requires that the file exist in the filesystem.

Taking another look to the code, I see that it could read from standard input if you set filename to '-' so maybe we could add something to the PoFile.__init__ method to accept a buffer and to provide it to libgettext using standard input.

What do you think? Would that be possible?
If that's done, I think we could try to switch to this other parser.

Revision history for this message

James Henstridge (jamesh) wrote on 2007-04-19:

Parsing from stdin does not sound like a good idea. Are you planning on doing dup2() tricks to open a new file descriptor over the top of it?

If you need a named file, just use tempfile.NamedTemporaryFile. Something like:

     fp = tempfile.NamedTemporaryFile()
     fp.write('data')
     fp.flush()

Now you have fp.name as the filename, and the file will be removed when the NamedTemporaryFile instance goes away.

Revision history for this message

Carlos Perelló Marín (carlos) wrote on 2007-04-19:

Reduced to high, we will try to do this as part of our urgent tasks before Gutsy is open to translations

Changed in rosetta:
importance:	Critical → High

Revision history for this message

Carlos Perelló Marín (carlos) wrote on 2007-05-22:

We are not doing this in the near future. We implemented an easy fix that improve the speed a lot. From more than 2 hours to 12 minutes!

Changed in rosetta:
status:	Confirmed → Rejected

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

big .pot file Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.