Systematic Timeout while trying to mark a bug as duplicate

Bug #1450251 reported by teo1978
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Confirmed
Undecided
Unassigned

Bug Description

Steps to reproduce (100% systematic)

Go to this bug:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-331/+bug/1268257

Try to mark it as duplicate of 1431753 (which it is)

Result: timeout error.

Quinta Helmer (qhelmer)
Changed in launchpad:
status: New → Confirmed
Revision history for this message
KennoVO (kenno-xs4all) wrote :

Yup, confirmed here too.

Revision history for this message
teo1978 (teo8976) wrote :

I also I tried to edit duplicates of lp:1268257 to make them duplicate of lp:1431753, and after I succesfully managed to do it with a handful of them, it started to systematically time out.

It seems that the issue not only happens when the duplicate has a lot of duplicates, but also when the "target" has (and "a lot" is like around a dozen)

The whole thing is PATHETIC.
Also, timeout errors often randomly happen when just commenting on a bug. How can Launchpad be so dementially inefficient?

Revision history for this message
William Grant (wgrant) wrote :

Launchpad isn't designed for bugs to have more than 1000 duplicates. It mostly works fine, but when you mark a bug with 1000 duplicates as a duplicate of another bug, it has to also switch all 1000 duplicates over to the new master bug. That's not a case that was designed for or tested at that scale, and not a case that has even come close to happening before, so it does not perform well in the uniquely pathological case of bug #1268257.

You can work around the timeout by moving all of the duplicates to the new master separately beforehand, eg. using launchpadlib (https://help.launchpad.net/API/launchpadlib).

If you have other timeout errors, you need to provide the OOPS ID that the error gave you. Saying that a timeout happens randomly doesn't give us any way to investigate the specific issue.

Revision history for this message
teo1978 (teo8976) wrote :

> Launchpad isn't designed for bugs to have more than 1000 duplicates.

Well that's STUPID to begin with. For a bug to have several thousands duplicates is normal. Also, if it is not capable of handling that, it shouldn't allow that. If it allows it, it must handle it.

> when you mark a bug with 1000 duplicates as a duplicate of another bug, it has to also switch all 1000 duplicates over to the new master bug.

That's not the only case.
I have "moved" a few bugs that were dupes of A to dupes of B, because I couldn't mark A as dupe of B. Now, any attempt to mark any further bug X as dupe of B times out, even though X doesn't have any dupes . This cannot be justified by anything else than wrong designed. This should be a O(1). If it isn't, something is wrongly designed.

> so it does not perform well in the uniquely pathological case of bug #1268257.

"Uniquely pathological", Do you realise how idiotic that sounds? That is not something that was done on purpose for the sake of testing, that's something that naturally happened, because people use software and stumble into bugs, and they report them, and sometimes they are duplicates. And sometimes bugs are huge and hit millions of people. And with Ubuntu, that happens a lot.

There's nothing pathological about it, except for the magnitude of the bug. But if Launchpad has not been designed to handle bugs of such huge impact, then it's not a proper bug tracker for Ubuntu.

Revision history for this message
teo1978 (teo8976) wrote :

> You can work around the timeout by moving all of the duplicates to the new master separately beforehand

I was doing that (manually), and after I moved a few hundreds, it has started to systematically time out when moving EVERY SINGLE ONE.

Revision history for this message
teo1978 (teo8976) wrote :

> If you have other timeout errors, you need to provide the OOPS ID that the error gave you.

If you want me to do that, you must show the OOPS ID on screen every time a timeout error occurs.
These timeouts are occurring in AJAX operations, and I haven't seen the phrase "OOPS ID" onscreen ever.

Revision history for this message
William Grant (wgrant) wrote : Re: [Bug 1450251] Re: Systematic Timeout while trying to mark a bug as duplicate
Download full text (4.1 KiB)

On 17/06/15 09:17, teo1978 wrote:
>> Launchpad isn't designed for bugs to have more than 1000 duplicates.
>
> Well that's STUPID to begin with. For a bug to have several thousands
> duplicates is normal. Also, if it is not capable of handling that, it
> shouldn't allow that. If it allows it, it must handle it.

Software isn't flawless, and there are anomalous edge cases for almost
all software that aren't handled well. In this case, a cosmetic
operation is unable to be performed on roughly 0.00005% of bug reports,
and there is a workaround, so it's not a very high priority to fix
directly. If we were to fix it, the fix would be to prevent marking bugs
with more than 100 duplicates as a duplicate of another bug -- not a
significant improvement over the "sorry, you can't do that because it
was too slow" message that you get now.

>> when you mark a bug with 1000 duplicates as a duplicate of another
> bug, it has to also switch all 1000 duplicates over to the new master
> bug.
>
> That's not the only case.
> I have "moved" a few bugs that were dupes of A to dupes of B, because I couldn't mark A as dupe of B. Now, any attempt to mark any further bug X as dupe of B times out, even though X doesn't have any dupes . This cannot be justified by anything else than wrong designed. This should be a O(1). If it isn't, something is wrongly designed.

Someone was repeatedly trying to mark A as a duplicate of B many
thousands of times an hour, causing database locks to be held on A, B,
and all of A's duplicates, preventing other duplicate operations from
completing on those bugs.

Now that they've stopped doing silly things like that (the message
suggests retrying in a couple of minutes, not immediately retrying
several times a second for several hours!), marking dupes of A as dupes
of B instead works fine through both the web UI and the API. Once most
of them have been moved across, A can be marked as a dupe of B and the
workaround will be complete. If it doesn't work, say so in a pleasant
manner and we can work out why it's broken again and how to fix it.

>> so it does not perform well in the uniquely pathological case of bug
> #1268257.
>
> "Uniquely pathological", Do you realise how idiotic that sounds?

It's not idiotic at all. A pathological case is an unlikely set of
circumstances that can cause bad behaviour, deliberate or otherwise.

In this case, the largest master bug in history is being marked as a
duplicate of another large master bug -- only necessary because someone
filed a new bug and decided that *it* should be the new master, rather
than using the existing bug, a very uncommon occurrence for an
established bug with hundreds of duplicates. This particular 0.00005% of
the dataset is by far the biggest piece of work that this code has ever
seen, and the code does not handle it well, so it is clearly a uniquely
pathological case.

> That is
> not something that was done on purpose for the sake of testing, that's
> something that naturally happened, because people use software and
> stumble into bugs, and they report them, and sometimes they are
> duplicates. And sometimes bugs are huge and hit millions of people. And
> with Ubuntu, that ha...

Read more...

Revision history for this message
teo1978 (teo8976) wrote :
Download full text (4.4 KiB)

> Software isn't flawless,

And here's a flaw of this software, which is why I reported the bug.
I was under the impression that you were trying to tell me that this wasn't a flaw at all, that it was because this software is "not designed for" doing a given task and that you were claiming that this was a sensible design decision, which it isn't.
But perhaps I misinterpreted you, as I see you haven't closed the bug. Perhaps you were just explaining where the root of the bug is, which is a design flaw, in which case we agree.

> and there are anomalous edge cases for almost all software that aren't handled well.

Here you are considering "anomalous edge case" a case that should be considered an "obvious normal case", perhaps not frequent, but which should definitely be handled well.

> In this case, a cosmetic operation

Pardon me, "cosmetic"??

> is unable to be performed on roughly 0.00005% of bug reports,

Do you have any data that backs up that statistical estimation?

> and there is a workaround,

Yeah, a ridiculously painful workaround that takes tens of man hours: moving all duplicates of the source bug one by one.

> so it's not a very high priority to fix directly.

Oh well, priorities are subjective, I'll give you that.

> Someone was repeatedly trying to mark A as a duplicate of B many
> thousands of times an hour, causing database locks to be held on A, B,
> and all of A's duplicates, preventing other duplicate operations from
> completing on those bugs.
> Now that they've stopped doing silly things like that

No, that was me yesterday (having launched an infinite while loop from a terminal; i did it in the hope that the spike in errors would attract some attention to the bug).
But that was YESTERDAY, and I stopped. TODAY, I manually started (again) to manually move dozens dupes of A to dupes of B (NOT to mark A as dupe of B), which is nothing silly, it is the "workaround" you suggest yourself, and it worked fine for the first few dozens bugs. Then, it started to systematically time out.

Indeed, after a few minutes, it did start working again.

So, if SUCCESFULLY moving a bunch of dupes from a bug to another triggers some "database lock", then that's another wrong design choice.
And also, if the reason the operation fails is that it has been locked for security or whatever reason, then "timeout error" is a dementially wrong error message.

> If it doesn't work, say so in a pleasant manner and we can work out why it's broken again and how to fix it.

1) Whether I speak in a pleasant or unpleasant manner should not be of any relevance in taking action to fix something that is broken
2) We are talking about another issue: the workaround shouldn't be needed at all. Handling of marking a dup that has its own dups is handled in a ridiculously inefficient matter and it must be optimized.

> In this case, the largest master bug in history is being marked as a duplicate of another large master bug

And the fact that it doesn't work demonstrates that the estimation of the size of bugs that the system should be capable of handling was pathetically wrong.
That's as if a video going viral made facebook crash and they said "This is ...

Read more...

Revision history for this message
teo1978 (teo8976) wrote :

> Bugs happen, in bugtrackers or in Ubuntu. Is
> that Ubuntu bug also unacceptable and worth unpleasant rants in bug reports?

Yes, of course it is, and it got quite a few.

Revision history for this message
teo1978 (teo8976) wrote :

Well there are a few typos in my comment but I can't edit that (yet another pathetic flaw in Launchpad), e.g. "matter" instead of "manner", but I guess it can be understood anyway.

Revision history for this message
teo1978 (teo8976) wrote :

By the way, the "rant" in this bug report (well, the most part) was not about the bug itself but about the fact that you seemed to not consider it a bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.