Tracker continually reindexes same files, databases grow arbitrarily large

Bug #157523 reported by Pausanias
This bug report is a duplicate of:  Bug #155244: tracker does not stop. Edit Remove
2
Affects Status Importance Assigned to Milestone
tracker (Ubuntu)
In Progress
Undecided
Jamie McCracken

Bug Description

On a fully updated gutsy, I am experiencing some fairly serious problems with tracker. When indexing my home directory for the first time, it behaves just fine. It indexes for a while, and then stops (tracker-status returns Idle). Then at some point after a reboot and/or suspend-resume cycle, it begins to index the same old files, and never reaches Idle again. At some point it gets stuck continually reindexing my Desktop folder, and the data base sizes grows to arbitrarily large size. When I wipe the .cache/tracker directory and start from scratch, the same behavior happens.

Steps to reproduce
- Fully updated gutsy, ReiserFS home directory
- Kill trackerd
- Wipe .cache/tracker
- Use included tracker.cfg in .config/tracker
- Start trackerd. Wait until tracker-status returns Idle.
- Reboot system. Use normally, suspend/resume occasionally.
- At some point after a reboot, tracker will start indexing and never stop

Expected behavior:
- Tracker should not reindex the same files continuously, with the database growing arbitrarily large

I am also appending the output of trackerd -v 3

Revision history for this message
Pausanias (pausanias) wrote :

Gzipped tracker log showing the continual re-indexing.

description: updated
Revision history for this message
Pausanias (pausanias) wrote :
Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

any idea what triggered it and can you reproduce it?

the sqlite metadata db prevents the same URI from being saved but perhaps it got corrupted (although log does not show the desktop folder being saved to disk - more like its getting stuck in an infinite loop)

is your desktop folder a normal directory or a symlink?

Revision history for this message
Pausanias (pausanias) wrote :

It is 100% reproducible, i.e. it happens every single time I start indexing from scratch. I notice it after ~1 day or so. The other thing I'm sure of is that file-meta.db keeps on growing through the infinite loop. I've let it go on and on to see how long it keeps on doing it, and at one point the file-meta.db reached 1.3GB for a 15GB partition (from a size of 30MB when it originally was done indexing).

The growth rate seems approximately 60kB/second.

What I'm still not sure about is what triggers it. What I do know is that once it gets stuck into that infinite loop, it keeps on repeating the infinite loop regardless of what I do (reboot, etc). The only thing that will stop it is an rm -rf ~/.cache/tracker.

So your hypothesis that it's a corrupt db that's causing it seems plausible.

It seems rather hard to catch it at the exact moment when it triggers but I have the following suspicions:

1) Could if have anything to do with ReiserFS? (I remember Beagle used to have some special mount requirements for reiserfs)
2) Could it be getting corrupted across a suspend/resume cycle? At one point I did suspend and resume just to see what would happen, but it stayed Idle.

What I'll do now is do an rm -rf, then suspend/resume, and the reboot and see what happens. If I catch it triggering I'll be sure to report here.

Hope I can find the problem, because I've come to rely on tracker and it's quite an excellent piece of software.

Revision history for this message
Pausanias (pausanias) wrote :

To answer your other question, no, Desktop is not a symlink. It does, however, have a symlink in it that points to a subdirectory in the home folder (i.e. ~/Desktop/a.pdf points to ~/talks/a.pdf)

It also contains some files owned by root.

Revision history for this message
Pausanias (pausanias) wrote :

After a little experimentation, I have some more information to report. I am fairly certain that this is what's going on.

1) It seems that whenever trackerd is started by gnome-session, the infinite looping happens.
2) When I kill that trackerd and start my own, it nicely resettles into an Idle state.
3) When the gnome-session trackerd is allowed to go on for too a long time, even the user-started trackerd starts to loop forever (this is how I was able to get the original log).

I have been unable to generate a log file of the gnome-session trackerd. I've tried editing the command used to start it up (In System->Preferences->Sessions), but it refuses to log to a file even if I type "trackerd -v 3 >> /home/pausanias/tracker.log". Oddly enough, "ps ax | grep trackerd" DOES show the redirect to tracker.log, but tracker.log doesn't grow. Is there standard logfile anywhere for trackerd?

For now, my workaround will be to disable the gnome-session trackerd and start it manually myself, until you guys figure out what's going on. I'll be happy to help however I can.

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

edit trackerd in gnome-session to include -v 3 so that it logs verbosely (default is silent and only errors are logged)

It should never get in an infinite loop because it works as follows:

1) directory is indexed and mtime logged
2) changed/new files in directory are then indexed
3) directory mtime is checked again (it does this to prevent race conditions as mtime on a folder will change if any file in it changes so we need to rescan if a change occurred while indexing it) if new mtime is different it goes back to (2)

the sqlite db has a unique index on URI so it should never store the same URI twice (unless the index is corrupted) - you will see an sqlite error in the log file if an attempt is made to store a URI twice.

I am investigating the source to try and find the problem

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

Will force integrity check on all sqlite dbs at startup and force reindex if they are corrupted

Changed in tracker:
assignee: nobody → jamiemcc-blueyonder
status: New → In Progress
Revision history for this message
Pausanias (pausanias) wrote :

Here is a more extensive test with a log.

First, I disabled trackerd in gnome-session. I wiped the cache, and started trackerd from the command line. I waited until it got an idle status. I logged in and out many times, repeating this procedure. It would always return to Idle status after a minute or so.

Next, I enable tracker in gnome-session with -v 3. I log out and log back in. It immediately starts indexing and keeps on going. After about 20 minutes I kill it.

I managed to get a log of this gnome-session started tracker doing the infinite indexing thing.

The file is attached. It is 535853 lines long. The transfer to the infinite repetition occurs at line 14841. In this particular run it is no longer the Desktop folder, but a set of 3 other folders that are being continually logged. File names have been changed to xxxxxxxxxxxxxx for privacy.

Conclusions:
1) It is confirmed that running trackerd from the command line does not yield this behavior.
2) It is always folders that are triggering this bug, but not always the same folder.

Revision history for this message
Pausanias (pausanias) wrote :

Hmm, that didn't seem to go through. I'll try again:

Revision history for this message
Pausanias (pausanias) wrote :

OK, well it seems to be limiting my logfile upload size. Let me know an email where I can send it.

Regarding checking DB integrity at startup: good idea, but it may not fix this bug, since the database IS valid when the gnome-session trackerd starts.

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

Ok I think its because gnome-session is force killing trackerd instead of waiting for it to exit gracefully

see man page for gnome-session

you can set --suicide-delay to 0 to prevent it killing trackerd prematurely which is probably why its corrupting (especially on reiser) - you will need to set the change in the your x login scripts (default is 10 secs btw but that should be long enough to exit)

can you test with that and get back to me?

There is nothing else in gnome-session that could possibly be causing the corruption AFAICT

Revision history for this message
Pausanias (pausanias) wrote :

I think you may be right about gnome-session doing something weird to trackerd when it quits. But I'm not sure if killing trackerd prematurely is what's doing it.

For the past few days I've been running trackerd manually, and it behaves fine, no problems whatsoever. This even though it's being killed by the system shutdown process (I don't bother to shut it down manually).

Then I decided to retry the gnome-session trackerd. Here's what I did:

1) Kill manually started trackerd with pkill trackerd.
2) Reactivate trackerd in gnome-session.
3) Log out and log back in.
4) Trackerd works fine, returns idle.
5) Log out and log back in
6) tracker-status first says it can't access the tracker daemon. Then it says this:
[Invalid UTF-8] Tracker daemon's status is \x81\xc3\x8b\x1c.
7) trackerd runs haywire, reindexing everything ad infinitum.

So, obviously, something really screwy happens to trackerd when gnome-session kills it, because when I kill it via shutdown or via pkill trackerd nothing bad ever happens.

Revision history for this message
Pausanias (pausanias) wrote :

Marking as duplicate of the 155244 bug as they are clearly the same.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.