torque-server init script fails during installation and removal

Bug #223649 reported by Morten Kjeldgaard
34
Affects Status Importance Assigned to Milestone
torque (Ubuntu)
Fix Released
Undecided
Morten Kjeldgaard
Hardy
Won't Fix
Undecided
Morten Kjeldgaard
Intrepid
Invalid
Undecided
Morten Kjeldgaard
Jaunty
Fix Released
Undecided
Morten Kjeldgaard

Bug Description

When installing the torque-server package, the installation fails when the init script is being executed. The installer complains about missing directories.

When removing the package, apt-get also fails, because the init script fails. This can only be fixed by editing the torque-server init script, so it always returns 0.

Related branches

Revision history for this message
Morten Kjeldgaard (mok0) wrote :

The attached patch fixes this problem, plus another couple of packaging issues that was addressed after the initial upload to the NEW queue.

motu-sru, please allow an SRU for hardy for the torque package.

Revision history for this message
Morten Kjeldgaard (mok0) wrote :

Patch incorporating modifications according to discussion with Luca Falavigna on IRC

Revision history for this message
Luca Falavigna (dktrkranz) wrote :

ACK from motu-sru.
Please, do not ship linda override in the upload, it is just a cosmetic change.
Also, provide a good TEST CASE to speed up verification process. Thanks!

Changed in torque:
status: New → Confirmed
Revision history for this message
Martin Pitt (pitti) wrote :

Why did you remove the pid file check? There is no rationale for this in the changelog, nor a separate bug report about this. PID files are a good thing to not break processes in chroots, or processes started as different users, etc.

Changed in torque:
status: New → Incomplete
Revision history for this message
Morten Kjeldgaard (mok0) wrote : Re: [Bug 223649] Re: torque-server init script fails during installation and removal

> Why did you remove the pid file check? There is no rationale for
> this in
> the changelog, nor a separate bug report about this. PID files are a
> good thing to not break processes in chroots, or processes started as
> different users, etc.

Yes, that's how I thought the --pidfile option worked too. However,
the purpose of this option is to specify a PID file created by the
server process itself, and because pbs_server does not do that, it is
not relevant in this case.

Cheers,
Morten

Revision history for this message
Martin Pitt (pitti) wrote :

Accepted into -proposed, please test and give feedback here

Changed in torque:
status: Incomplete → Fix Committed
Revision history for this message
Morten Kjeldgaard (mok0) wrote :

jd, please test the updated package in hardy-proposed and report back here!

Cheers,
Morten

Morten Kjeldgaard (mok0)
Changed in torque:
status: Confirmed → Fix Committed
Revision history for this message
Ted Cabeen (ted-cabeen) wrote :

Unfortunately, the --pidfile options are still present in this release, and they cause the initscript to break.

Also, /var/lib/torque/mom_priv/jobs, /var/lib/torque/spool and /var/lib/torque/undelivered need to be added to the dirs for torque-mom. (/var/lib/torque/spool and /var/lib/torque/undelivered need to have 1777 permissions)

Revision history for this message
Ted Cabeen (ted-cabeen) wrote :

Looking in more detail, they are gone from the server init script, but not the mom or scheduler one.

Also, the mom won't run properly unless it has a config file in /var/lib/torque/mom_priv/config. It can be empty, but it has to be present.

Finally, the postinst script should create the database, if it doesn't exist yet. Here's a sample script to do it:
/usr/sbin/pbs_server -t create && sleep 2 && /usr/bin/qterm

Revision history for this message
Morten Kjeldgaard (mok0) wrote :

Martin, please release the version in hardy-proposed into hardy-updates.

Revision history for this message
jd (jeff-dyke) wrote :

Its taken me a while to get back to this, but i'm having similar issues with the package from proposed. I've taken action on each of the comments by Ted, removing the --pidfile arguments, creating the database, running `touch /var/lib/torque/mom_priv/config` and changing permissions to 1777 on both /var/lib/torque/spool and /var/lib/torque/undelivered.

First i tried simply to install the -proposed version and it failed, so then I made the changes above and tried a reinstallation and it continues to fail. It would seem though that I can start each of the services, mom, scheduler and server and see them running is ps -ef. And since i'm brand new to this world i wanted to install the torque-gui, which seems to install, but i can't find where to launch it from (which is a side question).

In conclusion, I'd say the -proposed does not fix the bug at hand, even though i *may* have a working torque server

If there is anything else I can provide, please ask, I'd really like to use this.

All crash files added

Revision history for this message
jd (jeff-dyke) wrote :

After changing /var/lib/torque/server_name from torqueserver to the name of my machine, I am able to run qmgr and configure the server, as well as successfully configure nodes and have them display as 'free' via pbsnodes -a.

So it looks like i do have a working server and clients.

Revision history for this message
Morten Kjeldgaard (mok0) wrote :

Thanks for the very useful comments. I will fix these issues and make another upload shortly.

Changed in torque:
assignee: nobody → mok0
status: Fix Committed → In Progress
assignee: nobody → mok0
status: Fix Committed → In Progress
Revision history for this message
Giovanni Novelli (giovanni-novelli) wrote :

I have looked here to find a solution to the problem related to the bug, synthetically about just removing the torque (especially torque-mom, torque-scheduler and torque-server). The working solution is that stated in first post "This can only be fixed by editing the torque-server init script, so it always returns 0." that translates in putting a line with "exit 0" before line "This can only be fixed by editing the torque-server init script, so it always returns 0." in /etc/init.d/torque-mom, /etc/init.d/torque-scheduler and /etc/init.d/torque-scheduler.

Revision history for this message
Giovanni Novelli (giovanni-novelli) wrote :

I have looked here to find a solution to the problem related to the bug, synthetically about just removing the torque (especially torque-mom, torque-scheduler and torque-server). The working solution is that stated in first post "This can only be fixed by editing the torque-server init script, so it always returns 0." that translates in putting a line with "exit 0" before line "This can only be fixed by editing the torque-server init script, so it always returns 0." in /etc/init.d/torque-mom, /etc/init.d/torque-scheduler and /etc/init.d/torque-scheduler.

Revision history for this message
Darren Faulke (darren-alidaf) wrote :

I can't get any of this to work and I would like to just remove it by hand as it interferes with just about everything I try to install now. Apt-get remove doesn't work so can anyone offer some hints to get rid of it. Ta

Revision history for this message
Ted Cabeen (ted-cabeen) wrote :

alidaf: To remove the packages, add the line "exit 0" to the top (like 3 or so is fine) of each of the torque scripts in /etc/init.d, as Giovanni describes above. Then you can remove/purge the packages with dpkg.

Revision history for this message
Sebastian Kapfer (caci) wrote :

Ping? This package is a catastrophe.

Revision history for this message
Ted Cabeen (ted-cabeen) wrote :

It is pretty bad, but once you have it installed, it does work properly. What problems are you having?

Revision history for this message
Christian Hudon (chrish) wrote :

I still get this error with the version in hardy-proposed:

Setting up torque-server (2.1.8+dfsg-0ubuntu1.1) ...
 * Starting Torque batch queue server: PBS_Server: No such file or directory (2) in chk_file_sec, Security violation with "/var/lib/torque/spool/"
PBS_Server: No such file or directory (2) in chk_file_sec, Security violation with "/var/lib/torque/pbs_environment"
PBS_Server: PBS_Server, pbsd_init failed
invoke-rc.d: initscript torque-server, action "start" failed.
dpkg: error processing torque-server (--configure):
 subprocess post-installation script returned error exit status 3
Errors were encountered while processing:
 torque-server

A fix would really be appreciated. This makes the torque packages unusable.

Revision history for this message
Ted Cabeen (ted-cabeen) wrote :

If you want to fix it yourself, just create the following directories before you install the package:
/var/lib/torque
/var/lib/torque/server_priv
/var/lib/torque/server_priv/jobs
/var/lib/torque/server_priv/queues
/var/lib/torque/server_priv/accounting.

On clients, you need the following:
/var/lib/torque/mom_priv
/var/lib/torque/mom_priv/jobs
/var/lib/torque/spool (with permissions 1777)
/var/lib/torque/undelivered (with permissions 1777)

Also, you'll need a /var/lib/torque/server_name file on your servers and clients listing the torque server DNS name.

That's everything I have listed in my puppet config. Ideally all of these should be put in the package.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package torque - 2.1.8+dfsg-0ubuntu3

---------------
torque (2.1.8+dfsg-0ubuntu3) intrepid; urgency=low

  * Build for intrepid.

torque (2.1.8+dfsg-0ubuntu1.2) hardy-proposed; urgency=low

  * Fix remaining issues with init scripts (LP: #223649)
  * Add missing empty directories to package torque-client

 -- Morten Kjeldgaard <email address hidden> Wed, 27 Aug 2008 21:48:57 +0200

Changed in torque:
status: In Progress → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted into -proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in torque:
status: In Progress → Fix Committed
Revision history for this message
Christian Hudon (chrish) wrote :

The latest torque packages in hardy-proposed work for me.

Thanks!

Revision history for this message
Christian Hudon (chrish) wrote :

There's one small remaining problem with the torque-server init script: it's not idempotent. I found this out when running a "dpkg --configure torque-server" (to finish configuring the package) while the server was running.

I wasn't sure what was best, so I filed a separate bug report about this: https://bugs.launchpad.net/ubuntu/+source/torque/+bug/270574

Revision history for this message
Christian Hudon (chrish) wrote :

There's one issue with the packages in hardy-proposed. The following directories:

/var/lib/torque/mom_priv
/var/lib/torque/mom_priv/jobs

should be in the torque-mom package, not in the torque-client one. As is it, it makes torque-mom broken unless torque-client is also installed on the same machine (which is not always the case).

Revision history for this message
Christian Hudon (chrish) wrote :

Another issue. The /var/lib/torque/spool and /var/lib/torque/undelivered directories are not created with mode 1777 (see comment 21), which make submitted jobs fail when torque tries to run them on the compute note.

Revision history for this message
Christian Hudon (chrish) wrote :

Yet another reason why the hardy-proposed package is not ready for prime time: the directory /var/lib/torque/server_priv/acl_svr is missing from the torque-server package. Without this directory, the operators and managers settings of the torque-server are not preserved across executions.

Revision history for this message
Christian Hudon (chrish) wrote :

Three other directories missing (all in /var/lib/torque/server_priv):

acl_hosts
acl_users
acl_groups

Revision history for this message
Luca Falavigna (dktrkranz) wrote :

Marking verification-failed for now, this will probably require a new fix for Intrepid too.

Revision history for this message
Martin Pitt (pitti) wrote :

Reopening for all releases, since this still isn't fixed. I removed the SRU from hardy-proposed, since it failed verification.

Changed in torque:
status: Fix Committed → Confirmed
status: Fix Released → Confirmed
status: Fix Released → Confirmed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package torque - 2.3.6+dfsg-0ubuntu1

---------------
torque (2.3.6+dfsg-0ubuntu1) jaunty; urgency=low

  * New upstream release (LP: #235385).
  * Torque-server init script fails during installation and
    removal (LP: #223649) fixed, "set -e" removed from init script.
  * Package torque-scheduler 2.1.8+dfsg-0ubuntu1.1 failed to install
    (LP: #244440 LP: #270574) fixed, "set -e" removed from init script.
  * /etc/init.d/torque-mom not idempotent, and stop doesn't work
    (LP: #256998) Fixed, "set -e" removed from init script.
  * Torque-scheduler prints errors during package configuration
    (LP: #270653). Reason: missing dir sched_config. Fixed,
    package installs FIFO scheduler config file in
    /var/lib/torque/sched_priv/.
  * Package torque-mom 2.1.8+dfsg-0ubuntu1 failed to
    install/upgrade (LP: #276575 LP: #291674). Reason: missing directories.
    Fixed, /var/lib/torque/server_priv/jobs/,
    /var/lib/torque/server_priv/queues/,
    /var/lib/torque/server_priv/accounting/ and
    /var/lib/torque/mom_priv/jobs/ are now installed in their respective
    packages.
  * Package torque-gui missing most of the files it needs to run!
    (LP: #281360). Reason: missing *.tk etc. files from src/gui
    Fixed: /usr/lib/xpbs now shipped in package
  * changed patch system to quilt.

 -- Morten Kjeldgaard <email address hidden> Mon, 16 Feb 2009 17:32:28 +0100

Changed in torque:
status: Confirmed → Fix Released
Revision history for this message
UweBrauer (oub) wrote :

Hello

since I have a lot of problems with Kubuntu releases > 8.04, I want to stick with 8.04 for the moment. Where can I find a torque-server deb which works?

I tried to read all the posts and it seems there have been a release in hardy-proposed but
I can't find it in updates.

thanks

Uwe Brauer

Revision history for this message
Alex Valavanis (valavanisalex) wrote :

Intrepid Ibex reached end-of-life on 30 April 2010 so I am closing the report. The bug has been fixed in newer releases of Ubuntu.

Changed in torque (Ubuntu Intrepid):
status: Confirmed → Invalid
Revision history for this message
Rolf Leggewie (r0lf) wrote :

Hardy has seen the end of its life and is no longer receiving any updates. Marking the Hardy task for this ticket as "Won't Fix".

Changed in torque (Ubuntu Hardy):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.