Spin-coupled DOS and PDOS does not work in parallel

Bug #1718162 reported by Alberto Garcia
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Siesta
Status tracked in Trunk
4.1
Fix Released
High
Nick Papior
Trunk
Fix Released
High
Nick Papior

Bug Description

(This is a re-opening of bug #1645749, which was only partially fixed --- in particular, the parallel operation issue was not solved. Thanks to Roberto Robles and Ramón Cuadrado)

The calculation of DOS and PDOS only works in serial. In parallel they are not calculated as controlled in subroutine init_projected_DOS of projected_DOS.F. However, the relevant routines (pdos2g, pdos2k, pdos3g, pdos3k) seem to consider a parallel run, but they give wrong results in their current form. For example, pdos3g gives correct results for the first two components, but wrong results for third and fourth (tested with the Pt2 example with magnetization oriented along x or y). It would be nice to do everything in parallel. For example, sometimes a finer grid of K-points is needed for smooth DOS, and the calculations can take quite some time.

This affects all versions, and the new full-SOC branch.

Revision history for this message
Nick Papior (nickpapior) wrote :

I am wondering whether this is due to line 177 in pdos3g?

LocalToGlobalOrb is called with iband =1:nuo*2, however, that is the band index, not the orbital index.
I.e. should probably be something like:

LocalToGlobalOrb((iband+1) / 2, ...,...,ibandg)
ibandg = (ibandg-1)*2 + mod(iband, 2)

Just a guess?

Revision history for this message
Nick Papior (nickpapior) wrote :

I here attach a patch that _could_ check this (only for pdos3g). (it should be - and not +

Revision history for this message
Ramon Cuadrado (ramon-cuadrado) wrote : Re: [Bug 1718162] Spin-coupled DOS and PDOS does not work in parallel

Hi Nick,

   It seems that it works for gamma point (diag3g.F) but still does not work for diag3k.F

Cheers,
Ramón

----
Dr. Ramón Cuadrado | Incoming Marie Curie COFUND Post Doctoral Researcher

Theory and Simulation Group
Institut Català de Nanociència i Nanotecnologia,
Campus de la UAB, Edifici ICN2, 08193 Bellaterra, Barcelona, Spain

tel: +34 93 737 3624<tel:+441904324623> | email: <email address hidden><tel:+441904324623>

El 20 sept 2017, a las 21:38, Nick Papior <<email address hidden><mailto:<email address hidden>>> escribió:

I here attach a patch that _could_ check this (only for pdos3g). (it
should be - and not +

** Patch added: "pdos3g.patch"
  https://bugs.launchpad.net/siesta/+bug/1718162/+attachment/4954003/+files/pdos3g.patch

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1718162

Title:
 Spin-coupled DOS and PDOS does not work in parallel

Status in Siesta:
 New

Bug description:
 (This is a re-opening of bug #1645749, which was only partially fixed
 --- in particular, the parallel operation issue was not solved. Thanks
 to Roberto Robles and Ramón Cuadrado)

 The calculation of DOS and PDOS only works in serial. In parallel they
 are not calculated as controlled in subroutine init_projected_DOS of
 projected_DOS.F. However, the relevant routines (pdos2g, pdos2k,
 pdos3g, pdos3k) seem to consider a parallel run, but they give wrong
 results in their current form. For example, pdos3g gives correct
 results for the first two components, but wrong results for third and
 fourth (tested with the Pt2 example with magnetization oriented along
 x or y). It would be nice to do everything in parallel. For example,
 sometimes a finer grid of K-points is needed for smooth DOS, and the
 calculations can take quite some time.

 This affects all versions, and the new full-SOC branch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/siesta/+bug/1718162/+subscriptions

Revision history for this message
Ramon Cuadrado (ramon-cuadrado) wrote :

Sorry, pdos3g.F and pdos3k.F

R.

----
Dr. Ramón Cuadrado | Incoming Marie Curie COFUND Post Doctoral Researcher

Theory and Simulation Group
Institut Català de Nanociència i Nanotecnologia,
Campus de la UAB, Edifici ICN2, 08193 Bellaterra, Barcelona, Spain

tel: +34 93 737 3624<tel:+441904324623> | email: <email address hidden><tel:+441904324623>

El 21 sept 2017, a las 16:34, Ramón Cuadrado <<email address hidden><mailto:<email address hidden>>> escribió:

Hi Nick,

   It seems that it works for gamma point (diag3g.F) but still does not work for diag3k.F

Cheers,
Ramón

----
Dr. Ramón Cuadrado | Incoming Marie Curie COFUND Post Doctoral Researcher

Theory and Simulation Group
Institut Català de Nanociència i Nanotecnologia,
Campus de la UAB, Edifici ICN2, 08193 Bellaterra, Barcelona, Spain

tel: +34 93 737 3624<tel:+441904324623> | email: <email address hidden><tel:+441904324623>

El 20 sept 2017, a las 21:38, Nick Papior <<email address hidden><mailto:<email address hidden>>> escribió:

I here attach a patch that _could_ check this (only for pdos3g). (it
should be - and not +

** Patch added: "pdos3g.patch"
  https://bugs.launchpad.net/siesta/+bug/1718162/+attachment/4954003/+files/pdos3g.patch

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1718162

Title:
 Spin-coupled DOS and PDOS does not work in parallel

Status in Siesta:
 New

Bug description:
 (This is a re-opening of bug #1645749, which was only partially fixed
 --- in particular, the parallel operation issue was not solved. Thanks
 to Roberto Robles and Ramón Cuadrado)

 The calculation of DOS and PDOS only works in serial. In parallel they
 are not calculated as controlled in subroutine init_projected_DOS of
 projected_DOS.F. However, the relevant routines (pdos2g, pdos2k,
 pdos3g, pdos3k) seem to consider a parallel run, but they give wrong
 results in their current form. For example, pdos3g gives correct
 results for the first two components, but wrong results for third and
 fourth (tested with the Pt2 example with magnetization oriented along
 x or y). It would be nice to do everything in parallel. For example,
 sometimes a finer grid of K-points is needed for smooth DOS, and the
 calculations can take quite some time.

 This affects all versions, and the new full-SOC branch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/siesta/+bug/1718162/+subscriptions

Revision history for this message
Nick Papior (nickpapior) wrote :
Download full text (3.6 KiB)

Well that is good, i will submit a patch with all non colinear and spin
orbit routines fixed.

My patch was only for the gamma case anyway ;)

On 21 Sep 2017 16:51, "Ramon Cuadrado" <email address hidden> wrote:

> Hi Nick,
>
> It seems that it works for gamma point (diag3g.F) but still does not
> work for diag3k.F
>
> Cheers,
> Ramón
>
>
> ----
> Dr. Ramón Cuadrado | Incoming Marie Curie COFUND Post Doctoral Researcher
>
> Theory and Simulation Group
> Institut Català de Nanociència i Nanotecnologia,
> Campus de la UAB, Edifici ICN2, 08193 Bellaterra, Barcelona, Spain
>
> tel: +34 93 737 3624<tel:+441904324623> | email:
> <email address hidden><tel:+441904324623>
>
>
>
> El 20 sept 2017, a las 21:38, Nick Papior
> <<email address hidden><mailto:<email address hidden>>>
> escribió:
>
> I here attach a patch that _could_ check this (only for pdos3g). (it
> should be - and not +
>
> ** Patch added: "pdos3g.patch"
> https://bugs.launchpad.net/siesta/+bug/1718162/+
> attachment/4954003/+files/pdos3g.patch
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1718162
>
> Title:
> Spin-coupled DOS and PDOS does not work in parallel
>
> Status in Siesta:
> New
>
> Bug description:
> (This is a re-opening of bug #1645749, which was only partially fixed
> --- in particular, the parallel operation issue was not solved. Thanks
> to Roberto Robles and Ramón Cuadrado)
>
> The calculation of DOS and PDOS only works in serial. In parallel they
> are not calculated as controlled in subroutine init_projected_DOS of
> projected_DOS.F. However, the relevant routines (pdos2g, pdos2k,
> pdos3g, pdos3k) seem to consider a parallel run, but they give wrong
> results in their current form. For example, pdos3g gives correct
> results for the first two components, but wrong results for third and
> fourth (tested with the Pt2 example with magnetization oriented along
> x or y). It would be nice to do everything in parallel. For example,
> sometimes a finer grid of K-points is needed for smooth DOS, and the
> calculations can take quite some time.
>
> This affects all versions, and the new full-SOC branch.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/siesta/+bug/1718162/+subscriptions
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1718162
>
> Title:
> Spin-coupled DOS and PDOS does not work in parallel
>
> Status in Siesta:
> New
>
> Bug description:
> (This is a re-opening of bug #1645749, which was only partially fixed
> --- in particular, the parallel operation issue was not solved. Thanks
> to Roberto Robles and Ramón Cuadrado)
>
> The calculation of DOS and PDOS only works in serial. In parallel they
> are not calculated as controlled in subroutine init_projected_DOS of
> projected_DOS.F. However, the relevant routines (pdos2g, pdos2k,
> pdos3g, pdos3k) seem to consider a parallel run, but they give wrong
> results in their current form. For example, pdos3g gives correct
> results for the first two components, but wrong result...

Read more...

Revision history for this message
Ramon Cuadrado (ramon-cuadrado) wrote :
Download full text (5.3 KiB)

Very thanks Nick!

Ramón

----
Dr. Ramón Cuadrado | Incoming Marie Curie COFUND Post Doctoral Researcher

Theory and Simulation Group
Institut Català de Nanociència i Nanotecnologia,
Campus de la UAB, Edifici ICN2, 08193 Bellaterra, Barcelona, Spain

tel: +34 93 737 3624<tel:+441904324623> | email: <email address hidden><tel:+441904324623>

El 21 sept 2017, a las 16:52, Nick Papior <<email address hidden><mailto:<email address hidden>>> escribió:

Well that is good, i will submit a patch with all non colinear and spin
orbit routines fixed.

My patch was only for the gamma case anyway ;)

On 21 Sep 2017 16:51, "Ramon Cuadrado" <<email address hidden><mailto:<email address hidden>>> wrote:

Hi Nick,

  It seems that it works for gamma point (diag3g.F) but still does not
work for diag3k.F

Cheers,
Ramón

----
Dr. Ramón Cuadrado | Incoming Marie Curie COFUND Post Doctoral Researcher

Theory and Simulation Group
Institut Català de Nanociència i Nanotecnologia,
Campus de la UAB, Edifici ICN2, 08193 Bellaterra, Barcelona, Spain

tel: +34 93 737 3624<tel:+441904324623> | email:
<email address hidden><mailto:<email address hidden>><tel:+441904324623>

El 20 sept 2017, a las 21:38, Nick Papior
<<email address hidden><mailto:<email address hidden>><mailto:<email address hidden>>>
escribió:

I here attach a patch that _could_ check this (only for pdos3g). (it
should be - and not +

** Patch added: "pdos3g.patch"
 https://bugs.launchpad.net/siesta/+bug/1718162/+
attachment/4954003/+files/pdos3g.patch

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1718162

Title:
Spin-coupled DOS and PDOS does not work in parallel

Status in Siesta:
New

Bug description:
(This is a re-opening of bug #1645749, which was only partially fixed
--- in particular, the parallel operation issue was not solved. Thanks
to Roberto Robles and Ramón Cuadrado)

The calculation of DOS and PDOS only works in serial. In parallel they
are not calculated as controlled in subroutine init_projected_DOS of
projected_DOS.F. However, the relevant routines (pdos2g, pdos2k,
pdos3g, pdos3k) seem to consider a parallel run, but they give wrong
results in their current form. For example, pdos3g gives correct
results for the first two components, but wrong results for third and
fourth (tested with the Pt2 example with magnetization oriented along
x or y). It would be nice to do everything in parallel. For example,
sometimes a finer grid of K-points is needed for smooth DOS, and the
calculations can take quite some time.

This affects all versions, and the new full-SOC branch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/siesta/+bug/1718162/+subscriptions

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1718162

Title:
 Spin-coupled DOS and PDOS does not work in parallel

Status in Siesta:
 New

Bug description:
 (This is a re-opening of bug #1645749, which was only partially fixed
 --- in particular...

Read more...

Revision history for this message
Nick Papior (nickpapior) wrote :

I have just pushed a fix for 4.1 and trunk.

Since you are the experts, could you please check:

- non-colinear (Gamma)
- non-colinear (k)
- spin-orbit (Gamma)
- spin-orbit (k)

If they all show the correct results for parallel runs, we will mark it as solved.

Changed in siesta:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Nick Papior (nickpapior)
Revision history for this message
Roberto Robles (roberto-robles) wrote :

I have tried to check, but the current version stops with a seg fault before reaching the pdos routines. Tested with tests SOC_Pt2_zz and SOC_FePt_zz in serial and in parallel. This is weird, because I guess the full set of tests is checked before pushing a new version...

Revision history for this message
Nick Papior (nickpapior) wrote :

All tests where checked before release, so since 4.1-b3 only some of the tests have been tested.

However, I can run SOC_Pt_xx with 2 cores without problems?
Perhaps you could compile with debug options to figure out the problem, I am at this moment not able to run more than 2 cores.

Revision history for this message
Nick Papior (nickpapior) wrote :

Oh, I have no PDOS flags... :(
If you give me a test, I can run it with 2 cores. But for more cores it will have to wait until next week, from my side.

Revision history for this message
Roberto Robles (roberto-robles) wrote :

It doesn't work for me even in serial and without any PDOS flags. I am thinking of some ifort issue...

Revision history for this message
Nick Papior (nickpapior) wrote :

Hmm.. Still, doing the run with debug flags might be good. Could you do that?

Revision history for this message
Roberto Robles (roberto-robles) wrote :

I was on it. I have added -g (any other relevant debug option ??) but doesn't give me much information. I'm attaching the output.

I have seen that lowering the optimization to -O1 corrects the problem, so ifort is doing something fishy in some subroutine. I will try to localize it next week.

Revision history for this message
Roberto Robles (roberto-robles) wrote :

I have compiled with -traceback and I get the following error:

-------------
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
siesta 0000000000630EA2 m_state_analysis_ 198 state_analysis.F

Stack trace terminated abnormally.
-------------

If state_analysis.F is compiled with -O1 the program finishes without error.

Revision history for this message
Nick Papior (nickpapior) wrote :

Which source? I don't have 198 lines of code in 4.1/Src/state_analysis, nor trunk/Src/state_analysis?

Revision history for this message
Roberto Robles (roberto-robles) wrote :

siesta-4.1--767

Revision history for this message
Roberto Robles (roberto-robles) wrote :

Sorry, that included some comments I added. It is the line where the subroutine ends (not the module). It should be line 193.

Revision history for this message
Nick Papior (nickpapior) wrote :

Ok, so basically ifort is complaining about some re-ordering it has done.

Do you also have -g in the compilation flags?

Secondly, if this is the only error message ifort produces for us I am in no position to check what causes this (my guess is yet another compiler bug for high optimizations with intel, same as e.g. atom.F). I can compile with a very high optimization level in gnu fortran, with no problems. I also made a pedantic syntax check using gfortran, with no errors (-fsyntax-only -pedantic).

For the record, which intel version are you using?

Lastly, have you checked the results, i.e. does the commit fix the PDOS calculations?

Revision history for this message
Roberto Robles (roberto-robles) wrote :

-g is also included as a compilation option.

I am using ifort Version 13.1.0.146. I don't have enough knowledge to do this kind of debugging. I guess the poor-man patch would be to add state_analysis to intel.make as for atom.F.

Regarding the fix, the gamma version is fixed for non-collinear and SOC. The k-points version is NOT fixed.

Revision history for this message
Nick Papior (nickpapior) wrote :

I have just went through the codes, and indeed I found some inconsistencies in pdos[23]k.

Could you try with the attached patch and rerun (all NC, SO for Gamma and k-point).

I have by-passed a possible stack-fault when compiling with intel compilers.

Revision history for this message
Nick Papior (nickpapior) wrote :

Try this one instead (small change).

Revision history for this message
Roberto Robles (roberto-robles) wrote :

It seems to work. I have tried a non-collinear and SOC calculation with k-points and they give the same result in serial and in parallel.

If you push it, an extra change has to be made in projected_DOS.F removing lines from 51 to 63 to allow for the calculation of PDOS in parallel.

As discussed before we should also add

state_analysis.o: state_analysis.F
        $(FC) -c $(FFLAGS_DEBUG) $(INCFLAGS) $(FPPFLAGS) $(FPPFLAGS_fixed_F) $<

to intel.make

Thanks!

Revision history for this message
Nick Papior (nickpapior) wrote :

Thanks, it has been committed now with notice of fix.

Revision history for this message
xiaoxiong (wxx2018) wrote :

Hello,
I compiled siesta 4.1-b3 with ifort13 after applying the patches, but when I run the band structure of graphene with SOC, I still got the 174 error when the SCF was done and entered into the non scf
band calculation. Attached is my makefile. Can gfortran avoid this issue? Could anyone give me some suggestions? Thanks in advance.

Revision history for this message
Nick Papior (nickpapior) wrote :

Could you please try the latest 4.1 commit (download here: https://bazaar.launchpad.net/~siesta-maint/siesta/rel-4.1/tarball/969)

Also, check out Obj/intel.make and copy paste the relevant Intel specific options to your arch.make. You seem to have the siesta_analysis, but not atom.

Revision history for this message
xiaoxiong (wxx2018) wrote :

Hi Nick Papior,
Thanks for your help.
I tried the latest version you provided. When I run the graphene with SOC or the SOC_Pt2_zz test I got the same error '[completed_work] Error 7'.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.