このページの更新は終了しました。

最新の情報はTSUBAME3.0計算サービスのWebページをご覧ください。

TSUBAME2.5からTSUBAME3.0へのデータ移行方法の資料はこちら

Can I speed up the program by using the TSUBAME.?

Yes,You can. for example:

0.Check the basic performance
  > gcc -c wclock.c
  > gfortran -o sample2  sample2.f wclock.o
  > ./sample2
    TIME=    4.4802789688110352        1586700028.4671783     

1.added the options
  > gfortran -O3  -o sample2  sample2.f wclock.o
  > ./sample2
    TIME=    2.4241511304862797        1586700028.4671783     

   Compile options,please refer to チューニング講習会 P.11~
  Or,please refer to each manual with the TSUBAME.

2.Commercial compiler use
  You can use the Intel compiler and PGI compiler.
  (a) intel compiler
    > ifort -fast  -o sample2  sample2.f wclock.o
    > ./sample2
      TIME=    2.47723984718323        1586700028.46718     
  (b) PGI compiler
    > pgf95 -fastsse  -o sample2  sample2.f wclock.o
    > ./sample2
      TIME=     2.340852022171021         1586700028.467178     
    Warning: ieee_inexact is signaling
    FORTRAN STOP
  ※Will reasonably quickly by simply using the faster option.
     (You must be sure that the result does not change)

3.automatic parallelization
using the automatic parallelization of commercial compiler.
  (チューニング講習会/並列化、P.84)
  > pgf95 -Mconcur -fastsse -Minfo=all  -o sample2  sample2.f wclock.o
  > ./sample2
    TIME=     2.470033884048462         1586700028.467178 
  Warning: ieee_inexact is signaling
  FORTRAN STOP
  > export OMP_NUM_THREADS=2 (parallel specified)
  > ./sample2
    TIME=     1.312394857406616         1586700028.467178
  Warning: ieee_inexact is signaling
  FORTRAN STOP
  > ifort -parallel -fast  -o sample2  sample2.f wclock.o
  > ./sample2
    TIME=    1.59019112586975        1586700028.46718
  ※ It may rather be slower,depending on the program.

4.modify the program.
 It will modify in the loop(unroll)
  (チューニング講習会/高速化チューニング、P.36)
  > pgf95 -fastsse  -o sample2  sample2a.f wclock.o
  > ./sample2
    TIME=     1.697903871536255         1586700028.467178     
  Warning: ieee_inexact is signaling
  FORTRAN STOP

5.using GPU
 プDirective line to the program.
(チューニング講習会/高速化チューニング、P.68)
  > pgf95 -fastsse -Minfo=all -ta=nvidia  -o sample2  sample2k.f wclock.o
  > ./sample2
    TIME=    0.3683249950408936         1586700028.467178     
  Warning: ieee_inexact is signaling
  FORTRAN STOP
Please refer to : GPUコンピューティング研究会

6.parallelize the program
You can use the parallelization on MPI,and use the parallelization by directives.
  (チューニング講習会/並列化、P.71~)
  > mpif90 -o sample2  sample2m.f wclock.o
  > mpirun -np 1 ./sample2
    TIME=    2.66910314559937        1586700028.46718     
  > mpirun -np 2 ./sample2
    TIME=    1.28998279571533        1586700028.46718     
  > mpirun -np 4 ./sample2
    TIME=   0.838958024978638        1586700028.46718     
  >