--------------------------------------
Running PBS prologue script
--------------------------------------
User and Job Data:
--------------------------------------
Job ID:    4159.shelob1
Username:  fchen14
Group:     Admins
Date:      02-Jun-2014 14:21
Node:      shelob009 (17319)
--------------------------------------
PBS has allocated the following nodes:

shelob009

A total of 16 processors on 1 nodes allocated
---------------------------------------------
Check nodes and clean them of stray processes
---------------------------------------------
Checking node shelob009 14:21:07 
-> User hpctrn58 running job 4156.shelob1:state=C:ncpus=16
-> User hpctrn11 running job 4158.shelob1:state=C:ncpus=16
-> User fchen14 running job 4159.shelob1:state=R:ncpus=16 (This job)
Done clearing all the allocated nodes
------------------------------------------------------
Concluding PBS prologue script - 02-Jun-2014 14:21:07
------------------------------------------------------
+ cd /home/fchen14/loniworkshop2014/matmul/openacc/solution
+ make all
pgcc -acc -mp -Minfo=all -ta=nvidia,time mm_acc_v0.c -o mmaccv0c.out
main:
     47, Generating copyin(a[:][:])
         Generating copyin(b[:][:])
         Generating copyout(c[:][:])
     48, Generating NVIDIA code
     50, Loop is parallelizable
     52, Loop is parallelizable
         Accelerator kernel generated
         50, #pragma acc loop gang /* blockIdx.y */
         52, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
     56, Loop is parallelizable
pgcc -acc -mp -Minfo=all -ta=nvidia,time mm_acc_v1.c -o mmaccv1c.out
main:
     80, Parallel region activated
     82, Parallel loop activated with static block schedule
     89, Barrier
     91, Parallel region terminated
matmul_acc:
    141, Generating copyin(a[:nra][:nca])
         Generating copyin(b[:nca][:ncb])
         Generating copyout(c[:nra][:ncb])
         Generating NVIDIA code
    142, Loop is parallelizable
    143, Loop is parallelizable
         Accelerator kernel generated
        142, #pragma acc loop gang /* blockIdx.y */
        143, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
    145, Loop is parallelizable
pgf90 -acc -mp -Minfo=all -ta=nvidia,time mm_acc_v0.f90 -o mmaccv0f.out
matrix_mul:
     17, Generating create(a(:,:))
         Generating create(b(:,:))
         Generating create(c(:,:))
     20, Memory zero idiom, array assignment replaced by call to pgf90_mzero8
     36, Accelerator kernel generated
         37, !$acc loop gang ! blockidx%x
         41, !$acc loop vector(256) ! threadidx%x
             Sum reduction generated for sum
     36, Generating NVIDIA code
     38, Loop is parallelizable
     41, Loop is parallelizable
+ export PGI_ACC_TIME=1
+ PGI_ACC_TIME=1
+ export OMP_NUM_THREADS=16
+ OMP_NUM_THREADS=16
+ ./mmaccv0c.out

Accelerator Kernel Timing data
/home/fchen14/loniworkshop2014/matmul/openacc/solution/mm_acc_v0.c
  main  NVIDIA  devicenum=0
    time(us): 480,757
    47: data region reached 1 time
        47: data copyin reached 4 times
             device time(us): total=11,137 max=2,786 min=2,782 avg=2,784
        63: data copyout reached 3 times
             device time(us): total=5,064 max=2,527 min=16 avg=1,688
    48: compute region reached 1 time
        52: kernel launched 1 time
            grid: [16x2048]  block: [128]
             device time(us): total=464,556 max=464,556 min=464,556 avg=464,556
            elapsed time(us): total=464,572 max=464,572 min=464,572 avg=464,572
 total acc time: 0.820984 sec
 Gflops: 20.925946 
 total serial time: 86.130438 sec
acc and serial matches!
+ ./mmaccv1c.out 2048 2048 2048 1 0.001

Accelerator Kernel Timing data
/home/fchen14/loniworkshop2014/matmul/openacc/solution/mm_acc_v1.c
  matmul_acc  NVIDIA  devicenum=0
    time(us): 419,159
    141: data region reached 1 time
        30: data copyin reached 4 times
             device time(us): total=11,129 max=2,786 min=2,779 avg=2,782
        30: kernel launched 3 times
            grid: [16]  block: [128]
             device time(us): total=808 max=790 min=9 avg=269
            elapsed time(us): total=1,261 max=819 min=220 avg=420
        151: data copyout reached 3 times
             device time(us): total=5,050 max=2,510 min=34 avg=1,683
    141: compute region reached 1 time
        143: kernel launched 1 time
            grid: [16x2048]  block: [128]
             device time(us): total=402,172 max=402,172 min=402,172 avg=402,172
            elapsed time(us): total=402,201 max=402,201 min=402,201 avg=402,201
 total acc time: 0.749485 sec
 Gflops: 22.922239 
 total num of procs: 16
 total omp time with 16 threads: 13.446723 sec
 total omp time with 8 threads: 16.838642 sec
 total omp time with 4 threads: 31.214495 sec
 total serial time: 113.305555 sec
acc and serial matches!
+ ./mmaccv0f.out

Accelerator Kernel Timing data
/home/fchen14/loniworkshop2014/matmul/openacc/solution/mm_acc_v0.f90
  matrix_mul  NVIDIA  devicenum=0
    time(us): 467,355
    17: data region reached 1 time
    36: compute region reached 1 time
        36: kernel launched 1 time
            grid: [2048]  block: [256]
             device time(us): total=467,355 max=467,355 min=467,355 avg=467,355
            elapsed time(us): total=467,375 max=467,375 min=467,375 avg=467,375
Init Time:  0.137 Calc Time:  0.468 GFlops:  36.709
------------------------------------------------------
Running PBS epilogue script    - 02-Jun-2014 14:25:39
------------------------------------------------------
Checking node shelob009 (MS)
-> Killing process of fchen14: -bash
Checking node shelob009 ; modules change (nvidia_uvm 28216 <) ; ok
------------------------------------------------------
Concluding PBS epilogue script - 02-Jun-2014 14:25:41
------------------------------------------------------
Exit Status:     0
Job ID:          4159.shelob1
Username:        fchen14
Group:           Admins
Job Name:        mm_acc
Session Id:      17318
Resource Limits: ncpus=1,neednodes=1:ppn=16,nodes=1:ppn=16,walltime=01:00:00
Resources Used:  cput=00:17:03,mem=315240kb,vmem=105543464kb,walltime=00:04:34
Queue Used:      workq
Account String:  hpc_train_2014
Node:            shelob009
Process id:      18661
------------------------------------------------------
