CUDA for Crystallography

Are you stuck with Shelx, going crazy with Crystals, WinGX sent you wobbly, LinGX left you slightly lost, APEXII left you for dead or just need some structural healing? Then perhaps posting here will help? Use this forum for general discussion about software, databases and web resources.
Forum rules
If you've got a problem or want to post about a specific package please use its own subforum area. If there is not an appropriate area yeap just put it in the general area below

CUDA for Crystallography

Postby johnewarren » 08 Jan 2009, 14:06

Has anyone tried to use CUDA to boost their crystallographic programs yet? It looks like it could change the face of crystallography in all aspects.
User avatar
johnewarren
Synchrotron
 
Posts: 1676
Joined: 13 May 2006, 14:25
Location: UK

Re: CUDA for Crystallography

Postby smoggach » 08 Jan 2009, 15:05

John, whats CUDA?

Mogg
User avatar
smoggach
Trained Monkey
 
Posts: 8
Joined: 05 Jun 2006, 10:18
Location: Edinburgh

Re: CUDA for Crystallography

Postby johnewarren » 08 Jan 2009, 15:43

Basically from the NVIDIA site:
CUDA is a general purpose parallel computing architecture that leverages the parallel compute engine in NVIDIA graphics processing units (GPUs) to solve many complex computational problems in a fraction of the time required on a CPU. It includes the CUDA Instruction Set Architecture (ISA) and the parallel compute engine in the GPU. To program to the CUDATM architecture, developers can, today, use C, one of the most widely used high-level programming languages, which can then be run at great performance on a CUDATM enabled processor. Other languages will be supported in the future, including FORTRAN and C++.


ATI have a similar system. I think it is going to change crystallography and computing forever.
User avatar
johnewarren
Synchrotron
 
Posts: 1676
Joined: 13 May 2006, 14:25
Location: UK

Re: CUDA for Crystallography

Postby pascalp » 08 Jan 2009, 15:53

I need to find something I read a while ago about GPU programming. Basically you need hundreds of threads to feed a GPU, it's not an easy task at all.
The main problem is the double precision floating point operations absence, well, according to the community :)

I don't know if crystallography used lapack functions, but there is no implementation in cuda as far as I can remember. According to here: http://forums.nvidia.com/index.php?s=&s ... t&p=474719 work is in progress.
pascalp
Rotating Anode With Optics
 
Posts: 286
Joined: 17 Dec 2007, 16:01
Location: Oxford, UK

Re: CUDA for Crystallography

Postby johnewarren » 08 Jan 2009, 15:59

I didn't say it was going to be easy but things such as:

  • Standard C language for parallel application development on the GPU
  • Standard numerical libraries for FFT (Fast Fourier Transform) and BLAS (Basic Linear Algebra Subroutines)
  • Dedicated CUDA driver for computing with fast data transfer path between GPU and CPU

Must be knocking at the door.

I mean this isn't where I want to see it implemented but its a start: http://www.nvidia.com/object/axygen_success.html also it is worth looking at this post: http://forums.nvidia.com/index.php?showtopic=52144
User avatar
johnewarren
Synchrotron
 
Posts: 1676
Joined: 13 May 2006, 14:25
Location: UK

Re: CUDA for Crystallography

Postby johnewarren » 22 Apr 2009, 16:46

Ok so I've spent the day applying CUDA to shelxl, or specifically to the mp version. I took the easy route of wrapping shelxl following the CUDA examples from nvidia. Now I must say in advance stop laughing! Why well I've managed to make it even slower than it was when I started out. Ok, that is on the 6rxn test data. I was feeling down hearted until I tried to rebuild the mp version of shelxl and found that that too was slower. So I think it is down to an optimisation issue with some magic compiler setting?

Either way I get this lovely message when compiling the mp version:
Code: Select all
shelxh_omp.f:15100.72:

  52          IF(T.GT.AQtmp)GOTO 50                                     
                                                                       1
shelxh_omp.f:15104.72:

  50        end do                                                     
                                                                       2
Warning: Deleted feature: GOTO at (1) jumps to END of construct at (2)
shelxh_omp.f:15101.72:

              IF(N.GT.LX)GOTO 50                                       
                                                                       1
shelxh_omp.f:15104.72:

  50        end do                                                     
                                                                       2
Warning: Deleted feature: GOTO at (1) jumps to END of construct at (2)

I'm using gfortran to compile all this with -fopenmp well here is the line:
Code: Select all
gfortran -fopenmp -O3 -ffast-math -funroll-all-loops -march=native -mtune=native -ftree-vectorize -fvect-cost-model shelxh_omp.f shelxlv_omp.f -o shelxlifc3


You can see the timings are very different!
Existing shelxl_ifc:
Code: Select all
real   0m1.147s
user   0m2.594s
sys   0m0.109s

Mine:
Code: Select all
real   0m3.354s
user   0m11.618s
sys   0m0.073s


Has anyone out there had experience in compiling the mp version of shelxl, I think after reading some information on the web it may be down to some strange compatibility issue between the compiler used to build MPI and the compiler used to build the program. Going to get myself ifort and see if that makes any difference.
User avatar
johnewarren
Synchrotron
 
Posts: 1676
Joined: 13 May 2006, 14:25
Location: UK

Re: CUDA for Crystallography

Postby johnewarren » 22 Apr 2009, 17:04

This is interesting mp code with CUDA but without -fopenmp:
Code: Select all
real   0m8.552s
user   0m8.501s
sys   0m0.050s

Same again but with standard blas:
Code: Select all
real   0m8.314s
user   0m8.263s
sys   0m0.040s

With nothing but optimisation strings:
Code: Select all
real   0m8.330s
user   0m8.300s
sys   0m0.027s


Hmmmmmmm out of the box shelxl:
Code: Select all
real   0m3.011s
user   0m2.999s
sys   0m0.012s
and shelxh:
Code: Select all
real   0m3.163s
user   0m2.963s
sys   0m0.033s
User avatar
johnewarren
Synchrotron
 
Posts: 1676
Joined: 13 May 2006, 14:25
Location: UK

Re: CUDA for Crystallography

Postby pascalp » 22 Apr 2009, 17:23

well the intel compiler is known to be faster.

Have you tried with the -fprofile-generate/-fprofile-use flag ? It's a two stage compilation. You can get 10% faster binaries.
Add these : -fstrict-aliasing -pipe -fomit-frame-pointer ?

and if it is using lapack/blas, use the intel one, it's much faster.
pascalp
Rotating Anode With Optics
 
Posts: 286
Joined: 17 Dec 2007, 16:01
Location: Oxford, UK

Re: CUDA for Crystallography

Postby johnewarren » 22 Apr 2009, 17:29

Also just found that the shelxl etime_.c doesn't work with #include <sys/time.h> but does with <time.h> strange!
User avatar
johnewarren
Synchrotron
 
Posts: 1676
Joined: 13 May 2006, 14:25
Location: UK

Re: CUDA for Crystallography

Postby johnewarren » 22 Apr 2009, 17:40

Cheers Pascal I'll give them a try and post the results. But the lesson so far is that I can't beat the orginial? Perhaps it is down to a 64 and 32 bit thing? It shouldn't be but perhaps?
User avatar
johnewarren
Synchrotron
 
Posts: 1676
Joined: 13 May 2006, 14:25
Location: UK

Re: CUDA for Crystallography

Postby pascalp » 22 Apr 2009, 17:47

You must be able to beat it because you can use all the optimisation present in your processor. Well if you have a recent one and if the compiler can use them.

You can send me a ready to compile/test code and I'll give it a go.
What libraries your binary and their binary is using ? Use ldd to get them.

gcc 4.4 is out, so I'll get it soon and also try with it. On the paper, there are amazing improvements.
pascalp
Rotating Anode With Optics
 
Posts: 286
Joined: 17 Dec 2007, 16:01
Location: Oxford, UK

Re: CUDA for Crystallography

Postby johnewarren » 22 Apr 2009, 18:01

Code: Select all
-fstrict-aliasing -pipe -fomit-frame-pointer
Helped

If I use:
Code: Select all
g77 shelxl.f shelxlv.f etime_.c fdate_.c -O3 -ffast-math -fstrict-aliasing -pipe -fomit-frame-pointer -o bench/shelxl10

I get:
Code: Select all
real   0m4.352s
user   0m4.325s
sys   0m0.026s

Whilst:
Code: Select all
gfortran shelxl.f shelxlv.f etime_.c fdate_.c -O3 -ffast-math -fstrict-aliasing -pipe -fomit-frame-pointer -o bench/shelxl11

gives:
Code: Select all
real   0m8.469s
user   0m8.426s
sys   0m0.038s


Unfortunately it is George's code which you can get via his ftp when your registered so I can't distribute it but you can get it as your an academic and not using it for profit.

No special libs are required to build it.
User avatar
johnewarren
Synchrotron
 
Posts: 1676
Joined: 13 May 2006, 14:25
Location: UK

Re: CUDA for Crystallography

Postby johnewarren » 28 Apr 2009, 10:48

Hmmmmmmm, there is a massive difference between gfortran and ifort! MASSIVE!
User avatar
johnewarren
Synchrotron
 
Posts: 1676
Joined: 13 May 2006, 14:25
Location: UK

Re: CUDA for Crystallography

Postby johnewarren » 28 Apr 2009, 11:07

Just compiled the normal shelxl version of shelxl against CUDA:
Code: Select all
time shelxl 6rxn
........................
real   0m3.220s
user   0m3.018s
sys   0m0.035s


With CUDA
Code: Select all
time ./shelxlcuda 6rxn
.......................
real   0m1.646s
user   0m1.533s
sys   0m0.106s

This is just the normal not shelxh or mp version, I wonder if that would make any difference.

Now I know that the ifort compiler is the way forward going to try CUDA for tonto as well! Should be fun!
User avatar
johnewarren
Synchrotron
 
Posts: 1676
Joined: 13 May 2006, 14:25
Location: UK

Re: CUDA for Crystallography

Postby pascalp » 28 Apr 2009, 11:33

1.5s against 3s. On a small system like this you already have some benefits. Did you do some profiling on both of them ?

To enable profiling, you need to use the -pg flag. I am not sure it's working with the intel compiler.
Execute the code normally:
shelxl 6rxn
A gmon.out file is created, then use gprof:
gprof shelxl gmon.out

I still didn't ask for shelx code. I am not sure I can as a person and not part of the university. And send the form by mail...

But if you get a nice result on tonto. I'll take a look. The main problem, I don't know if my graphic card nvidia 6600GT can be used with cuda.
pascalp
Rotating Anode With Optics
 
Posts: 286
Joined: 17 Dec 2007, 16:01
Location: Oxford, UK

Next

Return to General Software

Who is online

Users browsing this forum: CommonCrawl [Bot] and 0 guests