[General] Any linux gurus?

Stephan Henning shenning at gmail.com
Thu Dec 12 17:04:55 CST 2013


-WD

The GPUs are sent data in chunks that they then process and return. The
time it takes a GPU to process a chunk can vary, so I assume the bottle
necks we were seeing was when several of the GPU cores would finish at
about the same time and request a new chunk and the chunk they needed
wasn't already in RAM, so the drive array would take a heavy hit.

Beyond that, I can't really give you a numerical value as to the amount of
data they are dumping into the pcie bus.


-David

Ya, not sure an FPGA exists large enough for this, it would be interesting
though.

While the process isn't entirely sequential, data previously processed is
reused in the processing of other data, so that has kept us away from
trying a cluster approach.

Depending on the problem, anywhere from minutes per iteration, to weeks per
iteration. The weeks long problems are sitting at about 3TB I believe.
We've only run benchmark problems on the SSDs up till now, so we haven't
had the experience of seeing how they react once they start really getting
full.

Sadly, 2TB of RAM would not be enough. I looked into this Dell box (
http://www8.hp.com/us/en/products/proliant-servers/product-detail.html?oid=4231377#!tab=features)
that would take 4TB, but the costs were insane and it can't support enough
GPUs to actually do anything with the RAM...




On Thu, Dec 12, 2013 at 4:51 PM, David <ainut at knology.net> wrote:

>  Stephan,
>
> The system architecture will determine how quickly your huge tasks get
> finished.  (See SGI versus PC in previous email.)
>
> NUMA refers to design of the computer and for our interests, determines
> how fast data moves around the buses.
> It is one of the reasons that, back in the 90's, SGI would whup Sun's butt
> in execution times for data intensive tasks, even though the Sun's were
> more expensive (Sun just had more advertising money.)  Anyway, a lot of
> that technology finally made it's way from the workstation/minicomputer
> world into the PC world.
>
> GPU's are fantastic for this class of tasks.  FPGA's would be much faster
> still, but I doubt you could even get one big enough.
>
> While we're on the subject of optimizations :)
> Is the algorithm linear or can it be broken up such that it could run
> several times faster if run on a cluster?  It might be a candidate,
> depending upon the algorithm.  A PS3 can sometimes rival GPU's as well but
> it is severely limited in onboard memory.  A cluster is ideal if, for
> example, your task is image analysis and processor 1 can work on the upper
> left corner of the image, while processor 2, the upper right, and so on,
> with each computer in the cluster having it's own GPU.  But if during
> runtime, each successive calculation depends entirely on the one preceding
> it, then a cluster would just slow you down.
>
> Yeah, you don't want the GPU's doing mundane work like file i/o, just data
> crunching because that's what they're best at.  That, though, is entirely
> compiler dependent, unless your language allows you to specify what runs
> where.
>
> May I ask what your runtimes are now?
>
> if you installed 2 terabytes of memory (mortgaging the state of Alabama to
> do so), you could get unbelievable runtimes. :)
>
>
> David M.
>
>
>
> Stephan Henning wrote:
>
>  I'm honestly not sure David, I'm a bit confused as to if NUMA is
> supposed to be something implemented at a die level or at a system level.
> If at a die level, This box is running 2x Xeon E5-2650. I've never gotten
> involved in the architecture side of things so this is a bit foreign to me.
>
> The bulk of the computation for a run is done on the GPUs, if it all ran
> on the CPU a single run would take months. The GPUs have cut the runtime
> way down, but only part of the process is GPU accelerated, and this file
> creation phase is one of the parts that is not and is still being processed
> by the CPU.
>
>
> On Thu, Dec 12, 2013 at 3:40 PM, David <ainut at knology.net> wrote:
>
>>  Sometimes, the use of DMA with the newest SATA III controllers actually
>> slows it down.  Only a live test will show which is faster.
>> Good point about the random access versus linear write, though.  My
>> suspicion from his overview is that it is linear.
>>
>> Linux is pretty good at managing disk optimization but if tweaking is
>> necessary, it can get very problematic because as you shift the dynamics,
>> the OS file system handler shifts it's management algorithm.  I believe
>> that came from the Cray world.
>>
>> Stephan, does your computer use the NUMA architecture?  There is a newer,
>> slightly faster design but I can't remember what it is.  The reason I bring
>> that up is that in a former life I had to deal with data the size you
>> mentioned.  As a test, I ran some benchmarks several times with the best PC
>> available at the time against a smallish SGI pizza box.  The program used
>> was one I wrote and had in production for quite a while.  The PC would
>> massage the data and finish in about 12 1/2 hours.  The SGI box did it in 2
>> hours 45.  I ran that test several times using data that changed somewhat
>> each month and timing results were consistent.  Just something to think
>> about if run-times are killing you at the moment.
>>
>>
>>
>> Arthur wrote:
>>
>> Here's something else to think about.  Is the program writing out the
>> data in sequential chuncks, or is it writing to random parts of the file?
>>
>>  With buffered writes the best speed up you can get with a raid 0 array
>> is if one disk is writing something while the other disk is going to the
>> place where the next thing is going to be written.  If you're dealing with
>> a bunch of random writes, then ponying up for a few SSDs or refactoring the
>> code might be worth it.
>>
>>  Assuming that your raid controller isn't the bottleneck.  Some
>> motherboard based raid controllers use the CPU to do the work, and can
>> cause everything to slow down.  (Side note.  Does anyone know if DMA works
>> with those kind of controllers?)
>>
>>
>> On Thu, Dec 12, 2013 at 11:26 AM, Stephan Henning <shenning at gmail.com>wrote:
>>
>>>   The program will write out a file of variable size, it's based on the
>>> problem being run. Currently, it writes out approximately 1.5TB for the
>>> benchmark problem, most of that contained in a single file, much too large
>>> for a ramdisk. Unfortunately, the problems have grown so large that they
>>> can't be run in ram any more. This is a GPU accelerated program so this
>>> file gets modified very heavily during the process of a run.
>>>
>>>  Current testing is being done on a Raid0 of 5c Crucial 960G SSDs. This
>>> has proven to be significantly faster than the old array, but I am trying
>>> to determine exactly how hard the disks are being hammered so I can try and
>>> optimize the hardware configuration.
>>>
>>>  The program is compiled from source, but I'm not involved in that
>>> process, I'd much rather try and piggyback something and monitor the
>>> process than try and go in and have something added to source.
>>>
>>>  I'll add parted and gparted to my list of things to read up on, thanks.
>>>
>>>
>>> On Thu, Dec 12, 2013 at 12:29 AM, David <ainut at knology.net> wrote:
>>>
>>>>  Excellent approach.
>>>>
>>>>
>>>>
>>>> Arthur wrote:
>>>>
>>>> How big are the files that you're dealing with?
>>>> If they're small you can just make a ramdisk and try running everything
>>>> in there.
>>>> It's not a final solution, but between that and strace you should be
>>>> able to see if that's really the issue or not.
>>>>
>>>>  Are you compiling from source?  If you are, then there are a bunch of
>>>> debugging tools you can use as well as doing things like timing individual
>>>> commands, and seeing how many times each line of code is run.
>>>>
>>>>
>>>> On Wed, Dec 11, 2013 at 10:48 PM, Stephan Henning <shenning at gmail.com>wrote:
>>>>
>>>>> This is a RedHat6 Enterprise install.
>>>>>
>>>>>  I don't think htop has the data I need, but I'll check. I'm not
>>>>> familiar with ntop and I didn't consider using trace for this, I'll check
>>>>> that as well.
>>>>>
>>>>>  The goal is to record read/write rates and block sizes. I'm pretty
>>>>> sure I am bottlenecking against the drive array, I'm hoping I can get some
>>>>> definitive answers from this.
>>>>>
>>>>>
>>>>> On Wed, Dec 11, 2013 at 6:01 PM, David <ainut at knology.net> wrote:
>>>>>
>>>>>>  ntop might do the trick, but not available in Fedora.
>>>>>>
>>>>>>
>>>>>>
>>>>>> David wrote:
>>>>>>
>>>>>> Can 'htop' show open files?
>>>>>>
>>>>>> For intensive live net data, look at WireShark for linux.
>>>>>>
>>>>>>
>>>>>> David wrote:
>>>>>>
>>>>>> If that's what you're looking for, there are several (free) programs
>>>>>> you could run from the command line in a separate window/screen while your
>>>>>> program is running that give you all you're asking about.  Sort of an
>>>>>> equivalent to Winblows "System Explorer."  What flavor or Linux are you
>>>>>> using?
>>>>>>
>>>>>> David M.
>>>>>>
>>>>>>
>>>>>> Devin Boyer wrote:
>>>>>>
>>>>>> Try something like "strace -T myapp" or "strace -T -c myapp"; they'll
>>>>>> show the system calls being made and the amount of time spent in each.
>>>>>>  It's slightly different information than iostat, but it may be useful in
>>>>>> figuring out what and where your program is performing io access.
>>>>>>
>>>>>>
>>>>>> On Wed, Dec 11, 2013 at 3:37 PM, Stephan Henning <shenning at gmail.com>wrote:
>>>>>>
>>>>>>>   No, iostat will normally just dump to the terminal window, but
>>>>>>> I'd like to pipe it's output to a file so I can parse it later.
>>>>>>>
>>>>>>>  My end goal here is to be able to generate a log of iostat output
>>>>>>> while I run this program, I'm trying to determine exactly how hard this
>>>>>>> program is hitting my harddrive and at what points during it's run does it
>>>>>>> access the drive the most frequently.
>>>>>>>
>>>>>>>  I've done something similar in bash before, but it is rather
>>>>>>> clunky.
>>>>>>>
>>>>>>>  I'll take a look at exec and see if I can use it.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Dec 11, 2013 at 4:46 PM, David <ainut at knology.net> wrote:
>>>>>>>
>>>>>>>>  Do you need to do anything with the results or just need them
>>>>>>>> displayed?
>>>>>>>> If you need to manipulate the results, consider using Perl,
>>>>>>>> or if C or C++,
>>>>>>>> in your 'exec' call, pipe the output to a file, then just read that
>>>>>>>> file into your program.
>>>>>>>> Ain't UNIX great?
>>>>>>>>
>>>>>>>>
>>>>>>>> David M.
>>>>>>>>
>>>>>>>>
>>>>>>>> Stephan Henning wrote:
>>>>>>>>
>>>>>>>>  I'd like to take some metrics with iostat while I have a specific
>>>>>>>> program running, is there a way to wrap iostat around another program (it
>>>>>>>> is called from the command line) so that iostat ends when the program
>>>>>>>> finishes running?
>>>>>>>>
>>>>>>>>  I know I can do it with a bash script, but I'm hoping for a more
>>>>>>>> elegant solution.
>>>>>>>>
>>>>>>>>
>>>>>>>>  _______________________________________________
>>>>>>>> General mailing listGeneral at lists.makerslocal.orghttp://lists.makerslocal.org/mailman/listinfo/general
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> General mailing list
>>>>>>>> General at lists.makerslocal.org
>>>>>>>> http://lists.makerslocal.org/mailman/listinfo/general
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> General mailing list
>>>>>>> General at lists.makerslocal.org
>>>>>>> http://lists.makerslocal.org/mailman/listinfo/general
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> General mailing listGeneral at lists.makerslocal.orghttp://lists.makerslocal.org/mailman/listinfo/general
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> General mailing listGeneral at lists.makerslocal.orghttp://lists.makerslocal.org/mailman/listinfo/general
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> General mailing listGeneral at lists.makerslocal.orghttp://lists.makerslocal.org/mailman/listinfo/general
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> General mailing list
>>>>>> General at lists.makerslocal.org
>>>>>> http://lists.makerslocal.org/mailman/listinfo/general
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> General at lists.makerslocal.org
>>>>> http://lists.makerslocal.org/mailman/listinfo/general
>>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>> Sincerely,
>>>> Arthur Moore
>>>> (256) 277-1001 <%28256%29%20277-1001>
>>>>
>>>>
>>>> _______________________________________________
>>>> General mailing listGeneral at lists.makerslocal.orghttp://lists.makerslocal.org/mailman/listinfo/general
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> General mailing list
>>>> General at lists.makerslocal.org
>>>> http://lists.makerslocal.org/mailman/listinfo/general
>>>>
>>>
>>>
>>> _______________________________________________
>>> General mailing list
>>> General at lists.makerslocal.org
>>> http://lists.makerslocal.org/mailman/listinfo/general
>>>
>>
>>
>>
>>  --
>> Sincerely,
>> Arthur Moore
>> (256) 277-1001 <%28256%29%20277-1001>
>>
>>
>> _______________________________________________
>> General mailing listGeneral at lists.makerslocal.orghttp://lists.makerslocal.org/mailman/listinfo/general
>>
>>
>>
>> _______________________________________________
>> General mailing list
>> General at lists.makerslocal.org
>> http://lists.makerslocal.org/mailman/listinfo/general
>>
>
>
>
> _______________________________________________
> General mailing listGeneral at lists.makerslocal.orghttp://lists.makerslocal.org/mailman/listinfo/general
>
>
>
> _______________________________________________
> General mailing list
> General at lists.makerslocal.org
> http://lists.makerslocal.org/mailman/listinfo/general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.makerslocal.org/pipermail/general/attachments/20131212/eefa8fd5/attachment-0001.html>


More information about the General mailing list