[General] Any linux gurus?

Stephan Henning shenning at gmail.com
Fri Dec 13 10:15:22 CST 2013


-David

Hmm, sounds interesting. The problem is distributed a little currently, you
can think of it kind of what is being done as a form of monte carlo, so the
same run will get repeated many times with light parameter adjustments.
Each of these can be distributed out to the compute nodes very easily,
currently this is being done with condor.


-James

It's a MoM CEM tool called vlox.



On Fri, Dec 13, 2013 at 5:43 AM, James Fluhler <j.fluhler at gmail.com> wrote:

> I'm just curious what simulation program are you running? I've used a
> number in the past that also utilize the GPU's for processing.
>
> James F.
>
> On Dec 12, 2013, at 11:28 PM, David <ainut at knology.net> wrote:
>
> IIRC, the good thing about this cluster is the automagic load leveling.
> Your existing binary may not run at max optimization but if the task can be
> spread among processors, Beowulf does a nice job of it.  If each computer
> has it's own GPU(s), then all the better.
>
> You can test it right there without changing anything on the system's
> disks.  Just create and run all the cluster members off a CD.
>
> Then to test, pick the fastest one of them (maybe even your existing Xeon
> box), run your benchmark, record execution time, then boot all the other
> machines in the cluster and run it again.  There are only about two dozen
> steps to set it up.  One professor even put most of those, along with
> automatic cluster setup(!) as a downloadable you can boot off of.  That
> leaves half a dozen steps to tweak the cluster together, then you're good
> to go.  I have one of those CD's around here somewhere and I can get
> details if you're interested.  Something to play with.  I did it with only
> 4 pc's around the house with some code and even though the code was never
> designed for a cluster (just multiprocessing), I got about 40% decrease in
> execution time.  The code was almost completely linear execution so I'm
> surprised it got any improvement but it did.
>
> David
>
>
> Stephan Henning wrote:
>
> -WD
>
>  I believe it's either ext3 or ext4, I'd have to ssh in and check when I
> get back on Monday.
>
>  -David
>
>  I'll check into the Beowulf and see what that would entail. I'll try and
> talk with the developer and see what their thoughts are on the feasibility
> of running it on a cluster. They may have already gone down this path and
> rejected it, but I'll check anyway.
>
>
> On Thu, Dec 12, 2013 at 6:16 PM, David <ainut at knology.net> wrote:
>
>> Sounds like a perfect candidate for a Beowulf cluster to me.  There are
>> possibly some gotcha's but you'll have the same problems with just a single
>> computer.
>>
>> Velly intewesting.
>>
>> Stephan Henning wrote:
>>
>>>  -WD
>>>
>>> The GPUs are sent data in chunks that they then process and return. The
>>> time it takes a GPU to process a chunk can vary, so I assume the bottle
>>> necks we were seeing was when several of the GPU cores would finish at
>>> about the same time and request a new chunk and the chunk they needed
>>> wasn't already in RAM, so the drive array would take a heavy hit.
>>>
>>> Beyond that, I can't really give you a numerical value as to the amount
>>> of data they are dumping into the pcie bus.
>>>
>>>
>>> -David
>>>
>>> Ya, not sure an FPGA exists large enough for this, it would be
>>> interesting though.
>>>
>>> While the process isn't entirely sequential, data previously processed
>>> is reused in the processing of other data, so that has kept us away from
>>> trying a cluster approach.
>>>
>>> Depending on the problem, anywhere from minutes per iteration, to weeks
>>> per iteration. The weeks long problems are sitting at about 3TB I believe.
>>> We've only run benchmark problems on the SSDs up till now, so we haven't
>>> had the experience of seeing how they react once they start really getting
>>> full.
>>>
>>>  Sadly, 2TB of RAM would not be enough. I looked into this Dell box (
>>> http://www8.hp.com/us/en/products/proliant-servers/product-detail.html?oid=4231377#!tab=features<
>>> http://www8.hp.com/us/en/products/proliant-servers/product-detail.html?oid=4231377#%21tab=features>)
>>> that would take 4TB, but the costs were insane and it can't support enough
>>> GPUs to actually do anything with the RAM...
>>>
>>>
>>>
>>>  <<<snip>>>
>>
>>
>> _______________________________________________
>> General mailing list
>> General at lists.makerslocal.org
>> http://lists.makerslocal.org/mailman/listinfo/general
>>
>
>
>
> _______________________________________________
> General mailing listGeneral at lists.makerslocal.orghttp://lists.makerslocal.org/mailman/listinfo/general
>
>
> _______________________________________________
> General mailing list
> General at lists.makerslocal.org
> http://lists.makerslocal.org/mailman/listinfo/general
>
>
> _______________________________________________
> General mailing list
> General at lists.makerslocal.org
> http://lists.makerslocal.org/mailman/listinfo/general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.makerslocal.org/pipermail/general/attachments/20131213/af92a214/attachment.html>


More information about the General mailing list