[ML-General] Cluster Computing

Thu Jan 22 18:43:30 CST 2015

@mc
Both. If I start to scale this to a large number of nodes I can foresee many headaches if I can't easily push modifications and updates. From the job distribution side, it would be great to maintain compatibility with condor, I'm just unsure how well it will operate if it has to hand jobs off to the head node that then get distributed out further. 

@ Brian
Our current cluster is made up of discrete machines only about 20 nodes. Many of the nodes are actual user workstations that are brought in when inactive. There is no uniform provisioning method. Every box has a slightly different hardware configuration. Thankfully we do a pretty good job keeping all required software aligned to the sam version. 

The VM idea is interesting. I hadn't considered that. I will need to think on that and how I might be able to implement it. 

@david
Yup, I'm fully aware this level of distributed computing is only good for specific cases. I understand your position, thanks. 

-stephan

---———---•---———---•---———---
Sent from a mobile device, please excuse the spelling and brevity. 

> On Jan 22, 2015, at 5:54 PM, Brian Oborn <linuxpunk at gmail.com> wrote:
> 
> I would be tempted to just copy what the in-house cluster uses for provisioning. That will save you a lot of time and make it easier to integrate with the larger cluster if you choose to do so. Although it can be tempting to get hardware in your hands, I've done a lot of work with building all of the fiddly Linux bits (DHCP+TFTP+root on NFS+NFS home) in several VMs before moving to real hardware. You can set up a private VM-only network between your head node and the slave nodes and work from there.
> 
>> On Thu, Jan 22, 2015 at 5:31 PM, Michael Carroll <carroll.michael at gmail.com> wrote:
>> So is your concern with provisioning and setup or with actual job distribution?
>> 
>> ~mc mobile
>> 
>>> On Jan 22, 2015, at 17:15, Stephan Henning <shenning at gmail.com> wrote:
>>> 
>>> This is a side project for the office. Sadly, most of this type of work can't be farmed out to external clusters, otherwise we would use it for that. We do currently utilize AWS for some of this type work, but only for internal R&D.
>>> 
>>> This all started when the Intel Edison got released. Some of us were talking about it one day and realized that it might have just enough processing power and ram to handle some of our smaller problems. We've talked about it some more and the discussion has evolved to the point where I've been handed some hours and a small amount of funding to try and implement a 'cluster-in-a-box'. 
>>> 
>>> The main idea being to rack a whole bunch of mini-itx boards on edge into a 4U chassis (yes, they will fit). Assuming a 2" board-board clearance across the width of the chassis and 1" spacing back-to-front down the depth of a box, I think I could fit 27 boards into a 36" deep chassis, with enough room for the power supplies and interconnects. 
>>> 
>>> Utilizing embedded motherboards with Atom C2750 8-core CPU's and 16gb of ram per board, that should give me a pretty substantial cluster to play with.  Obviously I am starting small, probably with two or three boards running Q2900 4-core cpus until I can get the software side worked out.
>>> 
>>> The software-infrastructure side is the part I'm having a hard time with. While there are options out there for how to do this, they are all relatively involved and there isn't an obvious 'best' choice to me right now. Currently our in-house HPC cluster utilizes HTCondor for it's backbone, so I would like to maintain some sort of connection to it. Otherwise, I'm seeing options in the Beowulf and Rocks areas that could be useful, I'm just not sure where to start in all honesty. 
>>> 
>>> At the end of the day this needs to be relatively easy for us to manage (time spent working on the cluster is time spent not billing the customer) while being easy enough to add notes to, assuming this is a success and I get the OK to expand it to a full 42U racks worth. 
>>> 
>>> 
>>> Our current cluster is almost always fully utilized. Currently we've got about a 2 month backlog of jobs on it. 
>>> 
>>> 
>>>> On Thu, Jan 22, 2015 at 4:55 PM, Brian Oborn <linuxpunk at gmail.com> wrote:
>>>> If you can keep your utilization high, then your own hardware can be much more cost effective. However, if you end up paying depreciation and maintenance on a cluster that's doing nothing most of the time you'd be better off in the cloud.
>>>> 
>>>>> On Thu, Jan 22, 2015 at 4:50 PM, Michael Carroll <carroll.michael at gmail.com> wrote:
>>>>> Depending on what you are going to do, it seems like it would make more sense to use AWS or Digital Ocean these days, rather than standing up your own hardware. Maintaining your own hardware sucks.
>>>>> 
>>>>> That being said, if you are doing something that requires InfiniBand, then hardware is your only choice :)
>>>>> 
>>>>> ~mc
>>>>> 
>>>>>> On Thu, Jan 22, 2015 at 4:43 PM, Joshua Pritt <ramgarden at gmail.com> wrote:
>>>>>> My friends and I installed a Beowulf cluster on a closet full of Pentium 75 Mhz machines we were donated just for fun many years ago back when Beowulf was just getting popular.  We never figured out anything to do with it though...
>>>>>> 
>>>>>>> On Thu, Jan 22, 2015 at 5:31 PM, Brian Oborn <linuxpunk at gmail.com> wrote:
>>>>>>> In my previous job I set up several production Beowulf clusters, mainly for particle physics simulations and this has been an area of intense interest for me. I would be excited to help you out and I think I could provide some good assistance.
>>>>>>> 
>>>>>>> Brian Oborn (aka bobbytables)
>>>>>>> 
>>>>>>> 
>>>>>>>> On Thu, Jan 22, 2015 at 4:25 PM, Stephan Henning <shenning at gmail.com> wrote:
>>>>>>>> Does anyone on the mailing list have any experience with setting up a cluster computation system? If so and you are willing to humor my questions, I'd greatly appreciate a few minutes of your time. 
>>>>>>>> 
>>>>>>>> -stephan
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> General mailing list
>>>>>>>> General at lists.makerslocal.org
>>>>>>>> http://lists.makerslocal.org/mailman/listinfo/general
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> General mailing list
>>>>>>> General at lists.makerslocal.org
>>>>>>> http://lists.makerslocal.org/mailman/listinfo/general
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> General mailing list
>>>>>> General at lists.makerslocal.org
>>>>>> http://lists.makerslocal.org/mailman/listinfo/general
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> General at lists.makerslocal.org
>>>>> http://lists.makerslocal.org/mailman/listinfo/general
>>>> 
>>>> 
>>>> _______________________________________________
>>>> General mailing list
>>>> General at lists.makerslocal.org
>>>> http://lists.makerslocal.org/mailman/listinfo/general
>>> 
>>> _______________________________________________
>>> General mailing list
>>> General at lists.makerslocal.org
>>> http://lists.makerslocal.org/mailman/listinfo/general
>> 
>> _______________________________________________
>> General mailing list
>> General at lists.makerslocal.org
>> http://lists.makerslocal.org/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> General at lists.makerslocal.org
> http://lists.makerslocal.org/mailman/listinfo/general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.makerslocal.org/pipermail/general/attachments/20150122/4d2ab991/attachment.html>