Compute Cluster Software
Now that the hardware side of my compute cluster project is complete I've turned my attention to the software that each cluster will need to run. One such piece of software is a social network web crawler. The crawler accesses a remote MySQL database to retrieve a batch of links for RSS data feeds which require processing. This was an interesting sub-project with additional educational value.
Because I'm relatively new to Python programming I've encountered relatively minor difficulties along the way. One challenge involved calling a MySQL stored procedure from within Python. For reasons which initially escaped me, my store proc calls were not committing data. To complicate matters the MySQLdb python wrapper wasn't throwing exceptions.
After searching the web I encountered a blog post which described setting the mysql autocommit flag to true in order to force a commit. Problem solved.
def savePost(userID, postID, text, date):
try:
db=MySQLdb.connect(host="",user="",passwd="",db="")
db.autocommit(1)
cur = db.cursor()
cur.execute("call sp_UserPost('%s', '%s', '%s', '%s')"
% (userID, postID, text, date))
cur.close()
db.close()
except MySQLdb.Error, e:
print "Mysql Error %d: %s" % (e.args[0], e.args[1])
Another issue involved properly handling unicode data.
def encodeForDB(text):
text = text.encode('ascii', 'xmlcharrefreplace')
return text.replace("'",""").replace("`",""")
The crawler project offered me an opportunity to work with technologies I'm familiar with under other programming languages - but this time using Python: mysql, low-level sockets, uuid's, unicode, file handling, XML, and string parsing. Not bad for just under a week of part-time effort! I'm currently focusing on a Python REST service using JSON over HTTP... More about that next month.
Compute Cluster is up and running
Back in April I began working on a prototype compute cluster to power a specialized computer chess project. I just happen to have eight Athlon XP-M 2600+ CPUs running at 1.8 GHz - still usable computers by modern day standards.
One of my goals involved eliminating anything with moving parts. This meant eliminating the 40GB hard drive and CDROM as boot devices. I opted for 1GB Flash modules. At this point the only moving parts are two internal fans. Another goal involved creating a custom enclousure for the four node cluster, whereby reducing the cluster's space requirements. I may still persue this goal - but leaving the machines in their original cases is by far the fastest and least expensive option - despite the fact that each machine occupies very little space inside of each case.
Each machine's flash module contains an installation of Damn Small Linux (DSL). Weighing in at only 50MB - DSL leaves plenty of room for software addons. One such addon was Python 2.5 - my light-weight programming weapon of choice.
I've put the chess project on hold in favor of a social networking experiment I'm running at http://www.CocoSci.com The four stacked machines in the photo above form the initial compute cluster while the machine on the lower right of the photo performs higher-level functions. I'm also using three VM's at SliceHost.com and another at Godaddy.com. Overall five machines in play with a total of nine when the custom software for the compute nodes is operational.
Carlos Justiniano: technologist, veteran software developer, world record holder, entrepreneur and author.