Small-time Supercomputing: An idiot’s attempt at running AMICA on a two-lab-computers-cluster using MPI

Small-time Supercomputing: An idiot’s attempt at running AMICA on a two-lab-computers-cluster using MPI

A timelapse of statistically highly correlated mental events:

*Those ICA decompositions do take their time…*  (After the second time running Infomax ICA in BVA)

*Well, this seems quite a bit faster! And it even gives me an estimate of when it finishes!* (Running AMICA on the same system 4 months later)

*What if I installed EEGLab on the other system and ran AMICA on second dataset there as well..? Profit!!* (8 days before pre-processing had to be finished)

*No internet or remote management on the second system and data management across two drives is a mess. There has to be another way…*  (2 hours ago)


That’s about where that story ended, on a perfect cliffhanger.

I am currently in the process of uncovering the possibilities of EEGLab, hoping to introduce, optimize and ease its use in my lab in Munich. In the lab we have two quite powerful computers running Windows (7), one of which is used to display our visual stimuli, record participant’s responses, and send these responses to the other computer that is recording the EEG data. Afterwards, currently only one of these systems is used for further pre processing the data, which is obviously a waste of computational resources.

Having entertained the last mental event in the timelapse above for a couple more weeks now, I felt it was both educational and useful to make an attempt at learning to setup systems to cooperate on a single computational task – not in the least because my upcoming course involving analysis of MRI data will require enough computation time for the current endeavor to hopefully pay itself back eventually.

For our preprocessing of EEG data, one of the most intense computational tasks is to run an Independent Component Analysis algorithm on the data, which is a very promising (-dare I say, broadly accepted?) method to separate statistically independent signals from the mixture that is EEG. In our case, the AMICA algorithm developed at the SCCN Lab by Jason Palmer and colleagues is the weapon of choice. In particular, for our application of ICA to our concatenated datasets of single subjects in multiple stages of movement and body postures, AMICA proved to be the best solution to date.

The AMICA plugin is written for EEGLab, yet its executable is a standalone application, that is able to run parallel on multiple systems on a network, using the MPICH2 protocol. If configured correctly, running the runamica1x.m script in Matlab would harness the power of all machines in the network that are connected via SSH and supporting this protocol – in this case that would bring me a substantial increase of 8 threads to 16, with future possibilities for more expansion.

Is MPI the way to go? Say that I might be writing my own parallel functions at some point, is MPI in that case the optimal solution? If I understand this web page correctly, perhaps there are already smarter solutions available, and from my limited experience, low-level programming might be something best left to non-neuropsychologists.

 

So: I want to maximally use the power of available lab pc’s for my computational needs. Consequence: I need to set up MPI on the lab computers, as the amica executable has been written to use this particular message passing interface protocol. Also: I therefore need to know about SSH.In order to use MPI, one needs to setup an SSH connection between the systems which is a secured communication protocol that prevents others from snooping in. The required software is natively present in Linux (and Unix?) environments, but very recently Microsoft made a move to start to integrate this into their OS as well. Assuming that anything made by the authors of the OS would ultimately be preferable to other solutions, I chose that option over the well known Putty tool. Setting this up on Windows 10 (my personal systems) was a hassle, but ultimately just because I did not read close enough and created a folder named ‘authorized_keys’ where I put the public key of the other machine. Instead one should place an extensionless file named exactly that, containing the contents of the public SSH key generated in the ~user/.ssh folder of the client machine, and all ends up well – problem solved.


 

 

  • To be continued.