High Performance Computing (HPC) resources are usually exploited nowadays to foster the frontier of human knowledge beyond the current limits, in an ever growing number of scientific application fields. However, some constraints may pose insuperable limits to the growth rates of extending the scale of scientific computing applications. These constraints may include but not limited to the nature computational model, the allocated computational resources, development technology may not match expectations, and the technical constraints.
IBM Shared University Research (SUR) program is a worldwide equipment award program designed to promote research in areas of mutual value and interest to IBM and universities, exploring new open research model of collaboration with academic institutions. Under the umbrella of this program and through one of its awards within the “CEEMEA Blue Gene Research Collaboration and Community Building” project, which aims at building capacity in developing efficient large scale scientific computing applications and how to overcome the challenges and constraints of such applications,researchers from West University of Timisoara, Romania, Ain Shams University, Egypt and IBM Center for Advanced Studies in Cairo are teamed up to Cluster Large Multispectral Remote Sensing Images using HPC platforms namely, Computational Cluster and Blue Gene/P supercomputer.
Fuzzy clustering is one of the most frequently used methods for identifying homogeneous regions in remote sensing images. This project aimed at parallelizing different variants of the Fuzzy c-Means (FCM) algorithm, which incorporate spatial information, e.g. Spatial FCM (SFCM) and Gaussian Kernel-based FCM with spatial bias correction (GKFCM). The high-level requirements that guided the formulation of the proposed parallel implementations are: (i) find appropriate partitioning of large images in order to ensure a balanced load of processors; (ii) use as much as possible the collective computations; (iii) reduce the cost of communications between processors. The parallel implementations were tested through several test cases including multispectral images and images having a large number of pixels. The experiments were conducted on both a computational cluster with up to 128 processors and a BlueGene/P supercomputer with up to 1024 processors.
Generally, good scalability was obtained both with respect to the number of clusters and the number of spectral bands. The speedup comparisons revealed that the parallel implementation on BlueGene/P is better in the case of large images with a smaller number of spectral bands than in the case of smaller images but with a large number of spectral bands. This can be explained by the fact that spatial partitioning has more benefits in the case of images containing a large number of pixels. Moreover, in the case of multispectral images all operations involving distance and centroid computations are vector-based operations, thus particular care in optimizing these operations is needed. Generally, further optimization opportunities are still reachable for the BlueGene/P,asthe research work is still conducted with partial support by the European Commission FP7 project HP-SEE for research activities.
West University of Timisoara, Romania
Ain Shams University, Egypt
IBM Center of Advanced Studies in Cairo, IBM Egypt Branch