Authors: Hayot-Sasson V, Glatard T
Neuroimaging open-data initiatives have led to increased availability of large scientific datasets. While these datasets are shifting the processing bottleneck from compute-intensive to data-intensive, current standardized analysis tools have yet to adopt strategies that mitigate the costs associated with large data transfers. A major challenge in adapting neuroimaging applications for data-intensive processing is that they must be entirely rewritten. To facilitate data management for standardized neuroimaging tools, we developed Sea, a library that intercepts and redirects application read and write calls to minimize data transfer time. In this paper, we investigate the performance of Sea on three preprocessing pipelines applied to three different neuroimaging datasets on two high-performance computing clusters. Our results demonstrate that Sea provides large speedups (up to 32×) when the shared file system's performance is deteriorated. When the shared file system is not overburdened by other users, performance is unaffected by Sea, suggesting that Sea's overhead is minimal even in cases where its benefits are limited.
Keywords: Data management; High-performance computing; Neuroimaging;
PubMed: https://pubmed.ncbi.nlm.nih.gov/41432812/
DOI: 10.1007/s12021-025-09760-3