A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation

ZHENG Senwei; KOU Jiaqing; ZHANG Weiwei

doi:10.21656/1000-0887.450167

Volume 46 Issue 1

Jan. 2025

Turn off MathJax

Article Contents

Article Navigation > Applied Mathematics and Mechanics > 2025 > 46(1): 40-54

ZHENG Senwei, KOU Jiaqing, ZHANG Weiwei. A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation[J]. Applied Mathematics and Mechanics, 2025, 46(1): 40-54. doi: 10.21656/1000-0887.450167

Citation:

PDF( 3548 KB)

A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation

doi: 10.21656/1000-0887.450167

ZHENG Senwei¹,
KOU Jiaqing^1,2,3,
ZHANG Weiwei^{1,2,3
,
,}

1. School of Aeronautics, Northwestern Polytechnical University, Xi’an 710072, P.R.China;
2. International Joint Institute of Artificial Intelligence on Fluid Mechanics, Northwestern Polytechnical University, Xi’an 710072, P.R.China;
3. National Key Laboratory of Aircraft Configuration Design, Xi’an 710072, P.R.China

Received Date: 2024-06-05
Rev Recd Date: 2024-07-10

Abstract

Abstract

Due to low computational power consumption and high efficiency, GPUs/TPUs/NPUs with single/half-precision computing units make the main computing mode for artificial intelligence, but they can’t be directly applied to solve differential equations requiring high floating-point accuracy, nor can they directly replace double-precision units. With the combined advantages of single and double precisions, a mixed-precision solution scheme balancing efficiency and accuracy, was proposed for large sparse linear equations. The sparse GMRES-IR algorithm for large sparse matrices was developed. Firstly, the characteristics of matrix data distributions in fluid dynamics simulation problems were analyzed. With double precision for pre-processing and single precision for detailed iteration, the single precision calculation was applied to the main time-consuming part of the algorithm, to enhance computational efficiency. Solutions of 33 linear equation systems from open-source datasets validate the accuracy and efficiency of the proposed method. The results show that, on a single-core CPU, under the same accuracy requirements, the proposed mixed-precision algorithm can achieve an acceleration effect of up to 2.5 times, and the effect is more prominent for large-scale matrices.
- mixed-precision,
- computational fluid dynamics,
- linear equations,
- GMRES

FullText(HTML)

References(27)

References

[2]CHOQUETTE J, GANDHI W, GIROUX O, et al. NVIDIA A100 tensor core GPU: performance and innovation[J].IEEE Micro,2021,41(2): 29-35.

JIMENEZ J. Computing high-Reynolds-number turbulence: will simulations ever replace experiments?[J]. Journal of Turbulence,2003,4. DOI: 10.1088/1468-5248/4/1/022.

[3]RAVIKUMAR A, SRIRAMAN H. A novel mixed precision distributed TPU GAN for accelerated learning curve[J].Computer Systems Science and Engineering,2023,46(1): 563-578.

[4]NOVITSKIY I M, KUTATELADZE A G. DU8ML: machine learning-augmented density functional theory nuclear magnetic resonance computations for high-throughput in silico solution structure validation and revision of complex alkaloids[J].Journal of Organic Chemistry,2022,87(7): 4818-4828.

[5]HAIDAR A, TOMOV S, DONGARRA J, et al. Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers[C]//SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, TX, USA: IEEE, 2018: 603-613.

[6]DU S, BHATTACHARYA C B, SEN S. Maximizing business returns to corporate social responsibility (CSR): the role of CSR communication[J]. International Journal of Management Reviews,2010,12(1): 8-19.

[7]DENG L, LI G, HAN S, et al. Model compression and hardware acceleration for neural networks: a comprehensive survey[J].Proceedings of the IEEE,2020,108(4): 485-532.

[8]BAI Y, WANG Y X, LIBERTY E. ProxQuant: quantized neural networksvia proximal operators[J/OL]. 2018[2024-07-10]. https://arxiv.org/abs/1810.00861v3.

[9]BUTTARI A, DONGARRA J, KURZAK J, et al. Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy[J].ACM Transactions on Mathematical Software,2008,34(4): 1-22.

[10]陈逸, 刘博生, 徐永祺, 等. 混合精度频域卷积神经网络FPGA加速器设计[J]. 计算机工程, 2023,49(12): 1-9.(CHEN Yi, LIU Bosheng, XU Yongqi, et al. FPGA accelerator design for hybrid precision frequency domain convolutional neural network[J]. Computer Engineering,2023,49(12): 1-9.(in Chinese))

[11]AMESTOY P R, DUFF I S, L’EXCELLENT J Y. Multifrontal parallel distributed symmetric and unsymmetric solvers[J].Computer Methods in Applied Mechanics and Engineering,2000,184(2/3/4): 501-520.

[12]LI X S, DEMMEL J W. SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems[J]. ACM Transactions on Mathematical Software,2003,29(2): 110-140.

[13]HOGG J D, SCOTT J A. A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems[J].ACM Transactions on Mathematical Software,2010,37(2): 1-24.

[14]CARSON E, HIGHAM N J. A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems[J].SIAM Journal on Scientific Computing,2017,39(6): A2834-A2856.

[15]HIGHAM N J, PRANESH S. Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems[J].SIAM Journal on Scientific Computing,2021,43(1): A258-A277.

[16]LOE J A, GLUSA C A, YAMAZAKI I, et al. A study of mixed precision strategies for GMRES on GPUs[J/OL]. 2021[2024-07-10]. https://arxiv.org/abs/2109.01232v1.

[17]AMESTOY P, BUTTARI A, HIGHAM N J, et al. Five-precision GMRES-based iterative refinement[J].SIAM Journal on Matrix Analysis and Applications,2024,45(1): 529-552.

[18]HAIDAR A, BAYRAKTAR H, TOMOV S, et al. Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems[J].Proceedings of the Royal Society A:Mathematical,Physical and Engineering Sciences,2020,476(2243): 20200110.

[19]ZOUNON M, HIGHAM N J, LUCAS C, et al. Performance impact of precision reduction in sparse linear systems solvers[J].PeerJ Computer Science,2022,8: e778.

[20]GRATTON S, SIMON E, TITLEY-PELOQUIN D, et al. Exploiting variable precision in GMRES[EB/OL]. 2019[2024-07-10]. https://arxiv.org/abs/1907.10550v2.

[21]GIRAUD L, HAIDAR A, WATSON L T. Mixed-precision preconditioners in parallel domain decomposition solvers[M]//Lecture Notes in Computational Science and Engineering. Berlin: Springer, 2008: 357-364.

[22]GOBEL F, GRUTZMACHER T, RIBIZEL T, et al. Mixed precision incomplete and factorized sparse approximate inverse preconditioning on GPUs[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2021: 550-564.

[23]陈华, 史悦戎. 基于GPU的重启PGMRES并行算法研究[J]. 计算机工程与应用, 2014,50(7): 35-40.(CHEN Hua, SHI Yuerong. Study on restarted PGMRES parallel algorithm with GPU[J]. Computer Engineering and Applications,2014,50(7): 35-40.(in Chinese))

[24]冯选燕, 燕振国, 朱华君, 等. 非精确Newton方法中线性迭代收敛判据研究[J]. 空气动力学学报, 2023,41(12): 28-36.(FENG Xuanyan, YAN Zhenguo, ZHU Huajun, et al. Study on the convergence criterion of linear iteration in inexact Newton methods[J]. Acta Aerodynamica Sinica,2023,41(12): 28-36.(in Chinese))

[25]贡伊明, 刘战合, 刘溢浪, 等. 时间谱方法中的高效GMRES算法[J]. 航空学报, 2017,38(7): 120894.(GONG Yiming, LIU Zhanhe, LIU Yilang, et al. Efficient GMRES algorithm in time spectral method[J]. Acta Aeronautica et Astronautica Sinica,2017,38(7): 120894.(in Chinese))

[26]伍康, 吕毅斌, 石允龙, 等. 有界多连通区域数值保角变换的GMRES(m)法[J]. 应用数学和力学, 2022,43(9): 1026-1033.(WU Kang, L Yibin, SHI Yunlong, et al. The GMRES(m) method for numerical conformal mapping of bounded multi-connected domains[J]. Applied Mathematics and Mechanics,2022,43(9): 1026-1033.(in Chinese))

[27]肖文可, 陈星玎. 求解PageRank问题的重启GMRES修正的多分裂迭代法[J]. 应用数学和力学, 2022,43(3): 330-340.(XIAO Wenke, CHEN Xingding. A modified multi-splitting iterative method with the restarted GMRES to solve the PageRank problem[J]. Applied Mathematics and Mechanics,2022,43(3): 330-340.(in Chinese))

Relative Articles

Supplements(0)

Cited By

Proportional views