留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向流体力学仿真的大型稀疏矩阵混合精度GMRES加速算法

郑森炜 寇家庆 张伟伟

郑森炜, 寇家庆, 张伟伟. 面向流体力学仿真的大型稀疏矩阵混合精度GMRES加速算法[J]. 应用数学和力学, 2025, 46(1): 40-54. doi: 10.21656/1000-0887.450167
引用本文: 郑森炜, 寇家庆, 张伟伟. 面向流体力学仿真的大型稀疏矩阵混合精度GMRES加速算法[J]. 应用数学和力学, 2025, 46(1): 40-54. doi: 10.21656/1000-0887.450167
ZHENG Senwei, KOU Jiaqing, ZHANG Weiwei. A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation[J]. Applied Mathematics and Mechanics, 2025, 46(1): 40-54. doi: 10.21656/1000-0887.450167
Citation: ZHENG Senwei, KOU Jiaqing, ZHANG Weiwei. A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation[J]. Applied Mathematics and Mechanics, 2025, 46(1): 40-54. doi: 10.21656/1000-0887.450167

面向流体力学仿真的大型稀疏矩阵混合精度GMRES加速算法

doi: 10.21656/1000-0887.450167
详细信息
    作者简介:

    郑森炜(2001—),男,硕士生(E-mail: senweiz@mail.nwpu.edu.cn);寇家庆(1993—),男,教授,博士生导师(E-mail: jqkou@nwpu.edu.cn);张伟伟(1979—),男,教授,博士生导师(通讯作者. E-mail: aeroelastic@nwpu.edu.cn).

    通讯作者:

    张伟伟(1979—),男,教授,博士生导师(通讯作者. E-mail: aeroelastic@nwpu.edu.cn).

  • 中图分类号: O35

A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation

  • 摘要: 由于计算能耗低、效率高,以单精度/半精度计算单元为主的GPU/TPU/NPU等算力已成为人工智能计算的主要模式,但无法直接应用于浮点精度需求高的微分方程求解,不能直接替代双精度算力.通过结合单/双精度各自的优势,提出了兼顾效率和精度的大型稀疏线性方程组的混合精度求解格式.发展了面向稀疏大型矩阵的GMRES细化迭代算法(sparse GMRES-IR).首先分析了流体力学仿真问题中的矩阵数据分布特点,通过双精度做预处理,单精度细化迭代,使单精度计算应用于算法主要耗时部分,发挥了计算效率优势.通过求解开源数据集提供的33个线性方程组验证了所提出方法的精度和效率.结果表明,在单核CPU上,相同精度要求下,提出的单双混合精度算法可以实现最高2.5倍的加速效果,且在大规模矩阵下效果更突出.
  • [2]CHOQUETTE J, GANDHI W, GIROUX O, et al. NVIDIA A100 tensor core GPU: performance and innovation[J].IEEE Micro,2021,41(2): 29-35.
    JIMENEZ J. Computing high-Reynolds-number turbulence: will simulations ever replace experiments?[J]. Journal of Turbulence,2003,4. DOI: 10.1088/1468-5248/4/1/022.
    [3]RAVIKUMAR A, SRIRAMAN H. A novel mixed precision distributed TPU GAN for accelerated learning curve[J].Computer Systems Science and Engineering,2023,46(1): 563-578.
    [4]NOVITSKIY I M, KUTATELADZE A G. DU8ML: machine learning-augmented density functional theory nuclear magnetic resonance computations for high-throughput in silico solution structure validation and revision of complex alkaloids[J].Journal of Organic Chemistry,2022,87(7): 4818-4828.
    [5]HAIDAR A, TOMOV S, DONGARRA J, et al. Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers[C]//SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, TX, USA: IEEE, 2018: 603-613.
    [6]DU S, BHATTACHARYA C B, SEN S. Maximizing business returns to corporate social responsibility (CSR): the role of CSR communication[J]. International Journal of Management Reviews,2010,12(1): 8-19.
    [7]DENG L, LI G, HAN S, et al. Model compression and hardware acceleration for neural networks: a comprehensive survey[J].Proceedings of the IEEE,2020,108(4): 485-532.
    [8]BAI Y, WANG Y X, LIBERTY E. ProxQuant: quantized neural networksvia proximal operators[J/OL]. 2018[2024-07-10]. https://arxiv.org/abs/1810.00861v3.
    [9]BUTTARI A, DONGARRA J, KURZAK J, et al. Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy[J].ACM Transactions on Mathematical Software,2008,34(4): 1-22.
    [10]陈逸, 刘博生, 徐永祺, 等. 混合精度频域卷积神经网络FPGA加速器设计[J]. 计算机工程, 2023,49(12): 1-9.(CHEN Yi, LIU Bosheng, XU Yongqi, et al. FPGA accelerator design for hybrid precision frequency domain convolutional neural network[J]. Computer Engineering,2023,49(12): 1-9.(in Chinese))
    [11]AMESTOY P R, DUFF I S, L’EXCELLENT J Y. Multifrontal parallel distributed symmetric and unsymmetric solvers[J].Computer Methods in Applied Mechanics and Engineering,2000,184(2/3/4): 501-520.
    [12]LI X S, DEMMEL J W. SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems[J]. ACM Transactions on Mathematical Software,2003,29(2): 110-140.
    [13]HOGG J D, SCOTT J A. A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems[J].ACM Transactions on Mathematical Software,2010,37(2): 1-24.
    [14]CARSON E, HIGHAM N J. A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems[J].SIAM Journal on Scientific Computing,2017,39(6): A2834-A2856.
    [15]HIGHAM N J, PRANESH S. Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems[J].SIAM Journal on Scientific Computing,2021,43(1): A258-A277.
    [16]LOE J A, GLUSA C A, YAMAZAKI I, et al. A study of mixed precision strategies for GMRES on GPUs[J/OL]. 2021[2024-07-10]. https://arxiv.org/abs/2109.01232v1.
    [17]AMESTOY P, BUTTARI A, HIGHAM N J, et al. Five-precision GMRES-based iterative refinement[J].SIAM Journal on Matrix Analysis and Applications,2024,45(1): 529-552.
    [18]HAIDAR A, BAYRAKTAR H, TOMOV S, et al. Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems[J].Proceedings of the Royal Society A:Mathematical,Physical and Engineering Sciences,2020,476(2243): 20200110.
    [19]ZOUNON M, HIGHAM N J, LUCAS C, et al. Performance impact of precision reduction in sparse linear systems solvers[J].PeerJ Computer Science,2022,8: e778.
    [20]GRATTON S, SIMON E, TITLEY-PELOQUIN D, et al. Exploiting variable precision in GMRES[EB/OL]. 2019[2024-07-10]. https://arxiv.org/abs/1907.10550v2.
    [21]GIRAUD L, HAIDAR A, WATSON L T. Mixed-precision preconditioners in parallel domain decomposition solvers[M]//Lecture Notes in Computational Science and Engineering. Berlin: Springer, 2008: 357-364.
    [22]GOBEL F, GRUTZMACHER T, RIBIZEL T, et al. Mixed precision incomplete and factorized sparse approximate inverse preconditioning on GPUs[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2021: 550-564.
    [23]陈华, 史悦戎. 基于GPU的重启PGMRES并行算法研究[J]. 计算机工程与应用, 2014,50(7): 35-40.(CHEN Hua, SHI Yuerong. Study on restarted PGMRES parallel algorithm with GPU[J]. Computer Engineering and Applications,2014,50(7): 35-40.(in Chinese))
    [24]冯选燕, 燕振国, 朱华君, 等. 非精确Newton方法中线性迭代收敛判据研究[J]. 空气动力学学报, 2023,41(12): 28-36.(FENG Xuanyan, YAN Zhenguo, ZHU Huajun, et al. Study on the convergence criterion of linear iteration in inexact Newton methods[J]. Acta Aerodynamica Sinica,2023,41(12): 28-36.(in Chinese))
    [25]贡伊明, 刘战合, 刘溢浪, 等. 时间谱方法中的高效GMRES算法[J]. 航空学报, 2017,38(7): 120894.(GONG Yiming, LIU Zhanhe, LIU Yilang, et al. Efficient GMRES algorithm in time spectral method[J]. Acta Aeronautica et Astronautica Sinica,2017,38(7): 120894.(in Chinese))
    [26]伍康, 吕毅斌, 石允龙, 等. 有界多连通区域数值保角变换的GMRES(m)法[J]. 应用数学和力学, 2022,43(9): 1026-1033.(WU Kang, L Yibin, SHI Yunlong, et al. The GMRES(m) method for numerical conformal mapping of bounded multi-connected domains[J]. Applied Mathematics and Mechanics,2022,43(9): 1026-1033.(in Chinese))
    [27]肖文可, 陈星玎. 求解PageRank问题的重启GMRES修正的多分裂迭代法[J]. 应用数学和力学, 2022,43(3): 330-340.(XIAO Wenke, CHEN Xingding. A modified multi-splitting iterative method with the restarted GMRES to solve the PageRank problem[J]. Applied Mathematics and Mechanics,2022,43(3): 330-340.(in Chinese))
  • 加载中
计量
  • 文章访问数:  13
  • HTML全文浏览量:  2
  • PDF下载量:  2
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-06-05
  • 修回日期:  2024-07-10

目录

    /

    返回文章
    返回