跳到主要导航 跳到搜索 跳到主要内容

An Efficient Private GPT Never Autoregressively Decodes

  • Zhengyi Li
  • , Yue Guan
  • , Kang Yang*
  • , Yu Feng
  • , Ning Liu
  • , Yu Yu*
  • , Jingwen Leng*
  • , Minyi Guo
  • *此作品的通讯作者
  • Shanghai Jiao Tong University
  • Shanghai Qi Zhi Institute
  • State Key Laboratory of Cryptology

科研成果: 期刊稿件会议文章同行评审

摘要

The wide deployment of the generative pre-trained transformer (GPT) has raised privacy concerns for both clients and servers. While cryptographic primitives can be employed for secure GPT inference to protect the privacy of both parties, they introduce considerable performance overhead. To accelerate secure inference, this study proposes a public decoding and secure verification approach that utilizes public GPT models, motivated by the observation that securely decoding one and multiple tokens takes a similar latency. The client uses the public model to generate a set of tokens, which are then securely verified by the private model for acceptance. The efficiency of our approach depends on the acceptance ratio of tokens proposed by the public model, which we improve from two aspects: (1) a private sampling protocol optimized for cryptographic primitives and (2) model alignment using knowledge distillation. Our approach improves the efficiency of secure decoding while maintaining the same level of privacy and generation quality as standard secure decoding. Experiments demonstrate a 2.1× ∼ 6.0× speedup compared to standard decoding across three pairs of public-private models and different network conditions.

源语言英语
页(从-至)34410-34428
页数19
期刊Proceedings of Machine Learning Research
267
出版状态已出版 - 2025
已对外发布
活动42nd International Conference on Machine Learning, ICML 2025 - Vancouver, 加拿大
期限: 13 7月 202519 7月 2025

指纹

探究 'An Efficient Private GPT Never Autoregressively Decodes' 的科研主题。它们共同构成独一无二的指纹。

引用此