When Gradient Descent Meets Derivative-Free Optimization: A Match Made in Black-Box Scenario

  • Chengcheng Han
  • , Liqing Cui
  • , Renyu Zhu
  • , Jianing Wang
  • , Nuo Chen
  • , Qiushi Sun
  • , Xiang Li
  • , Ming Gao*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Large pre-trained language models (PLMs) have garnered significant attention for their versatility and potential for solving a wide spectrum of natural language processing (NLP) tasks. However, the cost of running these PLMs may be prohibitive. Furthermore, PLMs may not be open-sourced due to commercial considerations and potential risks of misuse, such as GPT-3. The parameters and gradients of PLMs are unavailable in this scenario. To solve the issue, black-box tuning has been proposed, which utilizes derivative-free optimization (DFO), instead of gradient descent, for training task-specific continuous prompts. However, these gradient-free methods still exhibit a significant gap compared to gradient-based methods. In this paper, we introduce gradient descent into black-box tuning scenario through knowledge distillation. Furthermore, we propose a novel method GDFO, which integrates gradient descent and derivative-free optimization to optimize task-specific continuous prompts in a harmonized manner. Experimental results show that GDFO can achieve significant performance gains over previous state-of-the-art methods.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics, ACL 2023
PublisherAssociation for Computational Linguistics (ACL)
Pages868-880
Number of pages13
ISBN (Electronic)9781959429623
DOIs
StatePublished - 2023
EventFindings of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada
Duration: 9 Jul 202314 Jul 2023

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

ConferenceFindings of the Association for Computational Linguistics, ACL 2023
Country/TerritoryCanada
CityToronto
Period9/07/2314/07/23

Fingerprint

Dive into the research topics of 'When Gradient Descent Meets Derivative-Free Optimization: A Match Made in Black-Box Scenario'. Together they form a unique fingerprint.

Cite this