TY - JOUR
T1 - 大小模型端云协同进化技术进展
AU - Yongwei, Wang
AU - Tao, Shen
AU - Shengyu, Zhang
AU - Fan, Wu
AU - Zhou, Zhao
AU - Haibin, Cai
AU - Chengfei, Lyu
AU - Lizhuang, Ma
AU - Chenglei, Yang
AU - Fei, Wu
N1 - Publisher Copyright:
© 2024 Editorial and Publishing Board of JIG. All rights reserved.
PY - 2024/6
Y1 - 2024/6
N2 - Generative foundation models are facilitating significant transformations in the field of artificial intelligence. They demonstrate general artificial intelligence in diverse research fields,including natural language processing,multimodal content understanding,imagery,and multimodal content synthesis. Generative foundation models often consist of billions or even hundreds of billions of parameters. Thus,they are often deployed on the cloud side to provide powerful and general intelligent services. However,this type of service can be confronted with crucial challenges in practice,such as high latency induced by communications between the cloud and local devices,and insufficient personalization capabilities due to the fact that servers often do not have access to local data considering privacy concerns. By contrast,low-complexity lightweight models are located at the edge side to capture personalized and dynamic scenario data. However,they may suffer from poor generalization. Large and lightweight(or large-small)model collaboration aims to integrate the general intelligence of large foundation models and the personalized intelligence of small lightweight models. This integration empowers downstream vertical domain-specific applications through the interaction and collaboration of both types of intelligent models. Large and small model collaboration has recently attracted increasing attention and becomes the focus of research and development in academia and industry. It has also been predicted to be an important trend in technology. We therefore try to thoroughly investigate this area by highlighting recent progress and bringing potential inspirations for related research. In this study,we first overview representative large language models(LLMs)and large multimodal models. We focus on their mainstream Transformer-based model architectures including encoder-only,decoder-only,and encoder-decoder models. Corresponding pre-training technologies such as next sentence prediction,sequence-to-sequence modeling,contrastive learning,and parameter-efficient fine-tuning methods with representatives including low-rank adaptation and prompt tuning are also explored. We then review the development history and the latest advancement of model compression techniques,including model pruning,model quantization,and knowledge distillation in the era of foundation models. Based on the differences in terms of model collaboration purposes and mechanisms,we propose a new classification method and taxonomies for the large-small model collaboration study,namely,collaborative training,collaborative inference,and collaborative planning. Specifically,we summarize recent and representative methods that consist of dual-directional knowledge distillation between large models at the cloud side and small models deployed at the edge side,modular design of intelligent models that split functional models between the cloud and edge,and generative agents that collaborate to complete more complex tasks in an autonomous and intelligent manner. In collaborative training,a main challenge is dealing with the heterogeneity in data distribution and model architectures between the cloud and client sides. Data privacy may also be a concern during collaborative training,particularly in privacy sensitive cases. Despite much progress in collaborative inference,slicing and completing a complicated task in a collective way automatically remain challenging. Furthermore,the communication costs between computing facilities might be another concern. Collective planning is a new paradigm that gains attention with the increasing study and promising progress of LLM-centric agents(LLM agents). This paradigm often involves multiple LLM agents who compete or cooperate together to complete a challenging task. It often leverages emerging capabilities such as in-context learning and chain-of-thoughts of LLMs to automatically dive a complicated task into several subtasks. By completing and assembling different subtasks,the global task can be conducted in a collaborative manner. This scheme finds diverse applications such as developing games and simulating social societies. However,it may suffer from drawbacks inherent in LLMs,including hallucination and adversarial vulnerabilities. Thus,more robust and reliable collaborative planning schemes remain to be investigated. In summary,this work surveys the large-small model collaboration techniques from the perspectives of generative foundation models,model compression,and heterogeneous model collaboration via LLM agents. This work also compares the advantages and disadvantages between international and domestic technology developments in this research realm. We conclude that,although the gaps are narrowing between domestic and advanced international studies in this area,particularly for newly emerging LLM agents,we may still lack original and major breakthroughs. Certain notable advantages of domestic progress are closely related to industrial applications due to its rich data resources from industries. Therefore,the development of domain specific LLMs is advanced. In addition,this study envisions the applications of large-small model collaboration and discusses certain key challenges and promising directions in this topic. 1)The design of efficient model architectures includes developing new model architectures that can achieve low-complexity inference speed while maintaining efficient long-sequence modeling abilities as Transformers and further improving the scalability of mixture-of-expert-based architectures. 2)Current model compression methods are mainly designed for vision models. Thus,developing techniques specifically for LLMs and large multimodal models is important to preserve their emergent abilities during compression. 3)Existing personalization methods specially focus on discriminative models,and due attention needs to be paid for efficient personalization for generative foundation models. 4)Generative intelligence often suffers from fraudulent contents(e. g. ,generated fake imagery,deepfake videos,and fake news)and different types of attacks(e. g. ,adversarial attacks,the jailing breaking attacks,and the Byzantine attacks). Thus,security and trustworthy issues arise in their practical applications. Therefore,this study also advocates a deeper investigation of these emerging security threats. Then,it develops effective defenses accordingly to countermeasure these crucial issues during large-small model collaboration for empowering vertical domains more safely.
AB - Generative foundation models are facilitating significant transformations in the field of artificial intelligence. They demonstrate general artificial intelligence in diverse research fields,including natural language processing,multimodal content understanding,imagery,and multimodal content synthesis. Generative foundation models often consist of billions or even hundreds of billions of parameters. Thus,they are often deployed on the cloud side to provide powerful and general intelligent services. However,this type of service can be confronted with crucial challenges in practice,such as high latency induced by communications between the cloud and local devices,and insufficient personalization capabilities due to the fact that servers often do not have access to local data considering privacy concerns. By contrast,low-complexity lightweight models are located at the edge side to capture personalized and dynamic scenario data. However,they may suffer from poor generalization. Large and lightweight(or large-small)model collaboration aims to integrate the general intelligence of large foundation models and the personalized intelligence of small lightweight models. This integration empowers downstream vertical domain-specific applications through the interaction and collaboration of both types of intelligent models. Large and small model collaboration has recently attracted increasing attention and becomes the focus of research and development in academia and industry. It has also been predicted to be an important trend in technology. We therefore try to thoroughly investigate this area by highlighting recent progress and bringing potential inspirations for related research. In this study,we first overview representative large language models(LLMs)and large multimodal models. We focus on their mainstream Transformer-based model architectures including encoder-only,decoder-only,and encoder-decoder models. Corresponding pre-training technologies such as next sentence prediction,sequence-to-sequence modeling,contrastive learning,and parameter-efficient fine-tuning methods with representatives including low-rank adaptation and prompt tuning are also explored. We then review the development history and the latest advancement of model compression techniques,including model pruning,model quantization,and knowledge distillation in the era of foundation models. Based on the differences in terms of model collaboration purposes and mechanisms,we propose a new classification method and taxonomies for the large-small model collaboration study,namely,collaborative training,collaborative inference,and collaborative planning. Specifically,we summarize recent and representative methods that consist of dual-directional knowledge distillation between large models at the cloud side and small models deployed at the edge side,modular design of intelligent models that split functional models between the cloud and edge,and generative agents that collaborate to complete more complex tasks in an autonomous and intelligent manner. In collaborative training,a main challenge is dealing with the heterogeneity in data distribution and model architectures between the cloud and client sides. Data privacy may also be a concern during collaborative training,particularly in privacy sensitive cases. Despite much progress in collaborative inference,slicing and completing a complicated task in a collective way automatically remain challenging. Furthermore,the communication costs between computing facilities might be another concern. Collective planning is a new paradigm that gains attention with the increasing study and promising progress of LLM-centric agents(LLM agents). This paradigm often involves multiple LLM agents who compete or cooperate together to complete a challenging task. It often leverages emerging capabilities such as in-context learning and chain-of-thoughts of LLMs to automatically dive a complicated task into several subtasks. By completing and assembling different subtasks,the global task can be conducted in a collaborative manner. This scheme finds diverse applications such as developing games and simulating social societies. However,it may suffer from drawbacks inherent in LLMs,including hallucination and adversarial vulnerabilities. Thus,more robust and reliable collaborative planning schemes remain to be investigated. In summary,this work surveys the large-small model collaboration techniques from the perspectives of generative foundation models,model compression,and heterogeneous model collaboration via LLM agents. This work also compares the advantages and disadvantages between international and domestic technology developments in this research realm. We conclude that,although the gaps are narrowing between domestic and advanced international studies in this area,particularly for newly emerging LLM agents,we may still lack original and major breakthroughs. Certain notable advantages of domestic progress are closely related to industrial applications due to its rich data resources from industries. Therefore,the development of domain specific LLMs is advanced. In addition,this study envisions the applications of large-small model collaboration and discusses certain key challenges and promising directions in this topic. 1)The design of efficient model architectures includes developing new model architectures that can achieve low-complexity inference speed while maintaining efficient long-sequence modeling abilities as Transformers and further improving the scalability of mixture-of-expert-based architectures. 2)Current model compression methods are mainly designed for vision models. Thus,developing techniques specifically for LLMs and large multimodal models is important to preserve their emergent abilities during compression. 3)Existing personalization methods specially focus on discriminative models,and due attention needs to be paid for efficient personalization for generative foundation models. 4)Generative intelligence often suffers from fraudulent contents(e. g. ,generated fake imagery,deepfake videos,and fake news)and different types of attacks(e. g. ,adversarial attacks,the jailing breaking attacks,and the Byzantine attacks). Thus,security and trustworthy issues arise in their practical applications. Therefore,this study also advocates a deeper investigation of these emerging security threats. Then,it develops effective defenses accordingly to countermeasure these crucial issues during large-small model collaboration for empowering vertical domains more safely.
KW - edge-cloud collaboration
KW - generative AI
KW - generative agents
KW - generative foundation models
KW - large-small model collaboration
KW - model compression
UR - https://www.scopus.com/pages/publications/85196896640
U2 - 10.11834/jig.240011
DO - 10.11834/jig.240011
M3 - 文章
AN - SCOPUS:85196896640
SN - 1006-8961
VL - 29
SP - 1510
EP - 1534
JO - Journal of Image and Graphics
JF - Journal of Image and Graphics
IS - 6
ER -