FRANTAR E , ASHKBOOS S , HOEFLER T , et al . GPTQ: accurate post-training quantization for generative pre-trained transformers [J ] . arXiv preprint , arXiv: 2210.17323 , 2022 .
LIN J , TANG J M , TANG H T , et al . AWQ: activation-aware weight quantization for on-device LLM compression and acceleration [J ] . GetMobile Mob Comput Commun , 2023 , 28 : 12 - 17 .
DEANGELO G , BATABYAL A A , KUMAR S . An analysis of economic cost minimization and biological invasion damage control using the AWQ criterion [J ] . The Annals of Regional Science , 2007 , 41 ( 3 ): 639 - 655 .
RAJPUT S , SHARMA T . Benchmarking emerging deep learning quantization methods for energy efficiency [C ] // 2024 IEEE 21st International Conference on Software Architecture Companion (ICSA-C) . Piscataway : IEEE press , 2024 : 238 - 242 .
DAFAVI-NAINI S A A , ALI S , SHAHAB O , et al . Vision-language and large language model performance in gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models [J ] . arXiv preprint , arXiv: 2409.00084 , 2024 .