Large language models (like ChatGPT) have billions of trainable parameters. LoRA fine-tunes large language models by freezing the original weights and introducing a small set of trainable parameters, enabling efficient adaptation to specific tasks.1
Reducing transfer learning complexity lowers computation and memory costs for fine-tuning models. Other model adaptation techniques lower computation and memory costs at the expense of adding inference latency, making model response slower. LoRA applies low-rank updates to weights at select layers without modifying the entire weight matrix.
Multiple LoRA layers can be trained for specific purposes and swapped in models to change specializations. Beyond natural language processing, LoRA's approach to efficient fine-tuning holds potential for applications in fields like computer vision and graph neural networks. Because LoRA is low rank compared to full-fine tuning, the technique may have difficulty capturing very specific tasks and patterns.
The major downside of fine-tuning is that the new model contains as many parameters as the original model.
Related Methods
Pre-LoRA (2021): "Adapters", prompt fine-tuning (like prefix tuning), which is hard.
Post-LoRA -> IA3 (2022), OFT/BOFT (2023)
1 Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-rank adaptation of large language models. arXiv.
https://arxiv.org/abs/2106.09685