A Hybrid Framework for IA Drives Improved Elastic Weight Consolidation and Dynamic Knowledge Distillation Improvements
Main Article Content
Abstract
Aiming at the problems of high computational complexity of traditional elastic weight consolidation (EWC), weak generalization ability of AdaDistill, and poor adaptability of non-IID data in continuous learning, an improved hybrid framework driven by Intelligence Augmentation (IA) is proposed. The framework optimizes the parameter protection strategy of EWC through the meta-learning adaptive mechanism, introduces reinforcement learning dynamic regulation to improve the knowledge transfer efficiency of AdaDistill, and designs an adaptive weight fusion module to achieve collaborative optimization between the two. Specifically, in the EWC module, an online sparse Fisher information estimation method is proposed, which reduces the computational complexity from O(K×N²) to O(L×N). In the AdaDistill module, a multi-teacher collaborative distillation and dynamic temperature control mechanism is built to improve the ability of cross-task generalization. The total loss weight was adjusted by the dual feedback of task similarity and learning progress, and the adaptability of non-IID data was enhanced. Experiments on the Split CIFAR-100, Permuted MNIST, and GLUE datasets show that the average accuracy of the framework is improved by 7.2%~9.5%, the forgetting rate is reduced by 42.3%~51.6%, the training time is shortened by 65.8%~73.4%, and the memory usage is reduced by 78.2%~85.1% compared with the traditional EWC and AdaDistill. This framework can provide efficient solutions for continuous learning scenarios such as autonomous driving and medical diagnosis.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
McCloskey M, Cohen N J. Catastrophic interference in connectionist networks: The sequential learning problem[M]//Psychology of learning and motivation. Academic Press, 1989: 109-165.
French R M. Catastrophic forgetting in connectionist networks[J]. Trends in cognitive sciences, 1999, 3(4): 128-135.
Li Z, Hoiem D. Learning without forgetting[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(10): 2335-2347.
Kirkpatrick J, Pascanu R, Rabinowitz N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521-3526.
Zenke F, Poole B, Ganguli D. Continual learning through synaptic intelligence[J]. Advances in neural information processing systems, 2017, 30: 3987-3995.
Han S, Liu X, Mao H, et al. AdaDistill: Adaptive knowledge distillation for incremental learning[J]. IEEE transactions on neural networks and learning systems, 2020, 32(5): 2125-2136.
Wang Y, Yao Q, Kwok J T, et al. Generalizing from a few examples: A survey on few-shot learning[J]. ACM computing surveys, 2020, 53(3): 1-34.
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International conference on machine learning. PMLR, 2017: 1126-1135.
Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017.
Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations[C]//International conference on machine learning. PMLR, 2020: 1597-1607.
Li X, Zhou J, Chen F, et al. Sparse elastic weight consolidation for lifelong learning[C]//Proceedings of the 27th ACM international conference on multimedia. 2019: 2232-2240.
Zhang J, Mishra S, Brynjolfsson E, et al. Online elastic weight consolidation for lifelong learning[J]. arXiv preprint arXiv:1902.10486, 2019.
Wang Z, Liu Z, Liu J, et al. Multi-teacher knowledge distillation for continue learning[C]//2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021: 1-8.
Li Y, Zhang Y, Liu J, et al. Reinforced knowledge distillation for continuous learning[C]//Proceedings of the 29th ACM International Conference on Multimedia. 2021: 2008-2016.
Rusu A A, Rabinowitz N C, Desjardins G, et al. Progressive neural networks[J]. arXiv preprint arXiv:1606.04671, 2016.
Chen X, Liu Z, Zhao J, et al. Graph-based knowledge distillation for continuous learning[C]//2022 IEEE International Conference on Data Mining (ICDM). IEEE, 2022: 161-170.
Rebuffi S A, Kolesnikov A, Sperl G, et al. iCaRL: Incremental classifier and representation learning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2001-2010.
Lopez-Paz D, Ranzato M A. Gradient episodic memory for continual learning[C]//Advances in neural information processing systems. 2017, 30: 6467-6476.
Finn C, Rajeswaran A, Kakade S M, et al. Online meta-learning[C]//Advances in neural information processing systems. 2019, 32: 15009-15020.