⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-09-19 更新
DiffGAN: A Test Generation Approach for Differential Testing of Deep Neural Networks for Image Analysis
Authors:Zohreh Aghababaeyan, Manel Abdellatif, Lionel Briand, Ramesh S
Deep Neural Networks (DNNs) are increasingly deployed across applications. However, ensuring their reliability remains a challenge, and in many situations, alternative models with similar functionality and accuracy are available. Traditional accuracy-based evaluations often fail to capture behavioral differences between models, especially with limited test datasets, making it difficult to select or combine models effectively. Differential testing addresses this by generating test inputs that expose discrepancies in DNN model behavior. However, existing approaches face significant limitations: many rely on model internals or are constrained by available seed inputs. To address these challenges, we propose DiffGAN, a black-box test image generation approach for differential testing of DNN models. DiffGAN leverages a Generative Adversarial Network (GAN) and the Non-dominated Sorting Genetic Algorithm II to generate diverse and valid triggering inputs that reveal behavioral discrepancies between models. DiffGAN employs two custom fitness functions, focusing on diversity and divergence, to guide the exploration of the GAN input space and identify discrepancies between models’ outputs. By strategically searching this space, DiffGAN generates inputs with specific features that trigger differences in model behavior. DiffGAN is black-box, making it applicable in more situations. We evaluate DiffGAN on eight DNN model pairs trained on widely used image datasets. Our results show DiffGAN significantly outperforms a SOTA baseline, generating four times more triggering inputs, with greater diversity and validity, within the same budget. Additionally, the generated inputs improve the accuracy of a machine learning-based model selection mechanism, which selects the best-performing model based on input characteristics and can serve as a smart output voting mechanism when using alternative models.
深度神经网络(DNN)正越来越多地应用于各种应用程序中。但是,确保它们的可靠性仍然是一个挑战,并且在许多情况下,存在具有相似功能和准确性的替代模型。传统的基于准确性的评估通常无法捕获模型之间的行为差异,尤其是在有限的测试数据集的情况下,这使得难以有效地选择或组合模型。差分测试通过生成暴露DNN模型行为差异的测试输入来解决这个问题。然而,现有方法面临重大挑战:许多方法依赖于模型内部信息或受到可用种子输入的约束。为了解决这些挑战,我们提出了DiffGAN,这是一种用于DNN模型差分测试的黑色盒子测试图像生成方法。DiffGAN利用生成对抗网络(GAN)和非支配排序遗传算法II来生成多样且有效的触发输入,这些输入揭示了模型之间的行为差异。DiffGAN采用两个自定义的适应度函数,专注于多样性和发散性,以指导GAN输入空间的探索并识别模型输出之间的差异。通过有针对性地搜索这个空间,DiffGAN生成具有特定特征的输入,这些输入会触发模型行为的差异。DiffGAN是黑色盒子,使其适用于更多情况。我们在广泛使用的图像数据集上训练的八对DNN模型上评估DiffGAN。结果表明,DiffGAN显著优于最新技术的基线,在相同的预算内生成了四倍多的触发输入,具有更大的多样性和有效性。此外,生成的输入提高了基于机器学习的模型选择机制的准确性,该机制根据输入特征选择性能最佳的模型,并且可以作为使用替代模型时的智能输出投票机制。
论文及项目相关链接
PDF Accepted into IEEE Transactions on Software Engineering
Summary
本文介绍了深度神经网络(DNN)在应用中部署的可靠性挑战。传统基于准确性的评估方法无法捕捉模型间的行为差异,特别是在有限的测试数据集上,使得难以有效选择或组合模型。为解决此问题,本文提出DiffGAN方法,这是一种用于DNN模型差异测试的测试图像生成方法。DiffGAN利用生成对抗网络(GAN)和非支配排序遗传算法II生成揭示模型行为差异的触发输入。DiffGAN使用两个自定义的适应度函数来指导GAN输入空间的探索,并识别模型输出之间的差异。在八个广泛使用的图像数据集上训练的DNN模型对上进行的评估显示,DiffGAN显著优于基线方法,能够在相同的预算内生成四倍多的触发输入,具有更高的多样性和有效性。此外,生成的输入提高了基于机器学习模型的选型机制的准确性。该机制根据输入特性选择性能最佳的模型,在使用替代模型时可作为智能输出投票机制。
Key Takeaways
- 传统评估难以有效捕捉DNN模型间行为差异,需要新型测试方法。
- DiffGAN是一种基于GAN的测试图像生成方法,用于DNN模型的差异测试。
- DiffGAN利用非支配排序遗传算法II生成多样且有效的触发输入。
- DiffGAN采用两个自定义适应度函数来指导搜索和识别模型行为差异。
点此查看论文截图

