Metamorphic Testing and Debugging of Tax Preparation Software

This paper presents a data-driven framework to improve the trustworthiness of US tax preparation software systems. Given the legal implications of bugs in such software on its users, ensuring compliance and trustworthiness of tax preparation software is of paramount importance. The key barriers in developing debugging aids for tax preparation systems are the unavailability of explicit specifications and the difficulty of obtaining oracles. We posit that, since the US tax law adheres to the legal doctrine of precedent, the specifications about the outcome of tax preparation software for an individual taxpayer must be viewed in comparison with individuals that are deemed similar. Consequently, these specifications are naturally available as properties on the software requiring similar inputs provide similar outputs. Inspired by the metamorphic testing paradigm, we dub these relations metamorphic relations. In collaboration with legal and tax experts, we explicated metamorphic relations for a set of challenging properties from various US Internal Revenue Services (IRS) publications including Publication 596 (Earned Income Tax Credit), Schedule 8812 (Qualifying Children/Other Dependents), and Form 8863 (Education Credits). We focus on an open-source tax preparation software for our case study and develop a randomized test-case generation strategy to systematically validate the correctness of tax preparation software guided by metamorphic relations. We further aid this test-case generation by visually explaining the behavior of software on suspicious instances using easy to-interpret decision-tree models. Our tool uncovered several accountability bugs with varying severity ranging from non-robust behavior in corner-cases (unreliable behavior when tax returns are close to zero) to missing eligibility conditions in the updated versions of software.

翻译：本文提出了一种数据驱动框架，旨在提升美国报税软件系统的可信度。鉴于此类软件中漏洞对用户可能产生的法律后果，确保报税软件的合规性与可信度至关重要。开发报税系统调试工具面临的主要障碍在于缺乏明确的规范说明，且难以获取测试预言（oracles）。我们指出，由于美国税法遵循判例法原则，针对单个纳税人报税软件结果的规范必须通过与相似个体的比较来进行评估。因此，这些规范天然地体现为软件对相似输入应产生相似输出的属性要求。受蜕变测试（metamorphic testing）范式启发，我们将这些关系命名为蜕变关系（metamorphic relations）。在与法律及税务专家的协作下，我们根据美国国税局（IRS）的多份出版物（包括第596号出版物《劳动所得税抵免》、附表8812《合格子女/其他被抚养人》及表格8863《教育抵免》）中的多项复杂属性，明确了对应的蜕变关系。我们以一款开源报税软件为案例研究对象，开发了一种随机测试用例生成策略，通过蜕变关系引导系统性地验证报税软件的正确性。我们进一步利用易于理解的决策树模型对可疑实例的软件行为进行可视化解释，以辅助测试用例生成。该工具发现了多个严重程度不一的责任相关漏洞，涵盖从边界情况下的非鲁棒行为（纳税申报金额趋近于零时的不稳定表现）到软件更新版本中遗漏资格条件的缺陷。