DoRA: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal

The marketplace system connecting demands and supplies has been explored to develop unbiased decision-making in valuing properties. Real estate appraisal serves as one of the high-cost property valuation tasks for financial institutions since it requires domain experts to appraise the estimation based on the corresponding knowledge and the judgment of the market. Existing automated valuation models reducing the subjectivity of domain experts require a large number of transactions for effective evaluation, which is predominantly limited to not only the labeling efforts of transactions but also the generalizability of new developing and rural areas. To learn representations from unlabeled real estate sets, existing self-supervised learning (SSL) for tabular data neglects various important features, and fails to incorporate domain knowledge. In this paper, we propose DoRA, a Domain-based self-supervised learning framework for low-resource Real estate Appraisal. DoRA is pre-trained with an intra-sample geographic prediction as the pretext task based on the metadata of the real estate for equipping the real estate representations with prior domain knowledge. Furthermore, inter-sample contrastive learning is employed to generalize the representations to be robust for limited transactions of downstream tasks. Our benchmark results on three property types of real-world transactions show that DoRA significantly outperforms the SSL baselines for tabular data, the graph-based methods, and the supervised approaches in the few-shot scenarios by at least 7.6% for MAPE, 11.59% for MAE, and 3.34% for HR10%. We expect DoRA to be useful to other financial practitioners with similar marketplace applications who need general models for properties that are newly built and have limited records. The source code is available at https://github.com/wwweiwei/DoRA.

翻译：连接需求与供给的市场体系已被探索用于构建无偏的财产估值决策。房地产评估是金融机构高成本财产估值任务之一，因为它需要领域专家基于相应知识和市场判断进行估价。现有自动化估值模型减少了领域专家的主观性，但需要大量交易数据才能有效评估，这不仅受限于交易数据的标注成本，也受限于新兴开发区和农村地区的泛化能力。为从未标注的房地产数据集中学习表征，现有的面向表格数据的自监督学习方法忽略了多种重要特征，且未能融入领域知识。本文提出DoRA——一种基于领域的自监督学习框架，用于低资源房地产评估。DoRA通过基于房地产元数据的样本内地理预测作为前置任务进行预训练，使房地产表征具备先验领域知识。此外，采用样本间对比学习来增强表征的鲁棒性，以应对下游任务中有限的交易数据。我们在三种物业类型的真实交易基准测试结果表明：在少样本场景下，DoRA在MAPE、MAE和HR10%指标上至少比表格数据自监督基线方法、基于图的方法以及监督方法分别提升7.6%、11.59%和3.34%。我们期望DoRA能够帮助其他面临类似市场应用场景的金融从业者，为新建且交易记录有限的物业提供通用模型。源代码已开源至https://github.com/wwweiwei/DoRA。