The marketplace system connecting demands and supplies has been explored to develop unbiased decision-making in valuing properties. Real estate appraisal serves as one of the high-cost property valuation tasks for financial institutions since it requires domain experts to appraise the estimation based on the corresponding knowledge and the judgment of the market. Existing automated valuation models reducing the subjectivity of domain experts require a large number of transactions for effective evaluation, which is predominantly limited to not only the labeling efforts of transactions but also the generalizability of new developing and rural areas. To learn representations from unlabeled real estate sets, existing self-supervised learning (SSL) for tabular data neglects various important features, and fails to incorporate domain knowledge. In this paper, we propose DoRA, a Domain-based self-supervised learning framework for low-resource Real estate Appraisal. DoRA is pre-trained with an intra-sample geographic prediction as the pretext task based on the metadata of the real estate for equipping the real estate representations with prior domain knowledge. Furthermore, inter-sample contrastive learning is employed to generalize the representations to be robust for limited transactions of downstream tasks. Our benchmark results on three property types of real-world transactions show that DoRA significantly outperforms the SSL baselines for tabular data, the graph-based methods, and the supervised approaches in the few-shot scenarios by at least 7.6% for MAPE, 11.59% for MAE, and 3.34% for HR10%. We expect DoRA to be useful to other financial practitioners with similar marketplace applications who need general models for properties that are newly built and have limited records. The source code is available at https://github.com/wwweiwei/DoRA.
翻译:连接供需的市场体系已被探索用于开发房产估值的无偏决策。房地产评估是金融机构的高成本估值任务之一,因为它需要领域专家根据相应知识和市场判断进行估价。现有自动化估值模型通过减少领域专家的主观性,需要大量交易数据才能进行有效评估,这主要受限于交易的标注成本以及新开发区和农村地区的泛化能力。为从无标签的房地产数据中学习表征,现有针对表格数据的自监督学习(SSL)方法忽视了多种重要特征,且未能融入领域知识。本文提出DoRA,一种面向低资源房地产评估的基于领域的自监督学习框架。DoRA通过基于房地产元数据的样本内地理预测作为预文本任务进行预训练,从而为房地产表征配备先验领域知识。此外,采用样本间对比学习来增强表征的鲁棒性,以应对下游任务有限交易数据的挑战。我们在三种房产类型真实交易数据集上的基准测试结果表明,在少样本场景下,DoRA在MAPE、MAE和HR10%指标上分别比表格数据SSL基线方法、基于图的方法及监督方法至少提升7.6%、11.59%和3.34%。我们期望DoRA能帮助其他面临类似市场应用场景的金融从业者,为其新建且记录有限的房产提供通用模型。源代码已开源:https://github.com/wwweiwei/DoRA。