We present an exploration of cultural norms surrounding online disclosure of information about one's interpersonal relationships (such as information about family members, colleagues, friends, or lovers) on Twitter. The literature identifies the cultural dimension of individualism versus collectivism as being a major determinant of offline communication differences in terms of emotion, topic, and content disclosed. We decided to study whether such differences also occur online in context of Twitter when comparing tweets posted in an individualistic (U.S.) versus a collectivist (India) society. We collected more than 2 million tweets posted in the U.S. and India over a 3 month period which contain interpersonal relationship keywords. A card-sort study was used to develop this culturally-sensitive saturated taxonomy of keywords that represent interpersonal relationships (e.g., ma, mom, mother). Then we developed a high-accuracy interpersonal disclosure detector based on dependency-parsing (F1-score: 86%) to identify when the words refer to a personal relationship of the poster (e.g., "my mom" as opposed to "a mom"). This allowed us to identify the 400K+ tweets in our data set which actually disclose information about the poster's interpersonal relationships. We used a mixed methods approach to analyze these tweets (e.g., comparing the amount of joy expressed about one's family) and found differences in emotion, topic, and content disclosed between tweets from the U.S. versus India. Our analysis also reveals how a combination of qualitative and quantitative methods are needed to uncover these differences; Using just one or the other can be misleading. This study extends the prior literature on Multi-Party Privacy and provides guidance for researchers and designers of culturally-sensitive systems.
翻译:我们探索了Twitter上关于人际关系信息(如家庭成员、同事、朋友或恋人的信息)在线披露的文化规范。文献指出,个人主义与集体主义的文化维度是影响线下沟通在情感、话题和披露内容方面差异的主要因素。我们决定研究这些差异是否也出现在网络环境中,具体比较在个人主义社会(美国)与集体主义社会(印度)发布的推文。我们收集了美国与印度三个月内发布、包含人际关系关键词的超过200万条推文。通过卡片分类研究,我们构建了一种文化敏感的饱和型关键词分类体系,涵盖人际关系术语(例如ma、mom、mother)。随后,我们基于依存句法分析开发了高精度的人际关系披露检测器(F1分数:86%),用于识别词语是否指代发布者的个人关系(如"my mom"与"a mom"的区分)。这使我们能够在数据集中识别出40万+条实际披露发布者人际关系信息的推文。我们采用混合方法分析这些推文(例如比较关于家庭成员的喜悦表达程度),发现美国与印度推文在情感、话题和披露内容上存在差异。分析还揭示,需结合定性与定量方法才能揭示这些差异;仅使用单一方法可能导致误导。本研究拓展了多主体隐私领域的现有文献,并为文化敏感系统的研究人员与设计者提供了指导。