Thoughtfully designing services and rigorously testing software to support personal information management (PIM) requires understanding the relevant collections, but relatively little is known about what people keep in their file collections, especially personal collections. Complementing recent work on the structure of 348 file collections, we examine those collections' contents, how much content is duplicated, and how collections used for personal matters differ from those used for study and work. Though all collections contain many images, some intuitively common file types are surprisingly scarce. Personal collections contain more audio than others, knowledge workers' collections contain more text documents but far fewer folders, and IT collections exhibit unusual traits. Collection duplication is correlated to collections' structural traits, but surprisingly, not to collection age. We discuss our findings in light of prior works and provide implications for various kinds of information research.
翻译:为精心设计服务并严格测试软件以支持个人信息管理(PIM),需要理解相关的文件收藏,但人们对文件收藏中保存的内容——尤其是个人收藏——知之甚少。作为对近期关于348个文件收藏结构研究的补充,我们考察了这些收藏的内容、内容的重复程度,以及用于个人事务的收藏与用于学习和工作的收藏之间的差异。尽管所有收藏都包含大量图像,但一些直觉上常见的文件类型却出人意料地稀少。个人收藏中的音频文件多于其他收藏,知识工作者的收藏包含更多文本文档但文件夹数量远少于其他收藏,而信息技术(IT)收藏则展现出异常特征。收藏内容的重复程度与收藏的结构特征相关,但令人惊讶的是,它并不与收藏的年龄相关。我们结合先前研究讨论了这些发现,并为各类信息研究提供了启示。