New software and updates are downloaded by end users every day. Each dowloaded software has associated with it an End Users License Agreements (EULA), but this is rarely read. An EULA includes information to avoid legal repercussions. However,this proposes a host of potential problems such as spyware or producing an unwanted affect in the target system. End users do not read these EULA's because of length of the document and users find it extremely difficult to understand. Text summarization is one of the relevant solution to these kind of problems. This require a solution which can summarize the EULA and classify the EULA as "Benign" or "Malicious". We propose a solution in which we have summarize the EULA and classify the EULA as "Benign" or "Malicious". We extract EULA text of different sofware's then we classify the text using eight different supervised classifiers. we use ensemble learning to classify the EULA as benign or malicious using five different text summarization methods. An accuracy of $95.8$\% shows the effectiveness of the presented approach.
翻译:每天,最终用户都会下载新软件和更新。每个下载的软件都附带一份最终用户许可协议(EULA),但很少有人阅读。EULA包含避免法律诉讼的信息。然而,这也带来了许多潜在问题,例如间谍软件或在目标系统中产生不良影响。由于文档篇幅较长且用户难以理解,最终用户通常不会阅读这些EULA。文本摘要是解决此类问题的相关方案之一。这需要一个能够摘要EULA并将其分类为“良性”或“恶意”的解决方案。我们提出了一种方案,对EULA进行摘要并分类为“良性”或“恶意”。我们提取不同软件的EULA文本,然后使用八种不同的监督分类器对文本进行分类。我们采用集成学习方法,结合五种不同的文本摘要方法,将EULA分类为良性或恶意。$95.8\%$的准确率验证了所提出方法的有效性。