KeywordsSearchable encryption,Semantic-based keyword search I. INTRODUCTION In cloud computing, an increasing number of personal or enterprise users outsource their data to cloud storage to enjoy the benefits of pay-on-demand services and high computation performance. To preserve privacy, users opt to encrypt data before outsourcing.
Thus, the traditional keyword search cannot be directly executed on the encrypted data, which limits the utilization of data. semantic-based keyword search not only is convenient for users but also exactly expresses users intentions. Specifically, in some circumstances, users might not be familiar with the encrypted documents stored in cloud storage or might only want the semantically related results therefore, the search keywords are usually semantically related to the document rather than via an exact or fuzzy match. For example, the predefined keyword of a document is cloud-based storage, and the keyword that a user searches is distributed storage. Obviously, these two words are neither an exact nor a fuzzy match, but they are semantically related. Hence, the semantic-based keyword search is of practical importance and has attracted much attention. However, the existing approaches must rely on a predefined global dictionary whose quality greatly influences the accuracy of the search result.
Moreover, when the dataset is outsourced to the cloud, update operations that include inserting new documents and modifying and deleting existing documents are frequent. Because the predefined dictionary is constructed based on all documents in the dataset, the update of a single document can cause the reconstruction of the dictionary and even all document indexes, which is inefficient. the semantic similarity between keywords in a query and keywords of documents is important because it also determines the accuracy of search results. However, in the aforementioned approaches, the semantic information used to measure the semantic similarity was mined from some knowledge bases (KBs) (such as corpus and thesaurus) containing noise data, which cause the semantic similarity to be inaccurate. Compared with other KBs, ontology has good support for logic reasoning and can structurally express the semantic information of concepts.
Several ontology-based approaches have been proposed to assess the similarity between concepts through mining ontology information from different aspects. each document usually contains more than onekeyword, the index of a document is associated with multiple keyword vectors. Locality-Sensitive Hashing (LSH) function is able to hash similar items to the same bucket with high probability. Hence, we construct the document index by using LSH to map multiple keyword vectors into only one vector.
The frequency of the keyword in the document is also considered and is inserted into the index vector as the value of corresponding element. A semantic-based keyword search scheme returns results according to the semantic relatedness between documents and a query. II. History Background Cloud data owners prefer to outsource documents in an encrypted form for the purpose of privacy preserving. Therefore it is essential to develop efficient and reliable ciphertext search techniques. One challenge is that the relationship between documents will be normally concealed in the process of encryption, which will lead to significant search accuracy performance degradation.
Also the volume of data in data centers has experienced a dramatic growth. This will make it even more challenging to design ciphertext search schemes that can provide efficient and reliable online information retrieval on large volume of encrypted data. A traditional way to reduce information leakage is data encryption. However, this will make server-side data utilization, such as searching on encrypted data, become a very challenging task. In the recent years, researchers have proposed many ciphertext search schemes by incorporating the cryptography techniques. These methods have been proven with provable security, but their methods need massive operations and have high time complexity. Therefore, former methods are not suitable for the big data scenario where data volume is very big and applications require online data processing1.
Enabling keyword search directly over encrypted data is a desirable technique for effective utilization of encrypted data outsourced to the cloud. Existing solutions provide multikeyword exact search that does not tolerate keyword spelling error, or single keyword fuzzy search that tolerates typos to certain extent. The current fuzzy search schemes rely on building an expanded index that covers possible keyword misspelling, which lead to significantly larger index file size and higher search complexity. In cloud computing, scalable and elastic storage and computation resources are provisioned as measured services through the Internet. Outsourcing data services to the cloud allows organizations to enjoy not only monetary savings, but also simplified local IT management since cloud infrastructures are physically hosted and maintained by the cloud providers. To minimize the risk of data leakage to the cloud service providers, data owners opt to encrypt their sensitive data, e.g., health records, financial transactions, before outsourcing to the cloud, while retaining the decryption keys to themselves and other authorized users.
This in turn renders data utilization a challenging problem2. Due to the appealing features of cloud computing, large amount of data have been stored in the cloud. Although cloud based services offer many advantages, privacy and security of the sensitive data is a big concern. To mitigate the concerns, it is desirable to outsource sensitive data in encrypted form. Encrypted storage protects the data against illegal access, but it complicates some basic, yet important functionality such as the search on the data.
To achieve search over encrypted data without compromising the privacy, the literature. However, almost all of them handle exact query matching but not similarity matching a crucial requirement for real world applications. Although some sophisticated secure multi-party computation based cryptographic techniques are available for similarity tests, they are computationally intensive and do not scale for large data sources3. As the data produced by individuals and enterprises that need to be stored and utilized are rapidly increasing, data owners are motivated to outsource their local complex data management systems into the cloud for its great flexibility and economic savings. However, as sensitive cloud data may have to be encrypted before outsourcing, which obsoletes the traditional data utilization service based on plaintext keyword search, how to enable privacy-assured utilization mechanisms for outsourced cloud data is thus of paramount importance. Considering the large number of on-demand data users and huge amount of outsourced data files in cloud, the problem is particularly challenging, as it is extremely difficult to meet also the practical requirements of performance, system usability. aside from eliminating the local storage management, storing data into the cloud serves no purpose unless they can be easily searched and utilized. Thus, exploring privacy-assured and effective search service over encrypted cloud data is of paramount importance4.
Data owners are motivated to outsource their complex data management systems from local sites to the commercial public cloud for great flexibility and economic savings. But for protecting data privacy, sensitive data have to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. Thus, enabling an encrypted cloud data search service is of paramount importance. Considering the large number of data users and documents in the cloud, it is necessary to allow multiple keywords in the search request and return documents in the order of their relevance to these keywords. Related works on searchable encryption focus on single keyword search or Boolean keyword search, and rarely sort the search results.
to meet the effective data retrieval need, the large amount of documents demand the cloud server to perform result relevance ranking, instead of returning undifferentiated results. Such ranked search system enables data users to find the most relevant information quickly, rather than burdensomely sorting through every match in the content collection. Ranked search can also elegantly eliminate unnecessary network traffic by sending back only the most relevant data, which is highly desirable in the pay-as-you-use cloud paradigm5. Cloud computing infrastructure is a promising new technology and greatly accelerates the development of large scale data storage, processing and distribution. However, security and privacy become major concerns when data owners outsource their private data onto public cloud servers that are not within their trusted management domains. To avoid information leakage, sensitive data have to be encrypted before uploading onto the cloud servers, which makes it a big challenge to support efficient keywordbased queries and rank the matching results on the encrypted data. Most current works only consider single keyword queries without appropriate ranking schemes.
In the current multi-keyword ranked search approach, the keyword dictionary is static and cannot be extended easily when the number of keywords increases. Cloud computing infrastructure provides a flexible and economic strategy for data management and resource sharing. It can reduce hardware, software costs and system maintenance overheads. It can also offer a convenient communication channel to share resources across data owners and data consumers. With the popularity of cloud services, such as Amazon Web Services6. SYSTEM ARCHITECTURE Fig 1 The model of keyword search The index of document and the query are represented with the VSM. A document index is denoted by a vector generated with the keywords of the document, and a secure index is the encrypted index. Similarly, a query is a vector generated with the keywords of a search, and a trapdoor is an encrypted query.
In general, the document can be encrypted by traditional encryption schemes such as AES. Keygen- It is executed by the data owner or a trusted authority (TA). Taking a security parameter d as input, it outputs a system symmetric key.
BuildIndex- It is executed by the data owner. Based on the symmetric key sk and the document D, the algorithm generates the secure index. Trapdoor-This is executed by the data owner. With the keyword set Q that the user wants to search, the algorithm generates the corresponding trapdoor.
Search- It is executed by the cloud server. Based on the trapdoor and each secure index stored in the server, the cloud server calculates the correlation coefficients between the query Q and each document D and returns the ranked correlation coefficients to the user. The data owner publishes the encrypted documents and secure indexes to the cloud server. To reduce the computation burden, the data owner is allowed to outsource the generation of the trapdoor to TA by giving the private key to it. In this case, when a user wants to search over encrypted documents, he/she submits the keywords to TA which generates the corresponding trapdoor and returns it to the user.
Then, the user sends the trapdoor to the cloud server. Finally, the cloud server executes the search algorithm with the trapdoor on all secure indexes and returns the relevant documents to the user. TA is usually an internal server.
If there is no such trustable server in the system, the trapdoor can be generated by the data owner. Also, some existing schemes assume that the authorization between the data owner and users is appropriately done. Locality-Sensitive Hashing function-The Locality-sensitive hashing (LSH) function hashes input items so that similar items map to the same bucket with high probability. It is able to reduce the dimensionality of high-dimensional data. Secure k-Nearest Neighbor encryption- It is one of the fundamental query operations in some applications. The goal of SkNN is to securely identify the k-nearest points in the encrypted database to a given encrypted query, without allowing the data server to obtain the content of the database or the query. Compound Concept Semantic Similarity Calculation Method- Depending upon their semantic constituents, the compound concepts can be divided into two types endocentric structure and exocentric structure.
The endocentric structure is true when one or more constituents of a compound can play the central role and serve as a definable subject heading. Semantic Similarity Calculation- To measure the semantic similarity of compound concepts, we propose a novel approach that considers the concept constituent features and several other factors influencing similarity. Semantic Compound Ketword Based Search Scheme- The topic set in a field are used to construct the semantic vector for each keyword. More specifically, in the keyword vector, each element corresponds to a field topic, and its value is the similarity between the topic and the keyword, which is obtained using CCSS. Because the topics are almost invariable, the dimensionality of the keyword vector will not change with the adding or deleting of the keywords or documents, which is helpful in supporting data update. Security Enhanced SCKS- The adversary is allowed to submit queries adaptively, i.
e., submitting the next query after receiving the outcomes of previous queries. Thus, the adversary can decide the next query depending upon the previous outcomes. conclusion To accurately extract the semantic information of keywords, we first propose an ontology-based compound concept semantic similarity calculation method (CCSS), which greatly improves the accuracy of similarity measurement between compound concepts by comprehensively considering the compound features and a variety of information sources in ontology. low overhead on computation and that the search accuracy outperforms the existing schemes.
References C. Chen, X. Zhu, P. Shen, J.
Hu, S. Guo, Z. Tari, and A.
Y. Zomaya, An efficient privacy-preserving ranked keyword search method,IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 4,pp. 951963, 2016. B. Wang, S. Yu, W.
Lou, and Y. T. Hou, Privacy-preserving multi-keyword fuzzy search over encrypted data in the cloud, in IEEE International Conference on Computer Communications, 2014, pp. 21122120. M. Kuzu, M. S.
Islam, and M. Kantarcioglu, Efficient similarity search over encrypted data, in IEEE 28th International Conference on Data Engineering (ICDE), 2012, pp. 11561167. C. Wang, K. Ren, S. Yu, and K. M.
R. Urs, Achieving usable and privacy-assured similarity search over outsourced cloud data, in 2012 Proceedings of IEEE INFOCOM, 2012, pp. 451459. N. Cao, C. Wang, M. Li, K.
Ren, and W. Lou, Privacy-preserving multi-keyword ranked search over encrypted cloud data, IEEE Transactions on Parallel and Distributed Systems, vol. 25, no.
1, pp. 222233, 2014. R. Li, Z.
Xu, W. Kang, K. C. Yow, and C.
Z. Xu, Efficient multikeyword ranked query over encrypted data in cloud computing, Future Generation Computer Systems, vol. 30, no. 1, pp.
179190,2014. IJETT ISSN 2350 0808 April 2015 Volume 2 Issue 1 Author1, Author 2, (Times New Roman 11) Author 3 1st Authors email id1, 2nd Authors emailid2, 3rd Authors emailid3 Department name (Times New Roman 10). Institute name (Times New Roman 10). Semantic-based Compound Keyword Search over Encrypted Cloud Data wwoSxzsxxwuqIs_0 [email protected]_QEu [email protected]@3 [email protected]
j)wwUUZoEHjUul5x(If)PlL2R(tp /r_7gO6K).jQvMZYW.eK uffnPd0P,cSBHiL_-uC66SQS8y)VEg4XNZKFQ_1mKQbhHQg1zukshRUxpV 8.BdBMwxyEMI([email protected]/rj/ o83NHzG-j4Q/pv,rA 7)fhmYoUq7eUxvnUavA,[email protected] 9kQA-WB.ebidqSNtZ93go D xk1I([email protected] VJ1cX,[email protected] z4lZ6KFmM/KMhz/@Xz6W- [email protected]/[email protected]@@RmX NSU9zM6sReI4jmEx8P .
1US2SKJ(Z7kmpQVC) p3WNiR/[email protected] sC.LZJTCsF7zX-. P Hhu3zWJ db7zDOo PkmM Av AP_(IOl(l/OCRQP9,AZP-Mu UnY O31vXX(YjvU75fHf5_E/[email protected] [email protected] mJ(q82L FhIxYYUiF) Z50N @[email protected] hUks @7xM48E ,[email protected]@5 [email protected],7FEv(@,X_Wz ts(@ey([email protected]@[email protected]@[email protected]@[email protected]@5RmbOQwKh0 a60w,o.WsqqyfsI1Tvn6W GFM(FkNKsZvpKOSGo)2 LTu6I32a0epyQU3 Mm7BrIpTKq1KgRH6O1ZPt fYIi6Aby1M,O Cd b jVW,(v o_ cV8PH8( x )SVqLInin IxH 4Q o 4sLc) RG-8Pbq aR B(EPK 3LL 4f MXR0 -x/ .RKgFiTXF3 X WkdIR h66xtPW.FI8w_ guGk IaRnxrO(eerG5.z mbF8kom)4yR gzs_ vFFO O wamIoG(9 pv Y, dXiJ(x(I_TS1EZBmU/xYy5g/GMGeD3Vqq8K)fw9xrxwrTZaGy8IjbRcXIu3KGnD1NIBsRuKV.ELM2fiVvlu8zH(W uV4(Tn7_m-UBww_8(/0hFL)7iAs),Qg20ppf DU4pMDBJlC52FhsFYn3E6945Z5k8Fmw-dznZxJZp/P,)KQk5qpN8KGbe Sd17 paSR 6Q