S&T | Privacy and Security concerns in Big Data
Privacy and security in terms of big data is an important issue. Big data security model is not suggested in the event of complex applications due to which it gets disabled by default. However, in its absence, data can always be compromised easily. As such, this section focuses on the privacy and security issues.
Privacy – Information privacy is the privilege to have some control over how the personal information is collected and used. Information privacy is the capacity of an individual or group to stop information about themselves from becoming known to people other than those they give the information to. One serious user privacy issue is the identification of personal information during transmission over the Internet .
Security – Security is the practice of defending information and information assets through the use of technology, processes, and training from-Unauthorized access, Disclosure, Disruption, Modification, Inspection, Recording, and Destruction.
Privacy vs. security – Data privacy is focused on the use and governance of individual data—things like setting up policies in place to ensure that consumers’ personal information is being collected, shared and utilized in appropriate ways. Security concentrates more on protecting data from malicious attacks and the misuse of stolen data for profit . While security is fundamental for protecting data, it’s not sufficient for addressing privacy. Table 1 focuses on the additional difference between privacy and security.
Privacy requirements in the Big data
Big data analytics draw in various organizations; a hefty portion of them decide not to utilize these services because of the absence of standard security and privacy protection tools. These sections analyze possible strategies to upgrade big data platforms with the help of privacy protection capabilities. The foundations and development strategies of a framework that supports:
1. The specification of privacy policies managing the access to data stored into target big data platforms,
2. The generation of productive enforcement monitors for these policies, and
3. The integration of the generated monitors into the target analytics platforms. Enforcement techniques proposed for traditional DBMSs appear inadequate for the big data context due to the strict execution necessities needed to handle large data volumes, the heterogeneity of the data, and the speed at which data must be analyzed.
Businesses and government agencies are generating and continuously collecting large amounts of data. The current increased focus on substantial sums of data will undoubtedly create opportunities and avenues to understand the processing of such data over numerous varying domains. But, the potential of big data come with a price; the users’ privacy is frequently at danger. Ensures conformance to privacy terms and regulations are constrained in current big data analytics and mining practices. Developers should be able to verify that their applications conform to privacy agreements and that sensitive information is kept private regardless of changes in the applications and/or privacy regulations. To address these challenges, identify a need for new contributions in the areas of formal methods and testing procedures. New paradigms for privacy conformance testing to the four areas of the ETL (Extract, Transform, and Load) process as shown in below figure. [3, 4]
Figure Attribute: Big data architecture and testing area new paradigms for privacy conformance testing to the four areas of the ETL (Extract, Transform, and Load) processes are shown above.
1. Pre‐hadoop process validation This step does the representation of the data loading process. At this step, the privacy specifications characterize the sensitive pieces of data that can uniquely identify a user or an entity. Privacy terms can likewise indicate which pieces of data can be stored and for how long. At this step, schema restrictions can take place as well.
2. Map‐reduce process validation This process changes big data assets to effectively react to a query. Privacy terms can tell the minimum number of returned records required to cover individual values, in addition to constraints on data sharing between various processes.
3. ETL process validation Similar to step (2), warehousing rationale should be confirmed at this step for compliance with privacy terms. Some data values may be aggregated anonymously or excluded in the warehouse if that indicates a high probability of identifying individuals.
4. Reports testing reports are another form of questions, conceivably with higher visibility and wider audience. Privacy terms that characterize ‘purpose’ are fundamental to check that sensitive data is not reported with the exception of specified uses.
Big data privacy in data generation phase
Data generation can be classified into active data generation and passive data generation. By active data generation, we mean that the data owner will give the data to a third party , while passive data generation refers to the circumstances that the data are produced by data owner’s online actions (e.g., browsing) and the data owner may not know about that the data are being gathered by a third party. Minimization of the risk of privacy violation amid data generation by either restricting the access or by falsifying data.
1. Access restriction If the data owner thinks that the data may uncover sensitive information which is not supposed to be shared, it refuses to provide such data. If the data owner is giving the data passively, a few measures could be taken to ensure privacy, such as anti-tracking extensions, advertisement or script blockers, and encryption tools.
2. Falsifying data In some circumstances, it is unrealistic to counteract access of sensitive data. In that case, data can be distorted using certain tools prior to the data gotten by some third party. If the data are distorted, the true information cannot be easily revealed. The following techniques are utilized by the data owner to falsify the data:
A tool Sockpuppet is utilized to hide the online identity of individual by deception. By utilizing multiple Sockpuppets, the data belonging to one specific individual will be regarded as having a place with various people. In that way, the data collector will not have enough knowledge to relate different sockpuppets to one individual.
Certain security tools can be used to mask individual’s identity, such as Mask Me. This is especially useful when the data owner needs to give the credit card details amid online shopping.
Big data privacy in data storage phase
Storing high volume data is not a major challenge due to the advancement in data storage technologies, for example, the boom in cloud computing . If the big data storage system is compromised, it can be exceptionally destructive as individuals’ personal information can be disclosed . In distributed environment, an application may need several datasets from various data centers and therefore confront the challenge of privacy protection.
The conventional security mechanisms to protect data can be divided into four categories. They are file level data security schemes, database level data security schemes, media level security schemes and application level encryption schemes . Responding to the 3V’s nature of the big data analytics, the storage infrastructure ought to be scalable. It should have the ability to be configured dynamically to accommodate various applications. One promising technology to address these requirements is storage virtualization, empowered by the emerging cloud computing paradigm . Storage virtualization is processed in which numerous network storage devices are combined into what gives off an impression of being a single storage device. SecCloud is one of the models for data security in the cloud that jointly considers both of data storage security and computation auditing security in the cloud . Therefore, there is a limited discussion in case of privacy of data when stored in the cloud.
Approaches to privacy preservation storage on cloud
When data are stored on cloud, data security predominantly has three dimensions, confidentiality, integrity and availability . The first two are directly related to privacy of the data i.e., if data confidentiality or integrity is breached it will have a direct effect on users privacy. Availability of information refers to ensuring that authorized parties are able to access the information when needed. A basic requirement for the big data storage system is to protect the privacy of an individual. There are some existing mechanisms to fulfill that requirement. For example, a sender can encrypt his data using public key encryption (PKE) in a manner that only the valid recipient can decrypt the data. The approaches to safeguard the privacy of the user when data are stored in the cloud are as follows :
Attribute-based encryption – Access control is based on the identity of a user complete access to all resources.
Homomorphic encryption – Can be deployed in IBE or ABE scheme settings updating ciphertext receiver is possible.
Storage path encryption – It secures storage of big data on clouds.
Usage of Hybrid clouds – Hybrid cloud is a cloud computing environment which utilizes a blend of on-premises, private cloud and third-party, public cloud services with organization between the two platforms.
Integrity verification of big data storage
At the point when cloud computing is used for big data storage, data owner loses control over data. The outsourced data are at risk as cloud server may not be completely trusted. The data owner should be firmly convinced that the cloud is storing data properly according to the service level contract. To ensure privacy to the cloud user is to provide the system with the mechanism to allow data owner verify that his data stored on the cloud is intact [13, 14]. The integrity of data storage in traditional systems can be verified through a number of ways i.e., Reed-Solomon code, checksums, trapdoor hash functions, message authentication code (MAC), and digital signatures etc. Therefore data integrity verification is of critical importance. It compares different integrity verification schemes discussed [13, 15]. To verify the integrity of the data stored on the cloud, straight forward approach is to retrieve all the data from the cloud. To verify the integrity of data without having to retrieve the data from cloud [14, 15]. In integrity verification scheme, the cloud server can only provide the substantial evidence of the integrity of data when all the data are intact. It is highly prescribed that the integrity verification should be conducted regularly to provide the highest level of data protection .
Big data privacy preserving in data processing
Big data processing paradigm categorizes systems into the batch, stream, graph, and machine learning processing [16, 17]. For privacy protection in data processing part, the division can be done in two phases. In the first phase, the goal is to safeguard information from unsolicited disclosure since the collected data might contain sensitive information of the data owner. In the second phase, the aim is to extract meaningful information from the data without violating the privacy.
- November 2020(55)
- October 2020(79)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(26)
- September 2019(24)
- August 2019(15)
- July 2019(24)
- June 2019(55)
- May 2019(82)
- April 2019(77)
- March 2019(71)
- February 2019(67)
- January 2019(77)
- December 2018(46)
- November 2018(48)
- October 2018(76)
- September 2018(55)
- August 2018(63)
- July 2018(74)
- June 2018(64)
- May 2018(65)
- April 2018(76)
- March 2018(82)
- February 2018(65)
- January 2018(80)
- December 2017(71)
- November 2017(72)
- October 2017(75)
- September 2017(65)
- August 2017(97)
- July 2017(111)
- June 2017(87)
- May 2017(105)
- April 2017(113)
- March 2017(108)
- February 2017(112)
- January 2017(109)
- December 2016(110)
- November 2016(121)
- October 2016(111)
- September 2016(123)
- August 2016(169)
- July 2016(142)
- June 2016(152)
- May 2016(118)
- April 2016(60)
- March 2016(86)
- February 2016(154)
- January 2016(3)
- December 2015(150)