REVIEW: BIG DATA V’S MODELS, CHALLENGES, HADOOP ECOSYSTEM, ISSUES, USES, BENEFITS AND APPLICATIONS

: Big Data, encompassing colossal volumes of archival and consequential information, stands as the greatest invaluable treasure for any establishment, holding the potential to fortify business decisions grounded on factual evidence rather than mere perceptions. The most sophisticated method of defining big data is 56V's, provided in the paper. In this work, we have defined extensive data using the V technique, beginning with the 3V's and moving on to the latest definitions with the 5V's, 7V's, 10V's, and 14V's. Undoubtedly, forthcoming technological and corporate efficiency contests will amalgamate into extensive information exploration. The Hadoop cloud computing platform, which is open source and developed by the Apache Foundation, is also included in this study. It features a distributed system known as HDFS and MapReduce software programming platform. The leading big data technologies include Hadoop, Map Reduce, YARN, Hive, Flume, Apache Spark, and No SQL. The handling of massive data can significantly benefit from these technologies. This document also summarizes these tools' capabilities, advantages, and disadvantages. The concluding section of this research paper also delves into utilizing large-scale data in diverse sectors such as banking, finance, education, healthcare, and agriculture.


INTRODUCTION
The latest and most advanced concept of Big Data is currently in utilization; everyone likes to talk about it, and it frequently appears in the media, and businesses work to take advantage of the increased amount of information at their disposal.Big data denotes expansive and intricate data assemblages that pose challenges when scrutinizing them using conventional data processing methodologies.The world of massive information is a critical topic in the IT business.Nurturing the big data giant needs unique techniques, as traditional security and confidentiality protections must be improved in the face of complex distributed computing across varied data kinds.Each new type of data has its own set of challenges and mysteries.Countless scholarly endeavors have begun to untangle the complexity of big data, beginning with Doug Laney's seminal manuscript and over the last two decades.At its heart is the enormous issue of managing massive data volumes while assuring their safe passage through the vast expanse of the internet and arriving at their destination uninjured.The "3Vs," "5Vs," "7Vs," "14Vs", and "56Vs" are commonly used abbreviations for the features of extensive data and are as follows.
The first 3V interpretation of big data refers to these three fundamental elements depicted in Figure 2 Volume: The data is so large that it cannot be stored or processed on a single computer.Velocity: The data is generated and collected at a high rate in real-time.Variety: The data gathered are structured, semistructured, and unstructured.
Another aspect of Big data definition is to cover it with five Vs.According to a widely accepted definition, big data is distinguished by five features, 1: Large volumes of data which are being produced at a fast rate 2: a wide variety of data that cannot be stored in conventional relational databases 3: Data is processed at very high velocity 4: cost-effective value mining requires experienced data mining solutions; 5: data veracity might affect analysis accuracy, as shown in  (ZHANG Yaoxue1, 2017) With the rising growth of data, the scientist proposed the "7V" definition of big data.
IDC developed the four V's model in 2011 as big data technology advanced.Further developments have allowed scientists to access 10V's model of Big Data, is shown in The fundamental research on big data revolves around a set of 14V's, aiming to govern and leverage the vast amounts of data available effectively.Below, Table 1 shows the 14Vs and an explanation of each V. (Gayatri Kapil).

METHODOLOGY
This academic research will use an investigation framework with a pair of methods.The first technique will highlight critical concepts relevant to the fifty-six Big Data V's defining attributes.Meanwhile, the second way will involve thoroughly studying scholarly publications using a widely accepted systematic literature review methodology.This endeavour aims to identify the famous tools with their comparison and concrete benefits provided by the enormous world of big data in the magnitude with which it provides apparent benefits.

56-V's attributes:
The 3 Vs (Variety, Velocity, and Volume) are the first features of big data that many researchers pay close attention to, which prompts them to add a few additional Vs to the definition of big data.
Below, Table 2 describes the 56V characteristics of big data.(Hussein,2020) Challenges: Every opportunity brings its fair share of challenges.Big Data offers a multitude of enticing prospects, but it also presents numerous hurdles related to the gathering, storage, sharing, searchability, analysis, and visualization of such vast datasets.Unless we can surmount these obstacles, Big Data remains an untapped resource, akin to unmined gold.The current limitation lies in our need for more tools to explore the everincreasing magnitude of information effectively.An enduring problem in computer architecture has been the imbalance between CPU-intensive tasks and input/output performance.This disparity hampers the progress of discovering insights from big data.(C.L. Philip Chen, 2014) Following Moore's Law, CPU performance experiences a twofold increase approximately every 18 months, and the same holds for disk drive performance.However, in the past decade, the rotational speed of the disks has seen minimal growth.As a result, an imbalance exists where random input/output (I/O) speeds have seen slight enhancements while sequential I/O speeds have gradually improved alongside increased data density.Additionally, while the amount of information is growing exponentially, the pace at which information processing techniques develop could be faster.Modern methods and tools, especially those for real-time analysis, often need more optimal solutions in many significant Big Data applications.So, we have just now had the suitable instruments to utilize the gold ores best.(C.L. Philip Chen, 2014) Challenges encountered in BDA consist of data inconsistency, incompleteness, timeliness, data security and scalability.Data construction in a suitable manner is a prerequisite for practical data analysis.Each subprocess of data-driven applications brings forth distinct challenges.In the subsequent subsections, we will provide a concise overview of the difficulties encountered in each phase.

a)
Data Capture.
Big data can be sourced from various outlets, such as transactions, metadata, social media, sensors, and experiments.Collecting and consolidating data from diverse origins presents challenges regarding scalability and the sheer volume involved.In the future, organizations that can not only amass larger and higherquality datasets but also leverage them efficiently on a large scale may gain a competitive edge.The handling of data preprocessing, automated metadata creation, and data transfer are additional concerns linked to this subject matter.

c) Data Search
Given the requirement for data to be timely, dependable, and comprehensive to support informed decision-making, this necessity becomes crucial.Query optimization is vital in effectively addressing a wide range of complex analytic SQL queries.Demand for higher-level query languages (such as HiveQL, Pig Latin, and SCOPE) remains strong, even for the latest frameworks based on MapReduce and its derivatives.As data movement between nodes in parallel platforms incurs high costs, optimizing queries and refining physical designs remain essential infrastructure elements.(Chaiken R, 2008;Olston C, 2008;C. S, 2012;Thusoo A, 2010)

d) Data Sharing
In the current landscape, data sharing holds equal importance to data generation.While data is now being produced to allow for integration and sharing across different parts of an organization, professionals still generate and utilize information tailored to their business requirements.Challenges arise in data curation and safeguarding privacy when managing these aspects effectively.(D, 2012;E, 2012;Hampton SE, 2013;Zhang X, 2014)

e) Data Analysis
In the present Era, the ability to perform timely and cost-effective analytics on large-scale datasets is crucial for the success of numerous business, scientific, technical, and governmental endeavors.One solution to address these challenges lies in the appeal of system scalability, as exemplified via cloud computing.(Agrawal D, 2011;Begoli E, 2012 (3) Facilitating the ongoing discovery of patterns in data streams over the long term.In the emerging field of visual analytics, the aim is to present large datasets in visually appealing ways, enabling users to identify significant relationships.Creating multiple visualizations across diverse datasets is imperative for effective visual analytics.(Fisher D, 2012;J, 2014;Light RP, 2014;Shen Z, 2012) 1.
History of Hadoop Doug Cutting, the creator of Apache Lucene, a broadly used textual content exploration library, invented Hadoop.Hadoop's origins can be traced back to Apache Nutch, an open-source online exploration engine directly connected to the Lucene project.
The term 'Hadoop' is not an initialism but a fabricated designation.Doug Cutting, the project's initiator, elucidates the name's genesis: "It was bestowed upon a plush, yellow elephant by my child.
To conform my criteria for naming, it had to be succinct, relatively effortless to spell and articulate, devoid of intrinsic meaning, and unutilized elsewhere.Big Data: Hadoop Ecosystem.
A Hadoop cluster has the potential to encompass a multitude of nodes, rendering it intricate and arduous to operate manually.Consequently, diverse elements facilitate the complete Hadoop System setup, upkeep, and administration.
For specific individuals, Hadoop serves as a data governance system that amalgamates extensive volumes of organized and unstructured data, permeating nearly all strata of the established data infrastructure within an enterprise strategically positioned to assume a central role within a data center.Alternatively, it is perceived as a highly parallel execution framework that democratizes the potency of supercomputing, poised to energize the execution of enterprise applications.Some regard Hadoop as an open-source community that fabricates tools and software to tackle the challenges Big Data poses.Due to the extensive range of capabilities offered by Hadoop, adaptable to address diverse problems, it is widely recognized as a fundamental framework.(Lublinsky, Smith, & Yakubovich, 2013) Core Components of Hadoop Ecosystem: Hadoop's various components are configured using its configuration API.These components collaborate effectively to form a robust Hadoop ecosystem that can be applied to solve a wide range of real-world problems.The core components are shown in Figure 5. Map Reduce Much like HDFS, MapReduce is based on a master-slave model and works on a parallel processing framework.It encompasses a solitary master (Job Tracker) daemon paired with three enslaved people (Task Tracker), with each slave operating a daemon within a cluster.Data is processed parallel by Map Reduce using several algorithms.The process entails mapping the task and then reducing it.In order to facilitate parallel processing, this Job Tracker separates the dataset into several units known as tasks (Map tasks) and distributes them by default to three Data Nodes (Task Tracker).
In case of disruption, the Job Tracker is responsible for monitoring and rescheduling any tasks.If a task fails to show progress within a designated timeframe or a Data Node experiences a complete failure, all jobs, including incomplete one, are restarted on an alternative server.Additionally, if a task runs unusually slow, the Job Tracker restarts it on a different server to ensure the timely completion of the work according to the schedule.(speculative execution).

c) Zookeeper
Zookeeper serves as the distributed coordination service for Hadoop.Engineered to operate across a cluster of machines, it functions as a remarkably dependable service employed for the administration of Hadoop operations, upon which numerous Hadoop components rely.(Lublinsky et al., 2013)

d) Hive
Facebook pioneered the introduction of a Hadoop interface akin to SQL.Hadoop's Hive interface resembles SQL, allowing SQL users to create MapReduce jobs without requiring familiarity with MapReduce, using familiar SQL commands and a relational table structure.Hive treats all data elements as if they were tables, enabling the creation of table definitions based on data files.Furthermore, it organizes metadata of unstructured data into tables and converts inputs into MapReduce jobs.An essential element of intelligent urban centers is the efficient management of traffic flow, which aims to enhance the city's transportation systems, reduce commute times for residents, and optimize overall traffic patterns.As the population expands, challenges such as traffic congestion, environmental impact, and economic concerns arise.Consequently, smart cities implement advanced traffic signals and intelligent traffic management systems to address heavy traffic and congestion effectively.The ideal approach involves collecting data from all traffic signals across the city and leveraging this information to develop intelligent algorithms for making informed decisions, thereby providing the most optimal services for intelligent traffic management.(Aguilera G, 2013;Sepúlveda, 2021) c) Healthcare Enhanced well-being and profitable socioeconomic growth rely on advancements in healthcare.The healthcare industry generates substantial data that can empower physicians and healthcare practitioners to make more informed choices.Furthermore, leveraging massive data in healthcare can facilitate the development of real-time disease assessment, ultimately benefiting public health.To establish the correlation between data and environmental risks in public healthcare, comprehensive tracking, monitoring, storage, and analysis of mobile entities and their exposure to potentially harmful environmental factors are imperative.(Eiman Al Nuaimi, 2015)

d) Network Optimization
Utilizing BDA methodologies, it is possible to architect a mobile network that provides efficient and reliable services.Content-focused examination, traffic evaluation, and network signaling are vital aspects for ensuring optimal service delivery and superior performance.Network providers can establish a system for collecting, storing, and evaluating user or core network data to enhance signaling effectiveness, forecast traffic fluctuations, prevent congestion, intelligently optimize the network, automate network configuration, and foster intelligent communication.(Khatib, Barco, Muñoz, De La Bandera, & Serrano, 2016)

e) Educational Development
The field of education presents a plethora of data sources that can be leveraged for BDA-these information reservoirs aid in forecasting student performance and achievement.Additionally, using BDA in the educational domain is pivotal in overseeing curriculum content, constructing customized recommendation systems, and facilitating intelligent learning by applying text condensation and computational linguistics.In order to improve teaching and learning, data from massive open online courses (MOOCs) is also used to assist in identifying subject areas that are challenging for students and to support them.(Dobre, 2014) Big Data is beneficial for customizing educational procedures and raising academic performance.The ability to gather a wealth of data about students-present and past-allows educators to adjust their educational methods and choose the best tools.The results of generated data may influence the development of pedagogical design and teaching methodology.(Nweke, 2019) (Julio Ruiz-Palmero, 2020)

f) Banking Sector
The utilization of client data inevitably raises privacy concerns.BDA may expose private information by revealing hidden links between seemingly unrelated pieces of data.According to research, 62% of bankers cautiously handle big data due to privacy concerns.In addition, spreading consumer data across departments to produce deeper insights or outsourcing data analysis tasks also increase security threats.
For instance, a prominent UK bank's recent security lapse exposed a database containing thousands of customer files.Even though this bank immediately began an inquiry, sensitive documents were found.Such involves misappropriating consumers' wages, savings, mortgages, and insurance policies.(Muhammad Ali Raza, 2023) g) Finance Big data carries noticeable ramifications for the financial sector.The essential requirement in the realm of finance revolves around the manipulation of amassed data.The insights extracted from the unprocessed data primarily inform the decision-making process.Given the significance of massive financial data, structured business intelligence is perceived as an advancement.The enterprise leverages organized data to propel decisionmaking forward, as unrefined and unanalyzed data holds no value to the organization.
Accountants employ diverse methodologies to derive meaningful insights from the data at hand.Harnessing data analysis for pertinent decision-making, efficiency gains, and innovation brings numerous advantages.All business operations can be effectively supported by integrating data with robust analytics.
The convergence of corporate data and burgeoning big data has occurred in finance.This enables the seamless integration of ERP systems with unstructured data repositories.(Kuchipudi Sravanthi, 2015) Conclusion: A forthcoming scientific revolution is on the horizon as we enter the realm of big data, which represents the next frontier for innovation, competition, and efficiency.We eagerly anticipate the forthcoming technological upheaval.This manuscript presents a comprehensive definition of big data utilizing the "V" approach, aiming to enhance scholars' understanding of this concept.Moreover, this article delves into various challenges and issues associated with big data.To fully harness the potential of big data, it is imperative to foster fundamental research into these technical obstacles.The Hadoop Framework and its diverse components are also examined in this publication.HDFS, designed for standard hardware, stands as a pivotal element within big data.
Furthermore, this study elucidates the various domains where big data finds application.Big data ushers in many novel opportunities and exerts far-reaching impacts.Presently, organizations possess many alternatives that enable the formulation of innovative propositions.Businesses can now leverage big data strategies to attain novel and enhanced outcomes.

Figure
Figure 1: 5Vs of Big Data Figure 2: 3Vs of Big Data

Figure
Figure 1: 5Vs of Big Data Figure 2: 3Vs of Big Data

Figure
Figure 1: 5Vs of Big Data Figure 2: 3Vs of Big Data necessitates ample storage capacity and innovative data administration approaches on extensive distributed systems, as traditional database systems need help to cope with obstacles posed by Big Data.MapReduce facilitates automatic parallelization and scalable data distribution across multiple machines.Apache Hadoop, an open-source software, stands out as the preferred implementation.(GantzJ, 2012;Schadt E, 2010;V, 2013) Children excel at generating such appellations.'Googol' is another term conceived by a child."(White, 2012) Hadoop evolved from Nutch, an open-source crawler-driven search engine on a distributed system.Google released the Google MapReduce and GFS papers in 2003-2004.MapReduce was implemented in the Nutch framework.Doug Cutting and Mike Cafarella established Hadoop.When Doug Cutting joined Yahoo, a new initiative based on the same concepts as Nutch was formed, which became known as Hadoop, while Nutch remained a distinct sub-project.Several versions occurred then, and additional sub-projects began integrating with Hadoop, building the Hadoop ecosystem.(Achari, 2015) 2.

Figure
Figure 5: Core components of Hadoop a) HDFS Hadoop employs a distributed file system and disperses extensive files across numerous local storageequipped Data Nodes within the cluster.While processing, the Name Node partitions the original file, turns into blocks, typically sized at 64 MB, and replicates the blocks into various Data Nodes based on a predefined protocol.The Name Node consistently updates the

Figure 5 :
Figure 5: Core components of Hadoop, Hadoop's ecosystem is progressively expanding to offer advanced capabilities and additional components.Features, Strengths and Weaknesses of the core components of Hadoop and the newly introduced elements are described in Table 3. (Lublinsky et al., 2013)Big Data: Issues, Benefits and Uses: Big data encompasses various issues and considerations that require careful attention and resolution.However, it is equally important to acknowledge the substantial uses, benefits, issues and widespread applications that big data offers across diverse domains and industries, which are presented in Table4 field, big data states the use of all current technologies and data analysis as a building block for decision-making based on data.Massive Data has been utilized to advance several facets of agriculture, including remote sensing, decision-making by farmers, insurance and financing for farmers, knowledge of the change in climate, fields, studies of animals, crops and soil, and food availability.(H.B. U.Haq, 2020)

Table 2 : Big Data 56 Characteristics
(Laney, 2012)Data has the potential to be so greedy that it might influence, control, and even have the capacity to consume itself.(Gewirtz,2016;GoodStratTweet., 2015)23 Visible Relevant information must exist and be made apparent to the targeted person at the appropriate time.(Laney,2012)24VisualWe currently live in a world where people view, exchange, and share photos and videos online, whether they are of themselves, their products, or the weather.(Laney,2012) ' very existence is vital evidence that they value it; this is seen from the interaction of some established methods of data trading with subscribers.(Dhamodharavadhani,2018;Gewirtz,2016;GoodStratTweet., 2015)21 Vanity Data that is vain, content with the impact it has on other people.(Gewirtz,2016;GoodStratTweet., 2015)new technology.(Sivarajah, 2017) 39 Verdict People impacted by the model's choice.(Sivarajah, 2017) 40 Vet Putting the assumptions to the test with facts.(Sivarajah, 2017) 41 Vane Unclear decision-making process.(Hussein, 2020; Sivarajah, 2017) 42 Vanilla If used carefully, simple techniques can be beneficial.(Sivarajah, 2017) 43 Victual Big Data is the data science's fuel.(Sivarajah, The characteristics of Big Data pose challenges when it comes to visualizing the data.Visual interfaces are suitable for the given purposes: (1) Exploring data at different scales in conjunction with statistical analysis, as stated by Fisher et al., (2) Preserving context by representing data in the form of a smaller subset of a larger dataset, displaying correlated variables, and more.