Big Data analytics on cloud: comparative assessment

Today businesses are facing too many changes rapidly, even if these changes are tiny, they are creating significant problems for the organizations. Cloud computing facilitates companies to forget about maintaining an infrastructure. There are lots of benefits one organization can get by adopting cloud computing for example: Reduced IT Cost, Scalability, Business Continuity, Collaboration efficiency and much more. Today responsiveness and flexibility are the two key words for any business to stay in competition in the market and become the leader which can be achieved by moving the business to the cloud.

The Three Big Giants

Whenever we talk about the Cloud, the biggest name comes into the picture: Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform. Almost 82% of cloud business is getting handled on these three clouds. The ferocious rivalry has enabled low cloud computing prices and excellent competing feature-set. There are a lot of other cloud providers like IBM, RedHat, Rackspace, SAP, VMware, etc. which are gaining popularity. For this blog, we will restrict our discussion around Amazon, Google and Microsoft cloud only.

The rise of Big Data and Analytics in the cloud

The Exploration of data warehouses, web pages, audio and video streams, tweets, and blogs is engendering enormous amount of complex and persistent digital data. Managing and gaining insights from the produced data is a challenge and key to competitive advantage. Each cloud provider is trying to make their platform the most feature-rich of them all. Analysis and Management of Big Data will turn out to be the next front line of innovation, competition, and productivity for the clouds. To choose the right cloud platform for big data and analytics perspective one requires reviewing the services availability, Database model, scalability, flexibility and security offerings.

Big data and analytics is the hottest topic in the cloud environment. Customers can think of choosing cloud for big data analytics in situations when data and/or application is already in the cloud or to improve analytics capabilities into an existing cloud architecture. Cloud can also be a driver when application doesn’t fit in your on-premises big data setup, high-volume external data sources that need significant preprocessing, a short-turnaround and for a short-term data science project that requires an exploratory data mart (aka sandbox).

How Google, AWS and Azure cloud stack up in Big Data Analytics

The surge in the size of the data creating a lot of challenges to the cloud. But to achieve business goals organizations needs to understand how these big players can fulfill their requirements. If we talk about Google cloud platform, it emerges as one of the future leaders in the field of big data analytics. The main reason behind this is the services like Google BigQuery, Google Cloud Datalab, Google Cloud Dataproc, and Google Cloud Dataflow which are used to analyze the process, and get insights from the massive amount of data in a very short time. AWS provides real time stream data analytics through Kinesis Streams which can process thousands of data streams on a per-second basis while Azure provides Stream Analytics, Data Lake Analytics, Data Lake Store for real time data analytics and storage. In AWS, machines are accessible individually while in AZURE machines grouped into “cloud service” and respond to the same domain name but different ports. AWSs’ EBS storage is sufficiently fast for big data, but in AZURE, standard storage has difficulties for big data (Premium account is required). AZURE is very strong in the machine learning space, offering pre-trained models through custom R models running over big data, and is the only provider to offer the capability for organizations to track.

To process data pipelines, Google started a new service Cloud Dataflow while AWS and Azure both uses a declarative model that distributes processing work to other services such as Hadoop while the cloud dataflow is a fully programmable framework, available with Java and Python languages. Apache Beam uses cloud dataflow and spark for pipeline processing.

Price Comparison: Table below shows monthly cost with standard pricing and with preemptible discounts when running for 5hours daily.

Configuration

Microsoft Azure

Amazon Web Services

Google Cloud Platform

Data 50 TB 50 TB 50 TB
Instance D3v2 m3.xlarge n1-standard-4
Head Node 2 1 1
Worker Node 20 20 20
RAM 14 15 15
Disk Size 200 80 80
Standard Pricing $3,261.33 $2,613.27 $2,154.40
Short Term Discount -na- $1,936.17 $1,934.43

The table above based on information available on Google Cloud Platform website.

Below is the table to show different services offered by AWS, AZURE, and Google in Big Data and Analytics

data-ch

If it is about the real-time data analysis and processing, Google is having slightly upper hand over AWS and AZURE because of the services offered by google like Google BigQuery, Google Cloud Datalab. Google Cloud Dataproc is the quickest among the three as it takes only 90 seconds to start and scale Cloud Dataproc cluster. Another way to look at the capabilities of Google Cloud Platform is Google’s most popular products like Search Engine, YouTube, Gmail and much more are already getting managed on Google Cloud Platform. AWS is having all the products and services making it one of the strongest contenders for the crown but also having some minor problems like pre-trained machine learning models. And AZURE with having a large set of products and services also include a product like Data Lake; a serverless analytical product which is creating equal opportunity for Azure. As we can see from the price comparison above Google Cloud Platform is the most cost effective platform which is up to 17% less than EMR and 34% less than HDInsights in costing. From the overall discussion above we can say that from the perspective of Big Data and Analytics, the best suitable cloud could be Google Cloud Platform as they are offering the most feature reach services, pricing is less and evolving very rapidly.

Harness big data solutions from YASH to drive better business decisions.

Akash Jain Sr Technology Professional – Innovation Group – Big Data | IoT | Analytics | Cloud

Posted by Akash Jain
Comments (0)
September 1, 2017

Comments

No Comments

Add Comments

Request For Information
Request For Information
Thank you for your message. It has been sent.
X