What you need to know about big data Hadoop and Twitter data sentimental analysis
Twitter is one of the largest social media sites on the internet is known to receive a million tweets a day. This could on various related topics and issues including industrial, social, government policies and economics through the organization of our requirement and processing. Hadoop is one of the best tools available for Twitter data analytics, work simultaneously with distributed bog data, time-stamped data, and text data or even streaming data. This article discusses how to use Flume and Hive tool to analyze Twitter posts. The function of Flume is to extract real-time data from Twitter towards HDFS. But on the other hand Hive – that which is an SQL query language can be used for certain analysis and extraction purposes. Here in this article, we discuss twitter data sentimental analysis using Flume and Hive.
Intro to Apache Hadoop for Microblogging – Twitter
Microblogging is a common and a very popular tool for communication among most internet users. Authors on the twitter rant about their daily lives and issues including sharing opinions and other issues. This post analysis data could be used for decision making in different sectors including elections, business, product review, and government. This can be of a lot of assistance while decision making. Apache Hadoop is known to be good for Twitter data analysis as it works for big distributed data. This is an open source software framework, for any kind of distributed storage and large-scale distributed processing of various datasets of clusters.
Apache Flume – Collect Analyze streaming data into the Hadoop
Apache flume is reliable, distributed and available service to efficiently remove collect, analyze streaming data into the Hadoop Distributed File System. You can dump all the Twitter data in this Hadoop HDFS after the installation of VMWARE and the Hadoop for a single node. The next step is installing the Flume. For this, you need to log into Twitter. Go to apps and start creating a new application. Once you agree to all the terms and conditions you will get a new form of application. After that, you have to set up consumer key, consumer secret, owner key and owner secret ID. Create the access token and then you will get all the 4 info. After that, you can go to flume home and download apache flume.
Hive to summarize big data Hadoop project
This is a data warehouse infrastructure tool that is known to process structured data in Hadoop. It resides on Hadoop in order to summarize big data Hadoop project, which in turn makes querying and analyzing easier. Apache Hive or Hive QL with the distributed file system is known to be used for analysis of data. It provides an SQL interface that which is known to process stored HDP data. Since it’s an SQL interface, it is becoming an awesome choice of technology for using Hadoop. In order to set up a hive in Hadoop – build JSON Ser De.