MapReduce: Detecting Cycles in a Network Graph
I recently received an email from an audience of my blog on Map/Reduce algorithm design regarding how to detect whether a graph is acyclic using Map/Reduce. I think this is an interesting problem and...
View ArticleReading Hive Tables from MapReduce
This article is by Stephen Mouring Jr, appearing courtesy of Scott Leberknight. Preview Text: So just as sometimes you need to write data to Hive with a custom MapReduce job, sometimes you need to...
View ArticleSorting Text Files with MapReduce
In my last post I wrote about sorting files in Linux. Decently large files (in the tens of GB’s) can be sorted fairly quickly using that approach. But what if your files are already in HDFS, or ar...
View ArticleBig Data Beyond MapReduce: Google's Big Data Papers
Mainstream Big Data is all about MapReduce, but when looking at real-time data, limitations of that approach are starting to show. In this post, I’ll review Google’s most important Big Data...
View ArticleBeginner Tips For Elastic MapReduce
Curator's Note: The content of this article was written by Preview Text: By this point everyone is well acquainted with the power of Hadoop’s MapReduce. But what you’re also probably well acquainted...
View ArticleTargeting Big Data: Spring XD 1.0 Milestone 1 Released
The Spring XD team is pleased to announce that the first milestone of Spring XD is now Preview Text: Spring XD makes it easy to solve common big data problems such as data ingestion and export,...
View ArticleGlue and Big Data: Getting Started, Part 1
Where to start? Glue is split into three parts: glue-rest - this is the workflow engine that will execute your jobs gluecron - this is the cron/datadriven deamon that launch workflows based on cron or...
View ArticleWriting a Hadoop MapReduce Task in Java
Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. Preview Text: In this post I show how to create a MapReduce job in Java based on a...
View ArticleRun Your Hadoop MapReduce Job on Amazon EMR
I posted a while ago about how to set up an EMR cluster using the CLI. In this post I will show you how to set up the cluster using the Java SDK for AWS. In my opinion the best way to show how to do...
View ArticleHadoop Alternatives: When Your Data Isn't as Big as You Thought
This post from Chris Stuccio's blog takes a critical look at the use of Hadoop and Big Data as buzzwords by asking an interesting question: What if your data isn't as big as you think it is? He offers...
View ArticleSimplifying Secondary Sorting in MapReduce with htuple
I’ve recently found myself immersed in writing a number of MapReduce jobs that all require secondary sort. Whilst I was nursing my cramping hands after writing what felt like the 100th custom Writable...
View ArticleThe Best of the Week: Big Data Zone
Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Oct. 11 to Oct. 17). Here they are, in order of popularity: Preview Text: Make sure you didn't miss...
View ArticleHow to Speed Up MongoDB MapReduce by 20x
Analytics is becoming an increasingly important topic with MongoDB since it is in use for more and more large critical projects. People are tired of using different software to do analytics (Hadoop...
View ArticleThe Best of the Week (Oct. 25): NoSQL Zone
Make sure you didn't miss anything with this list of the Best of the Week in the NoSQL Zone (Oct. 25 to Oct. 31). Here they are, in order of popularity:1. How to Use MongoDB as a Pure In-memory DB...
View Article4 Methods for Structured Big Data Computation
All data can only have the existence value by getting involved in the computation to create value. The big data makes no exception. The computational capability on structural big data determines the...
View ArticleHadoop, MapReduce and Hive: How to Use Non-Java Languages, Such as R
This recent tutorial from Tom Hanlon at Hortonworks demonstrates how to use non-Java languages - R, in particular - to work with Hadoop data through MapReduce and Hive. Preview Text: This recent...
View ArticleThe Best of the Week (Nov. 22): Big Data Zone
Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Nov. 22 to Nov. 28). Here they are, in order of popularity: Preview Text: Make sure you didn't miss...
View ArticleThe Top 10 Articles of 2013: Big Data Zone
Rather than the best of the week this week, let's take a look at the most popular articles Big Data had to offer in 2013. Note that not all of these articles were actually published in 2013;...
View ArticleOne Programming Language to Save Developers
Hadoop is an outstanding parallel computing system whose default parallel computing mode is MapReduce. However, such parallel computing is not specially designed for parallel data computing. Plus, it...
View ArticleMapReduce On Hive Tables Using HCatalog
In my last post Introduction To Hive's Partitioning I described how we can load csv data to a partitioned hive table. Today we shall see how we can use HCatalog to run MapReduce on Hive table and store...
View ArticleMapReduce Algorithms: Understanding Data Joins, Part II
It’s been awhile since I last posted, and like last time I took a big break, I was taking some classes on Coursera. Preview Text: In this post, we resume our series on implementing the algorithms...
View ArticleThe Best of the Week (Feb. 14): Big Data Zone
Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Feb. 14 to Feb. 20). Here they are, in order of popularity: Preview Text: Make sure you didn't miss...
View ArticleInfinispan 5.2: Map/Reduce Parallel Execution
Ever since the Infinispan 5.2 release we implemented fully distributed execution of both map and reduce phases of MapReduceTask. For the map phase, MapReduceTask hashes task input keys, groups them by...
View ArticleMapReduce on Avro Data Files
In this post we are going to write a MapReduce program to consume Avro input data and also produce data in Avro format. We will write a program to calculate average of student marks. Data Preparation...
View ArticleCan MapReduce Solve Planning Problems?
To solve a planning or optimization problem, some solvers tend to scale out poorly: As the problem has more variables and more constraints, they use a lot more RAM memory and CPU power. They can hit...
View Article
More Pages to Explore .....