Quantcast
Channel: Javalobby - MapReduce
Browsing latest articles
Browse All 44 View Live

MapReduce: Detecting Cycles in a Network Graph

I recently received an email from an audience of my blog on Map/Reduce algorithm design regarding how to detect whether a graph is acyclic using Map/Reduce.  I think this is an interesting problem and...

View Article



Reading Hive Tables from MapReduce

This article is by Stephen Mouring Jr, appearing courtesy of Scott Leberknight. Preview Text:  So just as sometimes you need to write data to Hive with a custom MapReduce job, sometimes you need to...

View Article

Sorting Text Files with MapReduce

In my last post I wrote about sorting files in Linux. Decently large files (in the tens of GB’s) can be sorted fairly quickly using that approach. But what if your files are already in HDFS, or ar...

View Article

Big Data Beyond MapReduce: Google's Big Data Papers

Mainstream Big Data is all about MapReduce, but when looking at real-time data, limitations of that approach are starting to show. In this post, I’ll review Google’s most important Big Data...

View Article

Beginner Tips For Elastic MapReduce

Curator's Note: The content of this article was written by  Preview Text:  By this point everyone is well acquainted with the power of Hadoop’s MapReduce. But what you’re also probably well acquainted...

View Article


Targeting Big Data: Spring XD 1.0 Milestone 1 Released

The Spring XD team is pleased to announce that the first milestone of Spring XD is now Preview Text:  Spring XD makes it easy to solve common big data problems such as data ingestion and export,...

View Article

Glue and Big Data: Getting Started, Part 1

Where to start? Glue is split into three parts: glue-rest - this is the workflow engine that will execute your jobs gluecron - this is the cron/datadriven deamon that launch workflows based on cron or...

View Article

Writing a Hadoop MapReduce Task in Java

Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. Preview Text:  In this post I show how to create a MapReduce job in Java based on a...

View Article


Run Your Hadoop MapReduce Job on Amazon EMR

I posted a while ago about how to set up an EMR cluster using the CLI. In this post I will show you how to set up the cluster using the Java SDK for AWS. In my opinion the best way to show how to do...

View Article


Hadoop Alternatives: When Your Data Isn't as Big as You Thought

This post from Chris Stuccio's blog takes a critical look at the use of Hadoop and Big Data as buzzwords by asking an interesting question: What if your data isn't as big as you think it is? He offers...

View Article

Simplifying Secondary Sorting in MapReduce with htuple

I’ve recently found myself immersed in writing a number of MapReduce jobs that all require secondary sort. Whilst I was nursing my cramping hands after writing what felt like the 100th custom Writable...

View Article

The Best of the Week: Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Oct. 11 to Oct. 17). Here they are, in order of popularity: Preview Text:  Make sure you didn't miss...

View Article

How to Speed Up MongoDB MapReduce by 20x

Analytics is becoming an increasingly important topic with MongoDB since it is in use for more and more large critical projects. People are tired of using different software to do analytics (Hadoop...

View Article


The Best of the Week (Oct. 25): NoSQL Zone

Make sure you didn't miss anything with this list of the Best of the Week in the NoSQL Zone (Oct. 25 to Oct. 31). Here they are, in order of popularity:1. How to Use MongoDB as a Pure In-memory DB...

View Article

4 Methods for Structured Big Data Computation

All data can only have the existence value by getting involved in the computation to create value. The big data  makes no exception. The computational capability on structural big data determines the...

View Article


Hadoop, MapReduce and Hive: How to Use Non-Java Languages, Such as R

This recent tutorial from Tom Hanlon at Hortonworks demonstrates how to use non-Java languages - R, in particular - to work with Hadoop data through MapReduce and Hive. Preview Text:  This recent...

View Article

The Best of the Week (Nov. 22): Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Nov. 22 to Nov. 28). Here they are, in order of popularity: Preview Text:  Make sure you didn't miss...

View Article


The Top 10 Articles of 2013: Big Data Zone

Rather than the best of the week this week, let's take a look at the most popular articles Big Data had to offer in 2013. Note that not all of these articles were actually published in 2013;...

View Article

One Programming Language to Save Developers

Hadoop is an outstanding parallel computing system whose default parallel computing mode is MapReduce. However, such parallel computing is not specially designed for parallel data computing. Plus, it...

View Article

MapReduce On Hive Tables Using HCatalog

In my last post Introduction To Hive's Partitioning I described how we can load csv data to a partitioned hive table. Today we shall see how we can use HCatalog to run MapReduce on Hive table and store...

View Article

MapReduce Algorithms: Understanding Data Joins, Part II

It’s been awhile since I last posted, and like last time I took a big break, I was taking some classes on Coursera. Preview Text:  In this post, we resume our series on implementing the algorithms...

View Article


The Best of the Week (Feb. 14): Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Feb. 14 to Feb. 20). Here they are, in order of popularity: Preview Text:  Make sure you didn't miss...

View Article


Infinispan 5.2: Map/Reduce Parallel Execution

Ever since the Infinispan 5.2 release we implemented fully distributed execution of both map and reduce phases of MapReduceTask. For the map phase, MapReduceTask hashes task input keys, groups them by...

View Article

MapReduce on Avro Data Files

In this post we are going to write a MapReduce program to consume Avro input data and also produce data in Avro format. We will write a program to calculate average of student marks. Data Preparation...

View Article

Can MapReduce Solve Planning Problems?

To solve a planning or optimization problem, some solvers tend to scale out poorly: As the problem has more variables and more constraints, they use a lot more RAM memory and CPU power. They can hit...

View Article

Browsing latest articles
Browse All 44 View Live




Latest Images