I’ve started my journey with Hadoop, and the first thing I wanted to try was Streaming, so I could run the mapper and reducer methods with PHP programs.
The first thing I did was setup an alias:
alias stream='/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.18.3-streaming.jar'
The next thing was to create a scripts dir in my $HADOOP_HOME (/usr/local/hadoop) dir.wc_mapper.php
#!/usr/bin/php <?php error_reporting(0); $in = fopen("php://stdin", "r"); $results = array(); while ( $line = fgets($in, 4096) ) { $words = preg_split('/\W/', $line, 0, PREG_SPLIT_NO_EMPTY); foreach ($words as $word) $results[$word] += 1; } fclose($in); foreach ($results as $key => $value) print "$key\t$value\n"; ?>wc_reducer.php
#!/usr/bin/php <?php error_reporting(0); $in = fopen("php://stdin", "r"); $results = array(); while ( $line = fgets($in, 4096) ) { list($key, $value) = preg_split("/\t/", trim($line), 2); $results[$key] += $value; } fclose($in); ksort($results); foreach ($results as $key => $value) print "$key\t$value\n"; ?>To execute:
stream -input conf -output output4 -mapper /usr/local/hadoop/scripts/wc_mapper.php -reducer /usr/local/hadoop/scripts/wc_reducer.phpI’ll come back later and document. Just wanted to get the initial recorded.
Comments are closed.