- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class JoinExample
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool
Simple example of joining 2 data sets.
The example shows a vertex with multiple inputs that represent the two
data sets that need to be joined.
The join can be performed using a broadcast (or replicate-fragment) join in
which the small side of the join is broadcast in total to fragments of the
larger side. Each fragment of the larger side can perform the join operation
independently using the full data of the smaller side. This shows the usage
of the broadcast edge property in Tez.
The join can be performed using the regular repartition join where both
sides are partitioned according to the same scheme into the same number of
fragments. Then the keys in the same fragment are joined with each other. This
is the default join strategy.