public final class SplitInputJob extends Object
| Modifier and Type | Class and Description |
|---|---|
static class |
SplitInputJob.SplitInputComparator
Randomly permute key value pairs
|
static class |
SplitInputJob.SplitInputMapper
Mapper which downsamples the input by downsamplingFactor
|
static class |
SplitInputJob.SplitInputReducer
Reducer which uses MultipleOutputs to randomly allocate key value pairs
between test and training outputs
|
| Modifier and Type | Method and Description |
|---|---|
static void |
run(org.apache.hadoop.conf.Configuration initialConf,
org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
int keepPct,
float randomSelectionPercent)
Run job to downsample, randomly permute and split data into test and
training sets.
|
public static void run(org.apache.hadoop.conf.Configuration initialConf,
org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
int keepPct,
float randomSelectionPercent)
throws IOException,
ClassNotFoundException,
InterruptedException
initialConf - inputPath - path to input data SequenceFileoutputPath - path for output data SequenceFileskeepPct - percentage of key value pairs in input to keep. The rest are
discardedrandomSelectionPercent - percentage of key value pairs to allocate to test set. Remainder
are allocated to training setIOExceptionClassNotFoundExceptionInterruptedExceptionCopyright © 2008–2013 The Apache Software Foundation. All rights reserved.