Package weka.knowledgeflow.steps
Class TrainTestSplitMaker
java.lang.Object
weka.knowledgeflow.steps.BaseStep
weka.knowledgeflow.steps.TrainTestSplitMaker
- All Implemented Interfaces:
Serializable
,BaseStepExtender
,Step
@KFStep(name="TrainTestSplitMaker",
category="Evaluation",
toolTipText="A step that randomly splits incoming data into a training and test set",
iconPath="weka/gui/knowledgeflow/icons/TrainTestSplitMaker.gif")
public class TrainTestSplitMaker
extends BaseStep
A step that creates a random train/test split from an incoming data set.
- Version:
- $Revision: $
- Author:
- Mark Hall (mhall{[at]}pentaho{[dot]}com)
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionGet a list of incoming connection types that this step can accept.Get a list of outgoing connection types that this step can produce.boolean
Get whether to preserve the order of the instances or notgetSeed()
Get the random seed to useGet the training percentageoutputStructureForConnectionType
(String connectionName) If possible, get the output structure for the named connection type as a header-only set of instances.void
processIncoming
(Data data) Process an incoming data payload (if the step accepts incoming connections)void
setPreserveOrder
(boolean preserve) Set whether to preserve the order of the instances or notvoid
Set the random seed to usevoid
setTrainPercent
(String percent) Set the training percentagevoid
stepInit()
Initialize the stepMethods inherited from class weka.knowledgeflow.steps.BaseStep
environmentSubstitute, getCustomEditorForStep, getDefaultSettings, getInteractiveViewers, getInteractiveViewersImpls, getName, getStepManager, globalInfo, isResourceIntensive, isStopRequested, outputStructureForConnectionType, setName, setStepIsResourceIntensive, setStepManager, setStepMustRunSingleThreaded, start, stepMustRunSingleThreaded, stop
-
Constructor Details
-
TrainTestSplitMaker
public TrainTestSplitMaker()
-
-
Method Details
-
setTrainPercent
@OptionMetadata(displayName="Training percentage", description="The percentage of data to go into the training set", displayOrder=1) public void setTrainPercent(String percent) Set the training percentage- Parameters:
percent
- the training percentage
-
getTrainPercent
Get the training percentage- Returns:
- the training percentage
-
setSeed
@OptionMetadata(displayName="Random seed", description="The random seed to use when shuffling the data", displayOrder=2) public void setSeed(String seed) Set the random seed to use- Parameters:
seed
- the random seed to use
-
getSeed
Get the random seed to use- Returns:
- the random seed to use
-
setPreserveOrder
@OptionMetadata(displayName="Preserve instance order", description="Preserve the order of the instances rather than randomly shuffling", displayOrder=3) public void setPreserveOrder(boolean preserve) Set whether to preserve the order of the instances or not- Parameters:
preserve
- true to preserve the order rather than randomly shuffling first
-
getPreserveOrder
public boolean getPreserveOrder()Get whether to preserve the order of the instances or not- Returns:
- true to preserve the order rather than randomly shuffling first
-
stepInit
Initialize the step- Throws:
WekaException
- if a problem occurs
-
processIncoming
Process an incoming data payload (if the step accepts incoming connections)- Specified by:
processIncoming
in interfaceBaseStepExtender
- Specified by:
processIncoming
in interfaceStep
- Overrides:
processIncoming
in classBaseStep
- Parameters:
data
- the data to process- Throws:
WekaException
- if a problem occurs
-
getIncomingConnectionTypes
Get a list of incoming connection types that this step can accept. Ideally (and if appropriate), this should take into account the state of the step and any existing incoming connections. E.g. a step might be able to accept one (and only one) incoming batch data connection.- Returns:
- a list of incoming connections that this step can accept given its current state
-
getOutgoingConnectionTypes
Get a list of outgoing connection types that this step can produce. Ideally (and if appropriate), this should take into account the state of the step and the incoming connections. E.g. depending on what incoming connection is present, a step might be able to produce a trainingSet output, a testSet output or neither, but not both.- Returns:
- a list of outgoing connections that this step can produce
-
outputStructureForConnectionType
If possible, get the output structure for the named connection type as a header-only set of instances. Can return null if the specified connection type is not representable as Instances or cannot be determined at present.- Specified by:
outputStructureForConnectionType
in interfaceStep
- Overrides:
outputStructureForConnectionType
in classBaseStep
- Parameters:
connectionName
- the name of the connection type to get the output structure for- Returns:
- the output structure as a header-only Instances object
- Throws:
WekaException
- if a problem occurs
-