Package weka.knowledgeflow.steps
Class Join
java.lang.Object
weka.knowledgeflow.steps.BaseStep
weka.knowledgeflow.steps.Join
- All Implemented Interfaces:
Serializable
,BaseStepExtender
,Step
@KFStep(name="Join",
category="Flow",
toolTipText="Performs an inner join on two incoming datasets/instance streams (IMPORTANT: assumes that both datasets are sorted in ascending order of the key fields). If data is not sorted then usea Sorter step to sort both into ascending order of the key fields. Does not handle the case wherekeys are not unique in one or both inputs.",
iconPath="weka/gui/knowledgeflow/icons/Join.gif")
public class Join
extends BaseStep
Step that performs an inner join on one or more key fields from two incoming
batch or streaming datasets.
- Version:
- $Revision: $
- Author:
- Mark Hall (mhall{[at]}pentaho{[dot]}com)
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
Separator used to separate first and second input key specifications -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionGet the names of the connected steps as a listReturn the fully qualified name of a custom editor component (JComponent) to use for editing the properties of the step.Get the Instances structure being produced by the first inputGet a list of incoming connection types that this step can accept.Get the key specification (in internal format - k11,k12,...,k1nKEY_SPEC_SEPARATORk21,k22,...,k2n)Get a list of outgoing connection types that this step can produce.Get the Instances structure being produced by the second inputvoid
processIncoming
(Data data) Process some incoming datavoid
setKeySpec
(String ks) Set the key specification (in internal format - k11,k12,...,k1nKEY_SPEC_SEPARATORk21,k22,...,k2n)void
stepInit()
Initialize the stepMethods inherited from class weka.knowledgeflow.steps.BaseStep
environmentSubstitute, getDefaultSettings, getInteractiveViewers, getInteractiveViewersImpls, getName, getStepManager, globalInfo, isResourceIntensive, isStopRequested, outputStructureForConnectionType, outputStructureForConnectionType, setName, setStepIsResourceIntensive, setStepManager, setStepMustRunSingleThreaded, start, stepMustRunSingleThreaded, stop
-
Field Details
-
KEY_SPEC_SEPARATOR
Separator used to separate first and second input key specifications- See Also:
-
-
Constructor Details
-
Join
public Join()
-
-
Method Details
-
setKeySpec
Set the key specification (in internal format - k11,k12,...,k1nKEY_SPEC_SEPARATORk21,k22,...,k2n)- Parameters:
ks
- the keys specification
-
getKeySpec
Get the key specification (in internal format - k11,k12,...,k1nKEY_SPEC_SEPARATORk21,k22,...,k2n)- Returns:
- the keys specification
-
getConnectedInputNames
Get the names of the connected steps as a list- Returns:
- the names of the connected steps as a list
-
getFirstInputStructure
Get the Instances structure being produced by the first input- Returns:
- the Instances structure from the first input
- Throws:
WekaException
- if a problem occurs
-
getSecondInputStructure
Get the Instances structure being produced by the second input- Returns:
- the Instances structure from the second input
- Throws:
WekaException
- if a problem occurs
-
stepInit
Initialize the step- Throws:
WekaException
- if a problem occurs
-
processIncoming
Process some incoming data- Specified by:
processIncoming
in interfaceBaseStepExtender
- Specified by:
processIncoming
in interfaceStep
- Overrides:
processIncoming
in classBaseStep
- Parameters:
data
- the data to process- Throws:
WekaException
- if a problem occurs
-
getIncomingConnectionTypes
Get a list of incoming connection types that this step can accept. Ideally (and if appropriate), this should take into account the state of the step and any existing incoming connections. E.g. a step might be able to accept one (and only one) incoming batch data connection.- Returns:
- a list of incoming connections that this step can accept given its current state
-
getOutgoingConnectionTypes
Get a list of outgoing connection types that this step can produce. Ideally (and if appropriate), this should take into account the state of the step and the incoming connections. E.g. depending on what incoming connection is present, a step might be able to produce a trainingSet output, a testSet output or neither, but not both.- Returns:
- a list of outgoing connections that this step can produce
-
getCustomEditorForStep
Return the fully qualified name of a custom editor component (JComponent) to use for editing the properties of the step. This method can return null, in which case the system will dynamically generate an editor using the GenericObjectEditor- Specified by:
getCustomEditorForStep
in interfaceStep
- Overrides:
getCustomEditorForStep
in classBaseStep
- Returns:
- the fully qualified name of a step editor component
-