Package weka.knowledgeflow.steps
Class Sorter
java.lang.Object
weka.knowledgeflow.steps.BaseStep
weka.knowledgeflow.steps.Sorter
- All Implemented Interfaces:
Serializable
,BaseStepExtender
,Step
@KFStep(name="Sorter",
category="Tools",
toolTipText="Sort instances in ascending or descending order according to the values of user-specified attributes. Instances can be sorted according to multiple attributes (defined in order). Handles datasets larger than can be fit into main memory via instance connections and specifying the in-memory buffer size. Implements a merge-sort by writing the sorted in-memory buffer to a file when full and then interleaving instances from the disk-based file(s) when the incoming stream has finished.",
iconPath="weka/gui/knowledgeflow/icons/Sorter.gif")
public class Sorter
extends BaseStep
Step for sorting instances according to one or more attributes.
- Version:
- $Revision: $
- Author:
- Mark Hall (mhall{[at]}pentaho{[dot]}com)
- See Also:
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Implements a sorting rule based on a single attribute -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionGet the size of the in-memory bufferReturn the fully qualified name of a custom editor component (JComponent) to use for editing the properties of the step.Get a list of incoming connection types that this step can accept.Get a list of outgoing connection types that this step can produce.Get the sort rules to useGet the directory to use for temporary files during incremental operationvoid
processIncoming
(Data data) Process an incoming data payload (if the step accepts incoming connections)void
setBufferSize
(String buffSize) Set the size of the in-memory buffervoid
setSortDetails
(String sortDetails) Set the sort rules to usevoid
setTempDirectory
(File tempDir) Set the directory to use for temporary files during incremental operationvoid
stepInit()
Initialize the step.Methods inherited from class weka.knowledgeflow.steps.BaseStep
environmentSubstitute, getDefaultSettings, getInteractiveViewers, getInteractiveViewersImpls, getName, getStepManager, globalInfo, isResourceIntensive, isStopRequested, outputStructureForConnectionType, outputStructureForConnectionType, setName, setStepIsResourceIntensive, setStepManager, setStepMustRunSingleThreaded, start, stepMustRunSingleThreaded, stop
-
Constructor Details
-
Sorter
public Sorter()
-
-
Method Details
-
getBufferSize
Get the size of the in-memory buffer- Returns:
- the size of the in-memory buffer
-
setBufferSize
@OptionMetadata(displayName="Size of in-mem streaming buffer", description="Number of instances to sort in memory before writing to a temp file (instance connections only)", displayOrder=1) public void setBufferSize(String buffSize) Set the size of the in-memory buffer- Parameters:
buffSize
- the size of the in-memory buffer
-
setTempDirectory
@FilePropertyMetadata(fileChooserDialogType=0, directoriesOnly=true) @OptionMetadata(displayName="Directory for temp files", description="Where to store temporary files when spilling to disk", displayOrder=2) public void setTempDirectory(File tempDir) Set the directory to use for temporary files during incremental operation- Parameters:
tempDir
- the temp dir to use
-
getTempDirectory
Get the directory to use for temporary files during incremental operation- Returns:
- the temp dir to use
-
setSortDetails
Set the sort rules to use- Parameters:
sortDetails
- the sort rules in internal string representation
-
getSortDetails
Get the sort rules to use- Returns:
- the sort rules in internal string representation
-
stepInit
Initialize the step.- Throws:
WekaException
- if a problem occurs during initialization
-
getIncomingConnectionTypes
Get a list of incoming connection types that this step can accept. Ideally (and if appropriate), this should take into account the state of the step and any existing incoming connections. E.g. a step might be able to accept one (and only one) incoming batch data connection.- Returns:
- a list of incoming connections that this step can accept given its current state
-
getOutgoingConnectionTypes
Get a list of outgoing connection types that this step can produce. Ideally (and if appropriate), this should take into account the state of the step and the incoming connections. E.g. depending on what incoming connection is present, a step might be able to produce a trainingSet output, a testSet output or neither, but not both.- Returns:
- a list of outgoing connections that this step can produce
-
processIncoming
Process an incoming data payload (if the step accepts incoming connections)- Specified by:
processIncoming
in interfaceBaseStepExtender
- Specified by:
processIncoming
in interfaceStep
- Overrides:
processIncoming
in classBaseStep
- Parameters:
data
- the data to process- Throws:
WekaException
- if a problem occurs
-
getCustomEditorForStep
Return the fully qualified name of a custom editor component (JComponent) to use for editing the properties of the step. This method can return null, in which case the system will dynamically generate an editor using the GenericObjectEditor- Specified by:
getCustomEditorForStep
in interfaceStep
- Overrides:
getCustomEditorForStep
in classBaseStep
- Returns:
- the fully qualified name of a step editor component
-