Package weka.knowledgeflow.steps
Class SubstringLabeler
java.lang.Object
weka.knowledgeflow.steps.BaseStep
weka.knowledgeflow.steps.SubstringLabeler
- All Implemented Interfaces:
Serializable
,BaseStepExtender
,Step
@KFStep(name="SubstringLabeler",
category="Tools",
toolTipText="Label instances according to substring matches in String attributes The user can specify the attributes to match against and associated label to create by defining \'match\' rules. A new attribute is appended to the data to contain the label. Rules are applied in order when processing instances, and the label associated with the first matching rule is applied. Non-matching instances can either receive a missing value for the label attribute or be \'consumed\' (i.e. they are not output).",
iconPath="weka/gui/knowledgeflow/icons/DefaultFilter.gif")
public class SubstringLabeler
extends BaseStep
Step that appends a label to incoming instances according to substring
matches in string attributes. Multiple match "rules" can be
specified - these get applied in the order that they are defined. Each rule
can be applied to one or more user-specified input String attributes.
Attributes can be specified using either a range list (e.g 1,2-10,last) or by
a comma separated list of attribute names (where "/first" and "/last" are
special strings indicating the first and last attribute respectively).
- Version:
- $Revision: $
- Author:
- Mark Hall (mhall{[at]}pentaho{[dot]}com)
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionboolean
Get whether instances that do not match any of the rules should be "consumed" rather than output with a missing value set for the new attribute.Return the fully qualified name of a custom editor component (JComponent) to use for editing the properties of the step.Get a list of incoming connection types that this step can accept.Get the name of the new attribute that is created to indicate the matchGet the internally encoded list of match rulesboolean
Get whether the new attribute created should be a nominal binary attribute rather than a numeric binary attribute.Get a list of outgoing connection types that this step can produce.outputStructureForConnectionType
(String connectionName) If possible, get the output structure for the named connection type as a header-only set of instances.void
processIncoming
(Data data) Process an incoming data payload (if the step accepts incoming connections)void
setConsumeNonMatching
(boolean consume) Set whether instances that do not match any of the rules should be "consumed" rather than output with a missing value set for the new attribute.void
setMatchAttributeName
(String name) Set the name of the new attribute that is created to indicate the matchvoid
setMatchDetails
(String details) Set internally encoded list of match rulesvoid
setNominalBinary
(boolean nom) Set whether the new attribute created should be a nominal binary attribute rather than a numeric binary attribute.void
stepInit()
Initialize the stepMethods inherited from class weka.knowledgeflow.steps.BaseStep
environmentSubstitute, getDefaultSettings, getInteractiveViewers, getInteractiveViewersImpls, getName, getStepManager, globalInfo, isResourceIntensive, isStopRequested, outputStructureForConnectionType, setName, setStepIsResourceIntensive, setStepManager, setStepMustRunSingleThreaded, start, stepMustRunSingleThreaded, stop
-
Constructor Details
-
SubstringLabeler
public SubstringLabeler()
-
-
Method Details
-
setMatchDetails
Set internally encoded list of match rules- Parameters:
details
- the list of match rules
-
getMatchDetails
Get the internally encoded list of match rules- Returns:
- the match rules
-
setNominalBinary
@OptionMetadata(displayName="Make a nominal binary attribute", description="Whether to encode the new attribute as nominal when it is binary (as opposed to numeric)", displayOrder=1) public void setNominalBinary(boolean nom) Set whether the new attribute created should be a nominal binary attribute rather than a numeric binary attribute.- Parameters:
nom
- true if the attribute should be a nominal binary one
-
getNominalBinary
public boolean getNominalBinary()Get whether the new attribute created should be a nominal binary attribute rather than a numeric binary attribute.- Returns:
- true if the attribute should be a nominal binary one
-
setConsumeNonMatching
@OptionMetadata(displayName="Consume non matching instances", description="Instances that do not match any rules will be consumed, rather than being output with a missing value for the new attribute", displayOrder=2) public void setConsumeNonMatching(boolean consume) Set whether instances that do not match any of the rules should be "consumed" rather than output with a missing value set for the new attribute.- Parameters:
consume
- true if non matching instances should be consumed by the component.
-
getConsumeNonMatching
public boolean getConsumeNonMatching()Get whether instances that do not match any of the rules should be "consumed" rather than output with a missing value set for the new attribute.- Returns:
- true if non matching instances should be consumed by the component.
-
setMatchAttributeName
@OptionMetadata(displayName="Name of the new attribute", description="Name to give the new attribute", displayOrder=0) public void setMatchAttributeName(String name) Set the name of the new attribute that is created to indicate the match- Parameters:
name
- the name of the new attribute
-
getMatchAttributeName
Get the name of the new attribute that is created to indicate the match- Returns:
- the name of the new attribute
-
stepInit
Initialize the step- Throws:
WekaException
- if a problem occurs
-
getIncomingConnectionTypes
Get a list of incoming connection types that this step can accept. Ideally (and if appropriate), this should take into account the state of the step and any existing incoming connections. E.g. a step might be able to accept one (and only one) incoming batch data connection.- Returns:
- a list of incoming connections that this step can accept given its current state
-
getOutgoingConnectionTypes
Get a list of outgoing connection types that this step can produce. Ideally (and if appropriate), this should take into account the state of the step and the incoming connections. E.g. depending on what incoming connection is present, a step might be able to produce a trainingSet output, a testSet output or neither, but not both.- Returns:
- a list of outgoing connections that this step can produce
-
processIncoming
Process an incoming data payload (if the step accepts incoming connections)- Specified by:
processIncoming
in interfaceBaseStepExtender
- Specified by:
processIncoming
in interfaceStep
- Overrides:
processIncoming
in classBaseStep
- Parameters:
data
- the data to process- Throws:
WekaException
- if a problem occurs
-
outputStructureForConnectionType
If possible, get the output structure for the named connection type as a header-only set of instances. Can return null if the specified connection type is not representable as Instances or cannot be determined at present.- Specified by:
outputStructureForConnectionType
in interfaceStep
- Overrides:
outputStructureForConnectionType
in classBaseStep
- Parameters:
connectionName
- the name of the connection type to get the output structure for- Returns:
- the output structure as a header-only Instances object
- Throws:
WekaException
- if a problem occurs
-
getCustomEditorForStep
Return the fully qualified name of a custom editor component (JComponent) to use for editing the properties of the step. This method can return null, in which case the system will dynamically generate an editor using the GenericObjectEditor- Specified by:
getCustomEditorForStep
in interfaceStep
- Overrides:
getCustomEditorForStep
in classBaseStep
- Returns:
- the fully qualified name of a step editor component
-