Package weka.knowledgeflow
Interface StepManager
- All Known Implementing Classes:
StepManagerImpl
public interface StepManager
Client public interface for the StepManager. Step implementations should only
use this interface
- Author:
- Mark Hall (mhall{[at]}pentaho{[dot]}com)
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
-
Method Summary
Modifier and TypeMethodDescriptionenvironmentSubstitute
(String source) Substitute all known environment variables in the given stringfindStepInFlow
(String stepNameToFind) Finds a named step in the current flow.void
finished()
Step implementations processing batch data should call this to indicate that they have finished all processing.Get the executing environment.getIncomingConnectedStepsOfConnectionType
(String connectionName) Get a list of steps that are the source of incoming connections of the given typegetIncomingConnectedStepWithName
(String stepName) Get the named step that is connected with an incoming connection.Get a Map of all incoming connections.getIncomingStructureForConnectionType
(String connectionName) Attempt to retrieve the structure (as a header-only set of instances) for the named incoming connection type.getIncomingStructureForConnectionType
(String connectionName, Environment env) Attempt to retrieve the structure (as a header-only set of instances) for the named incoming connection type.getIncomingStructureFromStep
(StepManager sourceStep, String connectionName) Attempt to get the incoming structure (as a header-only set of instances) from the given managed step for the given connection type.Returns a reference to the step being managed if it has one or more outgoing CON_INFO connections.getInfoStep
(Class stepClass) Returns a reference to the step being managed if it has one or more outgoing CON_INFO connections and the managed step is of the supplied classgetLog()
Get the logGet the currently set logging levelGet the actual step managed by this step managergetName()
Get the name of the step managed by this StepManagergetOutgoingConnectedStepsOfConnectionType
(String connectionName) Get a list of downstream steps connected to this step with the given connection type.getOutgoingConnectedStepWithName
(String stepName) Get a named step connected to this step with an outgoing connectionGet a Map of all outgoing connections.Get the knowledge flow settingsboolean
Returns true if the step managed by this step manager has been marked as one that must run single-threaded.void
Step implementations processing batch data should call this as soon as they have finished processing after a stop has been requested.boolean
Returns true if, at this time, the step managed by this step manager is currently busy with processingboolean
Return true if the current step is finished.boolean
Return true if a stop has been requested by the runtime environmentboolean
isStreamFinished
(Data data) Returns true if this data object marks the end of an incremental stream.void
log
(String message, LoggingLevel level) Write a message to the log at the given logging levelvoid
Log a message at the "basic" levelvoid
Log a message at the "debug" levelvoid
logDetailed
(String message) Log a message at the "detailed" levelvoid
Log an error message.void
Log a message at the "low" levelvoid
logWarning
(String message) Log a warning message.int
Get the number of steps that are connected with incoming connectionsint
numIncomingConnectionsOfType
(String connectionName) Get the number of steps that are connected with the given incoming connection typeint
Get the number of steps that are connected with outgoing connectionsint
numOutgoingConnectionsOfType
(String connectionName) Get the number of steps that are connected with the given outgoing connection typevoid
outputData
(String outgoingConnectionName, String stepName, Data data) Output a single Data object to the named step with the supplied outgoing connection typevoid
outputData
(String outgoingConnectionName, Data data) Output data to all steps connected with the supplied outgoing connection type.void
outputData
(Data... data) Output one or more Data objects to all relevant steps.void
Step implementations processing batch data should call this to indicate that they have started some processing.void
setStepIsResourceIntensive
(boolean isResourceIntensive) Mark the step managed by this step manager as resource intensivevoid
setStepMustRunSingleThreaded
(boolean mustRunSingleThreaded) Marked the step managed by this step manager as one that must run single-threaded.void
statusMessage
(String message) Write a status messageboolean
Returns true if the step managed by this step manager has been marked as being resource (cpu/memory) intensive.void
throughputFinished
(Data... data) Signal that throughput measurement has finished.void
End a throughput measurement.void
Start a throughput measurement.
-
Field Details
-
CON_DATASET
- See Also:
-
CON_INSTANCE
- See Also:
-
CON_TRAININGSET
- See Also:
-
CON_TESTSET
- See Also:
-
CON_BATCH_CLASSIFIER
- See Also:
-
CON_INCREMENTAL_CLASSIFIER
- See Also:
-
CON_INCREMENTAL_CLUSTERER
- See Also:
-
CON_BATCH_CLUSTERER
- See Also:
-
CON_BATCH_ASSOCIATOR
- See Also:
-
CON_VISUALIZABLE_ERROR
- See Also:
-
CON_THRESHOLD_DATA
- See Also:
-
CON_TEXT
- See Also:
-
CON_IMAGE
- See Also:
-
CON_GRAPH
- See Also:
-
CON_CHART
- See Also:
-
CON_INFO
- See Also:
-
CON_ENVIRONMENT
- See Also:
-
CON_JOB_SUCCESS
- See Also:
-
CON_JOB_FAILURE
- See Also:
-
CON_AUX_DATA_SET_NUM
- See Also:
-
CON_AUX_DATA_MAX_SET_NUM
- See Also:
-
CON_AUX_DATA_TEST_INSTANCE
- See Also:
-
CON_AUX_DATA_TESTSET
- See Also:
-
CON_AUX_DATA_TRAININGSET
- See Also:
-
CON_AUX_DATA_INSTANCE
- See Also:
-
CON_AUX_DATA_TEXT_TITLE
- See Also:
-
CON_AUX_DATA_LABEL
- See Also:
-
CON_AUX_DATA_CLASS_ATTRIBUTE
- See Also:
-
CON_AUX_DATA_GRAPH_TITLE
- See Also:
-
CON_AUX_DATA_GRAPH_TYPE
- See Also:
-
CON_AUX_DATA_CHART_MAX
- See Also:
-
CON_AUX_DATA_CHART_MIN
- See Also:
-
CON_AUX_DATA_CHART_DATA_POINT
- See Also:
-
CON_AUX_DATA_CHART_LEGEND
- See Also:
-
CON_AUX_DATA_ENVIRONMENT_VARIABLES
- See Also:
-
CON_AUX_DATA_ENVIRONMENT_PROPERTIES
- See Also:
-
CON_AUX_DATA_ENVIRONMENT_RESULTS
- See Also:
-
CON_AUX_DATA_BATCH_ASSOCIATION_RULES
- See Also:
-
CON_AUX_DATA_INCREMENTAL_STREAM_END
- See Also:
-
CON_AUX_DATA_IS_INCREMENTAL
- See Also:
-
CON_AUX_DATA_PRIMARY_PAYLOAD_NOT_THREAD_SAFE
- See Also:
-
-
Method Details
-
getName
String getName()Get the name of the step managed by this StepManager- Returns:
- the name of the managed step
-
getManagedStep
Step getManagedStep()Get the actual step managed by this step manager- Returns:
- the Step managed by this step manager
-
getExecutionEnvironment
ExecutionEnvironment getExecutionEnvironment()Get the executing environment. This contains information such as whether the flow is running in headless environment, what environment variables are available and methods to execute units of work in parallel.- Returns:
- the execution environment
-
getSettings
Settings getSettings()Get the knowledge flow settings- Returns:
- the knowledge flow settings
-
numIncomingConnections
int numIncomingConnections()Get the number of steps that are connected with incoming connections- Returns:
- the number of incoming connections
-
numOutgoingConnections
int numOutgoingConnections()Get the number of steps that are connected with outgoing connections- Returns:
- the number of outgoing connections
-
numIncomingConnectionsOfType
Get the number of steps that are connected with the given incoming connection type- Parameters:
connectionName
- the type of the incoming connection- Returns:
- the number of steps connected with the specified incoming connection type
-
numOutgoingConnectionsOfType
Get the number of steps that are connected with the given outgoing connection type- Parameters:
connectionName
- the type of the outgoing connection- Returns:
- the number of steps connected with the specified outgoing connection type
-
getIncomingConnectedStepsOfConnectionType
Get a list of steps that are the source of incoming connections of the given type- Parameters:
connectionName
- the name of the incoming connection to get a list of steps for- Returns:
- a list of steps that are the source of incoming connections of the given type
-
getIncomingConnectedStepWithName
Get the named step that is connected with an incoming connection.- Parameters:
stepName
- the name of the step to get- Returns:
- the step connected with an incoming connection or null if the named step is not connected
-
getOutgoingConnectedStepWithName
Get a named step connected to this step with an outgoing connection- Parameters:
stepName
- the name of the step to look for- Returns:
- the connected step
-
getOutgoingConnectedStepsOfConnectionType
Get a list of downstream steps connected to this step with the given connection type.- Parameters:
connectionName
- the name of the outgoing connection- Returns:
- a list of downstream steps connected to this one with the named connection type
-
getIncomingConnections
Map<String,List<StepManager>> getIncomingConnections()Get a Map of all incoming connections. Map is keyed by connection type; values are lists of steps- Returns:
- a Map of incoming connections
-
getOutgoingConnections
Map<String,List<StepManager>> getOutgoingConnections()Get a Map of all outgoing connections. Map is keyed by connection type; values are lists of steps- Returns:
- a Map of outgoing connections
-
outputData
Output data to all steps connected with the supplied outgoing connection type. Populates the source and connection name in the supplied Data object for the client- Parameters:
outgoingConnectionName
- the type of the outgoing connection to send data todata
- a single Data object to send- Throws:
WekaException
- if a problem occurs
-
outputData
Output one or more Data objects to all relevant steps. Populates the source in each Data object for the client, HOWEVER, the client must have populated the connection type in each Data object to be output so that the StepManager knows which connected steps to send the data to. Also notifies any registeredStepOutputListeners
. Note that the downstream step(s)' processIncoming() method is called in a separate thread for batch connections. Furthermore, if multiple Data objects are supplied via the varargs argument, and a target step will receive more than one of the Data objects, then they will be passed on to the step in question sequentially within the same thread of execution.- Parameters:
data
- one or more Data objects to be sent- Throws:
WekaException
- if a problem occurs
-
outputData
Output a single Data object to the named step with the supplied outgoing connection type- Parameters:
outgoingConnectionName
- the name of the outgoing connectionstepName
- the name of the step to send the data todata
- the data to send- Throws:
WekaException
- if a problem occurs
-
getIncomingStructureForConnectionType
Attempt to retrieve the structure (as a header-only set of instances) for the named incoming connection type. Assumes that there is only one step connected with the supplied incoming connection type.- Parameters:
connectionName
- the type of the incoming connection to get the structure for- Returns:
- the structure of the data for the specified incoming connection, or null if the structure can't be determined (or represented as an Instances object)
- Throws:
WekaException
- if a problem occurs
-
getIncomingStructureForConnectionType
Instances getIncomingStructureForConnectionType(String connectionName, Environment env) throws WekaException Attempt to retrieve the structure (as a header-only set of instances) for the named incoming connection type. Assumes that there is only one step connected with the supplied incoming connection type.- Parameters:
connectionName
- the type of the incoming connection to get the structure forenv
- the Environment to use- Returns:
- the structure of the data for the specified incoming connection, or null if the structure can't be determined (or represented as an Instances object)
- Throws:
WekaException
- if a problem occurs
-
getIncomingStructureFromStep
Instances getIncomingStructureFromStep(StepManager sourceStep, String connectionName) throws WekaException Attempt to get the incoming structure (as a header-only set of instances) from the given managed step for the given connection type.- Parameters:
sourceStep
- the step manager managing the source stepconnectionName
- the name of the connection to attempt to get the structure for- Returns:
- the structure as a header-only set of instances, or null if the source step can't determine this at present or if it can't be represented as a set of instances.
- Throws:
WekaException
- if a problem occurs
-
isStepBusy
boolean isStepBusy()Returns true if, at this time, the step managed by this step manager is currently busy with processing- Returns:
- true if the step managed by this step manager is busy
-
isStopRequested
boolean isStopRequested()Return true if a stop has been requested by the runtime environment- Returns:
- true if a stop has been requested
-
isStepFinished
boolean isStepFinished()Return true if the current step is finished.- Returns:
- true if the current step is finished
-
processing
void processing()Step implementations processing batch data should call this to indicate that they have started some processing. Calling this should set the busy flag to true. -
finished
void finished()Step implementations processing batch data should call this to indicate that they have finished all processing. Calling this should set the busy flag to false. -
interrupted
void interrupted()Step implementations processing batch data should call this as soon as they have finished processing after a stop has been requested. Calling this should set the busy flag to false. -
isStreamFinished
Returns true if this data object marks the end of an incremental stream. Note - does not check that the data object is actually an incremental one of some sort! Just checks to see if the CON_AUX_DATA_INCREMENTAL_STREAM_END flag is set to true or not;- Parameters:
data
- the data element to check- Returns:
- true if the data element is flagged as end of stream
-
throughputUpdateStart
void throughputUpdateStart()Start a throughput measurement. Should only be used by steps that are processing instance streams. Call just before performing a unit of work for an incoming instance. -
throughputUpdateEnd
void throughputUpdateEnd()End a throughput measurement. Should only be used by steps that are processing instance streams. Call just after finishing a unit of work for an incoming instance -
throughputFinished
Signal that throughput measurement has finished. Should only be used by steps that are emitting incremental data. Call as the completion of an data stream.- Parameters:
data
- one or more Data events (with appropriate connection type set) to pass on to downstream connected steps. These are used to carry any final data and to inform the downstream step(s) that the stream has ended- Throws:
WekaException
- if a problem occurs
-
logLow
Log a message at the "low" level- Parameters:
message
- the message to log
-
logBasic
Log a message at the "basic" level- Parameters:
message
- the message to log
-
logDetailed
Log a message at the "detailed" level- Parameters:
message
- the message to log
-
logDebug
Log a message at the "debug" level- Parameters:
message
- the message to log
-
logWarning
Log a warning message. Always makes it into the log regardless of what logging level the user has specified.- Parameters:
message
- the message to log
-
logError
Log an error message. Always makes it into the log regardless of what logging level the user has specified. Causes all flow execution to halt. Prints an exception to the log if supplied.- Parameters:
message
- the message to logcause
- the optional Throwable to log
-
log
Write a message to the log at the given logging level- Parameters:
message
- the message to writelevel
- the level for the message
-
statusMessage
Write a status message- Parameters:
message
- the message
-
getLog
Logger getLog()Get the log- Returns:
- the log object
-
getLoggingLevel
LoggingLevel getLoggingLevel()Get the currently set logging level- Returns:
- the currently set logging level
-
environmentSubstitute
Substitute all known environment variables in the given string- Parameters:
source
- the source string- Returns:
- the source string with all known variables resolved
-
getInfoStep
Returns a reference to the step being managed if it has one or more outgoing CON_INFO connections and the managed step is of the supplied class- Parameters:
stepClass
- the expected class of the step- Returns:
- the step being managed if outgoing CON_INFO connections are present and the step is of the supplied class
- Throws:
WekaException
- if there are no outgoing CON_INFO connections or the managed step is the wrong type
-
getInfoStep
Returns a reference to the step being managed if it has one or more outgoing CON_INFO connections.- Returns:
- the step being managed if outgoing CON_INFO connections are present
- Throws:
WekaException
- if there are no outgoing CON_INFO connections
-
findStepInFlow
Finds a named step in the current flow. Returns null if the named step is not present in the flow- Parameters:
stepNameToFind
- the name of the step to find- Returns:
- the StepManager of the named step, or null if the step does not exist in the current flow.
-
stepIsResourceIntensive
boolean stepIsResourceIntensive()Returns true if the step managed by this step manager has been marked as being resource (cpu/memory) intensive.- Returns:
- true if the managed step is resource intensive
-
setStepIsResourceIntensive
void setStepIsResourceIntensive(boolean isResourceIntensive) Mark the step managed by this step manager as resource intensive- Parameters:
isResourceIntensive
- true if the step managed by this step manager is resource intensive
-
setStepMustRunSingleThreaded
void setStepMustRunSingleThreaded(boolean mustRunSingleThreaded) Marked the step managed by this step manager as one that must run single-threaded. I.e. in an executor service with one worker thread, thus effectively preventing more than one copy of the step from executing at any one point in time- Parameters:
mustRunSingleThreaded
- true if the managed step must run single-threaded
-
getStepMustRunSingleThreaded
boolean getStepMustRunSingleThreaded()Returns true if the step managed by this step manager has been marked as one that must run single-threaded. I.e. in an executor service with one worker thread, thus effectively preventing more than one copy of the step from executing at any one point in time- Parameters:
mustRunSingleThreaded
- true if the managed step must run single-threaded
-