Package weka.associations
Class FPGrowth
java.lang.Object
weka.associations.AbstractAssociator
weka.associations.FPGrowth
- All Implemented Interfaces:
Serializable
,Cloneable
,AssociationRulesProducer
,Associator
,CapabilitiesHandler
,CapabilitiesIgnorer
,CommandlineRunnable
,OptionHandler
,RevisionHandler
,TechnicalInformationHandler
public class FPGrowth
extends AbstractAssociator
implements AssociationRulesProducer, OptionHandler, TechnicalInformationHandler
Class implementing the FP-growth algorithm for
finding large item sets without candidate generation. Iteratively reduces the
minimum support until it finds the required number of rules with the given
minimum metric. For more information see:
J. Han, J.Pei, Y. Yin: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM-SIGMID International Conference on Management of Data, 1-12, 2000. BibTeX:
J. Han, J.Pei, Y. Yin: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM-SIGMID International Conference on Management of Data, 1-12, 2000. BibTeX:
@inproceedings{Han2000, author = {J. Han and J.Pei and Y. Yin}, booktitle = {Proceedings of the 2000 ACM-SIGMID International Conference on Management of Data}, pages = {1-12}, title = {Mining frequent patterns without candidate generation}, year = {2000} }Valid options are:
-P <attribute index of positive value> Set the index of the attribute value to consider as 'positive' for binary attributes in normal dense instances. Index 2 is always used for sparse instances. (default = 2)
-I <max items> The maximum number of items to include in large items sets (and rules). (default = -1, i.e. no limit.)
-N <require number of rules> The required number of rules. (default = 10)
-T <0=confidence | 1=lift | 2=leverage | 3=Conviction> The metric by which to rank rules. (default = confidence)
-C <minimum metric score of a rule> The minimum metric score of a rule. (default = 0.9)
-U <upper bound for minimum support> Upper bound for minimum support. (default = 1.0)
-M <lower bound for minimum support> The lower bound for the minimum support. (default = 0.1)
-D <delta for minimum support> The delta by which the minimum support is decreased in each iteration. (default = 0.05)
-S Find all rules that meet the lower bound on minimum support and the minimum metric constraint. Turning this mode on will disable the iterative support reduction procedure to find the specified number of rules.
-transactions <comma separated list of attribute names> Only consider transactions that contain these items (default = no restriction)
-rules <comma separated list of attribute names> Only print rules that contain these items. (default = no restriction)
-use-or Use OR instead of AND for must contain list(s). Use in conjunction with -transactions and/or -rules
- Version:
- $Revision: 15519 $
- Author:
- Mark Hall (mhall{[at]}pentaho{[dot]}com)
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
buildAssociations
(Instances data) Method that generates all large item sets with a minimum support, and from these all association rules with a minimum metric (i.e.boolean
Returns true if this AssociationRulesProducer can actually produce rules.Returns the tip text for this propertyTip text for this property suitable for displaying in the GUI.static List<AssociationRule>
generateRulesBruteForce
(weka.associations.FPGrowth.FrequentItemSets largeItemSets, DefaultAssociationRule.METRIC_TYPE metricToUse, double metricThreshold, int upperBoundMinSuppAsInstances, int lowerBoundMinSuppAsInstances, int totalTransactions) Generate all association rules, from the supplied frequet item sets, that meet a given minimum metric threshold.Gets the list of mined association rules.Returns default capabilities of the classifier.double
getDelta()
Get the value of delta.boolean
Get whether all rules meeting the lower bound on min support and the minimum metric threshold are to be found.double
Get the value of lowerBoundMinSupport.int
Gets the maximum number of items to be included in large item sets.Get the metric type to use.double
Get the value of minConfidence.int
Get the number of rules to find.String[]
Gets the current settings of the classifier.int
Get the index of the attribute value to consider as positive for binary attributes in normal dense instances.Returns the revision string.String[]
Gets a list of the names of the metrics output for each rule.Get the comma separated list of items that rules must contain in order to be output.Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.Gets the comma separated list of items that transactions must contain in order to be considered for large item sets and rules.double
Get the value of upperBoundMinSupport.boolean
Gets whether OR is to be used rather than AND when considering must contain lists.Returns a string describing this associatorgraph
(weka.associations.FPGrowth.FPTreeRoot tree) Assemble a dot graph representation of the FP-tree.Returns an enumeration describing the available options.Returns the tip text for this propertystatic void
Main method.Tip text for this property suitable for displaying in the GUI.Tip text for this property suitable for displaying in the GUI.Returns the tip text for this propertyTip text for this property suitable for displaying in the GUI.Tip text for this property suitable for displaying in the GUI.static List<AssociationRule>
pruneRules
(List<AssociationRule> rulesToPrune, ArrayList<Item> itemsToConsider, boolean useOr) void
Reset all options to their default values.Returns the tip text for this propertyvoid
setDelta
(double v) Set the value of delta.void
setFindAllRulesForSupportLevel
(boolean s) If true then turn off the iterative support reduction method of finding x rules that meet the minimum support and metric thresholds and just return all the rules that meet the lower bound on minimum support and the minimum metric.void
setLowerBoundMinSupport
(double v) Set the value of lowerBoundMinSupport.void
setMaxNumberOfItems
(int max) Set the maximum number of items to include in large items sets.void
Set the metric type to use.void
setMinMetric
(double v) Set the value of minConfidence.void
setNumRulesToFind
(int numR) Set the desired number of rules to find.void
setOffDiskReportingFrequency
(int freq) Set how often to report some progress when the data is being read incrementally off of the disk rather than loaded into memory.void
setOptions
(String[] options) Parses a given list of options.void
setPositiveIndex
(int index) Set the index of the attribute value to consider as positive for binary attributes in normal dense instances.void
setRulesMustContain
(String list) Set the comma separated list of items that rules must contain in order to be output.void
Set the comma separated list of items that transactions must contain in order to be considered for large item sets and rules.void
setUpperBoundMinSupport
(double v) Set the value of upperBoundMinSupport.void
setUseORForMustContainList
(boolean b) Set whether to use OR rather than AND when considering must contain lists.toString()
Output the association rules.Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyMethods inherited from class weka.associations.AbstractAssociator
doNotCheckCapabilitiesTipText, forName, getDoNotCheckCapabilities, makeCopies, makeCopy, postExecution, preExecution, run, runAssociator, setDoNotCheckCapabilities
-
Constructor Details
-
FPGrowth
public FPGrowth()Construct a new FPGrowth object.
-
-
Method Details
-
generateRulesBruteForce
public static List<AssociationRule> generateRulesBruteForce(weka.associations.FPGrowth.FrequentItemSets largeItemSets, DefaultAssociationRule.METRIC_TYPE metricToUse, double metricThreshold, int upperBoundMinSuppAsInstances, int lowerBoundMinSuppAsInstances, int totalTransactions) Generate all association rules, from the supplied frequet item sets, that meet a given minimum metric threshold. Uses a brute force approach.- Parameters:
largeItemSets
- the set of frequent item setsmetricToUse
- the metric to usemetricThreshold
- the threshold value that a rule must meetupperBoundMinSuppAsInstances
- the upper bound on the support in order to accept the rulelowerBoundMinSuppAsInstances
- the lower bound on the support in order to accept the ruletotalTransactions
- the total number of transactions in the data- Returns:
- a list of association rules
-
pruneRules
public static List<AssociationRule> pruneRules(List<AssociationRule> rulesToPrune, ArrayList<Item> itemsToConsider, boolean useOr) -
getCapabilities
Returns default capabilities of the classifier.- Specified by:
getCapabilities
in interfaceAssociator
- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classAbstractAssociator
- Returns:
- the capabilities of this classifier
- See Also:
-
globalInfo
Returns a string describing this associator- Returns:
- a description of the evaluator suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
resetOptions
public void resetOptions()Reset all options to their default values. -
positiveIndexTipText
Tip text for this property suitable for displaying in the GUI.- Returns:
- the tip text for this property.
-
setPositiveIndex
public void setPositiveIndex(int index) Set the index of the attribute value to consider as positive for binary attributes in normal dense instances. Index 1 is always used for sparse instances.- Parameters:
index
- the index to use for positive values in binary attributes.
-
getPositiveIndex
public int getPositiveIndex()Get the index of the attribute value to consider as positive for binary attributes in normal dense instances. Index 1 is always used for sparse instances.- Returns:
- the index to use for positive values in binary attributes.
-
setNumRulesToFind
public void setNumRulesToFind(int numR) Set the desired number of rules to find.- Parameters:
numR
- the number of rules to find.
-
getNumRulesToFind
public int getNumRulesToFind()Get the number of rules to find.- Returns:
- the number of rules to find.
-
numRulesToFindTipText
Tip text for this property suitable for displaying in the GUI.- Returns:
- the tip text for this property.
-
setMetricType
Set the metric type to use.- Parameters:
d
- the metric type
-
setMaxNumberOfItems
public void setMaxNumberOfItems(int max) Set the maximum number of items to include in large items sets.- Parameters:
max
- the maxim number of items to include in large item sets.
-
getMaxNumberOfItems
public int getMaxNumberOfItems()Gets the maximum number of items to be included in large item sets.- Returns:
- the maximum number of items to be included in large items sets.
-
maxNumberOfItemsTipText
Tip text for this property suitable for displaying in the GUI.- Returns:
- the tip text for this property.
-
getMetricType
Get the metric type to use.- Returns:
- the metric type to use.
-
metricTypeTipText
Tip text for this property suitable for displaying in the GUI.- Returns:
- the tip text for this property.
-
minMetricTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMinMetric
public double getMinMetric()Get the value of minConfidence.- Returns:
- Value of minConfidence.
-
setMinMetric
public void setMinMetric(double v) Set the value of minConfidence.- Parameters:
v
- Value to assign to minConfidence.
-
transactionsMustContainTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setTransactionsMustContain
Set the comma separated list of items that transactions must contain in order to be considered for large item sets and rules.- Parameters:
list
- a comma separated list of items (empty string indicates no restriction on the transactions).
-
getTransactionsMustContain
Gets the comma separated list of items that transactions must contain in order to be considered for large item sets and rules.- Returns:
- return the comma separated list of items that transactions must contain.
-
rulesMustContainTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setRulesMustContain
Set the comma separated list of items that rules must contain in order to be output.- Parameters:
list
- a comma separated list of items (empty string indicates no restriction on the rules).
-
getRulesMustContain
Get the comma separated list of items that rules must contain in order to be output.- Returns:
- the comma separated list of items that rules must contain in order to be output.
-
useORForMustContainListTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setUseORForMustContainList
public void setUseORForMustContainList(boolean b) Set whether to use OR rather than AND when considering must contain lists.- Parameters:
b
- true if OR should be used instead of AND when considering transaction and rules must contain lists.
-
getUseORForMustContainList
public boolean getUseORForMustContainList()Gets whether OR is to be used rather than AND when considering must contain lists.- Returns:
- true if OR is used instead of AND.
-
deltaTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying, in the explorer/experimenter gui
-
getDelta
public double getDelta()Get the value of delta.- Returns:
- Value of delta.
-
setDelta
public void setDelta(double v) Set the value of delta.- Parameters:
v
- Value to assign to delta.
-
lowerBoundMinSupportTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getLowerBoundMinSupport
public double getLowerBoundMinSupport()Get the value of lowerBoundMinSupport.- Returns:
- Value of lowerBoundMinSupport.
-
setLowerBoundMinSupport
public void setLowerBoundMinSupport(double v) Set the value of lowerBoundMinSupport.- Parameters:
v
- Value to assign to lowerBoundMinSupport.
-
upperBoundMinSupportTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getUpperBoundMinSupport
public double getUpperBoundMinSupport()Get the value of upperBoundMinSupport.- Returns:
- Value of upperBoundMinSupport.
-
setUpperBoundMinSupport
public void setUpperBoundMinSupport(double v) Set the value of upperBoundMinSupport.- Parameters:
v
- Value to assign to upperBoundMinSupport.
-
findAllRulesForSupportLevelTipText
Tip text for this property suitable for displaying in the GUI.- Returns:
- the tip text for this property.
-
setFindAllRulesForSupportLevel
public void setFindAllRulesForSupportLevel(boolean s) If true then turn off the iterative support reduction method of finding x rules that meet the minimum support and metric thresholds and just return all the rules that meet the lower bound on minimum support and the minimum metric.- Parameters:
s
- true if all rules meeting the lower bound on the support and minimum metric thresholds are to be found.
-
getFindAllRulesForSupportLevel
public boolean getFindAllRulesForSupportLevel()Get whether all rules meeting the lower bound on min support and the minimum metric threshold are to be found.- Returns:
- true if all rules meeting the lower bound on min support and the min metric threshold are to be found.
-
setOffDiskReportingFrequency
public void setOffDiskReportingFrequency(int freq) Set how often to report some progress when the data is being read incrementally off of the disk rather than loaded into memory.- Parameters:
freq
- the frequency to print progress.
-
getAssociationRules
Gets the list of mined association rules.- Specified by:
getAssociationRules
in interfaceAssociationRulesProducer
- Returns:
- the list of association rules discovered during mining. Returns null if mining hasn't been performed yet.
-
getRuleMetricNames
Gets a list of the names of the metrics output for each rule. This list should be the same (in terms of the names and order thereof) as that produced by AssociationRule.getMetricNamesForRule().- Specified by:
getRuleMetricNames
in interfaceAssociationRulesProducer
- Returns:
- an array of the names of the metrics available for each rule learned by this producer.
-
canProduceRules
public boolean canProduceRules()Returns true if this AssociationRulesProducer can actually produce rules. Most implementing classes will always return true from this method (obviously :-)). However, an implementing class that actually acts as a wrapper around things that may or may not implement AssociationRulesProducer will want to return false if the thing they wrap can't produce rules.- Specified by:
canProduceRules
in interfaceAssociationRulesProducer
- Returns:
- true if this producer can produce rules in its current configuration
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classAbstractAssociator
- Returns:
- an enumeration of all the available options.
-
setOptions
Parses a given list of options. Valid options are:-P <attribute index of positive value> Set the index of the attribute value to consider as 'positive' for binary attributes in normal dense instances. Index 2 is always used for sparse instances. (default = 2)
-I <max items> The maximum number of items to include in large items sets (and rules). (default = -1, i.e. no limit.)
-N <require number of rules> The required number of rules. (default = 10)
-T <0=confidence | 1=lift | 2=leverage | 3=Conviction> The metric by which to rank rules. (default = confidence)
-C <minimum metric score of a rule> The minimum metric score of a rule. (default = 0.9)
-U <upper bound for minimum support> Upper bound for minimum support. (default = 1.0)
-M <lower bound for minimum support> The lower bound for the minimum support. (default = 0.1)
-D <delta for minimum support> The delta by which the minimum support is decreased in each iteration. (default = 0.05)
-S Find all rules that meet the lower bound on minimum support and the minimum metric constraint. Turning this mode on will disable the iterative support reduction procedure to find the specified number of rules.
-transactions <comma separated list of attribute names> Only consider transactions that contain these items (default = no restriction)
-rules <comma separated list of attribute names> Only print rules that contain these items. (default = no restriction)
-use-or Use OR instead of AND for must contain list(s). Use in conjunction with -transactions and/or -rules
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classAbstractAssociator
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of the classifier.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classAbstractAssociator
- Returns:
- an array of strings suitable for passing to setOptions
-
buildAssociations
Method that generates all large item sets with a minimum support, and from these all association rules with a minimum metric (i.e. confidence, lift etc.).- Specified by:
buildAssociations
in interfaceAssociator
- Parameters:
data
- the instances to be used for generating the associations- Throws:
Exception
- if rules can't be built successfully
-
toString
Output the association rules. -
graph
Assemble a dot graph representation of the FP-tree.- Parameters:
tree
- the root of the FP-tree- Returns:
- a graph representation as a String in dot format.
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classAbstractAssociator
- Returns:
- the revision
-
main
Main method.- Parameters:
args
- the commandline options
-