Package weka.associations
Class Apriori
java.lang.Object
weka.associations.AbstractAssociator
weka.associations.Apriori
- All Implemented Interfaces:
Serializable
,Cloneable
,AssociationRulesProducer
,Associator
,CARuleMiner
,CapabilitiesHandler
,CapabilitiesIgnorer
,CommandlineRunnable
,OptionHandler
,RevisionHandler
,TechnicalInformationHandler
public class Apriori
extends AbstractAssociator
implements OptionHandler, AssociationRulesProducer, CARuleMiner, TechnicalInformationHandler
Class implementing an Apriori-type algorithm.
Iteratively reduces the minimum support until it finds the required number of
rules with the given minimum confidence.
The algorithm has an option to mine class association rules. It is adapted as explained in the second reference.
For more information see:
R. Agrawal, R. Srikant: Fast Algorithms for Mining Association Rules in Large Databases. In: 20th International Conference on Very Large Data Bases, 478-499, 1994.
Bing Liu, Wynne Hsu, Yiming Ma: Integrating Classification and Association Rule Mining. In: Fourth International Conference on Knowledge Discovery and Data Mining, 80-86, 1998. BibTeX:
The algorithm has an option to mine class association rules. It is adapted as explained in the second reference.
For more information see:
R. Agrawal, R. Srikant: Fast Algorithms for Mining Association Rules in Large Databases. In: 20th International Conference on Very Large Data Bases, 478-499, 1994.
Bing Liu, Wynne Hsu, Yiming Ma: Integrating Classification and Association Rule Mining. In: Fourth International Conference on Knowledge Discovery and Data Mining, 80-86, 1998. BibTeX:
@inproceedings{Agrawal1994, author = {R. Agrawal and R. Srikant}, booktitle = {20th International Conference on Very Large Data Bases}, pages = {478-499}, publisher = {Morgan Kaufmann, Los Altos, CA}, title = {Fast Algorithms for Mining Association Rules in Large Databases}, year = {1994} } @inproceedings{Liu1998, author = {Bing Liu and Wynne Hsu and Yiming Ma}, booktitle = {Fourth International Conference on Knowledge Discovery and Data Mining}, pages = {80-86}, publisher = {AAAI Press}, title = {Integrating Classification and Association Rule Mining}, year = {1998} }Valid options are:
-N <required number of rules output> The required number of rules. (default = 10)
-T <0=confidence | 1=lift | 2=leverage | 3=Conviction> The metric type by which to rank rules. (default = confidence)
-C <minimum metric score of a rule> The minimum confidence of a rule. (default = 0.9)
-D <delta for minimum support> The delta by which the minimum support is decreased in each iteration. (default = 0.05)
-U <upper bound for minimum support> Upper bound for minimum support. (default = 1.0)
-M <lower bound for minimum support> The lower bound for the minimum support. (default = 0.1)
-S <significance level> If used, rules are tested for significance at the given level. Slower. (default = no significance testing)
-I If set the itemsets found are also output. (default = no)
-R Remove columns that contain all missing values (default = no)
-V Report progress iteratively. (default = no)
-A If set class association rules are mined. (default = no)
-Z Treat zero (i.e. first value of nominal attributes) as missing
-B <toString delimiters> If used, two characters to use as rule delimiters in the result of toString: the first to delimit fields, the second to delimit items within fields. (default = traditional toString result)
-c <the class index> The class index. (default = last)
- Version:
- $Revision: 15519 $
- Author:
- Eibe Frank (eibe@cs.waikato.ac.nz), Mark Hall (mhall@cs.waikato.ac.nz), Stefan Mutter (mutter@cs.waikato.ac.nz)
- See Also:
-
Field Summary
-
Constructor Summary
ConstructorDescriptionApriori()
Constructor that allows to sets default values for the minimum confidence and the maximum number of rules the minimum confidence. -
Method Summary
Modifier and TypeMethodDescriptionvoid
buildAssociations
(Instances instances) Method that generates all large itemsets with a minimum support, and from these all association rules with a minimum confidence.boolean
Returns true if this AssociationRulesProducer can actually produce rules.Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyreturns all the rulesGets the list of mined association rules.Returns default capabilities of the classifier.boolean
getCar()
Gets whether class association ruels are minedint
Gets the class indexdouble
getDelta()
Get the value of delta.Gets the instances without the class atrribute.Gets only the class attribute of the instances.double
Get the value of lowerBoundMinSupport.Get the metric typedouble
Get the value of minConfidence.int
Get the value of numRules.String[]
Gets the current settings of the Apriori object.boolean
Gets whether itemsets are output as wellboolean
Returns whether columns containing all missing values are to be removedReturns the revision string.String[]
Gets a list of the names of the metrics output for each rule.double
Get the value of significanceLevel.Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.boolean
Gets whether zeros (i.e.double
Get the value of upperBoundMinSupport.boolean
Gets whether algorithm is run in verbose modeReturns a string describing this associatorReturns an enumeration describing the available options.Returns the tip text for this propertystatic void
Main method.Returns the metric string for the chosen metric typeReturns the tip text for this propertyMethod that mines all class association rules with minimum support and with a minimum confidence.Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyvoid
Resets the options to the default values.void
setCar
(boolean flag) Sets class association rule miningvoid
setClassIndex
(int index) Sets the class indexvoid
setDelta
(double v) Set the value of delta.void
setLowerBoundMinSupport
(double v) Set the value of lowerBoundMinSupport.void
Set the metric type for ranking rulesvoid
setMinMetric
(double v) Set the value of minConfidence.void
setNumRules
(int v) Set the value of numRules.void
setOptions
(String[] options) Parses a given list of options.void
setOutputItemSets
(boolean flag) Sets whether itemsets are output as wellvoid
setRemoveAllMissingCols
(boolean r) Remove columns containing all missing values.void
setSignificanceLevel
(double v) Set the value of significanceLevel.void
setTreatZeroAsMissing
(boolean z) Sets whether zeros (i.e.void
setUpperBoundMinSupport
(double v) Set the value of upperBoundMinSupport.void
setVerbose
(boolean flag) Sets verbose modeReturns the tip text for this propertytoString()
Outputs the size of all the generated sets of itemsets and the rules.Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyMethods inherited from class weka.associations.AbstractAssociator
doNotCheckCapabilitiesTipText, forName, getDoNotCheckCapabilities, makeCopies, makeCopy, postExecution, preExecution, run, runAssociator, setDoNotCheckCapabilities
-
Field Details
-
TAGS_SELECTION
Metric types.
-
-
Constructor Details
-
Apriori
public Apriori()Constructor that allows to sets default values for the minimum confidence and the maximum number of rules the minimum confidence.
-
-
Method Details
-
globalInfo
Returns a string describing this associator- Returns:
- a description of the evaluator suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
resetOptions
public void resetOptions()Resets the options to the default values. -
getCapabilities
Returns default capabilities of the classifier.- Specified by:
getCapabilities
in interfaceAssociator
- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classAbstractAssociator
- Returns:
- the capabilities of this classifier
- See Also:
-
buildAssociations
Method that generates all large itemsets with a minimum support, and from these all association rules with a minimum confidence.- Specified by:
buildAssociations
in interfaceAssociator
- Parameters:
instances
- the instances to be used for generating the associations- Throws:
Exception
- if rules can't be built successfully
-
mineCARs
Method that mines all class association rules with minimum support and with a minimum confidence.- Specified by:
mineCARs
in interfaceCARuleMiner
- Parameters:
data
- the instances for which class association rules should be mined- Returns:
- an sorted array of FastVector (confidence depended) containing the rules and metric information
- Throws:
Exception
- if rules can't be built successfully
-
getInstancesNoClass
Gets the instances without the class atrribute.- Specified by:
getInstancesNoClass
in interfaceCARuleMiner
- Returns:
- the instances without the class attribute.
-
getInstancesOnlyClass
Gets only the class attribute of the instances.- Specified by:
getInstancesOnlyClass
in interfaceCARuleMiner
- Returns:
- the class attribute of all instances.
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classAbstractAssociator
- Returns:
- an enumeration of all the available options.
-
setOptions
Parses a given list of options. Valid options are:-N <required number of rules output> The required number of rules. (default = 10)
-T <0=confidence | 1=lift | 2=leverage | 3=Conviction> The metric type by which to rank rules. (default = confidence)
-C <minimum metric score of a rule> The minimum confidence of a rule. (default = 0.9)
-D <delta for minimum support> The delta by which the minimum support is decreased in each iteration. (default = 0.05)
-U <upper bound for minimum support> Upper bound for minimum support. (default = 1.0)
-M <lower bound for minimum support> The lower bound for the minimum support. (default = 0.1)
-S <significance level> If used, rules are tested for significance at the given level. Slower. (default = no significance testing)
-I If set the itemsets found are also output. (default = no)
-R Remove columns that contain all missing values (default = no)
-V Report progress iteratively. (default = no)
-A If set class association rules are mined. (default = no)
-Z Treat zero (i.e. first value of nominal attributes) as missing
-B <toString delimiters> If used, two characters to use as rule delimiters in the result of toString: the first to delimit fields, the second to delimit items within fields. (default = traditional toString result)
-c <the class index> The class index. (default = last)
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classAbstractAssociator
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of the Apriori object.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classAbstractAssociator
- Returns:
- an array of strings suitable for passing to setOptions
-
toString
Outputs the size of all the generated sets of itemsets and the rules. -
metricString
Returns the metric string for the chosen metric type- Specified by:
metricString
in interfaceCARuleMiner
- Returns:
- a string describing the used metric for the interestingness of a class association rule
-
removeAllMissingColsTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setRemoveAllMissingCols
public void setRemoveAllMissingCols(boolean r) Remove columns containing all missing values.- Parameters:
r
- true if cols are to be removed.
-
getRemoveAllMissingCols
public boolean getRemoveAllMissingCols()Returns whether columns containing all missing values are to be removed- Returns:
- true if columns are to be removed.
-
upperBoundMinSupportTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getUpperBoundMinSupport
public double getUpperBoundMinSupport()Get the value of upperBoundMinSupport.- Returns:
- Value of upperBoundMinSupport.
-
setUpperBoundMinSupport
public void setUpperBoundMinSupport(double v) Set the value of upperBoundMinSupport.- Parameters:
v
- Value to assign to upperBoundMinSupport.
-
setClassIndex
public void setClassIndex(int index) Sets the class index- Specified by:
setClassIndex
in interfaceCARuleMiner
- Parameters:
index
- the class index
-
getClassIndex
public int getClassIndex()Gets the class index- Returns:
- the index of the class attribute
-
classIndexTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setCar
public void setCar(boolean flag) Sets class association rule mining- Parameters:
flag
- if class association rules are mined, false otherwise
-
getCar
public boolean getCar()Gets whether class association ruels are mined- Returns:
- true if class association rules are mined, false otherwise
-
carTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
lowerBoundMinSupportTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getLowerBoundMinSupport
public double getLowerBoundMinSupport()Get the value of lowerBoundMinSupport.- Returns:
- Value of lowerBoundMinSupport.
-
setLowerBoundMinSupport
public void setLowerBoundMinSupport(double v) Set the value of lowerBoundMinSupport.- Parameters:
v
- Value to assign to lowerBoundMinSupport.
-
getMetricType
Get the metric type- Returns:
- the type of metric to use for ranking rules
-
metricTypeTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMetricType
Set the metric type for ranking rules- Parameters:
d
- the type of metric
-
minMetricTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMinMetric
public double getMinMetric()Get the value of minConfidence.- Returns:
- Value of minConfidence.
-
setMinMetric
public void setMinMetric(double v) Set the value of minConfidence.- Parameters:
v
- Value to assign to minConfidence.
-
numRulesTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getNumRules
public int getNumRules()Get the value of numRules.- Returns:
- Value of numRules.
-
setNumRules
public void setNumRules(int v) Set the value of numRules.- Parameters:
v
- Value to assign to numRules.
-
deltaTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getDelta
public double getDelta()Get the value of delta.- Returns:
- Value of delta.
-
setDelta
public void setDelta(double v) Set the value of delta.- Parameters:
v
- Value to assign to delta.
-
significanceLevelTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getSignificanceLevel
public double getSignificanceLevel()Get the value of significanceLevel.- Returns:
- Value of significanceLevel.
-
setSignificanceLevel
public void setSignificanceLevel(double v) Set the value of significanceLevel.- Parameters:
v
- Value to assign to significanceLevel.
-
setOutputItemSets
public void setOutputItemSets(boolean flag) Sets whether itemsets are output as well- Parameters:
flag
- true if itemsets are to be output as well
-
getOutputItemSets
public boolean getOutputItemSets()Gets whether itemsets are output as well- Returns:
- true if itemsets are output as well
-
outputItemSetsTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setVerbose
public void setVerbose(boolean flag) Sets verbose mode- Parameters:
flag
- true if algorithm should be run in verbose mode
-
getVerbose
public boolean getVerbose()Gets whether algorithm is run in verbose mode- Returns:
- true if algorithm is run in verbose mode
-
verboseTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
treatZeroAsMissingTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setTreatZeroAsMissing
public void setTreatZeroAsMissing(boolean z) Sets whether zeros (i.e. the first value of a nominal attribute) should be treated as missing values.- Parameters:
z
- true if zeros should be treated as missing values.
-
getTreatZeroAsMissing
public boolean getTreatZeroAsMissing()Gets whether zeros (i.e. the first value of a nominal attribute) is to be treated int he same way as missing values.- Returns:
- true if zeros are to be treated like missing values.
-
getAllTheRules
returns all the rules- Returns:
- all the rules
- See Also:
-
m_allTheRules
-
getAssociationRules
Description copied from interface:AssociationRulesProducer
Gets the list of mined association rules.- Specified by:
getAssociationRules
in interfaceAssociationRulesProducer
- Returns:
- the list of association rules discovered during mining. Returns null if mining hasn't been performed yet.
-
getRuleMetricNames
Gets a list of the names of the metrics output for each rule. This list should be the same (in terms of the names and order thereof) as that produced by AssociationRule.getMetricNamesForRule().- Specified by:
getRuleMetricNames
in interfaceAssociationRulesProducer
- Returns:
- an array of the names of the metrics available for each rule learned by this producer.
-
canProduceRules
public boolean canProduceRules()Returns true if this AssociationRulesProducer can actually produce rules. Most implementing classes will always return true from this method (obviously :-)). However, an implementing class that actually acts as a wrapper around things that may or may not implement AssociationRulesProducer will want to return false if the thing they wrap can't produce rules.- Specified by:
canProduceRules
in interfaceAssociationRulesProducer
- Returns:
- true if this producer can produce rules in its current configuration
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classAbstractAssociator
- Returns:
- the revision
-
main
Main method.- Parameters:
args
- the commandline options
-