Class SpreadSubsample

java.lang.Object
weka.filters.Filter
weka.filters.supervised.instance.SpreadSubsample
All Implemented Interfaces:
Serializable, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, Randomizable, RevisionHandler, WeightedAttributesHandler, SupervisedFilter

public class SpreadSubsample extends Filter implements SupervisedFilter, OptionHandler, Randomizable, WeightedAttributesHandler
Produces a random subsample of a dataset. The original dataset must fit entirely in memory. This filter allows you to specify the maximum "spread" between the rarest and most common class. For example, you may specify that there be at most a 2:1 difference in class frequencies. When used in batch mode, subsequent batches are NOT resampled.

Valid options are:

 -S <num>
  Specify the random number seed (default 1)
 
 -M <num>
  The maximum class distribution spread.
  0 = no maximum spread, 1 = uniform distribution, 10 = allow at most
  a 10:1 ratio between the classes (default 0)
 
 -W
  Adjust weights so that total weight per class is maintained.
  Individual instance weighting is not preserved. (default no
  weights adjustment
 
 -X <num>
  The maximum count for any class value (default 0 = unlimited).
 
Version:
$Revision: 14508 $
Author:
Stuart Inglis (stuart@reeltwo.com)
See Also:
  • Constructor Details

    • SpreadSubsample

      public SpreadSubsample()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this filter
      Returns:
      a description of the filter suitable for displaying in the explorer/experimenter gui
    • adjustWeightsTipText

      public String adjustWeightsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getAdjustWeights

      public boolean getAdjustWeights()
      Returns true if instance weights will be adjusted to maintain total weight per class.
      Returns:
      true if instance weights will be adjusted to maintain total weight per class.
    • setAdjustWeights

      public void setAdjustWeights(boolean newAdjustWeights)
      Sets whether the instance weights will be adjusted to maintain total weight per class.
      Parameters:
      newAdjustWeights - whether to adjust weights
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class Filter
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -S <num>
        Specify the random number seed (default 1)
       
       -M <num>
        The maximum class distribution spread.
        0 = no maximum spread, 1 = uniform distribution, 10 = allow at most
        a 10:1 ratio between the classes (default 0)
       
       -W
        Adjust weights so that total weight per class is maintained.
        Individual instance weighting is not preserved. (default no
        weights adjustment
       
       -X <num>
        The maximum count for any class value (default 0 = unlimited).
       
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class Filter
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the filter.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class Filter
      Returns:
      an array of strings suitable for passing to setOptions
    • distributionSpreadTipText

      public String distributionSpreadTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDistributionSpread

      public void setDistributionSpread(double spread)
      Sets the value for the distribution spread
      Parameters:
      spread - the new distribution spread
    • getDistributionSpread

      public double getDistributionSpread()
      Gets the value for the distribution spread
      Returns:
      the distribution spread
    • maxCountTipText

      public String maxCountTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMaxCount

      public void setMaxCount(double maxcount)
      Sets the value for the max count
      Parameters:
      maxcount - the new max count
    • getMaxCount

      public double getMaxCount()
      Gets the value for the max count
      Returns:
      the max count
    • randomSeedTipText

      public String randomSeedTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getRandomSeed

      public int getRandomSeed()
      Gets the random number seed.
      Returns:
      the random number seed.
    • setRandomSeed

      public void setRandomSeed(int newSeed)
      Sets the random number seed.
      Parameters:
      newSeed - the new random number seed.
    • setSeed

      @ProgrammaticProperty public void setSeed(int seed)
      Description copied from interface: Randomizable
      Set the seed for random number generation.
      Specified by:
      setSeed in interface Randomizable
      Parameters:
      seed - the seed
    • getSeed

      @ProgrammaticProperty public int getSeed()
      Description copied from interface: Randomizable
      Gets the seed for the random number generations
      Specified by:
      getSeed in interface Randomizable
      Returns:
      the seed for the random number generation
    • getCapabilities

      public Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class Filter
      Returns:
      the capabilities of this object
      See Also:
    • setInputFormat

      public boolean setInputFormat(Instances instanceInfo) throws Exception
      Sets the format of the input instances.
      Overrides:
      setInputFormat in class Filter
      Parameters:
      instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
      Returns:
      true if the outputFormat may be collected immediately
      Throws:
      UnassignedClassException - if no class attribute has been set.
      UnsupportedClassTypeException - if the class attribute is not nominal.
      Exception - if the inputFormat can't be set successfully
    • input

      public boolean input(Instance instance)
      Input an instance for filtering. Filter requires all training instances be read before producing output.
      Overrides:
      input in class Filter
      Parameters:
      instance - the input instance
      Returns:
      true if the filtered instance may now be collected with output().
      Throws:
      IllegalStateException - if no input structure has been defined
    • batchFinished

      public boolean batchFinished()
      Signify that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances.
      Overrides:
      batchFinished in class Filter
      Returns:
      true if there are instances pending output
      Throws:
      IllegalStateException - if no input structure has been defined
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Filter
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      Parameters:
      argv - should contain arguments to the filter: use -h for help