Class FarthestFirst

All Implemented Interfaces:
Serializable, Cloneable, Clusterer, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class FarthestFirst extends RandomizableClusterer implements TechnicalInformationHandler
Cluster data using the FarthestFirst algorithm.

For more information see:

Hochbaum, Shmoys (1985). A best possible heuristic for the k-center problem. Mathematics of Operations Research. 10(2):180-184.

Sanjoy Dasgupta: Performance Guarantees for Hierarchical Clustering. In: 15th Annual Conference on Computational Learning Theory, 351-363, 2002.

Notes:
- works as a fast simple approximate clusterer
- modelled after SimpleKMeans, might be a useful initializer for it

BibTeX:

 @article{Hochbaum1985,
    author = {Hochbaum and Shmoys},
    journal = {Mathematics of Operations Research},
    number = {2},
    pages = {180-184},
    title = {A best possible heuristic for the k-center problem},
    volume = {10},
    year = {1985}
 }
 
 @inproceedings{Dasgupta2002,
    author = {Sanjoy Dasgupta},
    booktitle = {15th Annual Conference on Computational Learning Theory},
    pages = {351-363},
    publisher = {Springer},
    title = {Performance Guarantees for Hierarchical Clustering},
    year = {2002}
 }
 

Valid options are:

 -N <num>
  number of clusters. (default = 2).
 -S <num>
  Random number seed.
  (default 1)
Version:
$Revision: 15519 $
Author:
Bernhard Pfahringer (bernhard@cs.waikato.ac.nz)
See Also:
  • Constructor Details

    • FarthestFirst

      public FarthestFirst()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this clusterer
      Returns:
      a description of the evaluator suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Returns:
      the technical information about this class
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the clusterer.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Specified by:
      getCapabilities in interface Clusterer
      Overrides:
      getCapabilities in class AbstractClusterer
      Returns:
      the capabilities of this clusterer
      See Also:
    • buildClusterer

      public void buildClusterer(Instances data) throws Exception
      Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.
      Specified by:
      buildClusterer in interface Clusterer
      Specified by:
      buildClusterer in class AbstractClusterer
      Parameters:
      data - set of instances serving as training data
      Throws:
      Exception - if the clusterer has not been generated successfully
    • clusterInstance

      public int clusterInstance(Instance instance) throws Exception
      Classifies a given instance.
      Specified by:
      clusterInstance in interface Clusterer
      Overrides:
      clusterInstance in class AbstractClusterer
      Parameters:
      instance - the instance to be assigned to a cluster
      Returns:
      the number of the assigned cluster as an integer if the class is enumerated, otherwise the predicted value
      Throws:
      Exception - if instance could not be classified successfully
    • numberOfClusters

      public int numberOfClusters() throws Exception
      Returns the number of clusters.
      Specified by:
      numberOfClusters in interface Clusterer
      Specified by:
      numberOfClusters in class AbstractClusterer
      Returns:
      the number of clusters generated for a training dataset.
      Throws:
      Exception - if number of clusters could not be returned successfully
    • getClusterCentroids

      public Instances getClusterCentroids()
      Get the centroids found by FarthestFirst
      Returns:
      the centroids found by FarthestFirst
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class RandomizableClusterer
      Returns:
      an enumeration of all the available options.
    • numClustersTipText

      public String numClustersTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumClusters

      public void setNumClusters(int n) throws Exception
      set the number of clusters to generate
      Parameters:
      n - the number of clusters to generate
      Throws:
      Exception - if number of clusters is negative
    • getNumClusters

      public int getNumClusters()
      gets the number of clusters to generate
      Returns:
      the number of clusters to generate
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -N <num>
        number of clusters. (default = 2).
       -S <num>
        Random number seed.
        (default 1)
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class RandomizableClusterer
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of FarthestFirst
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class RandomizableClusterer
      Returns:
      an array of strings suitable for passing to setOptions()
    • toString

      public String toString()
      return a string describing this clusterer
      Overrides:
      toString in class Object
      Returns:
      a description of the clusterer as a string
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class AbstractClusterer
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      Parameters:
      argv - should contain the following arguments:

      -t training file [-N number of clusters]