org.apache.stanbol.commons.opennlp
Enum PosTagsCollectionEnum

java.lang.Object
  extended by java.lang.Enum<PosTagsCollectionEnum>
      extended by org.apache.stanbol.commons.opennlp.PosTagsCollectionEnum
All Implemented Interfaces:
java.io.Serializable, java.lang.Comparable<PosTagsCollectionEnum>

public enum PosTagsCollectionEnum
extends java.lang.Enum<PosTagsCollectionEnum>

Enumeration with pre-configured sets of POS tags for finding nouns, verbs ... in different languages

Author:
Rupert Westenthaler

Enum Constant Summary
DA_FOLLOW
          POS types that are followd to extend chunks for Danish based on the PAROLE Tagset as described by this paper
DA_NOUN
          POS types representing Nouns for Danish based on the PAROLE Tagset as described by this paper
DA_VERB
          POS types representing Verbs for Danish based on the PAROLE Tagset as described by this paper
DE_FOLLOW
          POS types one needs typically to follow to build TextAnalyzer.AnalysedText.Chunks over Nouns (e.g.
DE_NOUN
          Noun related POS types for German based on the STTS Tag Set
DE_VERB
          Verb related POS types for German based on the STTS Tag Set
EN_FOLLOW
          POS types one needs typically to follow to build TextAnalyzer.AnalysedText.Chunks over Nouns (e.g.
EN_NOUN
          Nouns related POS types for English based on the Penn Treebank tag set.
EN_VERB
          Verb related POS types for English based on the Penn Treebank tag set
NL_FOLLOW
          POS types followed to build Chunks based on the WOTAN tagset for Dutch (as used with Mbt).
NL_NOUN
          POS types for Nouns based on the WOTAN tagset for Dutch (as used with Mbt).
NL_VERB
          POS types for Verbs based on the WOTAN tagset for Dutch (as used with Mbt).
PT_FOLLOW
          POS types followed to build Chunks based on the PALAVRAS tag set for Portuguese.
PT_NOUN
          POS types for Nouns based on the PALAVRAS tag set for Portuguese.
PT_VERB
          POS types for Verbs based on the PALAVRAS tag set for Portuguese.
SV_FOLLOW
          POS types followed to build Chunks based on the TODO
SV_NOUN
          POS types for Nouns for Swedish language based on Lexical categories in MAMBA NOTE: This includes all typical noun categories as defined by MAMBA Unclassifiable part-of-speech and Numerical "RO" EN is excluded
SV_VERB
          POS types for Verbs of the Swedish language based on the Lexical categories in MAMBA
 
Method Summary
 java.lang.String getLanguage()
           
static java.util.Set<java.lang.String> getPosTagCollection(java.lang.String lang, PosTypeCollectionType type)
          Getter for the POS (Part-of-Speech) tag collection for the given language and type
 java.util.Set<java.lang.String> getTags()
          Getter for the set of POS tags
 PosTypeCollectionType getType()
           
static PosTagsCollectionEnum valueOf(java.lang.String name)
          Returns the enum constant of this type with the specified name.
static PosTagsCollectionEnum[] values()
          Returns an array containing the constants of this enum type, in the order they are declared.
 
Methods inherited from class java.lang.Enum
clone, compareTo, equals, finalize, getDeclaringClass, hashCode, name, ordinal, toString, valueOf
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Enum Constant Detail

EN_NOUN

public static final PosTagsCollectionEnum EN_NOUN
Nouns related POS types for English based on the Penn Treebank tag set.

NOTE the "``" tag is also added as noun, because it can not be found in the official tag set and is sometimes used to tag nouns.


EN_VERB

public static final PosTagsCollectionEnum EN_VERB
Verb related POS types for English based on the Penn Treebank tag set


EN_FOLLOW

public static final PosTagsCollectionEnum EN_FOLLOW
POS types one needs typically to follow to build TextAnalyzer.AnalysedText.Chunks over Nouns (e.g. "University_NN of_IN Otago_NNP" or "Geneva_NNP ,_, Ohio_NNP"). For English and based on the Penn Treebank tag set


DE_NOUN

public static final PosTagsCollectionEnum DE_NOUN
Noun related POS types for German based on the STTS Tag Set


DE_VERB

public static final PosTagsCollectionEnum DE_VERB
Verb related POS types for German based on the STTS Tag Set


DE_FOLLOW

public static final PosTagsCollectionEnum DE_FOLLOW
POS types one needs typically to follow to build TextAnalyzer.AnalysedText.Chunks over Nouns (e.g. "University_NN of_IN Otago_NNP" or "Geneva_NNP ,_, Ohio_NNP"). For German based on the STTS Tag Set


DA_NOUN

public static final PosTagsCollectionEnum DA_NOUN
POS types representing Nouns for Danish based on the PAROLE Tagset as described by this paper

TODO: Someone who speaks Danish should check this List NOTES:


DA_VERB

public static final PosTagsCollectionEnum DA_VERB
POS types representing Verbs for Danish based on the PAROLE Tagset as described by this paper

TODO: Someone who speaks Danish should check this List


DA_FOLLOW

public static final PosTagsCollectionEnum DA_FOLLOW
POS types that are followd to extend chunks for Danish based on the PAROLE Tagset as described by this paper

TODO: Someone who speaks Danish should check this List

NOTES:


PT_NOUN

public static final PosTagsCollectionEnum PT_NOUN
POS types for Nouns based on the PALAVRAS tag set for Portuguese.

TODO: Someone who speaks this language should check this List

NOTES: Currently this includes nouns, proper nouns and numbers. In addition I added "vp". "vp" is not part of the POS tag set documentation but in the training set there is a single occurrence therefore the POS tagger sometimes do tag words with this tag.


PT_VERB

public static final PosTagsCollectionEnum PT_VERB
POS types for Verbs based on the PALAVRAS tag set for Portuguese.

TODO: Someone who speaks this language should check this List


PT_FOLLOW

public static final PosTagsCollectionEnum PT_FOLLOW
POS types followed to build Chunks based on the PALAVRAS tag set for Portuguese.

TODO: Someone who speaks this language should check this List

NOTES: Currently this pubctations and prepositions.


NL_NOUN

public static final PosTagsCollectionEnum NL_NOUN
POS types for Nouns based on the WOTAN tagset for Dutch (as used with Mbt).

TODOO: Someone who speaks this language should checkthis List

NOTES: This includes now Nouns, Numbers and "others".


NL_VERB

public static final PosTagsCollectionEnum NL_VERB
POS types for Verbs based on the WOTAN tagset for Dutch (as used with Mbt).

The tagger does not distinguish the different forms fo verbs. Therefore it is enough so include "V"


NL_FOLLOW

public static final PosTagsCollectionEnum NL_FOLLOW
POS types followed to build Chunks based on the WOTAN tagset for Dutch (as used with Mbt).

NOTES: THis includes only prepositions and punctuations


SV_NOUN

public static final PosTagsCollectionEnum SV_NOUN
POS types for Nouns for Swedish language based on Lexical categories in MAMBA NOTE:


SV_VERB

public static final PosTagsCollectionEnum SV_VERB
POS types for Verbs of the Swedish language based on the Lexical categories in MAMBA


SV_FOLLOW

public static final PosTagsCollectionEnum SV_FOLLOW
POS types followed to build Chunks based on the TODO

NOTES: this includes prepositions, Part of idiom, Infinitive marker as well as all kinds of punctuations

Method Detail

values

public static PosTagsCollectionEnum[] values()
Returns an array containing the constants of this enum type, in the order they are declared. This method may be used to iterate over the constants as follows:
for (PosTagsCollectionEnum c : PosTagsCollectionEnum.values())
    System.out.println(c);

Returns:
an array containing the constants of this enum type, in the order they are declared

valueOf

public static PosTagsCollectionEnum valueOf(java.lang.String name)
Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)

Parameters:
name - the name of the enum constant to be returned.
Returns:
the enum constant with the specified name
Throws:
java.lang.IllegalArgumentException - if this enum type has no constant with the specified name
java.lang.NullPointerException - if the argument is null

getTags

public final java.util.Set<java.lang.String> getTags()
Getter for the set of POS tags

Returns:
the tags

getLanguage

public final java.lang.String getLanguage()
Returns:
the language

getType

public final PosTypeCollectionType getType()
Returns:
the type

getPosTagCollection

public static java.util.Set<java.lang.String> getPosTagCollection(java.lang.String lang,
                                                                  PosTypeCollectionType type)
Getter for the POS (Part-of-Speech) tag collection for the given language and type

Parameters:
lang - the language
type - the type
Returns:
the collection or null if no configuration for the parsed parameters is available.


Copyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.