org.apache.stanbol.commons.opennlp
Class PosTypeChunker

java.lang.Object
  extended by org.apache.stanbol.commons.opennlp.PosTypeChunker

Deprecated. replaced by STANBOL-733 (stanbol nlp processing module

public class PosTypeChunker
extends Object

Simple version of a Chunker that uses the POS tags to build chunks. It does not implement the Chunker interface because implementing methods other than the Chunker.chunkAsSpans(String[], String[]) is not feasible.

Defaults are based on the Penn Treebank tag set TODO:

Author:
Rupert Westenthaler

Constructor Summary
PosTypeChunker(Set<String> buildPosTypes, Set<String> followPosTypes, double minPosProb)
          Deprecated. Initialise a new PosTypeChunker for the parsed POS tag collections.
 
Method Summary
 opennlp.tools.util.Span[] chunkAsSpans(String[] tokens, String[] tags)
          Deprecated. Build the chunks based on the parsed tokens and POS tags.
 opennlp.tools.util.Span[] chunkAsSpans(String[] tokens, String[][] tags, double[][] props)
          Deprecated. Build the chunks based on the parsed tokens and the one or more detected POS tags alternatives for the tokens.
 Set<String> getChunkPosTypes()
          Deprecated. The set of POS types used to create Chunks
 Set<String> getFollowedPosTypes()
          Deprecated. The set of POS types followed to extend Chunks.
static PosTypeChunker getInstance(String lang, double minPosTagProbaility)
          Deprecated. Creates an instance for the given language based on the configuration within the PosTagsCollectionEnum.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PosTypeChunker

public PosTypeChunker(Set<String> buildPosTypes,
                      Set<String> followPosTypes,
                      double minPosProb)
Deprecated. 
Initialise a new PosTypeChunker for the parsed POS tag collections. This Constructor can be used if no predefined Configuration for a given language is available in the PosTagsCollectionEnum

Note that buildPosTypes are added to the followed once. Therefore the followPosTypes may or may not include some/all buildPosTypes.

Parameters:
buildPosTypes - the POS types that trigger a new Chunk (MUST NOT be null nor empty).
followPosTypes - additional POS types followed to extend Chunks (MAY BE null or empty).
Method Detail

getInstance

public static PosTypeChunker getInstance(String lang,
                                         double minPosTagProbaility)
Deprecated. 
Creates an instance for the given language based on the configuration within the PosTagsCollectionEnum.

Parameters:
lang - The language
minPosTagProbaility - The minimum probability of a POS tag so that it is processed. In case of lower Probabilities POS tags are ignored and assumed to be matching.
Returns:
the instance or null if no configuration for the parsed language is present in the PosTagsCollectionEnum.

getFollowedPosTypes

public final Set<String> getFollowedPosTypes()
Deprecated. 
The set of POS types followed to extend Chunks. This includes the getChunkPosTypes() values

Returns:
the followTypes

getChunkPosTypes

public final Set<String> getChunkPosTypes()
Deprecated. 
The set of POS types used to create Chunks

Returns:
the buildTypes

chunkAsSpans

public opennlp.tools.util.Span[] chunkAsSpans(String[] tokens,
                                              String[] tags)
Deprecated. 
Build the chunks based on the parsed tokens and POS tags.

This method is the equivalent to Chunker.chunkAsSpans(String[], String[])

Parameters:
tokens - the tokens
tags - the POS tags for the tokens
Returns:
the chunks as spans over the parsed tokens

chunkAsSpans

public opennlp.tools.util.Span[] chunkAsSpans(String[] tokens,
                                              String[][] tags,
                                              double[][] props)
Deprecated. 
Build the chunks based on the parsed tokens and the one or more detected POS tags alternatives for the tokens.

Parameters:
tokens - the tokens
tags - the POS tags for the tokens (1D:tokens; 2D:POS tags)
Returns:
the chunks as spans over the parsed tokens


Copyright © 2010-2013 The Apache Software Foundation. All Rights Reserved.