org.apache.oodt.cas.metadata.util
Class MimeTypeUtils

java.lang.Object
  extended by org.apache.oodt.cas.metadata.util.MimeTypeUtils

public final class MimeTypeUtils
extends Object

Author:
mattmann, bfoster

This is a facade class to insulate CAS Metadata from its underlying Mime Type substrate library, Apache Tika. Any mime handling code should be placed in this utility class, and hidden from the CAS Metadata classes that rely on it.


Field Summary
static String MIME_FILE_RES_PATH
           
 
Constructor Summary
MimeTypeUtils()
           
MimeTypeUtils(InputStream mimeIs, boolean magic)
           
MimeTypeUtils(String filePath)
           
MimeTypeUtils(String filePath, boolean magic)
           
 
Method Summary
 String autoResolveContentType(String url, byte[] data)
          Same as autoResolveContentType(String, String, byte[]), but this method passes null as the initial type.
 String autoResolveContentType(String typeName, String url, byte[] data)
          A facade interface to trying all the possible mime type resolution strategies available within Tika.
static String cleanMimeType(String origType)
          Cleans a MimeType name by removing out the actual MimeType, from a string of the form:
 String getDescriptionForMimeType(String mimeType)
           
 String getMimeType(File f)
          Facade interface to Tika's underlying MimeTypes.getMimeType(File) method.
 String getMimeType(String name)
          A facade interface to Tika's underlying MimeTypes.forName(String) method.
 String getMimeType(URL url)
          Facade interface to Tika's underlying MimeTypes.getMimeType(String) method.
 String getMimeTypeByMagic(byte[] data)
          Utility method to act as a facade to MimeTypes.getMimeType(byte[]).
 String getSuperTypeForMimeType(String mimeType)
           
 boolean isMimeMagic()
           
static byte[] readMagicHeader(InputStream stream)
           
static byte[] readMagicHeader(InputStream stream, int headerByteSize)
           
 void setMimeMagic(boolean mimeMagic)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MIME_FILE_RES_PATH

public static final String MIME_FILE_RES_PATH
See Also:
Constant Field Values
Constructor Detail

MimeTypeUtils

public MimeTypeUtils()

MimeTypeUtils

public MimeTypeUtils(String filePath)
              throws FileNotFoundException
Throws:
FileNotFoundException

MimeTypeUtils

public MimeTypeUtils(String filePath,
                     boolean magic)
              throws FileNotFoundException
Throws:
FileNotFoundException

MimeTypeUtils

public MimeTypeUtils(InputStream mimeIs,
                     boolean magic)
Method Detail

cleanMimeType

public static String cleanMimeType(String origType)
Cleans a MimeType name by removing out the actual MimeType, from a string of the form:
           <primary type>/<sub type> ; < optional params
 

Parameters:
origType - The original mime type string to be cleaned.
Returns:
The primary type, and subtype, concatenated, e.g., the actual mime type.

autoResolveContentType

public String autoResolveContentType(String url,
                                     byte[] data)
Same as autoResolveContentType(String, String, byte[]), but this method passes null as the initial type.

Parameters:
url - The String URL to use to check glob patterns.
data - The byte data to potentially use in magic detection.
Returns:
The String MimeType.

autoResolveContentType

public String autoResolveContentType(String typeName,
                                     String url,
                                     byte[] data)
A facade interface to trying all the possible mime type resolution strategies available within Tika. First, the mime type provided in typeName is cleaned, with cleanMimeType(String). Then the cleaned mime type is looked up in the underlying Tika MimeTypes registry, by its cleaned name. If the MimeType is found, then that mime type is used, otherwise URL resolution is used to try and determine the mime type. If that means is unsuccessful, and if mime.type.magic is enabled in NutchConfiguration, then mime type magic resolution is used to try and obtain a better-than-the-default approximation of the MimeType.

Parameters:
typeName - The original mime type, returned from a ProtocolOutput.
url - The given URL, that Nutch was trying to crawl.
data - The byte data, returned from the crawl, if any.
Returns:
The correctly, automatically guessed MimeType name.

getMimeType

public String getMimeType(URL url)
Facade interface to Tika's underlying MimeTypes.getMimeType(String) method.

Parameters:
url - A string representation of the document URL to sense the MimeType for.
Returns:
An appropriate MimeType, identified from the given Document url in string form.

getMimeType

public String getMimeType(String name)
A facade interface to Tika's underlying MimeTypes.forName(String) method.

Parameters:
name - The name of a valid MimeType in the Tika mime registry.
Returns:
The object representation of the MimeType, if it exists, or null otherwise.

getMimeType

public String getMimeType(File f)
Facade interface to Tika's underlying MimeTypes.getMimeType(File) method.

Parameters:
f - The File to sense the MimeType for.
Returns:
The MimeType of the given File, or null if it cannot be determined.

getMimeTypeByMagic

public String getMimeTypeByMagic(byte[] data)
Utility method to act as a facade to MimeTypes.getMimeType(byte[]).

Parameters:
data - The byte data to get the MimeType for.
Returns:
The String representation of the resolved MimeType, or null if a suitable MimeType is not found.

getDescriptionForMimeType

public String getDescriptionForMimeType(String mimeType)

getSuperTypeForMimeType

public String getSuperTypeForMimeType(String mimeType)

isMimeMagic

public boolean isMimeMagic()
Returns:
the mimeMagic

setMimeMagic

public void setMimeMagic(boolean mimeMagic)
Parameters:
mimeMagic - the mimeMagic to set

readMagicHeader

public static byte[] readMagicHeader(InputStream stream)
                              throws IOException
Throws:
IOException

readMagicHeader

public static byte[] readMagicHeader(InputStream stream,
                                     int headerByteSize)
                              throws IOException
Throws:
IOException


Copyright © 1999-2011 Apache Incubator. All Rights Reserved.