chemaxon.formats
Class MFileFormatUtil

java.lang.Object
  extended bychemaxon.formats.MFileFormatUtil

public class MFileFormatUtil
extends java.lang.Object

File format related utility functions.

Since:
Marvin 4.1, 12/15/2005
Version:
4.1.2, 10/03/2006
Author:
Peter Csizmadia, Szilard Dorant, Szilveszter Juhos

Field Summary
static int MULTISET
          The multi-molecule file really contains multiple atom sets of one molecule.
 
Constructor Summary
MFileFormatUtil()
           
 
Method Summary
static boolean canBe1LetterPeptide(java.lang.String s)
          Tests whether a string can be one-letter-abbreviated peptide name.
A valid name contains only uppercase letters.
static boolean canBe3LetterPeptide(java.lang.String s)
          Tests whether a string can be three-letter-abbreviated peptide name.
Each peptide's first letter must be uppercase and the other two are lowercase.
static boolean canBeAbbrevgroup(java.lang.String line)
          Tests whether a string can be in abbrevgroup format.
static boolean canBeBase64(java.lang.String line)
          Deprecated. as of Marvin 4.1, canBeBase64(String) must be used instead
static boolean canBeChime(java.lang.String s)
          Tests whether a string can be Chime (MDL compressed mol).
static boolean canBeJTF(java.lang.String line)
          Determines if a String is valid as the first line of a JTF file.
static boolean canBePDBRecord(java.lang.String recName)
          Checks if the given parameter is a PDB record name listed in PDB_RECORD_TYPES.
static boolean canBeSMARTS(java.lang.String s)
          Tests whether a string can be SMARTS.
static boolean canBeSMILES(java.lang.String s)
          Tests whether a string can be SMILES.
static java.lang.String[] getEncodingFromOptions(java.lang.String fmtopts)
          Gets the encoding that was explicitly given as an import option.
static java.lang.String getFileExtensionLC(java.io.File f)
          Gets the file extension in lower case.
static java.lang.String getFileExtensionLC(java.lang.String fname)
          Gets the file extension in lower case.
static java.lang.String[] getJTFFields(java.lang.String line)
          Gets fields delimited with {space} {tab} {;} {:} or {,}.
static java.lang.String getKnownExtension(java.lang.String fname)
          Returns the file extension if it is a known extension.
static java.lang.String[] getMolfileExtensions()
          Gets the array of known molecule file extensions.
static java.lang.String[] getMolfileFormats()
          Gets the array of known molecule file formats.
static java.lang.String getMostLikelyMolFormat(java.lang.String fname)
          Gets the most likey molecule file format from the file name extension.
static java.lang.String getUnguessableFormat(java.lang.String fname)
          Gets the file format from the file name extension for formats that are not guessable from the file content.
static boolean isOutputCleanable(java.lang.String fmt)
          Tests whether the specified output format is cleanable.
static boolean isSubFormatOf(java.lang.String f, java.lang.String other)
          Tests whether a format is a sub-format of another format.
static boolean isURLOrFileName(java.lang.String s)
          Tests whether the specified string is an URL (absolute or relative) or file name.
static int preprocessFormatAndOptions(java.lang.String[] fmtopts)
          Parses "MULTISET" like universal options.
static java.lang.String recognizeOneLineFormat(java.lang.String s)
          Recognize a one-line string as CxSMILES, CxSMARTS or AbbrevGroup.
static java.lang.String[] splitFileAndOptions(java.lang.String arg)
          Parses "file{options}" strings used in molecule file import.
static java.lang.String[] splitFormatAndOptions(java.lang.String opts)
          Parses "format:options" strings used in molecule file import and export.
static void testEncoding(java.lang.String enc)
          Tests whether the given charset name is supported by this JVM
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MULTISET

public static final int MULTISET
The multi-molecule file really contains multiple atom sets of one molecule.

See Also:
Constant Field Values
Constructor Detail

MFileFormatUtil

public MFileFormatUtil()
Method Detail

isURLOrFileName

public static boolean isURLOrFileName(java.lang.String s)
Tests whether the specified string is an URL (absolute or relative) or file name.

Parameters:
s - the string
Returns:
true if it is an URL or file name, false otherwise

isSubFormatOf

public static boolean isSubFormatOf(java.lang.String f,
                                    java.lang.String other)
Tests whether a format is a sub-format of another format.

Parameters:
f - the format codename
other - the other format
Since:
Marvin 4.1, 04/07/2006

canBeBase64

public static boolean canBeBase64(java.lang.String line)
Deprecated. as of Marvin 4.1, canBeBase64(String) must be used instead

Tests whether a string can be base64 encoded data.

Parameters:
line - the input string
Returns:
true if it can be, false if it can not be base 64

canBeChime

public static boolean canBeChime(java.lang.String s)
Tests whether a string can be Chime (MDL compressed mol).

Parameters:
s - the input string
Returns:
true if it can be, false if it cannot be Chime

canBeSMARTS

public static boolean canBeSMARTS(java.lang.String s)
Tests whether a string can be SMARTS.

Parameters:
s - the input string
Returns:
true if it can be, false if it cannot be SMARTS.

canBeSMILES

public static boolean canBeSMILES(java.lang.String s)
Tests whether a string can be SMILES.

Parameters:
s - the input string
Returns:
true if it can be, false if it cannot be SMILES

canBe1LetterPeptide

public static boolean canBe1LetterPeptide(java.lang.String s)
Tests whether a string can be one-letter-abbreviated peptide name.
A valid name contains only uppercase letters.

Parameters:
s - the input string
Returns:
true if it can be, false if it cannot be one-letter-abbreviated peptide name
Since:
Marvin 4.1

canBe3LetterPeptide

public static boolean canBe3LetterPeptide(java.lang.String s)
Tests whether a string can be three-letter-abbreviated peptide name.
Each peptide's first letter must be uppercase and the other two are lowercase.
e.g. ValAlaTyr

Parameters:
s - the input string
Returns:
true if it can be, false if it cannot be three-letter-abbreviated peptide name
Since:
Marvin 4.1

recognizeOneLineFormat

public static java.lang.String recognizeOneLineFormat(java.lang.String s)
Recognize a one-line string as CxSMILES, CxSMARTS or AbbrevGroup.

Parameters:
s - the input string
Returns:
the most probable format or null
Since:
Marvin 4.1, 04/06/2006

canBeAbbrevgroup

public static boolean canBeAbbrevgroup(java.lang.String line)
Tests whether a string can be in abbrevgroup format.

Parameters:
line - the input string
Returns:
true if it can be, false if it can not be abbgrevgroup

canBeJTF

public static boolean canBeJTF(java.lang.String line)
Determines if a String is valid as the first line of a JTF file.

Returns:
true, if the line can be a JTF header
Since:
Marvin 2.9.9

canBePDBRecord

public static boolean canBePDBRecord(java.lang.String recName)
Checks if the given parameter is a PDB record name listed in PDB_RECORD_TYPES.

Parameters:
recName - a potential PDB record name
Returns:
true, if the given parameter is a valid PDB record name

getJTFFields

public static java.lang.String[] getJTFFields(java.lang.String line)
Gets fields delimited with {space} {tab} {;} {:} or {,}. Fields are enclosed in "" or '' these can be mixed in a line, but must match for a single field. A valid line for example: "23";'345.45';"asdf asdf";;'CCC1CC1'

Returns:
the contents of the fields.

splitFileAndOptions

public static java.lang.String[] splitFileAndOptions(java.lang.String arg)
Parses "file{options}" strings used in molecule file import.

Parameters:
arg - string containing the filename and the options (if there are)
Returns:
a two-element array containing the filename and the options.

splitFormatAndOptions

public static java.lang.String[] splitFormatAndOptions(java.lang.String opts)
Parses "format:options" strings used in molecule file import and export. Examples:
 splitFormatAndOptions("xyz:f1.4") returns {"xyz", "f1.4"}
 splitFormatAndOptions("f1.4") returns {null, "f1.4"}
 splitFormatAndOptions("xyz:") returns {"xyz", ""}
 splitFormatAndOptions("gzip:xyz:f1.4") returns {"gzip", "xyz:f1.4"}
 
The semicolon can be omitted in case if Marvin's built-in input formats. Example:
 splitFormatAndOptions("xyz") returns { "xyz", ""}
 

Parameters:
opts - string containing the format and the options
Returns:
an array containing the format(s) and the options.

preprocessFormatAndOptions

public static int preprocessFormatAndOptions(java.lang.String[] fmtopts)
Parses "MULTISET" like universal options. Example:
 String[] fmtopts = splitFormatAndOptions("gzip:xyz:MULTISET,f1.4");
 // fmtopts == {"gzip", "xyz:MULTISET,f.14"}
 int result = preprocessFormatAndOptions(fmtopts);
 // fmtopts == {"gzip", "xyz:f.14"}, results == MULTISET
 

Parameters:
fmtopts - two-element array containing the format and the options
Returns:
flags corresponding to the options
See Also:
splitFormatAndOptions(java.lang.String)

getEncodingFromOptions

public static java.lang.String[] getEncodingFromOptions(java.lang.String fmtopts)
Gets the encoding that was explicitly given as an import option. The format is enc{name}, where name is a JAVA supported name of the charset.

Parameters:
fmtopts - the input format and options
Returns:
two element array, the first element is the encoding, the second contains the remaining import options.

testEncoding

public static void testEncoding(java.lang.String enc)
                         throws java.lang.IllegalArgumentException
Tests whether the given charset name is supported by this JVM

Parameters:
enc - the name of the charset
Throws:
java.lang.IllegalArgumentException

getUnguessableFormat

public static java.lang.String getUnguessableFormat(java.lang.String fname)
Gets the file format from the file name extension for formats that are not guessable from the file content. Used to distinguish SMARTS and SMILES.

Parameters:
fname - the filename
Returns:
the file format or null if the file contents can be used to recognize the format

getFileExtensionLC

public static java.lang.String getFileExtensionLC(java.io.File f)
Gets the file extension in lower case.

Parameters:
f - the file
Returns:
the extension in lower case

getFileExtensionLC

public static java.lang.String getFileExtensionLC(java.lang.String fname)
Gets the file extension in lower case.

Parameters:
fname - the filename
Returns:
the extension in lower case

getMostLikelyMolFormat

public static java.lang.String getMostLikelyMolFormat(java.lang.String fname)
Gets the most likey molecule file format from the file name extension.

Parameters:
fname - the filename
Returns:
the file format or null if the format cannot be determined from the file name

getKnownExtension

public static java.lang.String getKnownExtension(java.lang.String fname)
Returns the file extension if it is a known extension. Known extensions are the following: mrv t gz mol mol2 rgf rxn csmol csrgf csrxn sdf cssdf rdf smi smiles sma smarts cml xml xyz txt html htm cgi gif jpg jpeg msbmp png ppm svg svgz

Parameters:
fname - the filename
Returns:
the extension

getMolfileExtensions

public static java.lang.String[] getMolfileExtensions()
Gets the array of known molecule file extensions.

Returns:
the array of known molecule file extensions

getMolfileFormats

public static java.lang.String[] getMolfileFormats()
Gets the array of known molecule file formats.

Returns:
the array of known molecule file formats

isOutputCleanable

public static boolean isOutputCleanable(java.lang.String fmt)
                                 throws java.lang.SecurityException
Tests whether the specified output format is cleanable. For a non-cleanable output format, cleaning is meaningless because coordinates are not stored.

Parameters:
fmt - the format string
Returns:
true if the specified output format is non-cleanable, false otherwise
Throws:
java.lang.SecurityException
Since:
Marvin 4.1, 02/13/2006