anl.aida.reader
Class ProMedReader

java.lang.Object
  extended by anl.aida.reader.AbstractAIDAComponentReader
      extended by anl.aida.reader.ProMedReader
All Implemented Interfaces:
AIDAComponentReader

public class ProMedReader
extends AbstractAIDAComponentReader

AIDAComponentReader implementation for reading archived ProMed files. Archived means that files are saved in some directory as individual files rather than some email format. The names of the files should be ProMedMail_*.txt where * is some number. This will process all the files in the directory.


Field Summary
private  java.util.Set<java.lang.String> badTitleWords
           
private  java.util.Date date
           
private  PreTextExtractor extractor
           
private  java.io.Reader in
           
static java.lang.String INDEX_FILE
           
private  ProMedPreProcessor proc
           
private  ProMedTxtReader reader
           
 
Fields inherited from class anl.aida.reader.AbstractAIDAComponentReader
indexIter, lineItems, location, processors, startDate
 
Fields inherited from interface anl.aida.reader.AIDAComponentReader
MESSAGE_DIGEST
 
Constructor Summary
ProMedReader()
           
 
Method Summary
 void close()
          Closes this MIFSComponentReader and any resources it may have opened.
protected  java.lang.String getDocumentURL()
          Gets the URL of the current document.
protected  java.lang.String getIndexFileKey()
          Gets the name of the parameter key for the index file.
protected  ReaderResult getNextResult()
          Gets the next ReaderResult.
 boolean hasNext()
          Gets whether or not this MIFSComponentReader has more documents to process.
private  void incrementFile()
           
 void initialize(org.apache.uima.resource.ConfigurableResource resource, java.util.Date cacheStartDate)
          Initializes this MIFSComponentReader, optionally using the resource.
private  boolean isUsefulContent(java.lang.String title)
           
private  void next()
           
protected  void postNext()
          Called at the completion of AbstractAIDAComponentReader.getNext(CAS)
 
Methods inherited from class anl.aida.reader.AbstractAIDAComponentReader
checkDate, getNext
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INDEX_FILE

public static final java.lang.String INDEX_FILE
See Also:
Constant Field Values

reader

private ProMedTxtReader reader

proc

private ProMedPreProcessor proc

in

private java.io.Reader in

extractor

private PreTextExtractor extractor

badTitleWords

private java.util.Set<java.lang.String> badTitleWords

date

private java.util.Date date
Constructor Detail

ProMedReader

public ProMedReader()
Method Detail

close

public void close()
           throws java.io.IOException
Description copied from interface: AIDAComponentReader
Closes this MIFSComponentReader and any resources it may have opened.

Specified by:
close in interface AIDAComponentReader
Overrides:
close in class AbstractAIDAComponentReader
Throws:
java.io.IOException - if there is an error closing the reader.

getNextResult

protected ReaderResult getNextResult()
                              throws java.io.IOException
Description copied from class: AbstractAIDAComponentReader
Gets the next ReaderResult.

Specified by:
getNextResult in class AbstractAIDAComponentReader
Returns:
the next ReaderResult.
Throws:
java.io.IOException - if there is an error getting the result

postNext

protected void postNext()
                 throws java.io.IOException
Description copied from class: AbstractAIDAComponentReader
Called at the completion of AbstractAIDAComponentReader.getNext(CAS)

Overrides:
postNext in class AbstractAIDAComponentReader
Throws:
java.io.IOException - if there is an error getting the result

getIndexFileKey

protected java.lang.String getIndexFileKey()
Description copied from class: AbstractAIDAComponentReader
Gets the name of the parameter key for the index file. The index file contains the links etc to read.

Specified by:
getIndexFileKey in class AbstractAIDAComponentReader
Returns:
the name of the parameter key for the index file.

initialize

public void initialize(org.apache.uima.resource.ConfigurableResource resource,
                       java.util.Date cacheStartDate)
                throws org.apache.uima.resource.ResourceInitializationException
Description copied from interface: AIDAComponentReader
Initializes this MIFSComponentReader, optionally using the resource.

Specified by:
initialize in interface AIDAComponentReader
Overrides:
initialize in class AbstractAIDAComponentReader
Parameters:
resource - the resource to use for configuration
Throws:
org.apache.uima.resource.ResourceInitializationException - if there is an error initializing the reader

next

private void next()
           throws java.io.IOException
Throws:
java.io.IOException

isUsefulContent

private boolean isUsefulContent(java.lang.String title)

incrementFile

private void incrementFile()
                    throws java.io.IOException
Throws:
java.io.IOException

hasNext

public boolean hasNext()
                throws java.io.IOException,
                       org.apache.uima.collection.CollectionException
Description copied from interface: AIDAComponentReader
Gets whether or not this MIFSComponentReader has more documents to process.

Specified by:
hasNext in interface AIDAComponentReader
Overrides:
hasNext in class AbstractAIDAComponentReader
Returns:
true if done, otherwise false.
Throws:
java.io.IOException - if there is an error in determining if there are more docs to process.
org.apache.uima.collection.CollectionException

getDocumentURL

protected java.lang.String getDocumentURL()
Description copied from class: AbstractAIDAComponentReader
Gets the URL of the current document.

Overrides:
getDocumentURL in class AbstractAIDAComponentReader
Returns:
the URL of the current document.