Package anl.aida.reader

Interface Summary
AIDAComponentReader Interface for classes that are components in a MIFSCompositeReader.
ContentReader Interface for classes that read / parse content from a url and return a ReaderResult.
DocumentProcessor Interface for classes that can process lines of text, such as those extracted from an html page.
LineParser Interface for classes that can parse a line of text.
ProMedPreProcessor.Streamer  
ReaderConstants Constants used by the various readers.
ReaderResultProcessor Processes a ReaderResult is someone implementor specific way.
 

Class Summary
AbstractAIDAComponentReader Abstract implementation of AIDAComponentReader that processes a ReaderResult into a CAS by iterating over a "standard" index and retreiving the results.
AIDACompositeReader UIMA CollectionReader reader that delegates the actual reading of documents etc.
AllAfricaHTMLReader Reads content from AllAfrica web pages.
CachedContentReader ContentReader that reads cached content.
ChicagoTribuneHTMLReader Scrapes article content from archived Chicago Tribune pages.
ChicagoTribuneIndexMaker  
ChicagoTribuneIndexMaker.ArchiveMaker  
ChicagoTribuneReader AIDAComponentReader implementation for reading Chicago Tribune web articles.
ContentComponentReader Abstract class that will builds an AIDAComponentReader around a ContentReader implementation.
CTArchiveReader Deprecated. Index file has changed so will not work
ExtractStrings Catch all class for testing some string extraction from web pages.
ExtractStrings.PreBean  
GuardianHTMLReader  
GuardianIndexMaker  
GuardianIndexMaker.ArchiveMaker  
GuardianReader  
IndependentHTMLReader Scrapes article content from The Independent.
IndependentIndexMaker  
IndependentIndexMaker.ArchiveMaker  
IndependentReader AIDAComponentReader implementation for reading CNNExpansion web entries.
LATimesHTMLReader  
LATimesIndexMaker  
LATimesIndexMaker.ArchiveMaker  
LATimesReader  
NTARCArchiveReader Deprecated. Index file has changed so will not work
NTARCHtmlReader Reads the html from an NTARC article page.
NTARCIndexMaker  
NTARCIndexMaker.ArchiveMaker  
NTARCReader AIDAComponentReader implementation for reading NTARC web entries.
NYTEntry An article entry as represent by the JSON returned from the NYT article search API.This class is automatically instantiated and filled using Gson -- the JSON to java converter.
NYTimesArchiveReader Quick and dirty class to read nytimes_archive_index.txt and read / parse each article in the index using the NYTimesHTMLReader.
NYTimesHTMLReader Scrapes article content from NYTimes pages.
NYTimesHTMLReader.NullProcessor  
NYTimesReader AIDAComponentReader implementation for reading NYTimes web entries.
NYTIndexMaker Quick and dirty code to create a list of articles from NYT search API results.
NYTSearchResult An search result as represented by the JSON returned from the NYT article search API.This class is automatically instantiated and filled using Gson -- the JSON to java converter.
ParsedAuthors Encapsulates author info as returned by a document reader.
ParsedDate Encapsulates a Date metadata retrieved from a corpus document.
PMTests  
PMXMLReader Date: Jan 20, 2009 3:43:32 PM
PMXMLReader.Author  
PMXMLReader.DateRep  
PreTextExtractor  
PreTextExtractor.DocProc  
PreTextExtractor.PreBean  
ProMedIndexMaker Makes a standard index file from PRobo
ProMedIndexMaker.ArchiveMaker  
ProMedPreProcessor Preprocesses promed mail alerts, dividing them up if more than one "document" is in the mail text.
ProMedReader AIDAComponentReader implementation for reading archived ProMed files.
ProMedReaderTest  
ProMedTxtReader Reads Pro Med email txt for Date, Title content etc.
PubMedReader CollectionReaders for the pub med abstracts in xml format.
ReaderResult The result of parsing a document with a reader.
ReaderResultBuilder  
ReaderTests  
ReaderUtilities Reader related utility methods.
ResultCacher Result processor that will cache the contents of the the result in a file and create an index entry for that.
ScraperTests Tests of the web scrapers.
ScraperTests.RRProcessor  
StringExtractor Extracts lines of text and links from a URL and passes them to a DocumentProcessor.
TimesAPI  
 

Enum Summary
ReaderResult.Error