|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
Interface Summary | |
---|---|
AIDAComponentReader | Interface for classes that are components in a MIFSCompositeReader. |
ContentReader | Interface for classes that read / parse content from a url and return a ReaderResult. |
DocumentProcessor | Interface for classes that can process lines of text, such as those extracted from an html page. |
LineParser | Interface for classes that can parse a line of text. |
ProMedPreProcessor.Streamer | |
ReaderConstants | Constants used by the various readers. |
ReaderResultProcessor | Processes a ReaderResult is someone implementor
specific way. |
Class Summary | |
---|---|
AbstractAIDAComponentReader | Abstract implementation of AIDAComponentReader that processes a ReaderResult into a CAS by iterating over a "standard" index and retreiving the results. |
AIDACompositeReader | UIMA CollectionReader reader that delegates the actual reading of documents etc. |
AllAfricaHTMLReader | Reads content from AllAfrica web pages. |
CachedContentReader | ContentReader that reads cached content. |
ChicagoTribuneHTMLReader | Scrapes article content from archived Chicago Tribune pages. |
ChicagoTribuneIndexMaker | |
ChicagoTribuneIndexMaker.ArchiveMaker | |
ChicagoTribuneReader | AIDAComponentReader implementation for reading Chicago Tribune web articles. |
ContentComponentReader | Abstract class that will builds an AIDAComponentReader around a ContentReader implementation. |
CTArchiveReader | Deprecated. Index file has changed so will not work |
ExtractStrings | Catch all class for testing some string extraction from web pages. |
ExtractStrings.PreBean | |
GuardianHTMLReader | |
GuardianIndexMaker | |
GuardianIndexMaker.ArchiveMaker | |
GuardianReader | |
IndependentHTMLReader | Scrapes article content from The Independent. |
IndependentIndexMaker | |
IndependentIndexMaker.ArchiveMaker | |
IndependentReader | AIDAComponentReader implementation for reading CNNExpansion web entries. |
LATimesHTMLReader | |
LATimesIndexMaker | |
LATimesIndexMaker.ArchiveMaker | |
LATimesReader | |
NTARCArchiveReader | Deprecated. Index file has changed so will not work |
NTARCHtmlReader | Reads the html from an NTARC article page. |
NTARCIndexMaker | |
NTARCIndexMaker.ArchiveMaker | |
NTARCReader | AIDAComponentReader implementation for reading NTARC web entries. |
NYTEntry | An article entry as represent by the JSON returned from the NYT article search API.This class is automatically instantiated and filled using Gson -- the JSON to java converter. |
NYTimesArchiveReader | Quick and dirty class to read nytimes_archive_index.txt and read / parse each article in the index using the NYTimesHTMLReader. |
NYTimesHTMLReader | Scrapes article content from NYTimes pages. |
NYTimesHTMLReader.NullProcessor | |
NYTimesReader | AIDAComponentReader implementation for reading NYTimes web entries. |
NYTIndexMaker | Quick and dirty code to create a list of articles from NYT search API results. |
NYTSearchResult | An search result as represented by the JSON returned from the NYT article search API.This class is automatically instantiated and filled using Gson -- the JSON to java converter. |
ParsedAuthors | Encapsulates author info as returned by a document reader. |
ParsedDate | Encapsulates a Date metadata retrieved from a corpus document. |
PMTests | |
PMXMLReader | Date: Jan 20, 2009 3:43:32 PM |
PMXMLReader.Author | |
PMXMLReader.DateRep | |
PreTextExtractor | |
PreTextExtractor.DocProc | |
PreTextExtractor.PreBean | |
ProMedIndexMaker | Makes a standard index file from PRobo |
ProMedIndexMaker.ArchiveMaker | |
ProMedPreProcessor | Preprocesses promed mail alerts, dividing them up if more than one "document" is in the mail text. |
ProMedReader | AIDAComponentReader implementation for reading archived ProMed files. |
ProMedReaderTest | |
ProMedTxtReader | Reads Pro Med email txt for Date, Title content etc. |
PubMedReader | CollectionReaders for the pub med abstracts in xml format. |
ReaderResult | The result of parsing a document with a reader. |
ReaderResultBuilder | |
ReaderTests | |
ReaderUtilities | Reader related utility methods. |
ResultCacher | Result processor that will cache the contents of the the result in a file and create an index entry for that. |
ScraperTests | Tests of the web scrapers. |
ScraperTests.RRProcessor | |
StringExtractor | Extracts lines of text and links from a URL and passes them to a DocumentProcessor. |
TimesAPI |
Enum Summary | |
---|---|
ReaderResult.Error |
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |