anl.aida.reader.local
Class DailyGateHTMLReader

java.lang.Object
  extended by anl.aida.reader.local.DailyGateHTMLReader
All Implemented Interfaces:
ContentReader

public class DailyGateHTMLReader
extends java.lang.Object
implements ContentReader


Nested Class Summary
private  class DailyGateHTMLReader.Processor
           
 
Field Summary
private  java.lang.StringBuilder content
           
private  StringExtractor extractor
           
protected  java.lang.String parse
           
private  DocumentProcessor processor
           
private  ReaderResult result
           
protected  java.lang.String source
           
protected  java.lang.String stop
           
 
Constructor Summary
DailyGateHTMLReader(java.lang.String source, java.lang.String stop, java.lang.String parse)
           
 
Method Summary
 ReaderResult read(java.lang.String url, java.lang.String title, java.util.Date date, java.lang.String author)
          Reads the text from the specified URL.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

source

protected java.lang.String source

parse

protected java.lang.String parse

stop

protected java.lang.String stop

extractor

private StringExtractor extractor

processor

private DocumentProcessor processor

content

private java.lang.StringBuilder content

result

private ReaderResult result
Constructor Detail

DailyGateHTMLReader

public DailyGateHTMLReader(java.lang.String source,
                           java.lang.String stop,
                           java.lang.String parse)
Method Detail

read

public ReaderResult read(java.lang.String url,
                         java.lang.String title,
                         java.util.Date date,
                         java.lang.String author)
                  throws java.io.IOException
Reads the text from the specified URL. The title and date are expected to be provided from another source, e.g. an RSS feed entry or an archive "index."

Specified by:
read in interface ContentReader
Parameters:
url - the url to read from
title - the article title
date - the date of the article
author - the author (can be empty string)
Returns:
the result of reading and parsing the text at the url
Throws:
java.io.IOException - if there is an error while reading