anl.aida.reader.local
Class DailyGateHTMLReader
java.lang.Object
anl.aida.reader.local.DailyGateHTMLReader
- All Implemented Interfaces:
- ContentReader
public class DailyGateHTMLReader
- extends java.lang.Object
- implements ContentReader
Constructor Summary |
DailyGateHTMLReader(java.lang.String source,
java.lang.String stop,
java.lang.String parse)
|
Method Summary |
ReaderResult |
read(java.lang.String url,
java.lang.String title,
java.util.Date date,
java.lang.String author)
Reads the text from the specified URL. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
source
protected java.lang.String source
parse
protected java.lang.String parse
stop
protected java.lang.String stop
extractor
private StringExtractor extractor
processor
private DocumentProcessor processor
content
private java.lang.StringBuilder content
result
private ReaderResult result
DailyGateHTMLReader
public DailyGateHTMLReader(java.lang.String source,
java.lang.String stop,
java.lang.String parse)
read
public ReaderResult read(java.lang.String url,
java.lang.String title,
java.util.Date date,
java.lang.String author)
throws java.io.IOException
- Reads the text from the specified URL. The title and date are expected to
be provided from another source, e.g. an RSS feed entry or an archive
"index."
- Specified by:
read
in interface ContentReader
- Parameters:
url
- the url to read fromtitle
- the article titledate
- the date of the articleauthor
- the author (can be empty string)
- Returns:
- the result of reading and parsing the text at the url
- Throws:
java.io.IOException
- if there is an error while reading