anl.aida.reader
Class LATimesHTMLReader
java.lang.Object
anl.aida.reader.LATimesHTMLReader
- All Implemented Interfaces:
- ContentReader
public class LATimesHTMLReader
- extends java.lang.Object
- implements ContentReader
Method Summary |
ReaderResult |
read(java.lang.String url,
java.lang.String title,
java.util.Date date,
java.lang.String author)
Reads the text from the specified URL. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
extractor
private StringExtractor extractor
processor
private DocumentProcessor processor
content
private java.lang.StringBuilder content
result
private ReaderResult result
LATimesHTMLReader
public LATimesHTMLReader()
read
public ReaderResult read(java.lang.String url,
java.lang.String title,
java.util.Date date,
java.lang.String author)
throws java.io.IOException
- Reads the text from the specified URL. The title and date are expected to
be provided from another source, e.g. an RSS feed entry or an archive
"index."
- Specified by:
read
in interface ContentReader
- Parameters:
url
- the url to read fromtitle
- the article titledate
- the date of the articleauthor
- the author (can be empty string)
- Returns:
- the result of reading and parsing the text at the url
- Throws:
java.io.IOException
- if there is an error while reading