anl.aida.reader
Class AllAfricaHTMLReader
java.lang.Object
anl.aida.reader.AllAfricaHTMLReader
- All Implemented Interfaces:
- ContentReader
public class AllAfricaHTMLReader
- extends java.lang.Object
- implements ContentReader
Reads content from AllAfrica web pages. This assumes
that web pages are the printable versions.
Method Summary |
ReaderResult |
read(java.lang.String url,
java.lang.String title,
java.util.Date date,
java.lang.String author)
Reads the content from the url. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
extractor
private StringExtractor extractor
processor
private AllAfricaHTMLReader.Processor processor
content
private java.lang.StringBuilder content
title
private java.lang.String title
AllAfricaHTMLReader
public AllAfricaHTMLReader()
read
public ReaderResult read(java.lang.String url,
java.lang.String title,
java.util.Date date,
java.lang.String author)
throws java.io.IOException
- Description copied from interface:
ContentReader
- Reads the content from the url. The title
etc. are passed in and should be returned in
the ReaderResult.
- Specified by:
read
in interface ContentReader
- Parameters:
url
- the url to readtitle
- the document titledate
- the document'd ateauthor
- the author (can be empty string)
- Returns:
- the ReaderResult
- Throws:
java.io.IOException
- if there is an error reading the content