anl.aida.reader
Class AllAfricaHTMLReader

java.lang.Object
  extended by anl.aida.reader.AllAfricaHTMLReader
All Implemented Interfaces:
ContentReader

public class AllAfricaHTMLReader
extends java.lang.Object
implements ContentReader

Reads content from AllAfrica web pages. This assumes that web pages are the printable versions.


Nested Class Summary
private  class AllAfricaHTMLReader.Processor
           
 
Field Summary
private  java.lang.StringBuilder content
           
private  StringExtractor extractor
           
private  AllAfricaHTMLReader.Processor processor
           
private  java.lang.String title
           
 
Constructor Summary
AllAfricaHTMLReader()
           
 
Method Summary
 ReaderResult read(java.lang.String url, java.lang.String title, java.util.Date date, java.lang.String author)
          Reads the content from the url.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

extractor

private StringExtractor extractor

processor

private AllAfricaHTMLReader.Processor processor

content

private java.lang.StringBuilder content

title

private java.lang.String title
Constructor Detail

AllAfricaHTMLReader

public AllAfricaHTMLReader()
Method Detail

read

public ReaderResult read(java.lang.String url,
                         java.lang.String title,
                         java.util.Date date,
                         java.lang.String author)
                  throws java.io.IOException
Description copied from interface: ContentReader
Reads the content from the url. The title etc. are passed in and should be returned in the ReaderResult.

Specified by:
read in interface ContentReader
Parameters:
url - the url to read
title - the document title
date - the document'd ate
author - the author (can be empty string)
Returns:
the ReaderResult
Throws:
java.io.IOException - if there is an error reading the content