to.etc.util
Class HtmlScanner

java.lang.Object
  extended by to.etc.util.HtmlScanner

public class HtmlScanner
extends java.lang.Object

This class helps one to scan HTML documents. It contains stuff to scan for tags, to decode attributes and the like.

Author:
Frits Jalvingh

Constructor Summary
HtmlScanner()
           
HtmlScanner(boolean unquote)
           
 
Method Summary
 void assignTo(HtmlScanner hs)
           
 boolean atEof()
           
 boolean atEof(int pos)
           
 HtmlScanner duplicate()
           
 boolean findMatchingEndTag(java.lang.String tag)
          Finds the first matching end tag at the CURRENT level.
 boolean findTag(java.lang.String name)
          This tries to find the tag specified starting from the current position in the document.
 java.lang.String getCurrentTag()
           
 java.lang.String getDocument()
           
 int getPos()
           
 int getStartPos()
           
 int inc()
           
 void moveTo(int ix)
           
 java.lang.String nextTag()
          This scans for the next tag starting at the current position.
 void reset()
           
 void setDocument(java.lang.String s)
           
 void skipTag()
           
 boolean tagIsTagEnd()
          Returns T if the current location contains the tag end > character.
 java.lang.String tagParseInit()
          This must be called with the current position on a tag.
 java.lang.String tagParseParamName()
          This tries to parse a parameter name from the current pos.
 java.lang.String tagParseValue()
          Can be called after tagParseParamname() returned a name.
 boolean tagToEnd()
          When called this parses the tag until the end of the tag is reached.
static java.lang.String unquote(java.lang.String s)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlScanner

public HtmlScanner()

HtmlScanner

public HtmlScanner(boolean unquote)
Method Detail

unquote

public static java.lang.String unquote(java.lang.String s)

duplicate

public HtmlScanner duplicate()

assignTo

public void assignTo(HtmlScanner hs)

setDocument

public void setDocument(java.lang.String s)

getDocument

public java.lang.String getDocument()

moveTo

public void moveTo(int ix)

reset

public void reset()

getPos

public int getPos()

getStartPos

public int getStartPos()

inc

public int inc()

getCurrentTag

public java.lang.String getCurrentTag()

findTag

public boolean findTag(java.lang.String name)
This tries to find the tag specified starting from the current position in the document.

Parameters:
name -
Returns:

nextTag

public java.lang.String nextTag()
This scans for the next tag starting at the current position. If a next tag is found this returns the tag's name without the braces. The "current position" is left at the tag start character <.

Returns:
the tag name, or null if nothing was found.

tagParseInit

public java.lang.String tagParseInit()
This must be called with the current position on a tag. It initializes for parsing the tag's attribute contents. It moves the current position past the tag, at the location where the 1st attribute would be if the tag had them.

Returns:
the name of the tag.

tagParseParamName

public java.lang.String tagParseParamName()
This tries to parse a parameter name from the current pos. The name must be alphanumeric and ends in either spaces or an equals sign. If the current pos does not represent a parameter then this returns null.

Returns:
the parameter name or null if no parameter here.

tagParseValue

public java.lang.String tagParseValue()
Can be called after tagParseParamname() returned a name. This returns the parameter value, or null if no value is present. The value includes any quotes if present. If the value is unquoted then it is delimited by the 1st space or >.

Returns:
a value string, or null if no value is present.

tagIsTagEnd

public boolean tagIsTagEnd()
Returns T if the current location contains the tag end > character.

Returns:

atEof

public boolean atEof()

atEof

public boolean atEof(int pos)

tagToEnd

public boolean tagToEnd()
When called this parses the tag until the end of the tag is reached.

Returns:
T if it worked.

findMatchingEndTag

public boolean findMatchingEndTag(java.lang.String tag)
Finds the first matching end tag at the CURRENT level. Nested tags are counted.

Parameters:
tag - The name of the tag, like "A".
Returns:
T if a match was found. If so the current position will be at the start of the end tag.

skipTag

public void skipTag()