to.etc.domui.util
Class HtmlTextScanner

java.lang.Object
  extended by to.etc.util.TextScanner
      extended by to.etc.domui.util.HtmlTextScanner

public class HtmlTextScanner
extends TextScanner

Helper class to scan HTML and remove invalid constructs.

Author:
Frits Jalvingh Created on Feb 22, 2010

Constructor Summary
HtmlTextScanner()
           
 
Method Summary
 java.util.Map<java.lang.String,to.etc.domui.util.HtmlTextScanner.TagInfo> getMap()
           
 void scan(java.lang.StringBuilder sb, java.lang.String html)
          Scan HTML and remove unsafe tags and attributes.
 void scanAndRemove(java.lang.StringBuilder sb, java.lang.String html, boolean includelf)
          Remove all HTML tags and collapse whitespace.
 void setMaxlen(int maxlen)
           
 
Methods inherited from class to.etc.util.TextScanner
accept, accept, append, append, append, clear, copy, copy, copy, currentChar, eof, getBuffer, getCopied, getInt, getLastInt, inc, index, LA, LA, length, nextChar, sb, scanDelimited, scanInt, scanLetters, scanWord, setIndex, setString, skip, skipWS
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlTextScanner

public HtmlTextScanner()
Method Detail

getMap

public java.util.Map<java.lang.String,to.etc.domui.util.HtmlTextScanner.TagInfo> getMap()

setMaxlen

public void setMaxlen(int maxlen)

scan

public void scan(java.lang.StringBuilder sb,
                 java.lang.String html)
Scan HTML and remove unsafe tags and attributes. The result is garantueed to be safe and well-formed.

Parameters:
sb -
html -

scanAndRemove

public void scanAndRemove(java.lang.StringBuilder sb,
                          java.lang.String html,
                          boolean includelf)
Remove all HTML tags and collapse whitespace.

Parameters:
sb -
html -
includelf -