to.etc.lexer
Class HtmlTextScanner
java.lang.Object
to.etc.util.TextScanner
to.etc.lexer.HtmlTextScanner
public class HtmlTextScanner
- extends TextScanner
Helper class to scan HTML and remove invalid constructs.
- Author:
- Frits Jalvingh
Created on Feb 22, 2010
Method Summary |
java.util.Map<java.lang.String,to.etc.lexer.HtmlTextScanner.TagInfo> |
getMap()
|
static java.lang.String |
htmlRemoveAll(java.lang.String html,
boolean lf)
|
static void |
htmlRemoveAll(java.lang.StringBuilder outsb,
java.lang.String text,
boolean lf)
|
static java.lang.String |
htmlRemoveUnsafe(java.lang.String html)
|
static void |
htmlRemoveUnsafe(java.lang.StringBuilder outsb,
java.lang.String text)
This scans the input, and only copies "safe" html, which is HTML with only
simple constructs. |
void |
scan(java.lang.StringBuilder sb,
java.lang.String html)
Scan HTML and remove unsafe tags and attributes. |
void |
scanAndRemove(java.lang.StringBuilder sb,
java.lang.String html,
boolean includelf)
Remove all HTML tags and collapse whitespace. |
Methods inherited from class to.etc.util.TextScanner |
accept, accept, append, append, append, clear, copy, copy, copy, currentChar, eof, getBuffer, getCopied, getInt, getLastInt, inc, index, LA, LA, length, nextChar, sb, scanDelimited, scanInt, scanLetters, scanWord, setIndex, setString, skip, skipWS |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
HtmlTextScanner
public HtmlTextScanner()
getMap
public java.util.Map<java.lang.String,to.etc.lexer.HtmlTextScanner.TagInfo> getMap()
scan
public void scan(java.lang.StringBuilder sb,
java.lang.String html)
- Scan HTML and remove unsafe tags and attributes. The result is garantueed to be safe and well-formed.
- Parameters:
sb
- html
-
scanAndRemove
public void scanAndRemove(java.lang.StringBuilder sb,
java.lang.String html,
boolean includelf)
- Remove all HTML tags and collapse whitespace.
- Parameters:
sb
- html
- includelf
-
htmlRemoveUnsafe
public static void htmlRemoveUnsafe(java.lang.StringBuilder outsb,
java.lang.String text)
- This scans the input, and only copies "safe" html, which is HTML with only
simple constructs. It checks to make sure the resulting document is xml-safe (well-formed),
if the input is not well-formed it will add or remove tags until the result is valid.
- Parameters:
sb
- html
-
htmlRemoveUnsafe
public static java.lang.String htmlRemoveUnsafe(java.lang.String html)
htmlRemoveAll
public static void htmlRemoveAll(java.lang.StringBuilder outsb,
java.lang.String text,
boolean lf)
htmlRemoveAll
public static java.lang.String htmlRemoveAll(java.lang.String html,
boolean lf)