html-text 0.5.2-2 source package in Ubuntu
Changelog
html-text (0.5.2-2) unstable; urgency=medium * source-only upload needed for testing migration. -- Christian Marillat <email address hidden> Fri, 11 Nov 2022 09:12:09 +0100
Upload details
- Uploaded by:
- Christian Marillat
- Uploaded to:
- Sid
- Original maintainer:
- Christian Marillat
- Architectures:
- all
- Section:
- misc
- Urgency:
- Medium Urgency
See full publishing history Publishing
Series | Published | Component | Section | |
---|---|---|---|---|
Noble | release | universe | misc | |
Mantic | release | universe | misc | |
Lunar | release | universe | misc |
Downloads
File | Size | SHA-256 Checksum |
---|---|---|
html-text_0.5.2-2.dsc | 1.8 KiB | 910e8c38fde6d3d2f0df1bf1924ad8422d696e253227f97860c8d1095b4fa6db |
html-text_0.5.2.orig.tar.gz | 51.1 KiB | c75a1da10d649f55162446de57f98374059a998071110a343815841286a442f9 |
html-text_0.5.2-2.debian.tar.xz | 6.8 KiB | 8ba849c74f995dc7105f4b30fe0439df4ce4ddf3b1da61a1651c3e39c7f3258b |
Available diffs
- diff from 0.5.2-1 to 0.5.2-2 (320 bytes)
No changes file available.
Binary packages built by this source
- python3-html-text: extract text from HTML.
How is html_text different from .xpath('//text()') from LXML or .get_text()
from Beautiful Soup ?
.
* Text extracted with html_text does not contain inline styles,
javascript, comments and other text that is not normally visible to
users;
* html_text normalizes whitespace, but in a way smarter than
.xpath('normalize- space() ), adding spaces around inline elements (which
are often used as block elements in html markup), and trying to avoid
adding extra spaces for punctuation;
* html-text can add newlines (e.g. after headers or paragraphs), so that
the output text looks more like how it is rendered in browsers.