html-text 0.5.2-2 source package in Ubuntu

Changelog

html-text (0.5.2-2) unstable; urgency=medium

  * source-only upload needed for testing migration.

 -- Christian Marillat <email address hidden>  Fri, 11 Nov 2022 09:12:09 +0100

Upload details

Uploaded by:
Christian Marillat
Uploaded to:
Sid
Original maintainer:
Christian Marillat
Architectures:
all
Section:
misc
Urgency:
Medium Urgency

See full publishing history Publishing

Series Pocket Published Component Section
Noble release universe misc
Mantic release universe misc
Lunar release universe misc

Builds

Lunar: [FULLYBUILT] amd64

Downloads

File Size SHA-256 Checksum
html-text_0.5.2-2.dsc 1.8 KiB 910e8c38fde6d3d2f0df1bf1924ad8422d696e253227f97860c8d1095b4fa6db
html-text_0.5.2.orig.tar.gz 51.1 KiB c75a1da10d649f55162446de57f98374059a998071110a343815841286a442f9
html-text_0.5.2-2.debian.tar.xz 6.8 KiB 8ba849c74f995dc7105f4b30fe0439df4ce4ddf3b1da61a1651c3e39c7f3258b

Available diffs

No changes file available.

Binary packages built by this source

python3-html-text: extract text from HTML.

 How is html_text different from .xpath('//text()') from LXML or .get_text()
 from Beautiful Soup ?
 .
  * Text extracted with html_text does not contain inline styles,
    javascript, comments and other text that is not normally visible to
    users;
  * html_text normalizes whitespace, but in a way smarter than
    .xpath('normalize-space()), adding spaces around inline elements (which
    are often used as block elements in html markup), and trying to avoid
    adding extra spaces for punctuation;
  * html-text can add newlines (e.g. after headers or paragraphs), so that
    the output text looks more like how it is rendered in browsers.