Registered by Dileep kumar D

The script reads a list of URLs from a CSV file, tests their accessibility, fetches sitemap links, and checks for broken links. Results are displayed dynamically using interactive widgets and saved periodically to ensure data integrity. Asynchronous programming ensures efficient handling of network requests without blocking the main thread, while logging provides traceability and error reporting.

Description of the Python Code
The provided Python script is designed for checking URLs, identifying broken links, and fetching sitemap links from a list of URLs contained in a CSV file. It uses asynchronous programming to handle potentially slow network operations without blocking the main thread. The script also includes interactive widgets for user input and displays results dynamically.

Functionalities
URL Checking:

test_link(url): Checks if a URL is reachable and returns a status indicator.
fetch_sitemap_links(url): Fetches all links from a given URL, assuming it is a sitemap.
check_for_broken_links(url): Checks all links on a given webpage to identify broken links.
check_for_sitemap_links(url): Similar to fetch_sitemap_links, but outputs the results directly.
Results Display and Storage:

display_results(index): Displays the results of URL checks in a formatted table.
save_results_to_csv(): Saves the results to a CSV file.
User Interface:

create_widgets(initial_index, max_index): Creates interactive widgets for user input.
update_results(change, counter_input, sort_index_input, output_area, use_b_col, check_broken_links): Updates the results based on user input and displays them dynamically.
Initialization:

initialize(): Sets up the initial state, reads URLs from a CSV file, and starts the process of checking URLs.
Checklist
Setup and Imports:

Ensure all required libraries are imported (pandas, BeautifulSoup, ipywidgets, asyncio, aiohttp, chardet, logging, os, time).
Configure logging to capture and store logs.
Class Definition (URLChecker):

Initialize variables and configurations in the __init__ method.
Define asynchronous methods for URL checking (test_link, fetch_sitemap_links, check_for_broken_links, check_for_sitemap_links).
User Interaction:

Create widgets for user input (create_widgets).
Handle user input to update results (update_results).
Results Management:

Display results dynamically (display_results).
Save results periodically to a CSV file (save_results_to_csv).
Main Workflow:

Initialize the process (initialize).
Read and process URLs from a CSV file.
Continuously check URLs and update results.
Flowchart Priorities
Initialization:

Start the URLChecker process.
Read URLs from the specified CSV file.
Initialize interactive widgets.
User Input Handling:

Display initial set of URLs and their status.
Monitor widget inputs for changes (URL index, sorting options, etc.).
Update results based on user actions.
URL Checking:

Test each URL for accessibility.
Fetch sitemap links or check for broken links based on user preferences.
Results Display and Logging:

Display results dynamically as they are processed.
Log each step and any errors encountered.
Save results periodically to ensure data is not lost.

Project information

Maintainer:
Dileep kumar D
Driver:
Dileep kumar D
Licence:
Open Software Licence v 3.0, Creative Commons - No Rights Reserved

RDF metadata

View full history Series and milestones

trunk series is the current focus of development.

Get Involved

  • warning
    Report a bug
  • warning
    Ask a question
  • warning
    Help translate

Downloads

sitemapExtractor does not have any download files registered with Launchpad.