The script reads a list of URLs from a CSV file, tests their accessibility, fetches sitemap links, and checks for broken links. Results are displayed dynamically using interactive widgets and saved periodically to ensure data integrity. Asynchronous programming ensures efficient handling of network requests without blocking the main thread, while logging provides traceability and error reporting.
Description of the Python Code
The provided Python script is designed for checking URLs, identifying broken links, and fetching sitemap links from a list of URLs contained in a CSV file. It uses asynchronous programming to handle potentially slow network operations without blocking the main thread. The script also includes interactive widgets for user input and displays results dynamically.
Functionalities
URL Checking:
test_link(url): Checks if a URL is reachable and returns a status indicator.
fetch_sitemap_
check_for_
check_for_
Results Display and Storage:
display_
save_results_
User Interface:
create_
update_
Initialization:
initialize(): Sets up the initial state, reads URLs from a CSV file, and starts the process of checking URLs.
Checklist
Setup and Imports:
Ensure all required libraries are imported (pandas, BeautifulSoup, ipywidgets, asyncio, aiohttp, chardet, logging, os, time).
Configure logging to capture and store logs.
Class Definition (URLChecker):
Initialize variables and configurations in the __init__ method.
Define asynchronous methods for URL checking (test_link, fetch_sitemap_
User Interaction:
Create widgets for user input (create_widgets).
Handle user input to update results (update_results).
Results Management:
Display results dynamically (display_results).
Save results periodically to a CSV file (save_results_
Main Workflow:
Initialize the process (initialize).
Read and process URLs from a CSV file.
Continuously check URLs and update results.
Flowchart Priorities
Initialization:
Start the URLChecker process.
Read URLs from the specified CSV file.
Initialize interactive widgets.
User Input Handling:
Display initial set of URLs and their status.
Monitor widget inputs for changes (URL index, sorting options, etc.).
Update results based on user actions.
URL Checking:
Test each URL for accessibility.
Fetch sitemap links or check for broken links based on user preferences.
Results Display and Logging:
Display results dynamically as they are processed.
Log each step and any errors encountered.
Save results periodically to ensure data is not lost.
Project information
- Maintainer:
- Dileep kumar D
- Driver:
- Dileep kumar D
- Licence:
- Open Software Licence v 3.0, Creative Commons - No Rights Reserved
View full history Series and milestones
trunk series is the current focus of development.