PHP Classes

Bookmarks Checker for Chrome and Firefox: Check browser bookmark files to identify dead URLs

Recommend this page to a friend!
     
  Info   Example   View files Files   Install with Composer Install with Composer   Download Download   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
Not yet rated by the usersTotal: 119 All time: 9,519 This week: 455Up
Version License PHP version Categories
bookmarks-checker 1.0.0GNU General Publi...5HTTP, PHP 5, Files and Folders, Valid...
Description 

Author

This package can check browser bookmark files to identify dead URLs.

It can take a file with bookmarks used by the Chrome or Firefox browsers and checks if the URL they point are still accessible.

The package can determine the number of links that are still accessible or it failed to access them.

Innovation Award
PHP Programming Innovation award nominee
April 2019
Number 9
Many users take advantage of the bookmark system provided in most browsers to remember important pages that they want to access frequently or remember to access later when they have more time.

However, over time certain pages are removed or become inaccessible for same reason.

This package helps finding URLs in bookmarks files used by browsers like Chrome and Firefox, thus helping to clean bookmarks of pages that are no longer available.

Manuel Lemos
Picture of Martin Latter
  Performance   Level  
Name: Martin Latter <contact>
Classes: 8 packages by
Country: United Kingdom
Age: ???
All time rank: 129760 in United Kingdom
Week rank: 180 Up5 in United Kingdom Up
Innovation award
Innovation award
Nominee: 5x

Recommendations

The best PHP chrome bookmarks import to MySQL database class
Export bookmarks to HTML, parse the HTML, and import into MySQL

Example

#!/usr/bin/env php
<?php

/**
    * Bookmarks Checker
    *
    * Verify links in a Chrome or Firefox exported bookmarks file using cURL multi.
    *
    * Usage: php bookmarks_checker.php [file]
    *
    * @author Martin Latter
    * @copyright Martin Latter 15/01/2019
    * @version 0.08
    * @license GNU GPL version 3.0 (GPL v3); http://www.gnu.org/licenses/gpl.html
    * @link https://github.com/Tinram/Bookmarks-Checker.git
*/


require('classes/url_checker.class.php');

use
Tinram\URLChecker2\URLChecker2;

define('DUB_EOL', PHP_EOL . PHP_EOL);
define('DEFAULT_FILE', 'bookmarks.html');
define('LOG_FILE', 'bookmarks_checker.log');
define('BATCH_SIZE', 200); # size of each cURL request batch


/* filename */
if ( ! isset($_SERVER['argv'][1]))
{
    if (
file_exists(DEFAULT_FILE))
    {
       
$sFilename = DEFAULT_FILE;
    }
    else
    {
       
$sUsage =
           
PHP_EOL . ' ' .
           
str_replace('_', ' ', ucwords(basename(__FILE__, '.php'), '_')) .
           
DUB_EOL .
           
"\tusage: " . basename(__FILE__) . ' [filename]' .
           
DUB_EOL;

        die(
$sUsage);
    }
}
else
{
   
$sFilename = $_SERVER['argv'][1];
}

/* no such file */
if ( ! file_exists($sFilename))
{
    die(
PHP_EOL . ' ' . $sFilename . ' does not exist in this directory!' . DUB_EOL);
}


$sHtml = file_get_contents($sFilename);

$rxPattern = '/<a\s[^>]*href=\"([^\"]*)\"[^>]*>(.*)<\/a>/siU'; /* avoid attributes: by chirp.com.au */

preg_match_all($rxPattern, $sHtml, $aMatches, PREG_SET_ORDER);


$aLinks = [];

foreach (
$aMatches as $aLinkEntity)
{
   
$aLinks[] = [ 'url' => $aLinkEntity[1], 'name' => $aLinkEntity[2] ];
}

if (empty(
$aLinks))
{
    die(
' No links extracted from ' . $sFilename . DUB_EOL);
}

echo
PHP_EOL . ' ' . count($aLinks) . ' links being checked ...' . DUB_EOL;

$oChecker = new URLChecker2($aLinks);

echo
PHP_EOL . ' ' . $oChecker->getURLFails() . ' links failed';
echo
PHP_EOL . ' ' . ($oChecker->getURLTotal() - $oChecker->getURLFails()) . ' links verified' . DUB_EOL;


Details

Bookmarks Checker

Identify dead links in Firefox and Chrome bookmarks.

Background

So many browser bookmarks &ndash; there are 1,900 URLs in my bookmarks. And in just one year, 120 of those URLs ceased to exist.

A simple PHP prototype script provided a slow way (~1 URL per second) of checking for dead links.

I switched to Python to leverage its threading capabilities and speed up the process. Then I finally got round to adding cURL multi to the PHP script.

Example

$ php bookmarks_checker.php

1883 links being checked ...

error | https://www.nxytimes.com/ | 0 | 4.999007 | nxytimes
<...>

See generated logfile bookmarks_checker.log
URL parse time: 177.642 s

95 links failed
1788 links verified

Scripts

  • Python 3
  • Python 2
  • PHP and cURL

Usage

Export browser bookmarks.

The scripts by default will attempt to load a file in the same directory called bookmarks.html An alternative filename can be specified on the command-line.

The scripts parse the file and try to access each URL, printing a list of URLs that cannot be accessed (which will intermittently include a false positive).

Python

    python3 bookmarks_checker.py

    python bookmarks_checker_py2.py

(or make the file executable and run directly e.g. ./bookmarks_checker.py)

Switches

-h or --help displays help text.

-f <file> allows an alternatively-named file to be loaded instead of the default bookmarks.html

PHP

    php bookmarks_checker.php [file]

Exporting Browser Bookmarks <a id="export"></a>

Firefox

Bookmarks > Show All Bookmarks > Import and Backup > Export Bookmarks to HTML

Chrome

Access Chrome's Bookmark Manager with:

<kbd>Ctrl</kbd> + <kbd>Shift</kbd> + <kbd>O</kbd>

or

chrome://bookmarks/

then click Organize > Export bookmarks to HTML file ...

Other

Python Scripts

Setting DEBUG = True will show all URLs as access is attempted, and either the successful response or the failure error message.

Credits

Doug Hellmann, jfs, and philshem for threading pools in Python.

License

Scripts are released under the GPL v.3.


  Files folder image Files (9)  
File Role Description
Files folder imageclasses (1 file)
Accessible without login HTML file bookmarks.html Doc. Documentation
Accessible without login Plain text file bookmarks_checker.php Example Example script
Accessible without login Plain text file bookmarks_checker.py Data Auxiliary data
Accessible without login Plain text file bookmarks_checker_prototype.php Aux. Auxiliary script
Accessible without login Plain text file bookmarks_checker_py2.py Data Auxiliary data
Accessible without login Plain text file LICENSE Lic. License text
Accessible without login Plain text file README.md Doc. Documentation
Accessible without login Plain text file speed_example.txt Doc. Documentation

  Files folder image Files (9)  /  classes  
File Role Description
  Plain text file url_checker.class.php Class Class source

The PHP Classes site has supported package installation using the Composer tool since 2013, as you may verify by reading this instructions page.
Install with Composer Install with Composer
 Version Control Unique User Downloads Download Rankings  
 100%
Total:119
This week:0
All time:9,519
This week:455Up