tag:blogger.com,1999:blog-5156332309817509524.post8191093704178097576..comments2023-05-26T20:45:26.724+10:00Comments on Tech Talk: Detect broken links on a Web site using wgetAlexander Yaphttp://www.blogger.com/profile/03293300286288413519noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-5156332309817509524.post-49908350677141906492013-01-11T08:55:10.581+11:002013-01-11T08:55:10.581+11:00If you own the site, you can simply run the above ...If you own the site, you can simply run the above script over all the pages to find the pages with broken links.Alexander Yaphttps://www.blogger.com/profile/03293300286288413519noreply@blogger.comtag:blogger.com,1999:blog-5156332309817509524.post-4403920352183694432013-01-11T08:38:54.465+11:002013-01-11T08:38:54.465+11:00If you own the site, Check the logs, You will have...If you own the site, Check the logs, You will have a referrer link. Thats the page which linked to the broken link. If its an external site it becomes harder<br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5156332309817509524.post-24493836605229671152013-01-10T22:05:08.094+11:002013-01-10T22:05:08.094+11:00...but it would be nice to know which page contain......but it would be nice to know which page contains the broken link so it can be fixed.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5156332309817509524.post-29471553658968822222013-01-09T08:52:55.910+11:002013-01-09T08:52:55.910+11:00If the link is broken, then there is no link betwe...If the link is broken, then there is no link between the two pages. There is no way to figure out what the intended link URL should've been.Alexander Yaphttps://www.blogger.com/profile/03293300286288413519noreply@blogger.comtag:blogger.com,1999:blog-5156332309817509524.post-33110761629673695902013-01-09T08:17:09.314+11:002013-01-09T08:17:09.314+11:00I think he means the page that links to the broken...I think he means the page that links to the broken link.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5156332309817509524.post-29004109932446322172013-01-06T00:27:00.354+11:002013-01-06T00:27:00.354+11:00I believe so unless you specify the --no-parent op...I believe so unless you specify the --no-parent option.Alexander Yaphttps://www.blogger.com/profile/03293300286288413519noreply@blogger.comtag:blogger.com,1999:blog-5156332309817509524.post-23779692621495211162013-01-05T06:42:59.326+11:002013-01-05T06:42:59.326+11:00Can you get the parent page?Can you get the parent page?Unknownhttps://www.blogger.com/profile/00626754936723265580noreply@blogger.comtag:blogger.com,1999:blog-5156332309817509524.post-44140917729252975142012-09-14T06:31:45.963+10:002012-09-14T06:31:45.963+10:00If you are stuck on a Windows box and have the opt...If you are stuck on a Windows box and have the option to install PowerShell you can use Select-String to get similar results as the 'grep -B' command above.<br /><br />Use a Windows version of wget with the same switches as above.<br /><br />From the PowerShell command line (at the log file path):<br />>Get-Content .\wget-errors.txt | Select-String -context 2,0 -pattern -allmatches " 404 Not Found"<br /><br />In this example the -context switch will get the 2 lines above the matching line. (and zero lines below the matching line)<br /><br />If you just wanted the list of URLS you can do additional filtering using regular expressions instead of a simple text pattern. If you redirected the output from the previous command to a file called 404.txt you could:<br /><br />>Get-Content .\404.txt | Select-String -pattern "http://\S+" -allmatches | select matchesAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-5156332309817509524.post-29188929678243104712011-08-07T04:45:06.953+10:002011-08-07T04:45:06.953+10:00I've noticed that very often when I start to t...I've noticed that very often when I start to think of a hack I should code, after googling around for some time I realize my Linux box already has installed the tools necesssary for the job :)TeemuThttp://viheriointia.blogspot.com/noreply@blogger.com