Here is how to write a simple script to check the site. First, pretend to be a Mozilla-based browser and spider the site to the depth of one level:
wget --spider -r -l 1 --header='User-Agent: Mozilla/5.0' \
-o wget_errors.txt http://the_site_i_want_to_validate
Then, simply look at the return code to determine if there is any error. If the code is larger than zero, there is an error.
if [ $EXIT_CODE -gt 0 ]; then
echo "ERROR: Found broken link(s)"
To find out the actual links in question, just grep for 404 in the wget error log.
BROKEN_LINKS=`grep -B 2 '404' wget_errors.txt`
-B 2outputs the 2 lines above any matching line, which in this case contains the broken link in question.