Friday, August 6, 2010

Detect broken links on a Web site using wget

The other day, I started thinking about writing a simple Web site validator that detects broken links on a Web site, similar to W3C Link Checker. After looking at various code samples in Java, Ruby, etc, I figured out that GNU wget 1.12 on my Linux machine could do the job just fine, with no programming required. It even detected broken resource links in CSS, not just broken <a> links.

Here is how to write a simple script to check the site. First, pretend to be a Mozilla-based browser and spider the site to the depth of one level:

wget --spider -r -l 1 --header='User-Agent: Mozilla/5.0' \
-o wget_errors.txt http://the_site_i_want_to_validate


Then, simply look at the return code to determine if there is any error. If the code is larger than zero, there is an error.

EXIT_CODE=$?
if [ $EXIT_CODE -gt 0 ]; then
echo "ERROR: Found broken link(s)"
exit 1
fi


To find out the actual links in question, just grep for 404 in the wget error log.

BROKEN_LINKS=`grep -B 2 '404' wget_errors.txt`


The -B 2 outputs the 2 lines above any matching line, which in this case contains the broken link in question.

Tuesday, August 3, 2010

Loading large reference database in Android


Reference data normally refers to (mostly) read-only data that is used to validate or resolve other pieces of data. For example, a list of postcodes and suburbs, that can be used to provide auto-suggestions in a UI or validate a user's address.

I recently developed an Android (version 2.1 / Eclair) application that used a large set of reference data, stored in a SQLite database. The fact that this was targeted for Android smartphones introduced some constraints in the loading of this data that would not be present for a desktop or Web application, primarily processing time and storage space.

The initial design was based on my prior experience working on Web applications. It involved packaging a largish (~10MB) text file into the apk archive. The application performed a one-time initialization step that loaded this file, parsed it and executed SQL statements to insert rows into the database. The application then accessed the reference data via a subclass of SQLiteOpenHelper. It soon became apparent that this would not work in the Android world, due to the following reasons.
  • Loading the database, even if its a once-off, took too long. People are not likely to want to wait for many more minutes, especially after waiting for the 10MB download.

  • The database storage took up 10+MB, and when combined with the input text file, totalled 20+MB. Android 2.1 required the application to be installed onto the phone's limited internal memory, where 20MB was a relatively massive chunk.

  • The input text file in the apk archive could not be removed after initialization, so it just uselessly consumed precious phone memory.
After a few tries, I arrived at the following final design that seemed to solve these problems.

Preload database

The database loading code had to be removed from the Android code and rewritten as a separate program. This program read the text input file and inserted the rows into a SQLite database, with the database file stored on my desktop. Note that the primary key column of the reference table should be named _id and the following must be added to the database for it to be usable by Android:

CREATE TABLE android_metadata (locale TEXT);
INSERT INTO android_metadata VALUES('en_US');


This preloaded database file was then uploaded to a Web site so that it could be downloaded by the Android application.

Download database

The Android application didn't need to be packaged with the text input file any more, hence trimming the apk archive from 10+MB to a few hundred KB. The application's starting Activity would check and download the database file if it didn't already exist on the SD-Card. According to the Android Dev Guide on Data Storage, the proper place to save the database file would be:

  • for Android 2.2 and above :- the directory returned by getExternalFilesDir()

  • for Android 2.1 :- the directory starting with getExternalStorageDirectory(), then appended with /Android/data/<package_name>/<file_type>, which resolved to something like /sdcard/Android/data/com.mycompany.myapp/db

I had to use the latter option as I was targeting 2.1.

It was important to ensure that the UI was not frozen during the few minutes it took to download the 10+MB database file. That meant that the downloading code must be executed in a separate thread to the main UI thread. The best way to achieve this was to create a subclass of AsyncTask and put the downloading code in its doInBackground() method. This class also displayed a ProgressDialog to keep the user informed, and acquired a WakeLock (SCREEN_DIM_WAKE_LOCK) to prevent the phone from going to sleep. Another important thing to remember was to prevent the screen orientation from changing when the phone was flipped over while the ProgressDialog was showing, otherwise the application would crash when the dialog was dismissed. This was achieved by temporarily setting the requestedOrientation property of the activity to "no-sensor". Here is the skeleton code of the AsyncTask:

// FileDownloader is my own delegate class that performs the
// actual downloading and is initialized with the source URL.
public class InitializeDatabaseTask extends
AsyncTask<FileDownloader, Integer, Object> {
private ProgressDialog progressDialog;
private File dbFile;
private PowerManager.WakeLock wakeLock;
private Activity activity;
private transient int originalRequestedOrientation;

public InitializeDatabaseTask(Activity activity, File dbFile) {
super();
this.dbFile = dbFile;
this.activity = activity;

wakeLock = ...; // Obtain a wakelock for SCREEN_DIM_WAKE_LOCK
progressDialog = ...; // Create a ProgressDialog instance with title, message ,etc
}

@Override
protected void onPreExecute() {
super.onPreExecute();
originalRequestedOrientation = activity.getRequestedOrientation();
activity.setRequestedOrientation(ActivityInfo.SCREEN_ORIENTATION_NOSENSOR);
wakeLock.acquire();
progressDialog.show();
}

@Override
protected Object doInBackground(FileDownloader... params) {
FileDownloader downloader = params[0];
try {
downloader.downloadTo(dbFile);
} catch (IOException e) {
throw new AndroidRuntimeException(e);
}

return null;
}

@Override
protected void onPostExecute(Object result) {
super.onPostExecute(result);
progressDialog.dismiss();
wakeLock.release();
activity.setRequestedOrientation(originalRequestedOrientation);
}
}

The main Activity used InitializeDatabaseTask like so:

File dbFile = ...; // File pointing to /sdcard/Android/data/com.mycompany.myapp/db
new InitializeDatabaseTask(this, dbFile).execute(new FileDownloader(DOWNLOAD_DB_URL));


Mount SD-Card in emulator

I developed the application mainly using the Android SDK emulator in Eclipse. The emulator did not mount any SD-Card by default. In order to test the downloading code, the emulator must be set up with a SD-Card image file, like so (on my Linux system):

$ android-sdk/tools/mksdcard 64M ~/.android/avd/Android.2.1.avd/sdcard.img

This created a 64MB image file (more than enough for the data) in a special Android avd directory, so that it would be automatically mounted by the emulator.

Implement custom SQLite helper

Finally, I had to write my own custom version of the SQLiteOpenHelper to access the database file from the SD-Card, because the standard one would only read from the default phone internal memory. Given that the reference data would be read-only, this custom class was a lot simpler than SQLiteOpenHelper and only needed to open a read-only database connection. The code:


public abstract class ExternalStorageReadOnlyOpenHelper{
private SQLiteDatabase database;
private File dbFile;
private SQLiteDatabase.CursorFactory factory;

public ExternalStorageReadOnlyOpenHelper(
String dbFileName, SQLiteDatabase.CursorFactory factory) {
this.factory = factory;

if (!Environment.getExternalStorageState().equals(Environment.MEDIA_MOUNTED)) {
throw new AndroidRuntimeException(
"External storage (SD-Card) not mounted");
}
File appDbDir = new File(
Environment.getExternalStorageDirectory(),
"Android/data/com.mycompany.myapp/db");
if (!appDbDir.exists()) {
appDbDir.mkdirs();
}
this.dbFile = new File(appDbDir, dbFileName);
}

public boolean databaseFileExists() {
return dbFile.exists();
}

private void open() {
if (dbFile.exists()) {
database = SQLiteDatabase.openDatabase(
dbFile.getAbsolutePath(),
factory,
SQLiteDatabase.OPEN_READONLY);
}
}

public synchronized void close() {
if (database != null ) {
database.close();
database = null;
}
}

public synchronized SQLiteDatabase getReadableDatabase() {
return getDatabase();
}

private SQLiteDatabase getDatabase() {
if (database==null) {
open();
}
return database;
}
}


A concrete subclass of ExternalStorageReadOnlyOpenHelper was created to query the reference data via the SQLiteDatabase object returned by the getReadableDatabase() method.

The databaseFileExists() method allowed the main Activity to check if the database file already existed to decide whether to initiate download.

Conclusion


This final design had greatly improved the application in the following ways :
  • The user only had to download a small apk file initially, greatly reducing the barrier to installation.

  • The separate download step for the database file presented an opportunity to tell the user what was happening and to use Wi-Fi if available.

  • Most of the data resided on the SD-Card, where space was much more abundant.

  • There was ever only one copy of the reference data on the phone, no redundant duplication.

  • The main application code could be updated without having to download the reference data.