Monday, July 2, 2012

Moving large data between Oracle database and CSV

I'm currently working on a project that involves importing and exporting large volumes of data between spreadsheet-friendly comma-separated-value (CSV) files and an Oracle database. Although import/export of data can be done using a GUI tool like SQLDeveloper, it is not practical for large volumes or when such jobs need to be scripted. This blog describes how this can be done using BASH scripts and the command-line tools sqlplus and sqlldr.

Scenario

Suppose I run an online shop and maintain a list of purchase transactions with fields such as item_id, customer_id, delivery_status, etc. On my Oracle database, I have PURCHASE_TRANSACTIONS, ITEMS and CUSTOMERS tables with the appropriate joins.

Exporting from Oracle DB to CSV

To export data from the Oracle database table (normally with additional joins and filtering criteria), create a SQL file with:

  • a query to write out the header row of the CSV
  • a query to get the data and concatenate them together in a CSV format

Something like this to export a list of items pending delivery:

set head off;
set feed off;
set trimspool on;
set linesize 32767;
set pagesize 32767;
set echo off;
set termout off;

spool &1;
select 'ITEM_ID,ITEM_NAME,PRICE,CUSTOMER_ID,CUSTOMER_NAME,CUSTOMER_ADDRESS' from dual;
select pt.item_id
|| ',"' || itm.name
|| '",' || itm.price
|| ',' || pt.customer_id
|| ',"' || c.name
|| '","' || c.address
|| '"'
from PURCHASE_TRANSACTIONS pt
join CUSTOMERS c on pt.customer_id = c.id
join ITEMS itm on pt.item_id = itm.id
where pt.delivery_status = 'PENDING'
order by pt.customer_id;

spool off;
exit;

Note the double-quotes around textual data. To execute this, save it to a file named export-pending.sql and write a simple BASH script like so:

sqlplus -S MY_DB_USER/MY_DB_PASSWORD@MY_DB_SERVER:1521/DB_SID @export-pending.sql pending-delivery-report.csv

Running this script will write out the file pending-delivery-report.csv with contents like:

ITEM_ID,ITEM_NAME,PRICE,CUSTOMER_ID,CUSTOMER_NAME,CUSTOMER_ADDRESS
432,"Super Mouse Pad",23.5,11,"John Smith","45 Some Street, Sydney, NSW 2000"
877,"USB Fan",14,11,"John Smith","45 Some Street, Sydney, NSW 2000"
344,"NeverDie UPS",125.3,67,"Alex Jones","101 Scary Road, Melbourne, VIC 3000"
...

This export functionality is typically used to generate spreadsheet friendly reports.

Importing from CSV to Oracle DB

Suppose I have a CSV file items-data.csv containing a new list of items that I want to ADD to the ITEMS table, in the following format:

ID,NAME,PRICE
1001,'Lightbulb',1.95
1002,'Car battery',98
1003,'Super cool camera',450.50
...

To load this into the database, create a control file load-items.ctl with the following:

load data
append
into table ITEMS
fields terminated by ',' optionally enclosed by '\''
(
    ID,
    NAME,
    PRICE
)

Note that in order to REPLACE the entire table's contents with the CSV data, simply change the append instruction (second line) to replace. Finally, create a simple BASH script with the following to run sqlldr:

sqlldr MY_DB_USER/MY_DB_PASSWORD@MY_DB_SERVER:1521/DB_SID control=load-items.ctl data=items-data.csv direct=true rows=1000 skip=1 errors=1 log=load-items.log bad=load-items.bad discard=load-items.discard

After executing this script, check the items-data.log for the progress of the data load. The load-items.bad and load-items.discard files contain the data with errors.

No comments:

Post a Comment