Exporting data from Microsoft Access to MySQL using a linked table, trigger, UDF

This article describes a simple, but effective integration solution for transferring data on user request from an MS Access database to a MySQL database hosted on a remote web server. The data transfer occurs over an encrypted connection through the Internet, which creates some performance challenges. On the receiving side, the data has to be post-processed, which is achieved using a MySQL trigger and a user-defined function to notify a background daemon process.

The solution is outlined below:

  1. First off, ensure ODBC connectivity between the MySQL Access database and the MySQL database on the remote server. Install MyEnTunnel on the Windows machine to tunnel port 3306 from the local machine to the remote server over an encrypted (ssh) link. Configure MyEnTunnel to keep the connection open at all times. Set up an ODBC data source on the Windows machine to connect to the appropriate database at localhost:3306 (now tunneled to the remote server), using appropriate credentials. Note: I had some trouble configuring MyEnTunnel as a Windows service on Vista (though it worked ok on Windows XP). Adding it as a normal autostart program was a viable workaround.
  2. Based on the connection from the previous step, you can already create a linked table in Access which is mapped to a pre-existing MySQL table. Inserting rows into that table will insert them into MySQL. However, if you try inserting >1000 rows through a DSL connection, you will notice that the whole setup is unbearably slow (several minutes). Same applies to deleting rows. Some suggest setting up pass-through queries to work around the problem. I have not seriously tried this. Instead, I chose to reduce the number of INSERTs necessary by using a single linked table which acts as a "mailbox" on the MySQL side for receiving a "message" (long string) from MS Access. This message is deciphered by a remote daemon process.
  3. A naive approach would be to have the mailbox table contain a single column of type "text" and accept a single row. This won't work, MS Access will truncate everything below 64 KB or even below. You have to split the message into chunks. I chose to have 32 KB chunks. The layout of the mailbox table is as follows:
    CREATE TABLE `mailbox` (
      `id` int(4) NOT NULL auto_increment,
      `msg` text,
      `_ts` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
      `last_row` int(1) default '0',
      PRIMARY KEY(`id`)
    )
    
    To transmit the chunked message, first a DELETE FROM mailbox is executed, then a series of rows, each containing up to 32 KB msg payload are inserted, and all but the last row have last_row=0. The chunking is done by a VBA macro, which basically executes a query in Access, concatenates the rows (and columns) into a string, finally divides up the string into appropriately sized chunks.
  4. The following steps will make the server do something in response to the received data. Generally, you can react to database changes using triggers (provided that your MySQL version is recent enough). What we'd like to do is to have a trigger wake up a daemon process, which can then read the mailbox table, stitch together the received message chunks, and process the message. For that, we'll to add three more pieces to the puzzle: a dummy table, a trigger, and a user-defined function (UDF).
  5. The dummy table looks as follows:
    CREATE TABLE `dbg_tmp_dummy` (
      `dummy` varchar(256) default NULL
    )
    
    It should contain a single row, value doesn't matter. It will become apparent next why the dummy table is required.
  6. In order to notify an external process, I chose to use the tried and true UNIX named pipe mechanism (signals could be used just as well, but you'd then have to run the daemon process under MySQL identity). MySQL is supposed to write a character to a FIFO file, which the daemon process is reading from. MySQL doesn't support writing to files out-of-the-box without overwriting the file (the built-in export function will complain if the target file already exists). To work around this, a little MySQL user-defined function has to be implemented in C, so that the following incantation is possible: SELECT writefile('/tmp/fifo.pipe', 'x'); with the effect of a single byte 'x' being appended to /tmp/fifo.pipe. Given that we are dealing with a named pipe, and that the daemon process might not be listening, the write has to be a non-blocking one. I will skip the implementation details here. Rest assured that the UDF is a simple one. If you are new to UDFs, read the MySQL manual and have a look at udf_example.c shipped with the MySQL source code. Essentially, you compile the C file into a .so library, place the library in a directory searched by MySQL, use CREATE FUNCTION to register the function in your database.
  7. Finally comes the trigger part. To define the trigger, connect to the target database using the command-line MySQL client and enter the following:
    delimiter |
    CREATE TRIGGER mailbox_trigger AFTER INSERT ON mailbox
    FOR EACH ROW BEGIN
    UPDATE mailbox SET dummy=writefile('/tmp/fifo.pipe', 'x') WHERE NEW.last_row=1;
    END;
    |
    
    The above code is quite self-explanatory. The important things to note is that you have to use the special delimiter, you can't just enter the trigger definition having the standard semicolon as the delimiter (you'd get a syntax error). It is now apparent why the dummy table is needed. It is impossible to execute a SELECT statement within a trigger which is not tied to an INSERT/UPDATE. Finally, note how the last_row attribute is used to only send the notification after the full message has been received.
  8. The daemon process might be a Perl script which reads from the pipe:
    sub wait_for_notification
    {
        if (! -p $pipe_path) # create pipe if necessary
        {
            unlink($pipe_path);
            my $prev_umask = umask(077); # rw for owner (mysql)
            system("mknod $pipe_path p", 1, 1) || die "mknod: $!";
            umask($prev_umask);
        }
        open(FL, "<$pipe_path") || die "open: $!";
        for (;;)
        {
            my $buf = '';
            my $ret = sysread(FL, $buf, 1);
        
            if (!defined($ret))
            {
                die "sysread $pipe_path: $!";
            }
            elsif (!$ret) # EOF: we read something
            {
                close(FL) || die "close $pipe_path: $!";
                return;
            }
        }
    }
    
    Beware that notifications will be lost if MySQL writes to the pipe while the daemon is not reading it. For my particular use case, this is not an issue, as the daemon can also periodically poll the database and the notification volume is very low. You could modify the solution to use shared memory with appropriate locking instead of pipes to make sure that no race conditions are possible.

No comments:

Post a Comment