Thursday, January 22, 2009

Massaging text with Vim

I'm teaching a database design class this semester. And while I could do most of it using a database I already know (and I have been doing the samples so far using SQLite. I'm sure just about all the students are using MS Access), I'm also taking the opportunity to learn PostgreSQL. (I'm using PostgreSQL because I have a project that will require a spatial database at some point. And the only real options are Oracle and PostGIS, an extension of PostgreSQL. Since funding is at a premium, PostGIS it is.) PostgreSQL is a bit of a step for me, because it is a server-based database, and requires administration. So stuff to learn.

The other thing I need to learn is how to get data into it. MS Access, SQLite and Derby all have mechanisms for pulling in your average CSV file. But the general way that databases are populated (and backups made) is via SQL files. But data does not come in SQL files, they come in delimited text. So the way to do this is take a text file, then build SQL around the data.

The procedure is straight forward. After the tables are created in the database (using CREATE TABLE statements) you just read in a text file (usually with a .sql extension just so you can recognize it when you see it again). And the text file is not hard either. It starts out as
INSERT INTO tablename(fieldname1, fieldname2) VALUES ('value1', 'value2');

Now, it is not too hard to convert the delimited text into the 'value1', 'value2' format using a bunch of search and replace operations (and accounting for things like escaped "'" and the like). The trick is getting the rest of the text around it. It could be as simple as a cut and paste. Multiple pastes. But some of these files had 100000s of lines. That is a lot of [CTRL]-[v]

Like a lot of people whose fortune is it to program, I've used a number of text editors to write code. Microsoft Visual Studio, Eclipse, JEdit are my current set. But one editor that I've always kept around is Vim. And it is for things like this I do so.

So, I have something I need to do for a few 100000 times. One way to do this would be to write a script. Not too hard, read a line of text, use it to build a new line of text with all the SQL in the proper place. Write a file. But it is a bit of overhead with having the interpreter running and managing the file handles, etc.

But another way to do this is just write a macro or script inside the editor. But this would not be fun in Visual Basic. Vim, like MS Word, has a macro recorder. Basically, it records a bunch of key strokes and allows you to replay them however many times you need to. So I start by doing one line. Then I put the INSERT INTO tablename(fieldname1, fieldname2) VALUES ( on its own line at the beginning of thie file (before the first line of data). "ayy puts this into a memory register 'a' within Vim (not the system clipboard. I tried this, but it is much slower.) Now the magic starts.

q1 (starts recording in register '1')
J (joins the following line with the current line)
[CTRL]-[a] (moves cursor to the end of the line and switches to insert text mode)
); (finishes off the line of SQL)
[ESC] (ends insert text mode)
j (moves down one line)
^ (goes to beginning of current line)
"aP" (inserts contents of register 'a')
q (stops recording the macro)

Now, to use this, I test this a couple of times to make sure it does what I want.


@1 (replay macro '1')

Looks good.

100@1 (macro '1' 100 times)

Or

500000@1 (macro '1' 500000 times)

and go to bed.
Post a Comment