Wednesday, February 04, 2009

Loading lat/long data into PostGIS

Base scenario. I have a datatable with longitude and latitude data that I want to look at using spatial tools. And I have shape files for the metropolitan area that I want to use them with. So, in other words I have the following.

  1. ESRI shapefiles
  2. Data table with latitude and longitude

The plan is to load the data table into PostGIS, then I can pull them into a GIS system like uDig or Quantum GIS or various ESRI products. Because PostGIS can do transforms, but ESRI shapefiles are what they are, the procedure is as follows:

  1. Identify SRID (spatial reference ID) for the ESRI shapefiles. This should include units (feet, meters, or degrees)
  2. Identify a corresponding SRID that used lat long degrees
  3. Load data table into PostGIS
  4. Add geometry (points) data to datatable, including transformation from degrees to feet.
  5. Optional: export PostGIS to ESRI Shapefile if needed.

1. Identify SRID (spatial reference ID) for the ESRI shapefiles. This should include units (feet, meters, or degrees)

Shapefiles usually come as a set of files inside a directory. When they come from an official source, one of these should be a *.prj file. This is an example of one

PROJCS["NAD_1983_StatePlane_Pennsylvania_South_FIPS_3702_Feet",
GEOGCS["GCS_North_American_1983",
DATUM["D_North_American_1983", SPHEROID["GRS_1980",6378137.0,298.257222101]],
PRIMEM["Greenwich",0.0],
UNIT["Degree",0.0174532925199433]], PROJECTION["Lambert_Conformal_Conic"], PARAMETER["False_Easting",1968500.0],
PARAMETER["False_Northing",0.0],
PARAMETER["Central_Meridian",-77.75],
PARAMETER["Standard_Parallel_1",39.93333333333333],
PARAMETER["Standard_Parallel_2",40.96666666666667],
PARAMETER["Latitude_Of_Origin",39.33333333333334],
UNIT["Foot_US",0.3048006096012192]]

The first line identifies the projection, using something that should approach a WKT ("Well Known Text") name of a projection. In PostGIS, when the PostgreSQL database was geographically enabled, a table "spatial_ref_sys" was created that has a list of all SRID with names. Searching on this table should reveal a projection that has a name very similar to this one. That will define the target SRID for the whole project. The quick way to do this is to write a query:

SELECT srid, auth_name, auth_srid, srtext, proj4text
FROM spatial_ref_sys
WHERE srtext LIKE '%Pennsylvania%'
ORDER BY srtext;

In this case, we find:

2272; "PROJCS["NAD83 / Pennsylvania South (ftUS)", GEOGCS["NAD83", DATUM["North_American_Datum_1983",SPHEROID["GRS 1980", 6378137,298.257222101, AUTHORITY["EPSG","7019"]],

Note that in this case the key parts were NAD83, Pennsylvania South, and ft.

2. Identify a corresponding SRID that used long lat degrees
Another search through the "spatial_ref_sys" table finds only projections that are associated with feet or meters. So a broader search should be used to get


SELECT srid, auth_name, auth_srid, srtext, proj4text
FROM spatial_ref_sys
WHERE srtext LIKE '%NAD83%'
ORDER BY srtext;

This eventually reveals one projection that

4269;"GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]], AUTHORITY["EPSG","6269"]], PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]], UNIT["degree",0.01745329251994328, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4269"]]"

The key here is that the 'UNIT' is "degree" as opposed to "Survey feet" or "Meters".

3. Load data table into PostGIS

First, by this point the PostgreSQL database was created using a template that has PostGIS enabled. I created an SQL file with all the Create Table commands ahead of time. They look like this:

CREATE TABLE table2004
(
case_no character(7) NOT NULL,
week integer,
"month" character(3),
eventdate date,
bg_lat numeric(12,6),
bg_long numeric(12,6),
serial2004 serial NOT NULL,
CONSTRAINT table2004_pk PRIMARY KEY (serial2004)
)
WITH (OIDS=TRUE);
ALTER TABLE table2004 OWNER TO userid;

Key point is that I need something an integer field that is a primary key, or at some points indexing becomes a problem. So I made a serial field to be an index (because my case_no field is a text field so was unsuitable). Another issue is the geographic indexing also likes an OID field, so I set OIDS = TRUE. Most current PostgreSQL documentation mentions that this is deprecated, but PostGIS databases would be an exception.

Next, a geographically (geometry) enabled field is added. PostGIS has a PL/pgsql function named "AddGeometryColumn" that does this. Note that earlier, I established that the target SRID is 2272. Next, because there will be spatial searches, I will create an index on the geometry column.

SELECT AddGeometryColumn('public', 'table2004', 'longlat', 2272, 'POINT', 2);

CREATE INDEX idx_table2004_longlat ON table2004
USING GIST (longlat);

Next, the data is loaded. I had exported the data table to a csv file, then used VIM to surround each line with an INSERT INTO statement so it looked like this:

INSERT INTO table2004 (week, month, eventdate, case_no, bg_lat, bg_long) VALUES( 1,'JUL','07/01/2004', 6666097,40.382649,-79.803297);

This loads the data into PostgreSQL. The next step is to actually create the geographic data.

4. Add geometry (points) data to datatable, including transformation from degrees to feet.

Now that there is a geometry column, and the lat/long information in the database as a source, the point needs to be added as a point. This is done through an UPDATE statement, using the PL/pgsql function "transform" "PointFromText". Note that the SRID that corresponded to a latlong representation was 4269. Also note that POINT is expressed in x, y. So longitude comes before latitude.

UPDATE table2004 SET longlat = transform(PointFromText('POINT(' || bg_long || ' ' || bg_lat || ')', 4269), 2272);

Now, this is done, and the table can be viewed using a GIS viewer that can connect to a spatial database.

Some of the data is messy, in particular, I was not able to get lat/long for all the addresses. So I created a View that removed these from the result (or I get a bunch of points at 0, 0

CREATE OR REPLACE VIEW table2004work AS
SELECT table2004.week,table2004.month,table2004.eventdate, table2004.case_no,table2004.bg_lat,table2004.bg_long, table2004.serial2004, table2004.longlat
FROM table2004
WHERE table2004.bg_lat > 1 AND table2004.bg_long < (-1);

ALTER TABLE table2004work OWNER TO userid;
COMMENT ON VIEW table2004work IS 'Remove 0,0 data from map';

So this provides a cleaner dataset, and it is also accessible from a GIS viewer.

5. Optional: export PostGIS to ESRI Shapefile if needed.

To export from the PostGIS database to an ESRI Shapefile (if it needed to be given to someone who only need to look at it), the PostgreSQL database should be running. Then pgsql2shp can be run. This should be done after creating a data directory for the shape file to exist. From the terminal, cd into the target directory and run

pgsql2shp -f table2004working.shp datatable table2004work

This creates a shapefile with .shp, .dbg and .shx files. Note that you may need to create a .prj file as well, but this can be done from within a GIS viewer.


Note: Much of this was found through the great documentation available at Refractions PostGIS site and the Boston GIS website, which has multiple tutorials on PostGIS and PostgreSQL.

[Edit]  In the 'srtext' field, the PROJCS identifies the SRID as belonging to a 'projection' which is a curved surface put on a flat plane.  When there is no projection, it is a pure GEOGCS (geographic coordinate system) and it is in degrees lat/long

No comments: