Retain All Column Headers in QGIS on Join

Join and Merge are two similar but different processes for combining tables. Merge is a simple stacking of records (also called an append) of two table with identical fields. The result is more records and the same number of fields.

Join is the combining of tables where each as a field containing identical records, and combining all additional fields from each table into a new table. The result is the same (or more records and more fields.

There is always the potential for a non-uniqueKey field and a non-foregnKey field to have the same name in both tables. QGIS’s default solution to the potential of duplicate field names in the resulting joined table is to sequentially rename each field after its table of origin. Just like the 2 GB shapefile size limit, the 10 character shapefile column name restriction is not something easily ‘outsmarted’ – so just be aware of that for today.

  • tableOne.joinField
  • tableOne.Name
  • tableOne.Address
  • tableOne.Coordinates
  • tableTwo.joinField
  • tableTwo.Name
  • tableTwo.Occupation

Becomes

  • tableJoin.joinField
  • tableJoin.tableOne1
  • tableJoin.tableOne2
  • tableJoin.tableOne3
  • tableJoin.joinField
  • tableJoin.tableTwo1
  • tableJoin.tableTwo2

But the desired result is:

  • tableJoin.joinField
  • tableJoin.name
  • tableJoin.address
  • tableJoin.coordinates
  • tableJoin.joinField2
  • tableJoin.name2
  • tableJoin.occupation

A more functional alternative to this problem would be to have the opportunity to specify new field names manually – or – to require the addition of “2” in position 10 (or position 9 depending on what model you’re using, but you get the gist here).

Manual solution:

In QGIS, perform a join by selecting Layer Properties>Joins and add a new join. The new shapefile can then have fields manually renamed in the attribute table using Table Manager. This is a nice write-up by Darren Cope.

OGR Rename solution:

After running your join, you can use an sql query to rename your headers – this example is also reprojecting the shapefile ( -t_srs EPSG:4326 ):

ogr2ogr -sql “select OGR_GEOMETRY as the_geom, tableOne1 as name, tableOne2 as address, tableOne3 as coordinates, tableTwo2 as name2, tableTwo2 as occupation, from ArapaMap_Parcels” -t_srs EPSG:4326 C:\output.shp C:\input.shp

QGIS 2.6 Processing Modeler

Processing Modeler automates geoprocessing, analysis, data management, and many other tools.

Within the processsing modeler you only have to add these steps once. After saving the model you can easely re-use it.

Enter the processing modeler through processing > graphical modeler

qgis_processing1

 

Which invokes this wizard

qgis_processing2

Activate the tab Input as shown above. Double click ‘Vector layer’, you get:

qgis_processing3

Fill in a descriptive name, like “original shapefile”. For an example, here is a layer of parcels for a single county.

Note: Throughout this article, the example will have real table and field names, however it is advised when building a model to use generic descriptors so that the model can easily be applied to other data in need of joining.

WMA-qgis join batch 1

Double click ‘Table field’:

qgis_processing4

Fill in a descriptive name that clarifies what layer is the target layer, like “Field to join with”. The parent layer is automatically set because at this point there is only one input added. In the example, the Parameter name is the same name as the key column name – Parcels.shp, field “Parcel_NO”.

Now repeat the process to identify the table to join. Double click ‘Table’ and give it a descriptive name, like, “Table to be joined”. Double click ‘Table field’, fill in a descriptive name to identify the table, like “joinfield”. Make sure the parent Layer is set correctly to the name given to the table. Continuing with the County Parcel example, the table and field have been given their original names:

Owner_Address.folio

WMA-qgis join batch 2

Following the identification of the input layers, the next step is to specify which actions to run. This is done through the Algorithms tab.

qgis_processing5

All the scripts an algorithms that are used in the QGIS-interface. There is a search field at the top that helps easily navigate to the right algorithm.

qgis_processing6

Double click on ‘join attributes table’ and follow the GUI to make sure all parameters are aligned.

qgis_processing7

Specifying only an output layer name will generate a temporary layer that is added to the workspace. This layer can be ‘saved as’ to store a hard copy. This method is best for running trials, so only the desired result is saved. If all troubleshooting has been accomplished, a full path can be put in the output destination as opposed to saving as a temporary file.

Press ‘ok’ to see the completed model.

qgis_processing8

If this is a process that will be run more than once, give the model a name and save it.

The final step is running it. Here’s a quick look at both attribute tables before the join. Notice that Owner_Address.csv has 131109 unique records and Parcels.shp has only 121044 unique records.

WMA-qgis join batch 7

To run the entire model as an individual job, double click the gear icon to “run”. Ensure all parameters are aligned for input shp and table field names, and hit run.

WMA-qgis join batch 3

Press ‘Run’ and see if everything goes fine. If not, change parameters in the processing modeler and try it again. In this example though, it is clear that all attributes and unique file header names has been retained for the input shp Parcels and the joined attribute table Owner_Address.

WMA-qgis join batch 4

A final point of interest is that the output table has only 121044 records. This means that an “inner join” was run. For the difference on left, right, inner, outer and full outer joins, check out this article. This is definitely something to be aware of when joining tables. Depending on the needs of the project, it might be of greater advantage to look to other options for joining tables – like a custom JavaScript or in some cases csvkit will also work (although the attribute table of the shp will need to be exported to csv for this to work).

The Ultimate Join and Retain Headers Test

The good news is that the caveat of randomly assigned header names has been handled with this GUI. The final test, of course, is what happens when there are two fields not identified as the primary or foreign key.

To run this test, a simple addition of a dummy column with the name “Parcel_NU” to Owner_Address with differently formatted values was added.

WMA-qgis join batch 5

The join was re-run, and lo-and-behold! All field names were retained!! And the join table’s duplicate field name “Parcel_NUM” was changed to “parcel_n_1” – YAAY!!!

WMA-qgis join batch 6

Unfortunately, though, only the headers merged, and none of the data was imported – notice that all fields from Owner_Address are <null>.

 

In fairness, I only tested this once. So I have no troubleshooting recommendations or solutions to offer as of yet.

 

 Batch Processing with Processing Modeler in QGIS

For a batch job, find the job tracking interface in the Processing Toolbox.

qgis_processing10

Depending on the complexity of the jobs and timeline of the task, jobs can be run individually (one after the other) and simultaneously (depending on your file size, RAM and CPU power – anywhere between 1-6 jobs at a time). Check out this run-down of batch processing, because there is way too much info for that topic to not have its own article.

qgis_processing11

 

Margaret Spyker

Trackbacks & Pings

Leave a Reply Text

Your email address will not be published. Required fields are marked *

Powered by WishList Member - Membership Software