dissertation – mattbk.com

Saying No

I recently got back comments on a version of my dissertation draft, so I will again have little time to do much. Things I’ve said ‘no’ to recently: setting up a general Grand Forks online forum and hosting a local nordic ski race (because we don’t have any in town). If anyone is interested in doing either of these things, go ahead, because I can’t invest the time in them right now.

Using R to Create a Winner-Takes-All Raster from Point Data

I am to the point in my dissertation where I need to summarize a fair amount of data. The full data structure will be in the final manuscript, but for now suffice it to say that I am storing data as latitude/longitude pairs in a flat-file database (and using Panorama Sheets to query and modify it as needed). Point data can be exported from this database as a CSV (actually tab-separated) file and converted to a shapefile in QGIS.

For this project, the globe is divided into 1×1 degree cells. Each cell in my study area contains at least one vector point in my locality shapefile (exported from te database, above). Many cells contain multiple points, each of which may represent a locality interpreted as representing a different paleoenvironment than the others. I am using R to summarize these data points into a raster for each time slice I wish to study.

The method below emplys a winner-takes-all method, which is to say that during conversion from point data to raster, the points within each cell are evaluated for the values in the field ‘env_id’ (environment ID number, since we can’t use text in a raster), and the most frequent value of ‘env_id’ “wins” the cell and becomes the value burned into the output raster.

### Converting a point shapefile (with environment in numeric
### field env_id) into a raster
### Winner-takes-all.

# Load libraries for raster, maptools and mode
library(raster);
library(maptools);
library(modeest);

# Read in the locality point shapefile
locs<-readShapePoints("shapefile.shp");

# Make the base raster.
basegrid<-raster(ncols=360, nrows=180);

# Create a subset for the age you are looking at
locs_subset<-subset(locs,locs$time_slice==10);

# Convert the locality shapefile to raster.  Use the most frequent
# value of numeric field 'env_id'.
r<-rasterize(locs_subset, basegrid, field='env_id', fun=function(x, ...){mfv(x)[1]});

# View the plot (centered on SA)
plot(r,xlim=c(-100,-30),ylim=c(-70,52));

# View the plot (all)
# plot(r);

# Write the raster to file
writeRaster(r,"10 Ma.gtiff","GTiff");

Output from R looks something like the top image, and can be manipulated as any other raster in QGIS etc. Scale bar shows the numeric values used to represent different paleoenvironments in the raster; it’s not actually a continuous scale but I didn’t have a need to change the default R output.

Additional notes:

In rasterize(), to use a function other than the ones enumerated in the docs, it needs to be defined within the call to rasterize() is used above and here.
One example of reading in shapefile data to R is here.
I learned about package modeest here.

Proof of Concept – Paleogeographic Maps and _Diplodon_

This figure has taken me a good deal of time to make. Not really in the actual production, but it’s been a long time gestating since conception.

The genus Diplodon, as determined by the specimens to which that name has been applied, has been around since the Middle to Late Triassic. In the dissertation dataset, this works out to the 220 Ma time slice, or the Carnian stage. This is a map of what the world may have looked like at about that time period*.

Why is this important? In general, it’s important because it shows the geographical relationship among these occurrences as it may have been when these organisms were alive. Many paleogeography or historical biogeography papers ignore what the past geographic relationships may have been and focus on mapping a paleolandscape or biogeographic distribution onto a modern map.

Consider the possibility that these occurrences are not the earliest record of this genus (you would be right). If you were looking for additional material with only these four occurrences on which to base your search, you would look geographically nearby. Looking at a modern map would limit you to southern and eastern North America, but as you can see from the figure here, the paleogeography could support a South American or even African population. (I’ll tell you later why this this probably won’t work out.)

For the dissertation, this map is important because it (and others like it) can help show how far this genus is about to spread, and how long this is going to take. You may remember that I’m more interested in names than evolutionary relationships, so I hope to answer the question: how much time and space does there need to be between occurrences before we throw up our hands and say “this genus can’t possibly have survived that long?” The map series will help define where (and where not) there was a chance for lineage continuity.

*The background map, an achievement in itself that I take no credit for, is a product of Ron Blakey and Colorado Plateau Geosystems Inc. The positions of the continents are supported by Chris Scotese’s plate tectonic reconstructions as part of the Earth System History GIS collection. The positions of the Diplodon occurrences were mathematically rotated to these positions using the PointTracker software, also from Scotese.

Getting Started with Spatial Modeling Environment

UPDATE: This attempt has been abandoned at the advice of one of the SME developers. R was suggested as an alternative.

Sometime this week I hope to get the Spatial Modeling Environment up and running on at least one of my computers (office Windows 7 PC, Macbook Pro OS X Snow Leopard, or Ubuntu 12.04(?) in VirtualBox), but I’m posting this as a shoutout to anyone who has attempted this before: the README is pretty technical, and I could use some help.

This is also a note to developers (even if they are scientists) who write “user-friendly,” “icon-based” software and then make you jump through command line hoops to install it. Stop it. What are you trying to accomplish? The more people who can install your software, the more people will use it, and the better it will become.

I don’t think I’m being unnecessarily harsh. Luckily, I really want to use this software and I’m fairly comfortable following detailed specifications and dealing with the command line, but there are others who aren’t. Hopefully I can follow the directions and install this software and use it for my dissertation; hopefully I can put together some sort of installation tutorial that is clearer than the README; and hopefully this will help someone in the future.

P.S. I’m working through Landscape Simulation Modeling this week as well, and I’m pretty pumped to try SME. How’s that for an endorsement?

Work in progress: genus duration of freshwater mussels in the Hyriidae

My dissertation is dependent on the assumption that people make mistakes in identification and naming of fossil and modern organisms. In particular, I am proposing that certain freshwater mussel genera in the family Hyriidae have supposed taxon ranges that are far longer than they should be, however in order to be taken at least somewhat seriously I need to show that this could be the case and select candidates for further investigation.

The first figure is a simple taxon range diagram for several genera that are agreeably within the Hyriidae. The mean indicators can be ignored. It is apparent that three genera in particular stand out as being long-lived. This could be for a number of reasons: the genera may actually have survived for such long periods of time, certain specimens may have been misidentified, or certain nomenclatural lumping may have occurred inappropriately.
The second figure includes more information. The width of each bean is proportional to the number of occurrences of each taxon through time. Note that the full range of each taxon is not displayed on the second figure because it was produced from age estimates from (in most cases) surrounding stage boundaries.

I will leave it to the reader to determine whether I am on the right track.

Plots were produced in R using the function below, which is being released under the CRAPL. Data is from my personal locality occurrence database, which will become available on the completion of my dissertation.

ranges<-function(locfile,genera=c("Alathyria","Velesunio"),type="box",columns=c("D_no_dissertation_id","genus_bogan","species_source","ref1","age_start_ma","age_end_ma")) {
 # Make sure beanplot is available.
 library("beanplot");
 # Read in the data from a CSV file.
 localities<-read.csv(locfile);
 # List the genera you want.
 # genera<-c("Alathyria","Velesunio");
 # Create a place to store the selections.
 selection<-list();
 start<-list();
 end<-list();
 select<-data.frame();

## For bean plots
 # Grab the whole selection
 select<-subset(localities,localities$genus_bogan %in% genera & localities$age_start_ma!="NA" & localities$age_end_ma!="NA",select=columns);
 # Add mean dates to a data frame.
 select$mean<-ave(select$age_start_ma, select$age_end_ma);
 # Get rid of the unneeded genus names in the subset.
 select$genus_bogan<-factor(select$genus_bogan);

if(type=="box") {
 ## For box plots
 # Loop through the genera
 for (i in 1:length(genera)) {
 # Grab the columns you want.
 selection[[i]]<-subset(localities,localities$genus_bogan==genera[i] & localities$age_start_ma!="NA" & localities$age_end_ma!="NA",select=columns);
 # Sort by column age_start_ma (not needed at the moment).
 # selection[[i]]<-sort(selection[[i]],by=~"age_start_ma")
 # Find the start and end dates.
 start[[i]]<-max(selection[[i]]["age_start_ma"]);
 end[[i]]<-min(selection[[i]]["age_end_ma"]);
 }
 # Make the start and end lists into a matrix...the long way around.
 df<-data.frame(start=unlist(start),end=unlist(end));
 # Transpose the matrix.
 toplot<-t(as.matrix(df));
 # Make a box plot. Don't need to worry about whiskers because there are only two values. The y-axis is reversed.
 boxplot(toplot,names=genera,ylim=rev(range(toplot)),ylab=c("Ma"));
 # Show the data being plotted.
 print(toplot);

 } else if(type=="bean") {

 # Make a bean plot. This is more complicated. The y-axis is reversed.
 beanplot(select$mean~select$genus_bogan, ylim=rev(range(select$mean)), cut=0, log="", names=levels(select$genus_bogan), what=c(0,1,0,0),bw=20,col = c("#CAB2D6", "#33A02C", "#B2DF8A"), border = "#CAB2D6",ylab=c("Ma"));
 }

}

GIS: Raster cell centroids as vector points with GDAL and R

Short intro: I’m using raster data for part of my dissertation, but I’m storing the paleoenvironment data in a database as the latitude and longitude of the center of each grid cell. This lets me query my environment localities, add new ones, and make specific datasets.

Sometimes I take existing maps of other people’s environmental interpretations and digitize them, leaving me with vector polygon data. I need to convert the polygons into the same kind of data I’ve been storing. This is surprisingly hard in both QGIS and GDAL alone, so I finally figured out how to do it in R today courtesy of Simbamangu over at StackExchange.

See the attached images for how things look at each stage.

Below are the general terminal commands, first in GDAL and then R. I will try to add more rationalization and explanation later. My operating system is Mac OS X but these should be modifiable.

GDAL in your terminal

gdal_rasterize -at -where 'freshwater = 1' -a freshwater -ts 360 180 -te -180 -90 180 90 -l "source_layer_name_in_shapefile" "shapefile.shp" "output_raster.tif"
Produces a geoTiff of your input polygons where the attribute ‘freshwater’ is equal to ‘1’. For more detail I invite you to read the GDAL documentation, it’s quite good.

R

To run through a single file:
# load necessary libraries library(maptools); library(raster); library(rgdal);


# Load the raster from a file

r <- raster("rastername.tif")

# Convert to spatial points
p <- as(r, "SpatialPointsDataFrame")

# Set name of column of raster values for later use
names(p)[1]<-"value"

# Get only the values we want (values above zero) in that column
psubset<-subset(p,p$value>0);

# Save as a shapefile (extension appended automatically) writeSpatialShape(psubset, "outputfile");

You can also loop through a directory:
# load necessary libraries library(maptools); library(raster); library(rgdal);


# Set working directory

setwd("/my/working/directory/");
# list all .tif files in working directory

for (file in dir(pattern="*.tif")) {

  r <- raster(file);
  p <- as(r, "SpatialPointsDataFrame");

  # Set name of column of raster values
  names(p)[1]<-"value";

  # Get only the values we want (values above zero)
  psubset<-subset(p,p$value>0);

# Make a new filename oldname<-substr(file, start = 1, stop = nchar(file) -4); newname<-paste(oldname,"pts",sep=" "); # Make a shapefile writeSpatialShape(psubset, newname) ; # Make a CSV file # Convert to data frame so you can drop extra columns psubsetdf<-as.data.frame(psubset) write.table(psubsetdf[,c(2,3)],file=paste(newname,".csv",sep=""),sep=",", row.names=FALSE); }

[notes] Converting MRSID to GeoTIFF using GDAL in Mac OS 10.6 (Snow Leopard)

Just found out how to convert MrSID files (from NZ geology) to other formats using GDAL.

Followed instructions here: http://www.photo-mark.com/notes/2008/nov/12/converting-mrsid-gis-files/
Needed most recent version of GDAL with frameworks (1.8 Complete): http://www.kyngchaos.com/software/frameworks
Had to change command to “/Library/Frameworks/GDAL.framework/Versions/Current/Programs/gdal_translate -of PNG /Users/Matt/Desktop/whangarei/inputmap.sid /Users/Matt/Desktop/whangarei/outputmap.png”
That makes an image file, but I think I’ll convert to geotiff for GIS:
- /Library/Frameworks/GDAL.framework/Versions/Current/Programs/gdal_translate /Users/Matt/Desktop/whangarei/inputmap.sid /Users/Matt/Desktop/whangarei/outputmap.tif
Map source: http://www.gns.cri.nz/Home/Our-Science/Energy-Resources/Geological-Mapping/Geological-Maps/1-250-000-QMAP/Map-downloads
Map coordinate system: New Zealand Map Grid (Geodetic Datum 1949)
gdal_translate man: http://www.gdal.org/gdal_translate.html

Some translation help for the Brazil 1:1,000,000 geology map from CPRM

This is not complete, but hopefully helps out other English-but-not-Portuguese speakers like myself.

Here are translations (as best as I can figure them out) of the headers in the attribute table for the Brazil 1:1,000,000 geology map.
SIGLA_UNID = unit code
COD_UNI_ES = ?
SIGLAS_ANT = acronym something
NOME_UNIDA = unit name
HIERARQUIA = unit type (formation, lithofacies, unit, etc.)
IDADE_MAX = maximum age (years)
ERRO_MAX = assuming the plus/minus on IDADE_MAX
EON_IDADE_M = eon (Proterozoic, Phanerozoic, etc.)
ERA_MAXIMA = maximum possible era
PERIODO_MA = maximum possible period
EPOCA_MAX = maximum possible epoch
SISTEMA_GE = maybe what defines the type of unit it is? things like paleontology, structure, isotope, etc.
METODO_GEO = ?
QLDE_INFER = something about direct versus inferred measurements
IDADE_MIN = minimum age (years)
ERRO_MIN = assuming the plus/minus on IDADE_MAX (years)
EON_IDADE_1 = eon (not sure how this differs from EON_IDADE_M)
ERA_MINIMA = minimum possible era
PERIODO_MI = minimum possible period
EPOCA_MIN = minimum possible epoch
SISTEMA_1 = see SISTEMA_GE
METODO_G_1 = see METODO_GEO
QLDE_INF_1 = see QLDE_INFER
AMBSEDIMEN = first-order environment, so “continental,” “marine,” “transitional,” etc. values (“ambiente” is environment)
SISTSEDIME = second-order environment, so “deep,” “siliciclastic platform,” “fluvial,” etc. values
TIPO_DEPOS = third-order environment, so “pelagic,” “delta plain,” etc. values
ASSOC_MAGM = maybe magma source type?
NIVEL_CRUS = crustal level?
TEXTURA_IG = igneous texture
FONTE_MAGM = magma source
MORFOLOGIA = igneous morphology (like “batholith”)
AMBIENTE_T = tectonic environment
METAMORFIS = metamorphic type
METODO_G_2 = see METODO_G
TEMP_PICO = peak temperature
ERRO_TEMP_ = plus/minus on TEMP_PICO
PRESSAO_PL = pressure something
ERROR_PRESS = plus/minus on PRESSAO_PL
TIPO_BARIC = how something was determined
TRAJETORIA = how something was determined
AMBIENTE_1 = ?
LITOTIPO1 = lithotype
LITOTIPO2 = secondary lithotype
CLASSE_ROC = igneous, sedimentary, or metamorphic
CLASSE_R_1 = see CLASSE_ROC, with less values filled in
BB_SUBCLAS = subclass of CLASSE_ROC
BB_SUBCL_1 = subclass of CLASSE_R_1, I think
OBJECTID = the object ID
IDADE_MA_1 = see IDADE_MAX (all NULL)
ERRO_MAX_1 = see ERRO_MAX (all NULL)
EON_ID_MAX = maximum possible eon (EON_IDADE_M?)
PERIOD_MAX = see PERIODO_MA?
MET_ID_MAX = looks like method of determining maximum age
MET_DAT_MA = looks like the details of MET_ID_MAX, like what type of dating was used
QLD_ID_MAX = (quality?) whether the maximum age was determined with direct or indirect methods
IDADE_MI_1 = see IDADE_MIN
ERRO_MIN_1 = see ERRO_MIN
EON_ID_MIN = maximum possible eon (EON_IDADE_M?)
MET_ID_MIN = looks like method of determining minimum age
MET_DAT_MI = looks like details of MET_ID_MIN, like what type of dating was used
QLD_ID_MIN = (quality?) whether the minimum age was determined with direct or indirect methods
AMBSED = see AMBSEDIMEN
SISTSED = see SISTSEDIME
TIPO_DEP = see TIPO_DEPOS
TEXT_IGNEA = see TEXTURA_IG
AMB_TECTO = see AMBIENTE_T
METAMORF = see METAMORFIS
TRAJET_PT = see TRAJETORIA
CLASSE_RX1 = see CLASSE_ROC
CLASSE_RX2 = see CLASSE_RX1
SUBCLA_RX1 = see BB_SUBCLAS (of CLASSE_RX1)
SUBCLA_RX2 = see BB_SUBCL_1 (of CLASSE_RX2)
Shape_Leng = length of polygon (circumference?)
Shape_Area = area of polygon

Wanted (and some found): Geologic maps (formation contacts) of South America countries, for use in GIS

Here’s a good ol’ plea for help from the scientific community. As my questions to “the scientific community” via Academia.edu have gone unnoticed, I’m posting this out here to see if anyone else searching for the same thing has had any luck. I’m building a GIS (geographic information system) model to determine the possible biogeographic districbution of a genus through time. What I need for this, since the fossils are from South America, is a good geologic map, either of most of the continent or of the countries of Brazil, Argentina, Ecuador, and Peru. Hunting around online hasn’t shown me anything that I want to shell out a bunch of money for, site unseen, and I’m honestly trying to avoid having to scan paper maps and register them (additionally, I haven’t been able to access our worldwide collection of paper geologic maps recently). 1:500,000 or 1:100,000 would be great, but I’d even take 1:1,000,000 at this point. I need something with formational contacts so I can plot possible distributions. So that’s the challenge of the day: have any geologists in South America discovered a good source for this type of material? Would you be willing to share or trade? Drop me an email at one of the addresses on the right sidebar if you can help.

Brazil

UPDATE 2011-08-15: I found (finally) an online version of the Geological Map of Brazil at 1:1,000,000 scale. Unfortunately it is only available in a viewer, broken up into small sections. Anyone know where I can download the full dataset, or at least that data for those sections? UPDATE 2011-08-15 1431: Thanks to Sidney Goveia over at Geosaber, I tracked down the Brazil data to Geobank, so here goes nothing! If you are using a Mac, make sure you extract the contents of the ZIP files using StuffIt Expander (if you have it) rather than Archive Utility, otherwise you don’t end up with a folder.

Peru

UPDATE 2011-08-16: You can download a formation-level geological map of Peru at the 1:1,000,000 scale from INGEMMET – the Instituto Geologico Minero Y Metalurgico. There are a few steps through the online viewer (note: this is the new viewer, so the following steps may not work the same), which may appear in Spanish at first but for some reason decided to reload partway through in English. If you want, you can try to figure out how to turn on the layer under the Map–>Geodatabase menu, but really when that menu comes up you want to click the Download/Descargas folder icon, then select the Geologia layer (SHP icon next to it). It also looks like you can download the data as a KML file for Google Earth. A structural layer (Dominios Estructurales) also looks available, and I checked out the radiometric date layer (Dataciones Radiometricas) as well, which could be useful. You might need to check the projection on the geologic map once you download it. I imported it to QGIS at first and it came up as WGS 84 (probably because of the existing project), but I dug around and figured out that it works as PSAD_1956_UTM_Zone_18S. Need a map key for formation symbols? You can download scans of the paper version of this map from here, one of which has the legend. If the correct map doesn’t come up at that link, click on “Ministerio de Energia y Minas. Instituto de Geologia y Mineria” so see the others. The symbology is different. An alternative large-scale (1:100,000) but low-quality set of maps is available here but will not easily go into GIS.

Wanted: Geologic maps (formation contacts) of South America countries, ideally for use in GIS

Here’s a good ol’ plea for help from the scientific community. As my questions to “the scientific community” via Academia.edu have gone unnoticed, I’m posting this out here to see if anyone else searching for the same thing has had any luck. [Since then, I’ve discovered ResearchGate, which I generally like better for questions like this. 2014-03-05]

I’m building a GIS (geographic information system) model to determine the possible biogeographic districbution of a genus through time. What I need for this, since the fossils are from South America, is a good geologic map, either of most of the continent or of the countries of Brazil, Argentina, Ecuador, and Peru. Hunting around online hasn’t shown me anything that I want to shell out a bunch of money for, site unseen, and I’m honestly trying to avoid having to scan paper maps and register them (additionally, I haven’t been able to access our worldwide collection of paper geologic maps recently).

1:500,000 or 1:100,000 would be great, but I’d even take 1:1,000,000 at this point. I need something with formational contacts so I can plot possible distributions.

So that’s the challenge of the day: have any geologists in South America discovered a good source for this type of material? Would you be willing to share or trade?