Monday, August 30, 2010

Why I am a researcher

I was asked yesterday about the value of a Ph.D. It is a fair question. When I was in graduate school, occasionally a first year student would say something to the effect that he was in grad school because he wanted to make money. We all informed him that he assuredly had the skills to do very well, with an MBA (equivalent) from the excellent business school down the block at our university. And that he would make more money for considerably effort then the road that he was starting on. I did not start my professional life as an academic, so this is a question that in a sense has an answer.

After I finished college I went to D.C. to a policy school for a professional master's degree and to work. (because all Master's in policy students usually work while in school). Over the next four years I worked for a legislature, a non-profit, a government agency and at a government contractor. I did this as a quantitative analyst and as a logistician. And it was fun. I worked on environmental studies and projects that were at the cutting edge of applied economics and modeling. There is work I did that I could see in the news over a decade later as they came to fruition or the issues they addressed began to be considered by commercial enterprises. I had few projects that were routine. There was always the sense that you were developing something new, and while there may be ways to do that, what you would find along the way was not always obvious in foresight.

But I also realized that the only reason I was working on these projects was because of my boss(es) along the way. And this is not what work would always be like. Looking ahead, I realize that there were really two ways to have a career doing new things like this. One was to work for twenty years and establish yourself at the top of your field (like my last boss). The other was to get a Ph.D.

And this was the value of the Ph.D. Not the knowledge gained along the way, but the opportunities to work on things that were new and hard. At one meeting with non-researchers I've explained I do not want to be working on problems unless someone else has examined it and failed to find a resolution. And at this time, I can say I have this. Will this continue? As a non-tenure track academic I have a good place working on difficult applied problems, where there are both technical and social issues that need to be solved (as I've commented to one doctor, if the social issue is what is preventing the situation from being resolved, it is an issue that needs to be solved.) Is there a place long term for this? One of the criticism of academia is that it focuses on problems that are theoretically hard rather then hard in practice, even when the reason it is hard in practice is because of the lack of development in theory. In Operations Research, there is pride in working on hard problems, but the definition of hard is based on the technical difficulty of the math, while I tend to use the definition that smart and talented people have tried and failed as my benchmark. Whether there is a place for that remains to be seen.

Monday, August 23, 2010

Why I ____

I remember an article and conversations some time ago asking the question which is harder: Marriage or Motherhood? Replacing 'Motherhood' with 'Parenting', I think both of us have the conclusion that given how easy Marriage has been, we are going to come on the 'Parenting is harder' side of the argument. Even though we are not among those people who knew each other for many years, joining lives was not a big adjustment, partly because our lives were (and are) in the midst of ongoing changes so it was more of merging of two moving highways then big changes. And we both have well exercised sharp elbows. At least neither of us ever picked up the habit of hiding sharp elbows until the worst possible time. (or hiding them at all)

People who rhapsodize over marriage or parenting have tended to make both sound very unattractive. Both criticize those who have not yet joined them as selfish (I sometimes wonder if my friends who criticize their unmarried peers as selfish but do not have kids hear the criticism that those who are married without kids are being selfish.) Both describe the life post-marriage/parenthood as a dying of self and life and the parts of life that are interesting. And the wonderful alternative presented is often something unappetizing. A life without flavor. We've been glad that this has not come to pass in marriage, but parenting looks a little more problematic. To use vocabulary one mother used, working with the things that come in marriage involved skills and attitudes that were already part of our operating systems. We're not sure that Parenting is. One person who was ready to provide wisdom asked us the question "Are you ready? Most people are not sure they know how to raise a child before their first" Our answer was "Of course not! We're pretty sure that after our first child we still won't know how to raise a child."

The other problematic side is the reality that some aspects of our lives will fall away or change markedly. We are fortunate that we have many examples of people who did not disappear off the face of the earth when they had children, and so we think there is hope.

But, as I tell my graduate students and the post-docs I work with, the reason I write such detailed messages is because I have no memory. And while I still have a nice record of photographs, I do not live life through a viewfinder (and I never will). So to remember, I have to write it down. And so I won't loose it, I am putting my faith in Google (and maybe a printout) as a place of record. So begins a series of notes on the various aspects of my life now. And in a few years, I may look back on it fondly (or not) as we bring a child into this world and all that is before us.

Friday, August 20, 2010

WinBUGS and JAGS differences

One of my graduate students and I have been working on input modeling in a setting where the amount of historical data can range from a lot to almost no (i.e. 0, 1, 2) occurrences. So both of us have been learning Bayesian methods and developing means to work with our setting. Part of this involves working with the standard tools and building blocks for Bayesian methods, in particular Monte Carlo Markov Chains (MCMC). To implement these methods, you can program your own, or use some standard modeling frameworks. Some of these frameworks are essentially programming libraries. But the primary ones are Bayesian inference Using Gibbs Sampling (BUGS) derivatives. There are two families: WinBUGS/OpenBUGS and JAGS.

WinBUGS is the direct successor to BUGS, with OpenBUGS being the open source next generation. BUGS is both a software package as well as a model specification language for MCMC models. One issue with WinBUGS/OpenBUGS is that it is written in Component Pascal. WinBUGS depended on the BlackBox component builder, which is only available in MS Windows (as does OpenBUGS, but OpenBUGS is essentially the open source next generation of WinBUGS once it is done). JAGS is an opensource implementation of the BUGS model specification and command language and is written in C++ on top of some open source libraries. JAGS exists specifically because of the inability to port WinBUGS. A review of the literature shows that WinBUGS/JAGS/various other libraries, all have areas of strength and weakness. Research groups working extensively with Bayesian methods use all of the them at various times.

But, this is our first project. So one thing we did was to examine three frameworks while we were learning Bayesian methods. I primarily used JAGS, one graduate student worked with WinBUGS, a third graduate student (the strongest programmer) used MCMCpack (an R package).

Our intent was to use either WinBUGS or JAGS when we actually implemented our methods on real data, so the question was which to use. Some obvious differences that came up.

  1. When running using R, the RWinBUGS interface opens the WinBUGS application to run BUGS model. We anticipate having to run this on ~50 separate data sets a day as part of our methodologies. This time of interface is unstable (both theoretically, and practically as my student's computer ends up hanging a lot)
  2. WinBUGS is a lot more particular about the BUGS model specification. Or, since the WinBUGS people are the ones who developed the BUGS model specification, it is probably better to say that JAGS is much more forgiving. Some gotchas that came up included:
  • When specifying distributions, parameters in WinBUGS could not be expressions. Only single variables. I.e. something like (num1*num2) is not allowed. So we have to perform all computations, then specify the distribution separately. JAGS allows passing an expression as a parameter.
  • WinBUGS does not have an exponent operator (either '^' or '**') JAGS does. Since one parameter that is important and common is 1/variance, this is very useful when the data is expressed in terms of standard deviation. We have to use expressions like (num1*num1) all over the place.
  • JAGS from R is seamless (R2jags, rjags). You call JAGS, and it returns without opening any IDEs. You can see the JAGS output in the terminal, but JAGS runs in command line, then is done. No IDE left to close. (R2jags was designed to work the same way as RWinBUGS, but I think it is much more elegant)
  • The JAGS documentation includes some more differences, but we have not hit those yet.
So, for this project, given that we have a certain amount of risk based on the fact that we are figuring out Bayesian methods and MCMC along the way, we're going to go with JAGS as our engine. Someday we'll get better at using WinBUGS/OpenBUGS and MCMCpack, but not today.

Thursday, August 05, 2010

Book review: The Visual Display of Quantitative Information by Edward Tufte

Edward R. Tufte: The Visual Display of Quantitative Information by Edward R. Tufte

My rating: 4 of 5 stars

The book goes through many examples of displaying information visually. And it does so through a historical context, reminding us that the issues that are faced and the many ways to (mis)-represent them have been around for centuries.

What I'm reminded of is that statistics and data analysis is not just about methods, but they are means of communication. And like all methods of communication, they can be made less clear whenever you have something other than clear communication as the goal.

Many of the techniques discussed in the creation of various plots and charts are artifacts of when printing graphics was done by ink and pen, and difficult to reproduce. But the book's focus is not on the techniques of making these visual displays, but on the principles in designing efficient displays.

I use a number of data analysis packages and packages, Excel, R, Python, etc. After reading this it makes me look at these other packages and their options differently, wanting to evaluate the choices there designers made. It also makes me look at charts and graphs on internet sites, newspapers and magazines differently. I imagine the author would consider that to be a success.

View all my reviews >>