This summer is going by quickly. And in front of us we see that next month T starts Kindergarten, which is probably the point where others start to have a significant impact on his growth. (While his daycare and preschool staff are wonderful caregivers, we are clearly driving T's attitudes towards how he approaches his world as well as skills he develops, academic, social, and physical.) Some touchpoints.
1. He has learned to take pride in competency and the effort it took him to get there. This shows in reading, playing piano, taekwondo, building things (LEGO and wood), helping with cooking and chores around the house, and most facets of life.
2. While we are reserving judgment on how smart he is (ok, we are pushing him into Kindergarten ahead of schedule, but that as much a result of the fact that preschools advance kids on their birthdays while the public school system advances people in cohorts as anything else (i.e. he has been in pre-K this past year since his birthday, but the standard public school timeline would have him repeating a year before he is on their timeline)), he does have an attention span and a pretty good level of perseverance (for a 4-year old). And we think the fact that he has the attitude that it is good to work hard to do something difficult as he starts his school years is a success on our end.
3. We have had guests with us this week. One of their comments was he seems like he is the type of kid who will never complain about things being unfair because he did not get something. We are pretty sure that he has learned to value people over things (although, if the things in question are books, it is a close call), And I think that half of the attraction of usual rewards like candy, toys, and trophies is because he gets from others that he is supposed to like it. (we occasionally have candy laying out, and don't realize there is an issue until other kids come into the house, because T won't go after them)
4. He likes helping around the house. Actually, when he makes a mess, he tries to clean up before we find out about it. It is really funny when he needs some supplies to do it. Last week he was asking me to help him get paper towels, "but don't come downstairs to look at the table"
5. He approaches the world engaged and looking around. We see that in his visits to museums, when we go on walks in parks and trails, and especially in taekwondo (my impression that this is the biggest differentiater among the pre-schoolers is their ability to focus on the teachers and class.
6. He is generally happy. The main testimonials are a couple kids that are in both his pre-school and his taekwondo school who have stated that "T is always happy"
7. He plays with other kids. This was a bit of a concern up to when he was around 3 1/2 or so. One thing that helps is there are a number of kids just a little older than him on our street (mostly girls). Now, the preschool teachers say that our little turtle is out of his shell and they have to tell him to stop talking. One of the joys about taking him to taekwondo is that we get to watch him interact with the other kids. And while we don't encourage it, he interacts with all of the other kids as they are waiting their turns for drills and such. As one of the smallest kids in the school (I think he is now the second smallest, one of the owners refers to him as one of their peanuts), We don't ever see him being one of the really talkative kids in his class and we expect that he will always take time to get used to new adults (neither he nor his sister were ever pass around kids),
8. He is learning how to present what he knows. Since we actually talk about what is going on at the museum or science center, we have been having him show the other kids and parents (advantage of having academics as parents, we actually know more than the kids volunteering there about whatever we are looking at, most of the time). These past few months he has been getting markably better at it.
9. He adores his little sister. (and she does likewise). From day 1 he has enjoyed being the big brother and getting to play and help care for his baby sister.
10. He still does not sleep on his own. Back when he was a baby with colic, the pediatrician warned us that he probably would not sleep on his own until he was four. Well, he is four, what's up (jk) At this point he can sleep by himself and does occasionally, but he really likes to check every now and then that someone is around. And he likes the snuggle. And, well, since I can pretty much sleep through the occasional check-in it does not bother me so why not. And I'm sure that there will be an end to this.
11. He is very careful. In some sense this is good. There are a lot of things we never had to worry about. He tested things before trying to eat them, We never really needed baby gates with him (because he figured out how to go backwards down before he started getting to fast for us to catch him), we can let him cut food with a (plastic) knife. We probably did not do a complete job on babyproofing things. But this goes along with not being daring in trying new things, until we go through with him first.
12. Not very creative. This somewhat goes with being careful, but he never had the phase of coloring and drawing with abandon. So we went from scribbling to deliberate outlining shapes, but not the randomish stick drawings that are stereotypical of pre-schoolers. I've started him on learning to draw using primitives (assembling basic shapes until they look like something) but the wild abandon that many kids keep through early elementary school never took with him.
13. He is no longer the the island of stillness in the chaos of toddler daycare like when he was 3, but he still tends to focus on one thing for extended periods of time, even as all of his peers fly from task to task.
14. He is becoming more expressive. With people he knows, he is willing to express himself.
15. In Fate Approaches terms, he would be Good at Careful, Fair at Quick and Clever, Average at Forceful and Flashy, and Poor at Sneaky.
16. We pretty much skipped tantrums. He has occasional crying spells, but his tantrums seem like he is trying to imitate what he thinks a tantrum should look like.
17. He has little sense of possessiveness. His daycare teachers have commented he does not fight over toys, because he does not have that sense of possessiveness and greed. He has more of a sense that something is his, but it still is not that strong.
18. He is pretty much what you see is what you get. No deceptiveness at all. Well, he is in the stage of saying what he thinks we want to hear (which would be considered lying if he were older), but even that is not that bad (exhibit: saying "I need a paper towel. But don't come downstairs, there is no mess")
Now we are getting ready for Kindergarten. We are going to a local catholic school, because they are the only ones who would consider taking in someone early (with recommendations from his pre-K teachers and an assessment, of course). They are going to get a kid who reads Dr. Seuss before entry into Kindergarten. Judging by their reactions, they can handle a wide range in a Kindergarten class. We have found out in his daycare/pre-school that T responds well in settings with small sizes (also helped a lot in taekwondo) and we are looking forward to his starting school.
Wednesday, July 29, 2015
Tuesday, July 28, 2015
Lessons in teaching: teaching exploratory data analysis with R
Last spring, I took over a course labeled as information systems engineering. This is aimed at sophomores in engineering. Historically, this course focused on using the MS Access database. I was asked by the department to take this over after several years of commenting that our engineering seniors have inadequate computer programming skills, as evidenced by the amount of effort they spend on their senior projects doing tasks that would have been much simpler if they tried programming. Last year some of the faculty tried experiments in their classes where they had students code in an assignment (generally they asked for C). In every case this went very badly. So they asked me to take this course and change it so that it covered programming and specifically to use R. (I am effectively the primary data analysis faculty here). In keeping with the course title, I chose to focus the course on data analysis, with one month focusing on databases and how to think about data problems (and giving them time to gradually learn R), the rest on exploratory data analysis. I used as the primary text Data Manipulation with R by Phil Spector, and as supplements GGplot2 by Hadley Whickam and An Introduction to Data Cleaning by Edwin de Jonge and Mark Van Der Roo. I presented the CONVO framework for thinking about data problems based on Thinking about Data by Max Shrum.
As freshmen, they would have has CS0 (the Association for Computing Machinery designation of introduction to computer science for non-computer science/electrical engineering majors) material covered over a two course sequence that also covers mathematics for engineering (primarily linear algebra). The language of instruction is primarily Matlab, but they also cover C and, depending on instructor, Python (there is one module that is sometimes covered by Physics faculty, and they like to use Python). For databases, there is another course on databases taught by an adjunct faculty who used to teach databases for information systems.
For tools I used SQLite (more on why this and not MS Access later), SQLite Manager, R, and R Studio. Prior to the end of the previous semester I sent everyone an email with links to videos introducing them to R and R Studio and encouraged them to introduce them to R through typing out a tutorial (I explained that they would actually learn R over the semester, the typing exercise was to ensure they had seen everything once before we actually needed it in class.).
For assessments, there were weekly labs for computer knowledge, exams mostly covered how to think through data problems. A semester project with two milestones (plus completed project) was the main way to assess how well they developed computer programming competency. Each week, we covered one
We had three datasets that I used as teaching and lab examples throughout the course.
Some observations and notes
1. SQLite vs MS Access. I was surprised to find out that MS Access has a relatively low size limit on databases. It was not able to handle either the National Survey of Family Growth (expected) nor could it handle a single PUMA for the American Community Survey (this was a surprise). That meant we had to use SQLite for the entire course. (my Mac students were happy since this put them on equal footing with the PC students). Next time I will just use SQLite. (and use MS Access only to explain why we are not using MS Access)
2. Learning R. In a pre-class survey, the entire class indicated complete lack of confidence in programming to fulfil a task (expected). I think that the standard programming language belief that it is always easier to learn a second programming language failed in this case, because I did not realize just how bad their first experience was. While the first month was very intentionally a confidence building exercise, I think that for a portion of the class, they really needed to start from scratch. Next time around, I will spend an entire period doing nothing but walking the class through R.
3. Data manipulation. This included covering data structures (text, dates, dataframes), regular expressions, plyr, reshape, and missing values imputation. Essentially the Hadleyverse v. 1. One issue here was the wide variety of potential topics. While I think every topic got used by someone in their semester project, some of the student evaluations complained about my teaching topics that were not on the exam. Essentially, for people who are only used to computing on numbers, the entire topic of data manipulation seems to be a heavy cognitive load.
4. Visualization. I taught qplot, but I think that I should have gone straight to ggplot. I think that either I go the traditional route and build every type of plot as an individual entity, or I present the grammar of graphics approach and build plots. Either way, now that I've taught it, I don't think qplot helps in either, and it is a lot less capable. (every groups final project pretty much had to transition to ggplot)
5. Projects. I let the students find their own datasets and questions, subject to the fact that they had to write the project purpose using the guidelines we covered in thinking about data. The big division in quality of the projects was the richness of the dataset. Next time, I will be a lot more strict on the dataset, in particular, I had a subjective guideline that they should not consider it practical to look at the whole dataset. In some cases, this still was a very small volume, and it made for a trivial and uninteresting report.
6. Thinking about data. I used Max Strum Thinking about data framework where for a data project, one should identify the COntext, Need, Vision, Outcome. Every week we read a contemporary news article that included a data component (mostly from the fivethirtyeight.com website) Each discussion opened up with class discussion to summarize the article into this CONVO framework, then a discussion of the analysis in the article itself. This actually worked out pretty well. Each exam had at least one CONVO focused question, and generally they did well (and of the people who did not, there were no surprises based on class participation)
7. News articles. I had a wide range of news articles that we covered in a weekly discussion, drawn mostly from fivethirtyeight.com, the Upshot column from the New York Times, and the data series from the Washington Post. Each article was assigned at the end of the week, for discussion in the Tuesday morning lecture. Discussion opened up with a summary based on the CONVO framework, then we evaluated the data analysis presented in the article, followed by how we could change it to make it better or to answer a different question. These class periods were fun. My goal was to take 15 minutes for each article, in a few cases we were on a roll so we let it go to 30 minutes. I had good participation. And it showed in the CONVO question on exams, and generally people did well when I asked them to imagine a data analysis based on data presented on a test (this was the last part of a multi part question, where the other parts were about the data presented). One disappointing thing was that when it came time for course evaluations, I was rated poorly with how the class material relates to the everyday world (like all engineering courses do). So I have to figure this one out.
8. Course evaluations. When course evaluations came in, they were roughly a uniform distribution, which makes them very hard to interpret. In addition, comments that expressed weaknesses were mirrored in the comments that expressed strengths. So that meant that I had terrible averages and a chat with my department chair. Fortunately for me, the generally accepted belief is that the broad diversity in the teaching evaluation is due to pushing the students harder (i.e. making them do programming again) and that this is part of improving the department as a whole. Hopefully when he meets the dean to review the faculty the dean agrees with this assessment as well.
9. Class projects. About a quarter of the projects (teams of1, 2, or 3) were genuinely impressive. Many projects with 100,000s of records, a few with millions of records, several dimensions, and data analysis that used layered visualizations to explore. Most projects were a little more modest, thousands of data points and reasonable visualizations. Some projects were personal in nature (looking at issues in their home towns), others were fun (several projects revolved around music or sports) A number showed evidence of lack of confidence, shown in very unambitious data sets. The issue with this group is how hard to push. One of the known problems with CS0 or CS1 is that they complete destroy people's confidence in programming, and a substantial portion of those who take one of their courses completely leave the field, or in the case of engineers, avoid programming at all costs in the future.
Next time around:
1. Using a framework like CONVO (Max Strum) works. I am pretty sure everyone at least learned how to think about problems and settings.
2. Skip MS Access. I think I probably spent too much time on databases and working with the MS Access interface. Next time, going straight to SQL is probably enough, given that the limits on MS Access means that we cannot do interesting datasets.
3. I liked using three datasets the entire course. Actually, some of them used the American Community Survey for their semester projects (after reading in multiple PUMA, e.g. an entire metropolitan area instead of only one PUMA).
4. One question that I will have to think about is how much of a do-over of CS0 this course will be. Clearly, as it is most of the class seems to get it the second time around and a good portion are pretty impressive. But there is a pretty large fraction that finished CS0 absolutely convinced that programming is forever beyond them.
As freshmen, they would have has CS0 (the Association for Computing Machinery designation of introduction to computer science for non-computer science/electrical engineering majors) material covered over a two course sequence that also covers mathematics for engineering (primarily linear algebra). The language of instruction is primarily Matlab, but they also cover C and, depending on instructor, Python (there is one module that is sometimes covered by Physics faculty, and they like to use Python). For databases, there is another course on databases taught by an adjunct faculty who used to teach databases for information systems.
For tools I used SQLite (more on why this and not MS Access later), SQLite Manager, R, and R Studio. Prior to the end of the previous semester I sent everyone an email with links to videos introducing them to R and R Studio and encouraged them to introduce them to R through typing out a tutorial (I explained that they would actually learn R over the semester, the typing exercise was to ensure they had seen everything once before we actually needed it in class.).
For assessments, there were weekly labs for computer knowledge, exams mostly covered how to think through data problems. A semester project with two milestones (plus completed project) was the main way to assess how well they developed computer programming competency. Each week, we covered one
We had three datasets that I used as teaching and lab examples throughout the course.
- Titanic survivors
- National Survey of Family Growth
- American Community Survey (U.S. Census, Pittsburgh North PUMA)
Some observations and notes
1. SQLite vs MS Access. I was surprised to find out that MS Access has a relatively low size limit on databases. It was not able to handle either the National Survey of Family Growth (expected) nor could it handle a single PUMA for the American Community Survey (this was a surprise). That meant we had to use SQLite for the entire course. (my Mac students were happy since this put them on equal footing with the PC students). Next time I will just use SQLite. (and use MS Access only to explain why we are not using MS Access)
2. Learning R. In a pre-class survey, the entire class indicated complete lack of confidence in programming to fulfil a task (expected). I think that the standard programming language belief that it is always easier to learn a second programming language failed in this case, because I did not realize just how bad their first experience was. While the first month was very intentionally a confidence building exercise, I think that for a portion of the class, they really needed to start from scratch. Next time around, I will spend an entire period doing nothing but walking the class through R.
3. Data manipulation. This included covering data structures (text, dates, dataframes), regular expressions, plyr, reshape, and missing values imputation. Essentially the Hadleyverse v. 1. One issue here was the wide variety of potential topics. While I think every topic got used by someone in their semester project, some of the student evaluations complained about my teaching topics that were not on the exam. Essentially, for people who are only used to computing on numbers, the entire topic of data manipulation seems to be a heavy cognitive load.
4. Visualization. I taught qplot, but I think that I should have gone straight to ggplot. I think that either I go the traditional route and build every type of plot as an individual entity, or I present the grammar of graphics approach and build plots. Either way, now that I've taught it, I don't think qplot helps in either, and it is a lot less capable. (every groups final project pretty much had to transition to ggplot)
5. Projects. I let the students find their own datasets and questions, subject to the fact that they had to write the project purpose using the guidelines we covered in thinking about data. The big division in quality of the projects was the richness of the dataset. Next time, I will be a lot more strict on the dataset, in particular, I had a subjective guideline that they should not consider it practical to look at the whole dataset. In some cases, this still was a very small volume, and it made for a trivial and uninteresting report.
6. Thinking about data. I used Max Strum Thinking about data framework where for a data project, one should identify the COntext, Need, Vision, Outcome. Every week we read a contemporary news article that included a data component (mostly from the fivethirtyeight.com website) Each discussion opened up with class discussion to summarize the article into this CONVO framework, then a discussion of the analysis in the article itself. This actually worked out pretty well. Each exam had at least one CONVO focused question, and generally they did well (and of the people who did not, there were no surprises based on class participation)
7. News articles. I had a wide range of news articles that we covered in a weekly discussion, drawn mostly from fivethirtyeight.com, the Upshot column from the New York Times, and the data series from the Washington Post. Each article was assigned at the end of the week, for discussion in the Tuesday morning lecture. Discussion opened up with a summary based on the CONVO framework, then we evaluated the data analysis presented in the article, followed by how we could change it to make it better or to answer a different question. These class periods were fun. My goal was to take 15 minutes for each article, in a few cases we were on a roll so we let it go to 30 minutes. I had good participation. And it showed in the CONVO question on exams, and generally people did well when I asked them to imagine a data analysis based on data presented on a test (this was the last part of a multi part question, where the other parts were about the data presented). One disappointing thing was that when it came time for course evaluations, I was rated poorly with how the class material relates to the everyday world (like all engineering courses do). So I have to figure this one out.
8. Course evaluations. When course evaluations came in, they were roughly a uniform distribution, which makes them very hard to interpret. In addition, comments that expressed weaknesses were mirrored in the comments that expressed strengths. So that meant that I had terrible averages and a chat with my department chair. Fortunately for me, the generally accepted belief is that the broad diversity in the teaching evaluation is due to pushing the students harder (i.e. making them do programming again) and that this is part of improving the department as a whole. Hopefully when he meets the dean to review the faculty the dean agrees with this assessment as well.
9. Class projects. About a quarter of the projects (teams of1, 2, or 3) were genuinely impressive. Many projects with 100,000s of records, a few with millions of records, several dimensions, and data analysis that used layered visualizations to explore. Most projects were a little more modest, thousands of data points and reasonable visualizations. Some projects were personal in nature (looking at issues in their home towns), others were fun (several projects revolved around music or sports) A number showed evidence of lack of confidence, shown in very unambitious data sets. The issue with this group is how hard to push. One of the known problems with CS0 or CS1 is that they complete destroy people's confidence in programming, and a substantial portion of those who take one of their courses completely leave the field, or in the case of engineers, avoid programming at all costs in the future.
Next time around:
1. Using a framework like CONVO (Max Strum) works. I am pretty sure everyone at least learned how to think about problems and settings.
2. Skip MS Access. I think I probably spent too much time on databases and working with the MS Access interface. Next time, going straight to SQL is probably enough, given that the limits on MS Access means that we cannot do interesting datasets.
3. I liked using three datasets the entire course. Actually, some of them used the American Community Survey for their semester projects (after reading in multiple PUMA, e.g. an entire metropolitan area instead of only one PUMA).
4. One question that I will have to think about is how much of a do-over of CS0 this course will be. Clearly, as it is most of the class seems to get it the second time around and a good portion are pretty impressive. But there is a pretty large fraction that finished CS0 absolutely convinced that programming is forever beyond them.
Thursday, July 09, 2015
Data Manipulation with R by Spector: Book Review
Data Manipulation with R by Phil Spector
My rating: 4 of 5 stars
The quality that programming language based data analysis environments have that menu driven or batch environments do not is the ability to manipulate data. That means transforming data into usable forms, but it also means cleaning data, manipulating text, transforming data formats, and extracting data from free text. While R falls into this category of data analysis environment, almost all of the available material focuses on the application of statistical methods in R. This fills a much needed niche in how to process data. I still do not regard R as my goto tool for data manipulation, but this book means I am more likely to stay in R than otherwise. I used this as a textbook in a lower division data analysis course and the class went from a group that only half remembers Matlab to being able to process and analyze fairly large datasets. A comment I received was "I looked back on the work done in this project and I cannot believe I actually did that!"
The first part of the book is reading in data and writing out results. It discusses both text (csv, delimited, fixed) and working with relational database. One note is that the database they use is MySQL. This was easily convertible to SQLite, which is what I used in my class because my students are not IT savvy. I also used supplementary material for SQL (which is readily available) Then putting things together into data frames.
Next are a series of data types: datetimes, factors, numbers. For people who have only worked in Excel, these are deal breakers. Even using Excel, these are areas that often go unnoticed by students and lead to problems.
Character manipulation is about working with strings and a gentle introduction to regular expressions. For many of my students, they have never manipulated text programmaticly before, so this chapter was quite successful. For Regular expressions, well it provided a taste of it, enough to solve the lab assignment. I supplemented it with other material, but noone was going to learn regular expressions in 5 pages.
The best part of the book was the sections on aggregating and reshaping data. This is what made what my students were doing with R start to look like magic. Aggregations using the apply family of functions, reshape to convert data into long or wide formats, combining data frames, and an introduction to vectorization. This is not going to make anyone a functional programmer, but these are key idioms and Spector spent a lot of time here.
I am not going to prefer R over Python for working with text and manipulating data, but Data Manipulation with R shows how to do some non-obvious things. The examples are all interesting enough to be useful, and they all work as is. And this goes deep enough into some pretty powerful capabilities that expanded my students understanding of what is possible. While it is becoming dated (an update would have to include dplyr), the approaches it provides put the reader well on their way to being an accomplished R programmer, not just someone who feeds data into functions.
View all my reviews
My rating: 4 of 5 stars
The quality that programming language based data analysis environments have that menu driven or batch environments do not is the ability to manipulate data. That means transforming data into usable forms, but it also means cleaning data, manipulating text, transforming data formats, and extracting data from free text. While R falls into this category of data analysis environment, almost all of the available material focuses on the application of statistical methods in R. This fills a much needed niche in how to process data. I still do not regard R as my goto tool for data manipulation, but this book means I am more likely to stay in R than otherwise. I used this as a textbook in a lower division data analysis course and the class went from a group that only half remembers Matlab to being able to process and analyze fairly large datasets. A comment I received was "I looked back on the work done in this project and I cannot believe I actually did that!"
The first part of the book is reading in data and writing out results. It discusses both text (csv, delimited, fixed) and working with relational database. One note is that the database they use is MySQL. This was easily convertible to SQLite, which is what I used in my class because my students are not IT savvy. I also used supplementary material for SQL (which is readily available) Then putting things together into data frames.
Next are a series of data types: datetimes, factors, numbers. For people who have only worked in Excel, these are deal breakers. Even using Excel, these are areas that often go unnoticed by students and lead to problems.
Character manipulation is about working with strings and a gentle introduction to regular expressions. For many of my students, they have never manipulated text programmaticly before, so this chapter was quite successful. For Regular expressions, well it provided a taste of it, enough to solve the lab assignment. I supplemented it with other material, but noone was going to learn regular expressions in 5 pages.
The best part of the book was the sections on aggregating and reshaping data. This is what made what my students were doing with R start to look like magic. Aggregations using the apply family of functions, reshape to convert data into long or wide formats, combining data frames, and an introduction to vectorization. This is not going to make anyone a functional programmer, but these are key idioms and Spector spent a lot of time here.
I am not going to prefer R over Python for working with text and manipulating data, but Data Manipulation with R shows how to do some non-obvious things. The examples are all interesting enough to be useful, and they all work as is. And this goes deep enough into some pretty powerful capabilities that expanded my students understanding of what is possible. While it is becoming dated (an update would have to include dplyr), the approaches it provides put the reader well on their way to being an accomplished R programmer, not just someone who feeds data into functions.
View all my reviews
Subscribe to:
Posts (Atom)