This is an update on my Citibike data/D3 charts experiment. The chart above represents a little over a month’s worth of Citibike station data at one station. One can kind of see a pattern here, but there’s a lot of noise.
As I watched the data for this chart fill in over the last month, I began to see the story of this station. It is, unsurprisingly, different on weekdays versus weekends or holidays. During the week, there are very few, if any bikes available at this station overnight. Between 06:00 and 09:00 in the morning, the station fills up. It usually remains that way until the evening rush, when it empties out by around 20:00. On weekends and holidays, things are a more erratic, I also suspect that the weather has a much bigger effect on the weekend.
Prior to collecting this data, my sporadic observation (usually on weeknights and weekends) led me to believe that there were rarely any bikes available at this station. When, in fact, there are large swaths of the day when this station is lousy with bikes. So, I learned something.
Unfortunately, my original vision of how this chart would work isn’t doing a great job of getting the point across. There’s too much noise. Adding filters to allow one to turn off different day types (weekday vs. weekend) might help. I had also thought about making the older data less opaque, thereby giving newer data more prominence. But, I’m not sure if that would help much. Perhaps the line chart isn’t the way to go at all. The first chart Abe Stanway uses in his analysis of Citibike weather is much cleaner and the pattern is clear.
I’m not sure what’s next for this experiment. Rather than refine this chart, I’m thinking about going in a slightly different direction. In thinking more about the story here, I want to dive into the trip data and see where all these bikes are coming from. Since central Park Slope has good Citibike coverage, but not many subway stations, I suspect that most of the trips to Park Place & 7th Avenue on weekday mornings originate from that part of the neighborhood, but I could be wrong. A lot of the data visualizations done on Citibike data have shown bike movement, so there should be plenty of examples for me to learn from.
I’ve been wanting to gain some more proficiency in D3.js. I went through several tutorials at some point in 2016, but I really needed a project to work on. More recently, I’ve become interested in Citibike data, specifically, the patterns of available bikes at given locations. Why not graph it?
Citibike maintains a comprehensive dataset of all trip data, but does not seem to maintain a historical dataset for stations. So, I hacked together a python script1 to gather some historical data from the station near my apartment.
I’m a total D3 neophyte, so I found a sample multi-series line chart that I could use to start viewing the data I collected. I hacked another script1 to transform the data into something that would be easier to chart.
After a bit of modification to the sample chart, I got this:
It looks cool, but it’s really not what I was going for, and it’s pretty useless.
After some more work, I’m getting closer:
…but, it still needs a lot of work. Specifically, the x-axis labels are worthless. Also, I really don’t have enough data there to show what I want to show. I think I need at least 30 days as opposed to the 7 I currently have. It doesn’t help that the system was closed for a couple of those days due to winter weather.
I’ll check back in a couple weeks to see how it is shaping up.
Back in 2007, there was some talk about personal unit tests. The idea was to apply unit testing, a tenant of Test Driven Development, to some of the mundane, yet important daily tasks of one’s life. Done properly, one could see what their pass rate was, and address problem areas.
This sounded like a great idea. I created a spreadsheet complete with conditional formatting to track small tasks like “exercise”, “healthy lunch” and “practice guitar”. While it was useful to see how I was doing, the overhead of tracking all of these little tasks was very high. If it could only be automated, like unit tests in TDD, it would be so much better.
Now, 7 years later, our devices are tracking all sorts of things about us. Perhaps most of these unit tests could be automated by querying the repositories of personal data that are being created. Data that can’t be obtained automatically, could come from something like Reporter. I’ve seen several gorgeous visualizations of this data–Aprilzero immediately comes to mind, but I don’t think they are all that actionable throughout the day. This is where unit tests could really shine, maybe.