Planes, Trains, and Automobiles

Introduction

Today’s blog post title comes from the 1987 classic Planes, Trains, and Automobiles or A Ticket for Two in German. I just got back from a whirlwind trip to the United States where I was in four cities in 10 days (San Francisco, CA, Chicago, IL, Cleveland, OH, and Princeton, WV) all with a starting and ending point of Paris, France. To accomplish all of this travel I took airplanes, trains (if you include the local Chicago CTA), and automobiles. In this post we’ll look at the most time and cost effective way to do this trip. There’s no difficult math or modeling in this post, but it can give you an idea of the power of data manipulation and descriptive statistics in R using the package dplyr.

TAKE AWAY POINTS FROM THIS POST

  • The package dplyr is a useful and clean way to organize data and get descriptive information.
  • The package dplyr can also be used to create new columns to manipulate data.
  • Don’t fly from Cleveland, OH to Princeton, WV unless you have a lot of money to burn.

Data

My data contains legs for four one-way trips: 1) San Francisco to Chicago, 2) Chicago to Cleveland, 3) Cleveland to Princeton, and 4) Princeton to Cleveland. I then found out the time and price for three methods of travel: 1) airplane, 2) train, and 3) automobile. For the airplane data, I took the cheapest one-way flight based on a search of a website that aggregates across airlines and included the price and estimated duration of the flight. Since there was no available flight into Princeton, WV my airport was Beckley, WV. For the train data, I used Amtrak train estimates taking the time and price of the first available, cheapest ticket. Since there was no available train into Princeton, WV my station was in Norfolk, VA. For the automobile data, I assumed you already had a car and were only paying for gas. I used the current national average of $2.29 and a car with an MPG of 22, which is roughly the average for a mid-sized non-hybrid car. Below is the data set.

origin

destination

method

time

price

san francisco

chicago

plane

4.233333333

89

san francisco

chicago

train

53

134

san francisco

chicago

auto

31

221.9218182

chicago

cleveland

plane

1.283333333

96

chicago

cleveland

train

6.083333333

45

chicago

cleveland

auto

5.266666667

35.80727273

cleveland

princeton

plane

1.983333333

2300

cleveland

princeton

train

18.73333333

98

cleveland

princeton

auto

5.166666667

35.70318182

princeton

cleveland

plane

1.983333333

2300

princeton

cleveland

train

20.71666667

98

princeton

cleveland

auto

5.216666667

35.80727273

Best Itinerary Based on Time

To find out the best itinerary based on time we can simply see which method within each leg has the shortest duration of travel. The code for how to do this is shown below. First I group by my origin and destination so I can look at each leg separately. To do this I use the “group_by()” verb. Then I filter my data so that within each leg I only keep the method with the shortest time, the minimum time. Intuitively this done via the “filter()” verb. I end by “ungroup()”ing the data so that any future analyses are on the whole data set, not specific to my groupings.

library(dplyr)

itinerary_time = data %>%
  group_by(origin, destination) %>%
  filter(time == min(time)) %>%
  ungroup()

origin

destination

method

time

price

san francisco

chicago

plane

4.233333333

89

chicago

cleveland

plane

1.283333333

96

cleveland

princeton

plane

1.983333333

2300

princeton

cleveland

plane

1.983333333

2300

 

With this method we see that we have a total travel time of 9.48 hours, but our price is pretty high at $4785. Regarding method, we see all legs are by plane. If you look at the data you can see that this high price is almost entirely driven by the cost of the flight to and from Princeton, which comes in at $2300 per leg and requires an air taxi. Instead of the shortest trip, let’s see what our itinerary looks like if want the cheapest price.

Best Itinerary Based on Price

Here we do the same thing as before, but now filtering based on the lowest (minimum) price. The code is shown below.

itinerary_price = data %>%
  group_by(origin, destination) %>%
  filter(price == min(price)) %>%
  ungroup()

origin

destination

method

time

price

san francisco

chicago

plane

4.233333333

89

chicago

cleveland

auto

5.266666667

35.80727273

cleveland

princeton

auto

5.166666667

35.70318182

princeton

cleveland

auto

5.216666667

35.80727273

 

With this method we see that we have a total travel price of $196.32, but now our travel time is much higher at 19.88 hours, over 10 hours longer than when we based our trip on travel time. Regarding method, we have one leg by plane (San Francisco to Chicago) and the rest by automobile. At least in the United States, it seems it is never most advantageous to travel via train, either regarding time or price. A total price of $196.32 is pretty inexpensive for such a long trip, but what if we’re willing to pay a little more to get a shorter total travel time?

Best Itinerary Based on Time and Price

In this last section we play with how much we’re willing to pay to “buy” less travel time. We’ll say that we’re willing to pay $50 for each hour less of travel time. For example, if traveling by auto costs $50 but takes 3 hours I would be willing to pay $100 for a plane if it only took 2 hours. The code is shown below. To do this I’ll create a new column called “cost” that first multiples the total time by 50 and then adds the price of the trip. In effect we’re saying each hour costs $50 (thus total time x 50), so longer trips will “cost” more. We then add the actual price of the trip to take that into account when computing our total cost. To create this new column we use the “mutate()” verb. Finally, we’ll filter by the minimum total cost to get the best trip given our willingness to pay $50 for each hour less of travel time.

itinerary_cost = data %>%
  group_by(origin, destination) %>%
  mutate(cost = time * 50 + price) %>%
  filter(cost == min(cost)) %>%
  ungroup()

origin

destination

method

time

price

cost

san francisco

chicago

plane

4.233333333

89

300.66666665

chicago

cleveland

plane

1.283333333

96

160.16666665

cleveland

princeton

auto

5.166666667

35.70318182

294.03651517

princeton

cleveland

auto

5.216666667

35.80727273

296.64060608

 

Using our cost benefit analysis method we now find that the total price of our trip is $256.51 and that it will take 15.9 hours. Our modes of transportation have only changed a bit, we now fly from Chicago to Cleveland instead of driving as we did in the cheapest trip version. Actually this is the trip I ended up doing, which is good to know (also, I didn’t own a car for the trip from Chicago to Cleveland, so that would have made the trip cost more than this analysis predicts).

Conclusion

R (and more specifically dplyr) can be a useful tool to get descriptive information about your data, both as a whole and grouped. We can also create new columns to analyze our data with new variables. You may have noticed that my example here doesn’t take into account the fact that in a car I can drive directly to my destination, while with a plane I need to get from the airport to my final destination. With a little more data we can also take this into account using the same methods we did in this post. For example we can add a variable of “trip” and have multiple methods within a single trip. By grouping by “trip”” we can get summary costs (the verb “summarise()” in dplyr) and then run the same comparisons we did above. I hope you enjoyed playing with the data from my recent United States trip. Your most important take away should be that flying from Cleveland, OH to Princeton, WV is pretty much out of the question, although it would be pretty awesome to take an air taxi.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: