Table of Contents

Introduction

Today’s blog post title comes from the 1987 classic Planes, Trains, and Automobiles or A Ticket for Two in German. I just got back from a whirlwind trip to the United States where I was in four cities in 10 days (San Francisco, CA, Chicago, IL, Cleveland, OH, and Princeton, WV) all with a starting and ending point of Paris, France. To accomplish all of this travel I took airplanes, trains (if you include the local Chicago CTA), and automobiles. In this post we’ll look at the most time and cost effective way to do this trip. There’s no difficult math or modeling in this post, but it can give you an idea of the power of data manipulation and descriptive statistics in R using the package dplyr.

TAKE AWAY POINTS FROM THIS POST

The package dplyr is a useful and clean way to organize data and get descriptive information.
The package dplyr can also be used to create new columns to manipulate data.
Don’t fly from Cleveland, OH to Princeton, WV unless you have a lot of money to burn.

Data

My data contains legs for four one-way trips: 1) San Francisco to Chicago, 2) Chicago to Cleveland, 3) Cleveland to Princeton, and 4) Princeton to Cleveland. I then found out the time and price for three methods of travel: 1) airplane, 2) train, and 3) automobile. For the airplane data, I took the cheapest one-way flight based on a search of a website that aggregates across airlines and included the price and estimated duration of the flight. Since there was no available flight into Princeton, WV my airport was Beckley, WV. For the train data, I used Amtrak train estimates taking the time and price of the first available, cheapest ticket. Since there was no available train into Princeton, WV my station was in Norfolk, VA. For the automobile data, I assumed you already had a car and were only paying for gas. I used the current national average of $2.29 and a car with an MPG of 22, which is roughly the average for a mid-sized non-hybrid car. Below is the data set.

origin	destination	method	time	price
san francisco	chicago	plane	4.233333333	89
san francisco	chicago	train	53	134
san francisco	chicago	auto	31	221.9218182
chicago	cleveland	plane	1.283333333	96
chicago	cleveland	train	6.083333333	45
chicago	cleveland	auto	5.266666667	35.80727273
cleveland	princeton	plane	1.983333333	2300
cleveland	princeton	train	18.73333333	98
cleveland	princeton	auto	5.166666667	35.70318182
princeton	cleveland	plane	1.983333333	2300
princeton	cleveland	train	20.71666667	98
princeton	cleveland	auto	5.216666667	35.80727273

Best Itinerary Based on Time

To find out the best itinerary based on time we can simply see which method within each leg has the shortest duration of travel. The code for how to do this is shown below. First I group by my origin and destination so I can look at each leg separately. To do this I use the “group_by()” verb. Then I filter my data so that within each leg I only keep the method with the shortest time, the minimum time. Intuitively this done via the “filter()” verb. I end by “ungroup()”ing the data so that any future analyses are on the whole data set, not specific to my groupings.

library(dplyr)

itinerary_time = data %>%
  group_by(origin, destination) %>%
  filter(time == min(time)) %>%
  ungroup()

origin	destination	method	time	price
san francisco	chicago	plane	4.233333333	89
chicago	cleveland	plane	1.283333333	96
cleveland	princeton	plane	1.983333333	2300
princeton	cleveland	plane	1.983333333	2300

With this method we see that we have a total travel time of 9.48 hours, but our price is pretty high at $4785. Regarding method, we see all legs are by plane. If you look at the data you can see that this high price is almost entirely driven by the cost of the flight to and from Princeton, which comes in at $2300 per leg and requires an air taxi. Instead of the shortest trip, let’s see what our itinerary looks like if want the cheapest price.

Best Itinerary Based on Price

Here we do the same thing as before, but now filtering based on the lowest (minimum) price. The code is shown below.

itinerary_price = data %>%
  group_by(origin, destination) %>%
  filter(price == min(price)) %>%
  ungroup()

origin	destination	method	time	price
san francisco	chicago	plane	4.233333333	89
chicago	cleveland	auto	5.266666667	35.80727273
cleveland	princeton	auto	5.166666667	35.70318182
princeton	cleveland	auto	5.216666667	35.80727273

With this method we see that we have a total travel price of $196.32, but now our travel time is much higher at 19.88 hours, over 10 hours longer than when we based our trip on travel time. Regarding method, we have one leg by plane (San Francisco to Chicago) and the rest by automobile. At least in the United States, it seems it is never most advantageous to travel via train, either regarding time or price. A total price of $196.32 is pretty inexpensive for such a long trip, but what if we’re willing to pay a little more to get a shorter total travel time?

Best Itinerary Based on Time and Price

In this last section we play with how much we’re willing to pay to “buy” less travel time. We’ll say that we’re willing to pay $50 for each hour less of travel time. For example, if traveling by auto costs $50 but takes 3 hours I would be willing to pay $100 for a plane if it only took 2 hours. The code is shown below. To do this I’ll create a new column called “cost” that first multiples the total time by 50 and then adds the price of the trip. In effect we’re saying each hour costs $50 (thus total time x 50), so longer trips will “cost” more. We then add the actual price of the trip to take that into account when computing our total cost. To create this new column we use the “mutate()” verb. Finally, we’ll filter by the minimum total cost to get the best trip given our willingness to pay $50 for each hour less of travel time.

itinerary_cost = data %>%
  group_by(origin, destination) %>%
  mutate(cost = time * 50 + price) %>%
  filter(cost == min(cost)) %>%
  ungroup()

origin	destination	method	time	price	cost
san francisco	chicago	plane	4.233333333	89	300.66666665
chicago	cleveland	plane	1.283333333	96	160.16666665
cleveland	princeton	auto	5.166666667	35.70318182	294.03651517
princeton	cleveland	auto	5.216666667	35.80727273	296.64060608

Using our cost benefit analysis method we now find that the total price of our trip is $256.51 and that it will take 15.9 hours. Our modes of transportation have only changed a bit, we now fly from Chicago to Cleveland instead of driving as we did in the cheapest trip version. Actually this is the trip I ended up doing, which is good to know (also, I didn’t own a car for the trip from Chicago to Cleveland, so that would have made the trip cost more than this analysis predicts).

Conclusion

R (and more specifically dplyr) can be a useful tool to get descriptive information about your data, both as a whole and grouped. We can also create new columns to analyze our data with new variables. You may have noticed that my example here doesn’t take into account the fact that in a car I can drive directly to my destination, while with a plane I need to get from the airport to my final destination. With a little more data we can also take this into account using the same methods we did in this post. For example we can add a variable of “trip” and have multiple methods within a single trip. By grouping by “trip”” we can get summary costs (the verb “summarise()” in dplyr) and then run the same comparisons we did above. I hope you enjoyed playing with the data from my recent United States trip. Your most important take away should be that flying from Cleveland, OH to Princeton, WV is pretty much out of the question, although it would be pretty awesome to take an air taxi.

Page Piccinini

Planes, Trains, and Automobiles

Introduction

Data

Best Itinerary Based on Time

Best Itinerary Based on Price

Best Itinerary Based on Time and Price

Conclusion

0 Comments

Add yours →

One Pingback

Leave a comment Cancel reply

Introduction

Data

Best Itinerary Based on Time

Best Itinerary Based on Price

Best Itinerary Based on Time and Price

Conclusion

Share this:

Related

One Pingback

Leave a comment Cancel reply