Understanding Ridesplitting Behavior with Interpretable Machine Learning Models Using Chicago Transportation Network Company Data

Abstract
As congestion levels increase in cities, it is important to analyze people’s choices of different services provided by transportation network companies (TNCs). Using machine learning techniques in conjunction with large TNC data, this paper focuses on uncovering complex relationships underlying ridesplitting market share. A real-world dataset provided by TNCs in Chicago is used in analyzing ridesourcing trips from November 2018 to December 2019 to understand trends in the city. Aggregated origin–destination trip-level characteristics, such as mean cost, mean time, and travel time reliability, are extracted and combined with origin–destination community-level characteristics. Three tree-based algorithms are then utilized to model the market share of ridesplitting trips. The most significant factors are extracted as well as their marginal effect on ridesplitting behavior, using partial dependency plots for interpretation of the machine learning model results. The results suggest that, overall, community-level factors are as or more important than trip-level characteristics. Additionally, the percentage of White people highly affects ridesplitting market share as well as the percentage of bachelor’s degree holders and households with two people residing in them. Travel time reliability and cost variability are also deemed more important than travel time and cost savings. Finally, the potential impact of taxes, crimes, cultural differences, and comfort is discussed in driving the market share, and suggestions are presented for future research and data collection attempts.
Funding Information
  • Northwestern University Transportation Center

This publication has 22 references indexed in Scilit: