How Google and Portland’s TriMet Set the Standard for Open Transit Data
If those agencies haven't already formatted their data in the Google Transit Feed Specification (GTFS), the industry standard, they are likely rushing to do so now. How Google's specification became the common language for transit data is an interesting story and, as with many tales of transit innovation, it begins in Portland, Oregon.
After traveling internationally in the summer of 2005, Bibiana McHugh, an IT Manager at Portland's TriMet transit agency, was frustrated that she couldn't access transit information on a mapping program like Mapquest and certainly couldn't plan a trip by transit with the same ease as a driving trip. When she returned stateside, she sent inquiries to Mapquest, Yahoo!, and Google, asking each if they had plans to incorporate transit data into their mapping services and if TriMet could partner in the endeavor.
Of the three, only Google replied. As it happened, software engineer Chris Harrelson had been using his 20 percent time to interface transit data with Google Maps, what became the Google Transit Trip Planner. TriMet worked with Google to prepare TriMet's data set in a format that would work for Google Maps, a difficult task, according to McHugh.
"Transit data is extremely complex," she said. "There is a temporal element and spacial element and it takes a relational database in order to manage all of that information."
She added, "A lot of agencies have this fear that it will be
misrepresented or won’t be used accurately."
Because TriMet was proactive with its data, the subsequent GTFS very closely resembled the operator's data feed. Google Transit Trip Planner launched on December 7th, 2005, and for most of the first year, TriMet was the only operator available on Google Maps. In September, 2006, five more cities got on board: Eugene, OR; Honolulu, HI; Pittsburgh, PA; Seattle, WA; and Tampa, FL.
In addition to fears by some operators about misrepresentation of the data, many operators were simply reluctant to open data for fear of bad publicity, according to Joe Hughes, a Google Transit software engineer.
"Transit agencies are used to being beat up in the press. Public transit has been the underdog since the 1950s and I think it's made the agencies pretty conservative," he said.
Hughes, who began his transit mapping career in Pittsburgh in 2002, several years before joining Google, said that prior to GTFS, many software engineers had to "data scrape" operator websites or submit Freedom of Information Act requests to obtain data. Often times that data was mailed on a CD and could be out-of-date by the time it was turned over.
With the exception of Tri-Met and the other early adopters, "It's been a slow and painful process to open this stuff up," said Hughes. "At first there was no infrastructure available to do this."
McHugh echoed the sentiment, suggesting that many agencies had outdated assumptions about data and were reluctant to provide it for free. "For some agencies, they are used to making money off it. When they asked why we aren't charging for our data, the answer is that the taxpayers have already paid for it and the benefits are so big for openness."In Portland, said McHugh, their "lawyers are pretty versed with open source. Having open data aligns with our agency's philosophies. We didn't even have to think about it."
In the past few years the process has sped up tremendously, according to Hughes. "If you told me even a few years ago that every significant transit agency in the country would open its data, it would have been pretty hard to believe. The U.S. is now ahead of much of the world in releasing data."
Despite this optimism, there are still obstacles to full data transparency, particularly for those software developers not named Google. A number of transit operators, particularly those in the New York Metropolitan Area, like the New York Metropolitan Transportation Authority (NYMTA) and New Jersey Transit, have licensed their data with Google, but no one else. Though Google won't pay for data, their caché and the ubiquity of their mapping service on personal computers and mobile devices has led agencies to provide GTFS only to them.
"What Google has is a clearly useful product, PR value, and name recognition," said Hughes of the situation, arguing the fact that these agencies have released info to them is a step toward openness. "At least they're sharing it with one developer, but that's not the end state. Ideally that data is available to any developer to use."
Matt Lerner, Chief Technology Officer for City-Go-Round, said that sharing with Google alone does not make the data open. Pointing to the GTFS Data Exchange and City-Go-Round's top-ten list of transit operators that don't open their data, he said millions of people in metropolitan regions don't have access to open data, though they easily could.
"All the operators have to do is provide a URL where someone can download the feed. They already have the data, all they have to do is let the data be downloaded. It's not open until they give that URL."
Lerner also lauded Google for its impact, "The agencies wouldn't have ever put their data into a standard format if it weren't for Google. It took a really big company to get the agencies to have a standard format at all."
Now that GTFS is the baseline, Google is considering dropping it's name from the title and changing it to the "General Transit Feed Specification." Hughes proposed the changes on several transit developer listservs and said the renaming could come as early as next week.
"The name Google Transit Feed Specification is a non-name," said Hughes. "We didn't want to be presumptuous by saying this would become the standard when we started. Having Google in there doesn't really reflect all the different apps that are being used with the format."
Hughes reiterated that the goal from the beginning was to make transit directions and maps as commonplace and simple as driving directions.
"I hope we're helping to bring transit back on equal footing with driving."