Many Internet travel sites let you check airline schedules and buy
tickets online. You specify the origin, destination,
and travel dates, and they tell you what
flights are available for those dates.
Site59 (http://www.site59.com)
approached ArsDigita with a plan for a site that would sell
travel with a very different spin. Instead of picking the origin and
destination cities, you pick your mood: romantic, adventurous,
extravagant, and so on. They then match you up with a travel package
that suits your mood.
To support this application I built a software module that connects
to the Worldspan
Computerized Reservation System (CRS).
This CRS interface module provides a building block for
integrating travel-related
services into a web site.
Of course, the CRS interface module is not limited to the unique travel approach
of Site59. Individual companies can use it to interface a travel
reservation system to their web site.
If you're a large company
with an in-house travel staff, you could use the module to build a
system to automate internal travel requests and booking.
Say, for example, that you are arranging a meeting of people from
remote offices in your company. When you schedule the meeting in
your intranet calendar, the travel system could automatically suggest
flight and hotel options for bringing employees to the meeting
location.
Even a small company could make use of integrated travel
information. For a firm that often sends employees to client sites,
you could connect your client scheduling system with the interface module to
help automatically book travel and schedule meetings to reduce travel
costs.
In short, if you have a business function that involves travel,
the CRS interface module will allow you to build a system
to support it. In this article, I explain how the module is built
and discuss how Site59 used it in their web site.
A brief history of airline reservation systems
There are four major airline reservation systems today:
Sabre (http://www.sabre.com/),
Worldspan (http://www.worldspan.com/),
Galileo (http://www.galileo.com/), which owns Apollo,
and
Amadeus (http://www.amadeus.com/).
The history of
these systems predates computers. In the 1950s, the Sabre system
consisted of a room full of women sorting cards into pigeonholes;
hence the term "Computerized Reservation System" to distinguish it
from the old manual reservation system. All of the modern systems used
to be part of the airlines' internal information technology
departments: Sabre was part of American Airlines, Apollo was part of
United, Worldspan started as TWA's PARS reservation system in 1971, and
Amadeus started as a joint effort of several European airlines in the
1980s.
All of the systems interoperate and list each other's flights, but
the U.S. and European governments decided that it was a conflict of
interest for the airlines to own the systems that were pricing and
vselling their competitors' products. The reservation systems have
been spun off as subsidiaries or separate companies, which leads to
the curious situation of the airlines not being able to tell you
how much their product costs without going through a third party.
Many aspects of the systems are regulated by the government, including
the type of information that must be provided (such as
on-time performance) and the order in which results are returned by
the system, to prevent airlines from favoring their own flights over
their competitors'.
The core technology behind the reservation system has not changed much over
time. Each system essentially has one big mainframe that has all its
data, with thousands of terminals connected to it for reservation
agents to type in commands. Travel agents are the user interface to
the mainframe: you ask them "how much is a flight to Albuquerque?" and
they type in a cryptic command, then interpret the equally cryptic
response for you.
Talking to a reservation system
Because of this terminal-centered design,
the only way to get an automated program to interact
the CRS system is by constructing a screen-scraping system that
fills in forms on
an imaginary 3270 terminal screen and
interprets the results that come back in
pre-defined screen fields.
Writing a screen-scraper to encode and decode this kind of data is
error-prone and time-consuming.
A few companies like
Datalex (http://www.datalex.com)
have developed travel booking engines that do this work for you. They
typically give you a nice modular, high-level interface for
querying and booking flights that you can easily plug into
your web site back end.
Unfortunately, these products are tremendously expensive, and at the
time we started our project they didn't support connecting to
Worldspan, which was a requirement for our client.
Luckily
Worldspan has a system called the Structured Message Interface, or
SMI, which provides a much more machine-friendly interface to the
reservation system mainframe than decoding screens yourself. It's
still a low-level protocol, but it takes care of most of the messy
data translation.
Instead of talking directly to the Worldspan mainframe, you talk to
the message processor,
which is a system that sits between you and the mainframe and
does the screen scraping for you.
An application using this interface
can send a number of different messages
to request flight availability, price and book tickets,
get details on a flight and so forth.
The application constructs the message and
sends it to the message processor, which converts it into Worldspan screens,
collects the result,
and sends it back to the application
as an easily parseable result.
The connection to the message processor is through a dedicated X.25 line to
Worldspan's data center.
A web interface to a legacy system
Since an X.25 connection requires special hardware, I decided to
separate the X.25 communication from the SMI message parsing code, so
that the server that is required to be connected to the X.25 hardware
is as small as possible. The web server that constructs SMI messages
talks to the SMI message gateway system over HTTP with a POST form,
one for each SMI message to be sent. The SMI result is sent back via HTTP to
the server.
The SMI gateway server is very lightweight
and can be run on a
small machine that is dedicated to message service.
In our case, that is an important consideration
because the transport server is
connected to a dedicated communications line
and requires special X.25 protocol software
that is difficult to move to another computer.
We wanted the flexibility to have the main site web server on any
machine, so we could move it around as needed to balance the load on
our servers.
This design gives us what we really wanted in the first place: a
web-based interface to a travel reservation system, which we can use
to build our application.
An introduction to X.25
X.25 is a communications technology developed in the 1970s to use
Public Switched Data Networks (PSDNs).
It uses packet switching to
break data into packets that get mingled together with other
users' data in the public
network and reconstructed at the other end. It's similar to the
way the modern Internet operates, but with some important differences.
One difference is that unlike the Internet, which uses
connectionless protocols at the lowest level, X.25 uses the
concept of virtual circuits, which are like phone calls. You
call a destination number and establish a connection to it, then
send data over the connection. (This is not surprising when you
consider that X.25 was created by the phone company.)
Packet switching is perfect for bursty communication, like
travel agents typing on terminals, since each user sends a
relatively small amount of data that can be multiplexed together
over a shared communications channel.
The alternative is to have a
dedicated line for each user, which wastes a lot of bandwidth
as the line sits idle between keystrokes.
It may seem obvious today,
but the concept of multiplexing data was a significant
innovation when it was developed.
X.25 is especially widespread in Europe, where the economics
of data communication are significantly different than in North
America. (An E1 line, the European equivalent of the North
American T1 line, is typically two to three times as expensive.)
It is gradually being supplanted by Frame Relay and other
technologies, but legacy X.25 applications will continue to be
around for many years.
|
Sending messages
A typical request message is "Check availability for flights from
SJC to DFW on 18 June around 8:00 a.m." The raw message format
is ASCII in a set of fixed fields; the response is a list of flights
that satisfy your criteria. You can perform fairly complex queries by
constraining the number of connections, allowable connecting airports,
airlines, or specific cities and times. Other messages request
pricing information or perform a booking for a particular series of
flight segments.
All of the messages, however, are geared toward the kinds of tasks
that an average travel agent would perform for a client: given two
points, construct a trip between those points based on various
criteria. For most people, price is the most important factor,
but other important factors could be
the type of aircraft, number of stops, or the departure time.
If you want to do queries in a significantly different way,
like asking for flights based on anything other than departure and
destination city, then you'll have to build your own database on top
of the information you get out of the standard reservation system
queries.
Case study
Site59 (www.site59.com)
sells spontaneous travel: last-minute getaways at a discount.
They buy distressed inventory from their travel partners and bundle
components together into packages. Their selling window is typically
seven to fourteen days before the package departure time, since they
are targeting last-minute travelers.
They don't operate like a traditional travel agent, who will sell
you a ticket to a destination of your choosing. Instead, Site59 has a
preselected list of departure and destination cities for which they
collect a list of flights, hotel rooms and car rentals. During the
selling period, they periodically check the available inventory of
airline seats, rooms and cars to ensure that they only present
site visitors with packages that Site59 can fulfill.
Site59 uses the CRS interface module to query for flights and hotel rooms in
their city pairs each week when they load new inventory, to poll
Worldspan for up-to-date inventory information, and to book customers'
flights, hotel rooms and car rentals.
For air travel, their travel automation system
incorporates heuristics about which flights are acceptable based on the
length of layovers, the pricing rules of the air carriers involved,
the desirability of airports in the area,
and other factors based on their contracts with the airlines.
The system uses these rules to select the flights
to make available in their inventory list.
The next section describes the different layers of
the module and how they work together to bridge the gab between a web
service and a legacy mainframe reservation system.
The message processor
The message processing library has three layers:
- A high-level API that provides
access to common requests such as determining flight
availability;
- An intermediate-layer API that constructs a raw message from
a structure message list; and
- A low-level API that sends the message and receives the response
from the reservation host.
At the highest layer the functions operate without having to know
the specifics of the lower layers. An example is a function to
perform an availability query, which takes as arguments
the airline code, departure and
arrival airports, departure and arrival times, and number of seats
required.
The result is either a list of possible flights, or an result code
that indicates the reason no flights were returned, usually that there
were no flights available.
The next layer implements the functions that are needed to perform the
work required by the upper layer. Here we have functions such as the
one to construct a raw SMI message. Its arguments are the message
type and the message data. When called, it constructs the message
in the logical format required by the reservation system,
sends it to the SMI server, and returns the result.
The lowest level knows how to talk to the physical network interface
that is connected to the reservation system. Functions at this level
take raw message data, encapsulate it in the message protocol required
by the physical interface (which is different for X.25 and TCP),
read the response from the reservation system and
return the response data to the caller.
Performance
Reservation systems are relatively slow.
Turnaround time for messages
is on the order of seconds or tens of seconds, but
times as long as a few minutes are not uncommon.
When you consider the amount of information involved and the volume of
requests that Worldspan receives, this isn't an unreasonable amount
of time.
Depending on the kind of queries you want to do, though, 30 seconds
can be way too long. For a service like Expedia that responds to a
user request for flights to a specific destination, it's fine to have
the user wait while performing the query. But for a service like
Site59, where a user may be presented with packages to several
destinations on the same page, it is completely impractical to perform
availability queries on demand. In that case, you'll have to do some
sort of advance query and cache the results in a database.
Even then you will have to think carefully about how you
get the flight inventory data. Consider the case of
caching 500 flights and updating the number of seats available
hourly, so that you can search them by arbitrary criteria. If each
flight takes even ten seconds to update, then it will take nearly an
hour and a half to update the entire database, longer than the time
available.
What's more, this may turn out to be prohibitively expensive. The
reservation system might be charging you for each request, or for
requests in excess of a certain number. If you're doing 500 requests
every 90 minutes, then you're liable to start racking up the charges
pretty quickly.
Some simple heuristics can drastically reduce
the number of messages sent. For example, you don't have to bother
querying flights that have already departed or are already full.
Checking the number of
seats available for a flight that is three months in the future
is a waste of time, too.
Depending on what you know about the types of flights you are dealing
with, you can build in other rules that can eliminate useless
checking of seldom-full flights while doing more frequent checks of
flights that are likely to fill up quickly. (If you think you might
get some help from the airlines in this regard, think again:
predicting flight availability and traffic patterns is a key
to maximizing their profit,
so they are not going to give away anything
they know about it to their customers.)
One technique we used in Site59's system to aggregate availability
checking is to use general availability queries between city pairs.
We group flights to be checked by
origin and destination city, and then group by airline and sort by
departure time.
We query for all available flights for that city pair, starting at
12:01 a.m. Then we scan the list of returned flights and compare
against our inventory list. Matching flight numbers have their
inventory updated and are removed from the list.
Flights whose departure time is between the first and last returned
flight are moved to a "questionable" list.
When we reach the end of the returned flight list, if there are more
flights left in our list of flights to be queried,
we compare the departure time of the last returned flight to the
departure time of the first remaining flight.
If the remaining flight departs after the last returned flight,
we repeat our availability query, starting with the time of the last
returned flight from the previous query.
Otherwise, we move the rest of the remaining flights to the
questionable list, assuming that we have seen all the flights for the day.
The questionable flights are then checked one by one using the
specific flight number so we get an exact inventory on them.
A simple way to reduce the time it takes to
process inventory is to run multiple queries in
parallel. You can pick an arbitrary number of flights to query at
once, as long as you have a sufficient number of available channels on
your communication line. This reduces the overall processing time at
the cost of increasing the number of messages sent.
Implementation Challenges
One of the most difficult things about this project has been the
problem of knowing what to expect from the reservation system.
There is a great deal of knowledge built into Worldspan's SMI interface
that comes from their terminal-based system.
Unless you are already a Worldspan expert, you will have a hard time
interpreting the meaning of many parts of the SMI messages and their
responses. Worldspan's developer support teams in Kansas City and
Atlanta have been invaluable in translating the intricacies of their
travel interface into a form that is understandable to mere mortal
programmers.
To facilitate debugging,
the CRS interface module includes the ability to record and
examine all the messages sent and received by the system. You can
quickly see if you're getting back timeout errors, which may mean you
need to examine your communications channel. Or, if you're getting
back application errors from the reservation system server, you'll
have all the information you need to debug the problem in conjunction
with the reservation system support staff.
Recording every message sent back and forth between the application
systems can generate a tremendous amount of data. A day's worth of
messages can easily be tens of thousands of rows, most of which will
never be examined again. The production system in use at Site59
cleans out the recorded message queue a few times an hour, both to
keep the total number of messages down and to avoid deleting a large
number of rows at one time. The messages are also kept in a separate
tablespace to make it easier to export the rest of the site's data
without having to process the message history table.
The split between the parser and transport layer also makes it easy to
run a test system that uses the same communications line as your
production system. A typical strategy is to divide your
dedicated line bandwidth into two channels, one for test and one for
production. Your development server can then run on a separate machine
from your live web server while still connecting to the dedicated SMI
gateway machine for access to the test line. You can hack on your
test environment with reckless abandon while feeling confident that
you won't crash your production server.
Future enhancements
We plan to optimize Site59's flight inventory queries even further to
reduce the number of messages sent. We are designing a scheme that
will respond to users' requests for travel package information to
target the flight queries that will be performed.
Also, we hope that someday a dedicated line to Worldspan won't be needed. A
secure TCP/IP connection, perhaps using PPTP tunneling, would reduce
a lot of the facility overhead in establishing service with them.
Worldspan is now offering TCP/IP connections over dedicated lines,
so we will be able to migrate away from X.25, which has been
a significant effort to maintain.
Further off in the future is using XML to exchange travel data
among travel sites. The Open
Travel Alliance (http://www.opentravel.org) is coordinating an effort to standardize XML
representations of various travel related things like customer
preferences, travel itineraries and electronic tickets. Once the
industry players agree on an XML schema, you could theoretically
transfer travel information between the CRS interface module and other travel
web services. Like any standards process, the progress is slow; it
will take some time before specifications are developed and adopted
widely enough to be useful.
Thanks to Gregory Galperin for contributing to this article.
Links
Travel sites:
Reservation systems:
Travel standards:
-
http://www.opentravel.org
- The Open Travel Alliance, a travel industry group that is
attempting to establish open standards for exchange of travel
information
X.25: