A look into airline loyalty programs

Introduction

In the pursuit of targeting and retaining customers, loyalty programs have grown to significant size. In 2015, businesses spent approximately $5.6 billion on loyalty programs, awarding points to customers that customers could later redeem for goods or services[1]. With this growth comes the inevitable need for accurate financial estimation of points flowing through the programs, in accordance with strict revenue recognition rules[2]. Estimation relies on predictive models of points usage patterns, but predictive models are only as strong as their underlying data, which is often controlled by IT and Marketing teams with limited consideration of potential financial use cases. We offer some cautions and prescriptions for the gathering and use of loyalty points data for financial estimation based on our experience in the airline industry.

At a basic level, most loyalty programs collect basic demographic data on a customer, assign that customer a member number, and award points to the customer based on interaction with the firm. These outstanding points represent a liability on the books for the firm. If the points are redeemed, the liability washes; if the points expire (or “break”), the firm claims revenue recognition. The discussion of revenue recognition with respect to unredeemed gift cards is well-documented, particularly in accountancy journals, but historically, loyalty programs have not been so material as to warrant consideration.

The airline industry was an early adopter of loyalty programs that have grown into materially relevant business segments[3]. We have observed challenges and best practices through our work with several airlines to develop their breakage estimation methodology. For airlines, as with most industries, loyalty program initiatives are typically created and administered by marketing groups, but the loyalty program data are maintained by IT groups. Marketing prioritizes customer satisfaction, while IT prioritizes data efficiency. When a loyalty program becomes materially relevant, Finance must use the transactional data underlying the loyalty program to estimate financial outcomes. Design and maintenance decisions made by Marketing and IT can thus have meaningful impact on a firm’s financial picture that is often not given due consideration.

The consequences of this disconnect are most evident in three arenas: (i) program design, wherein the loyalty program is designed and altered without consideration of the potential impact on financial estimation; (ii) data handling, wherein the data are altered or updated without consideration of how it affects the overall cleanliness and consistency of the data; and (iii) data maintenance, wherein the data is collected and maintained with an eye towards optimizing computational efficiency, but without consideration of how much historical data is needed for accurate estimation of future conditions. We note that this is not intended to be an exhaustive catalog of issues in loyalty databases, but the starting point of a discussion regarding how such data is collected, curated, and used.

Loyalty Program Design

Many design elements exist simultaneously in a single loyalty program. While they all might influence points accruals and probable redemptions, points flows are not the primary focus of the design choices. Marketing designs the loyalty program to achieve its business goals. They may, for example, examine the data to build overall pictures of the demographics and habits of their customer base, or assign the loyalty customers into groups to run A/B tests[4] on various types of promotions. They may use the information collected to help accurately target ads, discounts, and other incentives towards their customer base or to design and test new structural elements of the program. They might focus on their most profitable segment of customers; in the case of an airline, customers who are already highly engaged with the airline, such as co-branded credit card users, may be offered promotions tied to their use of the card. Alternatively, Marketing might seek to increase loyalty among users who engage via partners, such as car rental firms, by bundling offerings with the partner and increasing the number of points awarded for the bundled purchase over the base purchase. Finally, Marketing may wish to increase overall customer retention by introducing incentives to casual users who earn only occasionally and only through direct engagement with the airline, such as two-tiered usage systems that award more points once a certain threshold level of points is achieved.

Yet program design can have can have a significant effect on the proportion of points expected to be redeemed or expired. For example, redemption channels can influence customer behavior.

In programs where booking flights is the only redemption channel offered, this presents a significant barrier to redemption for casual users. Few users who earn points only through flight will earn enough points to “afford” a ticket, even if the expiration window is sufficiently long that their occasional travel allows them to continuously “refresh” their points. Those casual users who do ultimately redeem will rarely redeem more than once given the time and number of accrual activities required to earn enough points for a flight.
When programs allow points to be redeemed on “lower cost” items, such as magazines, that barrier to redemption is lowered to the level of the lowest point-cost item, and in programs where users can donate points to charity, theoretically all points could ultimately be redeemed. In those cases, casual users are more likely to engage in redemption behavior and to redeem multiple times. From a marketing perspective, multiple redemption channels might be favorable, because it increases the appeal of the program to customers. From a financial perspective this element of design broadly re-categorizes many users who would typically expire into the redeemer category and reduces revenue recognition.

Some program design elements do not directly affect consumer behavior, but do affect timing of that behavior, creating variation in the airline’s financial picture depending on the point in time at which the data is examined. Timing issues are well-illustrated by considering expiration windows (i.e., the length of time an account may remain inactive before some or all points expire). While the window itself does not affect any individual’s behavior very much, the existence and length of that window affect the dominant user and point types in the mix of data at any given time, and thus the proportion of points likely to be redeemed.

Consider a program in which there is no expiration at all. In this situation, all loyalty members exist in the data as “active” users and so their points remain valid forever. Meanwhile, users who redeem continue to do so at their regular rate and timing. This implies that the proportion of outstanding points expected to expire in the long run (when the firm ends the program, for instance) is ever-rising, asymptotically approaching 100%. Claiming such a breakage rate would likely be immensely difficult for any airline regardless of the truth behind it. What was conceived as a marketing gimmick (“Points never expire!”) becomes a challenge for booking revenue recognition.
Less obviously, consider an airline where approximately half the loyalty users ultimately expire their points and half redeem them. An airline with an 18-month expiration window and a nine-month median time to redemption will have data skewed towards having more users that will ultimately expire than redeem in the data at any single snapshot in time. An airline with a three-month expiration window and a four-month median time to redemption will have a roughly accurate mix of expiring to redeeming users in their data at any point in time.
It is important to note here that there is a difference between the probability of any given user redeeming their points versus the probability that any given outstanding point is redeemed. Likelihood of any user redeeming is highly correlated with certain characteristics: for instance, co-branded credit card holders are highly likely to redeem points and tend to have high volumes of points flowing through their accounts, because they earn points every time they use their credit card. Loyalty members who only ever earn points through flight (low-engagement users), on the other hand, are likely to expire and have low volumes of points flowing through their accounts. Thus, an expiration window wherein the user types are equally mixed may skew towards a larger percentage of total points in the pool being redeemed than one where low-engagement users are over-represented. Knowing how design elements affect or appear to affect user behavior can help determine how various types of points and the overall pool of outstanding points are or appear to be affected by changes.
For an example of an expiration design element affecting user behavior rather than just timing, consider a program that allows redemption activity to “refresh” points for users to avoid expiration, as opposed to only accrual activities refreshing points. In that case, some users might choose to redeem when they would otherwise have accrued, changing the timing of their redemption behavior.

One last example of a program element significantly influencing customer behavior patterns is that of “tiered” loyalty programs, which are common among airlines. The tiers might be structured to encourage customers to make that extra leap to a higher-status tier, which carries perks such as first class lounge access or free checked baggage. In some loyalty programs, the achievement and maintenance of that higher tier status becomes a goal of its own, leading to users who continuously aggregate points, perhaps even seeking out a co-branded credit card to do so, but never redeem those points despite otherwise having user characteristics that would predict redemption[5]. For these customers, the perks of their status tier outweigh the utilization of a “free” flight. If a loyalty program contains design elements that encourage this behavior, these users might fit the general characteristics typical of a redeemer but in fact never exhibit redemption behavior in the existing data. This then raises the question of the time horizon for which the estimates are generated: in the very long run these users would likely redeem before the airline went out of business, but in the short run they will continue to collect points without redeeming to maintain their status.

The above examples impact financial estimates either by truly changing outcomes through altering user behavior or by appearing to do so. While major program design changes are generally discussed with teams from Finance in the airline industry, these discussions are uncommon in less mature industries. All decisions should be discussed across business units to ensure, if not consensus, at least informed clarity. In practice, however, because loyalty programs are considered the purview of Marketing, their design is largely left to them, as are many aspects of data collection.

Data Collection in Loyalty Databases

While it is tempting to assume that all loyalty database transactions are generated automatically, it is critical to note that loyalty programs are subject to a high degree of data entry error because they can be modified by customer service representatives. This is especially true if the business rules underlying the database are not clearly defined and stated, which leaves loyalty team members or customer service representatives with no authoritative source to consult when faced with unclear situations. Accurate transactional history is critical in building user profiles to predict behavior, and as we will demonstrate, estimation can be highly biased if data are polluted by manual entry.

To illustrate some problems that might occur and their potential impact on financial estimation, consider the case of a dissatisfied customer contacting a customer service representative because a recent flight awarded fewer points than expected. The representative’s goal is to resolve the issue quickly and to the satisfaction of both the customer and the airline; to do so, the representative might enter a manual adjustment to the customer’s loyalty account, crediting additional miles to the account regardless of whether the customer was correct in their initial belief that they were under-awarded points for that flight. The customer service representative should tag the new transaction according to some pre-determined list of options and should be trained to choose the appropriate option based on the customer’s complaint and the resolution. The representative should know that sometimes transactions can be slow to enter and have some idea of how long to wait before trying to re-initiate the transaction and how to check whether the transaction posted correctly. Finally, the representative should have training regarding how to link the adjustment with the original transaction without conflating data entry dates with transaction dates. Here we see the potential for data collection issues to manifest:

In this case there should be a “Manual Adjustment – Flight” option of some sort to indicate that this is a manual adjustment of awards points related to a flight taken. If customer service representatives are inconsistently trained and have no clear documentation to refer to, the representative might choose any number of potential tags, such as “Bonus Credit” that were not intended for use in manual adjustments. The result is inconsistently coded transactions that, on a large scale, can appear to be actual deviations in customer behavior patterns.
Worse, if the input fields are not limited at all, the representative might make an error in spelling or even substitute similar characters such as the letter “O” and the digit “0” if their keyboard is broken[6]. While some such errors can be cleaned through pattern-matching, others cannot, leaving large swathes of data virtually unusable.
If the system is slow or seems to hiccup, the customer service representative might re-enter the transaction without checking to see if it posted correctly, resulting in a duplicate transaction with identical details[7].
Many loyalty databases have transactional tables that feed into summary tables which are then displayed to loyalty team members through programs called dashboards. These summary tables display aggregate characteristics of users, such as total flight points earned or total credit card points earned. In this case, the account balance in the summary data is now incorrect.
Down the line, a loyalty program team member may notice the incorrect account balance and adjust the customer’s account summary to reflect the correct points balance without deleting or cancelling out the duplicate transaction in the transactional table. The transactional data and the summary member data no longer line up and it is impossible to tell with any certainty where they diverged. Financial estimates will be constructed on a points base that does not reconcile with summary data but that cannot be corrected.
Depending on the structure of the database, the customer service representative might connect the additional points with the original flight, resulting in the transaction appearing to have happened in the past with no indication that while it refers to a past date, it was posted on the present date. This results in snapshots of past periods being impossible to reproduce due to retroactive transactions and a dynamically changing database.
The examples above have obvious impacts on estimation, but there is a less obvious impact as well: if there is a larger underlying pattern to the representatives who make these errors, such as all being in a single customer service location that primarily serves a single geographic region, there is now a systematic bias in the data specific to the underlying pattern. This might make it appear that users in a particular place, such as California, have substantially different user behavior patterns than users in a neighboring state, such as Nevada, when it is merely the result of error in the data.
There is an additional issue to consider: financial audits can often require a full replication of data protocols, especially in the wake of high-profile data breaches, and identifying all such cases of the behavior above is time-consuming and likely impossible.

While Marketing may encounter problems in their own exercises resulting from some or all of the situations described above, their tolerance for error is likely to be much higher than that of Finance. A 5 percentage-point error rate in the data may be acceptable for marketing models but can be material for financial estimation. These data errors exist in both older loyalty programs as well as in newer programs with more recent data. Marketing may care primarily about testing customer response to various marketing policies and are unlikely to require estimates of actual volumes of points accrued or redeemed that conform to financial reporting standards. Similarly, Marketing has little need to consider the strength of the data controls with respect to the loyalty database, and few commercial loyalty database systems hold themselves to the standards of financial reporting. With respect to many of the “backend” decisions, marketing teams rely on the IT experts tasked with maintaining the data to make appropriate decisions. This leads into a separate problem: data maintenance without an understanding of future use cases.

Data Management in Loyalty Databases

Loyalty programs are designed and run by Marketing, but Marketing rarely has the wherewithal to handle the more technical aspects of data collection and management. The loyalty data itself is usually maintained by IT, which typically has teams specializing in databases and data management infrastructure.

Database expert C.J. Date notes that the goal of the database design process is to “produce a design that’s independent of all considerations having to do with either physical implementation or specific applications”[8]. That is, in theory, databases should be designed to be flexible enough to accommodate any future use cases. Date further elaborates that database design determines “what tables a database should contain, what columns those tables should have, and what integrity constraints those columns and tables are subject to”[9]. Ideally, a loyalty database should be designed via collaboration between all business units who might use the data now or in the future, and the technical team who must build and implement it. In reality, most databases are designed for their present use only, with the business unit that holds ownership of the project describing the type of data they wish to collect, and IT implementing it as they understand the specification, but potentially without any thorough understanding of how the data are intended to be used.

With respect to financial models, this is illustrated by considering the case of “old” data. While data storage is cheap, data processing efficiency is prized in data management. Thus, most sensible database administrators will have a series of rules surrounding the data tables to keep the database streamlined and clean. These rules may create challenges in using data for predictive models, particularly with regards to older data, because predictive models need some history from which to predict. For instance, an administrator might create various rules regarding inactive accounts: since they are not being accessed frequently, a sensible database administrator may not wish to have them in a tablespace that is optimized for such access activity.

One solution might be to move such accounts to slow or offline storage, effectively “retiring” them. This allows access to that data on the rare occasions it is needed while simultaneously freeing up faster storage for more active accounts. This scenario is optimal from the perspective of predictive modeling as the older data remains accessible and usable.
If points expire for inactive users, a database administrator might propose to delete expired accounts older than a given age to clear up space and free up database resources. From the marketing perspective, this may be totally acceptable, particularly if the marketing team feels it has learned all that it can about the demographics of users who exit the program permanently. In this case, survivor bias has been introduced into the data. In trying to predict which users will redeem, estimates will skew high because information regarding users who never redeemed has been lost.
If older accounts are purged and a past user re-joins the program, the potentially valuable information that the user opted to reactivate an idle account is not preserved. A user’s prior transactional history is now unavailable for future examination and analysis.
A variant of this type of decision is a rule purging all transactions older than a certain age. Again, from the marketing perspective, that may not matter, because more recent transactions are more relevant and summary tables give a good overall snapshot of the user. From a user modeling perspective, however, important information about how user behavior has changed over time is being discarded, and transactional tables may not reconcile with summary tables, leading to inaccurate financial estimates.[10]

The data handling protocols above assume a loyalty database that was originally created for a single firm. An even more complex set of issues arises in the event of two airlines merging, and the subsequent merge of their loyalty programs. If this merge event is not approached with extreme care and consideration, it is possible to render the legacy data virtually useless for predictive modeling purposes.

To continue with the theme of inactive accounts, there may be a temptation to migrate only active user accounts from the loyalty program database whose use is being discontinued. This creates survivor bias, but only among users from the discontinued program, and may require the two sets of users to be modeled separately. Prediction for the migrated group is inaccurate without any information regarding lapsed users, and users of each airline may be markedly different, making estimates of the fully-documented loyalty program an imperfect substitute for the migrated group.
Transaction codes and partner codes may not be fully cross-walked between the programs, resulting in multiple codes for the same partners or transaction types depending on which program the user came from or the point in time (pre- or post-merge) of the transaction. This can require careful consideration and extensive cleanup for accurate financial estimation.
Codes and history from the discontinued database may not be preserved, leaving large gaps in knowledge and an inability to correctly classify transaction and user types.
A set of rules regarding the merging of customer accounts that exist in both databases must be clearly defined; otherwise such incidents are resolved at the discretion of customer service representatives, often with little consistency and with high error rates.

In addition to general rules surrounding data handling that may not be constructed with an eye towards future use cases, there is often insufficient documentation of databases. It is common for database policies such as those listed above to come to the attention of Finance only during the data exploration phase as a predictive modeling project begins. Sometimes, Finance has no clear record of historical partner and transaction codes and rely entirely on institutional knowledge to determine how to handle those codes. They also tend to be unfamiliar with what tables exist in the database, how they are related, and what might be most useful for modeling. Even the structure of the points data can be unfamiliar to them – for instance, some databases treat all transactions as having positive numbers of points attached, and the transaction code determines which points should be treated as negative, while others use negative points for all negative transactions, and some do a mix, such as negative points for expirations but positive points for all other transactions. When data dictionaries and codebooks are requested from IT and Marketing, they are often incomplete if they exist at all, and this lack of codified knowledge tends to be more severe in situations where programs have merged, where database providers have changed over time, or where IT and Marketing have had high turnover. In financial estimation where single percentage point shifts can be materially relevant, lack of clear documentation can lead to inaccurate estimates.

Conclusion

The growth of loyalty programs into material components of revenue in conjunction with financial accounting standards requiring accurate estimation of these components have led to a strong demand for predictive models of points usage patterns among customers. The application of models intended for use in financial forecasting to loyalty program data raises questions regarding the reliability of that data. Using airline loyalty programs as a case study of an industry with mature programs, we note a series of potential pitfalls that would undermine the validity of any models applied to such data if not detected. As described above, Marketing and IT make decisions with respect to the loyalty program data with little if any consideration of other use cases for that data, and occasionally with little communication between them regarding how the data is handled. Often, data about points is provided to financial teams in summary workbooks, revealing little if any of the underlying user data, and it is never clearly communicated to the loyalty or technical team that detailed historical user data may at some point be required for financial modeling, particularly when loyalty programs are being formed. Therefore, it behooves managers seeking modeling exercises to explore the transactional data provided with extreme care, taking note of any gaps, inconsistencies, oddities in patterns, or unexpected information, and communicate frequently with clients to discuss such findings. Often this involves multiple back-and-forth exchanges and requires the manager to go back to Marketing and IT to find answers, but ultimately results in datasets that, while not clean, are at least understood in their limitations.

The ultimate solution to such situations is rooted in better communication and documentation: increased communication between technical and business unit teams, and between marketing and financial teams. In the best of circumstances, clients have had dedicated technical team members assigned to their business units. These individuals then facilitated the data acquisition for modeling per the finance team’s instructions and gave guidance and feedback to the main database team that improved data quality for future modeling. All decision rules were documented centrally and maintained for any user to reference. When these practices were implemented projects went smoothly and quickly, without multiple rounds of data cleaning. For industries and sectors where rewards programs are growing into materially relevant business segments, clear communication and thoughtful construction of loyalty programs and their databases can avoid some of the growing pains experienced by the airline industry.

[1] Incentive Federation Incorporated. 2015 Incentive Marketplace Estimate Research Study.

[2] The ASC 606 requirements have been discussed primarily with respect to gift cards but apply equally to rewards programs that reflect a material financial commitment to customers.

[3] United Airlines reported $4.88 billion in deferred revenue from frequent flier miles in 2016, according to their 2016 Annual Report (http://otp.investis.com/clients/us/united_continental_holdings/SEC/sec-outline.aspx?FilingId=11879093&Cik=0000100517&PaperOnly=0&HasOriginal=1).

[4] A/B testing is a two-sample testing method. For example, a website might randomly display to visitors two versions of its webpage that are identical in every aspect except the color of a single link in order to see which color generates more click-throughs.

[5] Some status-oriented users will even take short flights and immediate return flights in order to earn enough points to maintain their status. An apocryphal example of this is seen in the movie “Up in the Air,” starring George Clooney.

[6] Yes, we have seen this in real datasets.

[7] Optimally, each of these transactions would at least enter the database with their own unique transaction identifier as a primary key, but it is still impossible to know whether both transactions are valid without further information.

[8] Date, C.J. What is Database Design, Anyway? O’Reilly Media. 2016.

[9] Date, C.J. What is Database Design, Anyway? O’Reilly Media. 2016.

[10] This can also happen if there is a rule that sets the summary table points total to zero for any user whose transactions sum to a negative number. In this case, loyalty teams may never know that the underlying data is corrupted, as they may generally interact only with the accounts at the summary data levels.

A look into airline loyalty programs

Recent Posts

Read Economist Marc Martos Vila and Senior Analyst...

Director Amarita Natt quoted in Lifewire’s article: How...

Why We Need More Women In Data Science...

Algorithmic Bias: A Risk Management Perspective

Contact Us