I was first introduced to FRED sometime in 2007, shortly after joining WSJ.com as Senior Programmer on the News Graphics team. My first projects in that role were designed, in part, to demonstrate the scope and efficiency that data technologies could bring to our work. Though we succeeded in publishing more comprehensive, explorable pieces, many of our early efforts required heavy curation: data collected by hand, file structures built from scratch. New challenges immediately appeared, perhaps the thorniest of which was: how will we keep this work up-to-date?
APIs: help for managing published data
And then a colleague mentioned that FRED – the Federal Reserve Economic Data repository – also had an API. Though ubiquitous today, in the pre-Twitter, “what’s-this-iPhone-thing-all-about” world of 2007, a useful, reliable, trustworthy data API like FRED’s was a major – and welcome – anomaly. For the cost of requesting a free API key, FRED would supply access to the most current figures for tens of thousands of economic indicators, in simple, one-element-per-entry XML.
Yet solving our first updating issue only highlighted a related problem: just as out-of-date data-driven graphics are fated for irrelevance, fixed content about data topics can become tragicomically inaccurate when it references auto-updating data elements. This crops up constantly with economic data, which at times can seem intentionally fickle: in March 2013 the economy added 88,000 – no, wait – 138,000 jobs; GDP for Q1 2013 wasn’t +2.4%, it was only +1.8%.
“Data revisions can be significant, and for many reasons,” says Keith Taylor, Data Desk Coordinator at the Economic Research Division of the St. Louis Federal Reserve. “Revisions come up because more data is gathered about the past: methodologies may change; units might change. For example, the CPI (Consumer Price Index) base period may change. For series that are seasonally adjusted, the seasonal weights may change.”
The result of these revisions, as illustrated above, is that today’s unemployment figure for June may not be September’s unemployment figure for that same month, making it difficult to assess historical economic contexts. To deal with this, Senior Web Developer George Essig built a twin for FRED, known as ALFRED: the ArchivaL Federal Reserve Economic Data. Though FRED and ALFRED share both a code base and an API, their perspective on data differs slightly.
“FRED is ‘What do we know today about the past?’” says Essig. “ALFRED is, ‘On a given day in the past, what did we know?’”
The distinction can be crucial, particularly when trying to evaluate research or make sense of policies that may now seem out of sync with reality. Since policy makers are among the primary “clients” of the Economic Research Division’s work, understanding the exact information available to them at the point of decision-making offers essential context for analyzing policy decisions.
Data Docs is all about providing context for data: understanding its nuances and variability. So as part of developing our plugins and platform, we also want to share what we’re learning about our data sources as we go. Fortunately, we were able to spend some time last week talking with Taylor and Essig, who know everything there is to know about FRED, quite literally.
“George has been working on this for more than a decade,” says Taylor. “If you could say that FRED is a person, George is the proverbial father.”
Making FRED your friend
The key to making best use of the AL/FRED API is to understand its date terminology, specifically the distinction between “realtime” and “observation” dates. “Observation” dates describe the month(s), for example, about which you want data (e.g. nonfarm payrolls for March, 2013). The “realtime” dates, meanwhile, function essentially as “as-of” dates (e.g. nonfarm payrolls for March, 2013, as of April 15, 2013), sometimes also described as the “vintage” of the data. The default for “observation” dates is the earliest information available, up to the most recent release. The default for “realtime” dates is today.
For example, let’s say you wanted to check up on the aforementioned nonfarm payrolls report and revision. To check the first report (published April 5, 2013), your query would look something like this:
http://api.stlouisfed.org/fred/series/observations?series_id=PAYEMS &observation_start=2013-03-01&observation_end=2013-03-01 &realtime_start=2013-04-05&realtime_end=2013-04-05 &units=chg&api_key=xxxxxxxxxxxxxxxxx
Which, in a sentence says: “Give me the change (units=chg) in nonfarm payrolls (series_id=PAYEMS) for March 2013 (observation_start=2013-03-01, observation_end=2013-03-01) as of April 5, 2013 (realtime_start=2013-04-05, realtime_end=2013-04-05).
The result of which looks something like this:
<observations realtime_start="2013-04-05" realtime_end="2013-04-05" observation_start="2013-03-30" observation_end="2013-03-30" units="chg" output_type="1" file_type="xml" order_by="observation_date" sort_order="asc" count="1" offset="0" limit="100000"> <observation realtime_start="2013-04-05" realtime_end="2013-04-05" date="2013-03-01" value="88"/> </observations>
To see how it the figure changes by May, we can simply adjust the realtime_start and realtime_end values:
http://api.stlouisfed.org/fred/series/observations?series_id=PAYEMS &observation_start=2013-03-01&observation_end=2013-03-01 &realtime_start=2013-05-03&realtime_end=2013-05-03 &units=chg &api_key=xxxxxxxxxxxxxxxxx
And the reading changes accordingly:
<observations realtime_start="2013-05-03" realtime_end="2013-05-03" observation_start="2013-03-01" observation_end="2013-03-01" units="chg" output_type="1" file_type="xml" order_by="observation_date" sort_order="asc" count="1" offset="0" limit="100000"> <observation realtime_start="2013-05-03" realtime_end="2013-05-03" date="2013-03-01" value="138"/> </observations>
Note that if you want to see all of the values measurements of the nonfarm payrolls for March 2013 that existed between, say, April 5 and June 5, you could change the realtime_start and realtime_end to match that range and the API will return each value, with the “realtime_start” and “realtime_end” attributes describing the dates (within the range provided) for which the value was considered “current.” This is only allowed for raw measurements, however, not for calculated units (which in our first two cases was “chg”). That’s in part because transformations of the data (usually specified via “unit” or “frequency” parameters) are calculated on the fly by AL/FRED, not provided by the original data source (such as the Bureau of Labor Statistics).
For this reason, too, there can be some irregularities in what the API returns for aggregated data, occasionally resulting in an entry being returned by the API whose value is simply “.” As this post describes, the “.” value means that no appropriate reading exists for the particular combination of realtime/observation dates and/or frequency/units entered. Here at Data Docs, we encountered this initially last February, as we tried to pull the most recent dozen or so quarters of unemployment data for our first episode. Because by the middle of February some of the data exists for Q1 2013 (e.g. the January and February monthly readings), the result of the API call contains an element for that quarter. However, because FRED cannot calculate and accurate quarterly unemployment reading until March’ data is available, the value returned in that entry will simply be “.” So if you’re planning to make use of AL/FRED’s transformation tools, make sure you build in handling for that flag.
While ALFRED’s vintages for big indicators may go back as far as the fifties, says Taylor, versioned data for many others may only be available from 2005 or later. “When the series start varies,” warns Essig. To check the earliest vintages available for a particular indicator, use the vintage dates portion of the API. In general, a vintage date is only entered if an actual release or revision of the series occurred on that date.
Though Taylor and Essig’s teams continue to expand the series available through AL/FRED, the dozen or so people who comprise their teams have other responsibilities as well. Still, they’re eager to have AL/FRED be useful to a wider audience, even promising to consider the possibility of supporting JSON output. In the meantime, if you’d like to learn a bit more about AL/FRED API, try exploring the links below, or reach out to them directly at: firstname.lastname@example.org. They’ll even answer the telephone!