Wednesday, October 31, 2012

More on Likely Liars and the Ignored, Important 2.5%

Many (ok, I assume many, since I'm not sure anyone really was all that troubled by it) might have been troubled by the little bit of sleight of hand in the previous post (Likely Liars) where I magically assumed that people who ex post lie about voting have similar profiles and incentives to those who ex ante lie about expecting to vote.  Luckily, via Sasha Issenberg, we have evidence that this isn't purely speculation. A paper by Todd Rogers and Masa Aida (the wonderfully named "Why Bother Asking?" cuts directly to the point, using a similar (but temporally reversed) version of the methodology in Ansolabehere and Hersh. That is, they check the answers that pollsters received about intention to vote prior to the election against public voting databases, tabulate the discrepancies, and analyze the determinants of the differences. The key results:

First, consistent with research on social desirability, in each election a meaningful fraction of those who say they will vote, do not. Second, in each election a surprising proportion of respondents who say they will not vote, in fact, do vote -- a proportion that rivals the proportion who erroneously predict that they will vote but, in fact do not vote. Third, past vote history is a much better predictor of turnout than self-reported intention to vote.
Unfortunately, we still know precious little about the demographics that drive this decision, except that the lying decision tends to be correlated with the benefits maintaining a self-image consistent with that of an informed "voter". This is exactly the sort of mechanism driving the results in Ansolabehere and Hersh, so I'm now less troubled about the temporal inversion. I AM incredibly interested in what drives people to say they don't intend to vote (or are not likely to) but then actually vote. This isn't a huge percentage of the population (about 5%ish) but over half of them vote anyway, and we can't don't even measure their intentions.

Thursday, October 25, 2012

Likely Liars, or Why You Should Ignore Anything You Hear After "Poll of Likely Voters" If You've Ever Been To a Bar

Confused yet about what a "likely voter" is? Yeah, me too. I mean I know what a registered voter is, and I know what an actual voter is, but "likely"? That sounds like statistical mumbo-jumbo to me. Luckily, that's exactly what we're here clear up what might be a relatively significant bias in recent (i.e. this year and 2008) polling models.  But bear with me, the statistics get a bit more tricky before they get any clearer, although the payoff is fairly substantial in my humble opinion.

Wanna skip the stats? "Likely" voter polling results are likely to be wrong, because the same people who are most likely to say they'll vote are also the same people who are most likely to lie about it. At best, this makes the forecast a lot more error-prone; at worst, it adds serious bias to the results.

First, let's just be clear about what (most) polling firms do. They select a mode (or several) of methods to contact eligible voters (think calling random numbers of either landlines or cellphones, sending mail or email, etc.) and from respondents they form a (noisy) sample of the representative voter population. Often this involves overweighting or underweighting survey responses to properly match the known demographics of registered voters (or more simply, just the adult population) in a particular polling region. From this, they extrapolate support within a given confidence interval for a particular candidate if (assuming that their weighting is right) all registered voters voted.

But only 90% of registered voters ACTUALLY vote (yeah, I know, commentary on state of participation etc. for another time, although this number is much higher than I thought it was, given the wide discrepancy between registered and "likely" results), so polling firms try to adjust their survey respondents (again through weighting or in some cases, trimming) to more accurately represent those who are going to actually put the time in on election day. Their methods for doing this are mostly proprietary, but Gallup has some notes on how they adjust here. To quickly summarize, you are only a likely voter if you said you did all of:
  • know where your polling place is
  • have voted there in the past election
  • you voted in the last presidential election
  • you respond that you "always" vote
Gallup has some adjustments for very young voters (i.e. < 21) and other minor tweaks. Gallup just drops responses if they don't correspond to these constraints, and generates a new "likely" voter result. Other polls may employ a more sophisticated weighting scheme based on core demographics of the last few election actual votes. Of course, it is well known that this biases the survey to an even older, more stable, population than its registered voters result and so it is often assumed that the likely voter model tends to magnify Republican support. Which is exactly what you want, since traditionally these demographics are associated with actual voter turnout.

But what if people are lying?

More accurately, what if lying is correlated systematically with various partisan demographics?

What you have is a situation in which the likely voter model not only adds noise, but actually biases the results such that you would be better served not doing the adjustment at all. Some analysis here and here, which seems indicative that in 2008 there was a breakdown in the accuracy of likely voter models vs. registered voter results.

If people who vote look like people who don't vote (that is, the 1/3rd that doesn't vote looks exactly like the 2/3rds that do), then you should choose "likely" voters at random. The main effect here is that the result will the same as the registered voter survey, but with a higher variance.

What do we need to have that happen? People who report as likely voters need to be lying about voting in greater proportion than those who don't. Huh? How can we ever know that?

Enter the very smart Stephen Ansolabehere and Eitan Hersh with their recent and hopefully no longer overlooked paper: "Validation: What Big Data Reveal About Survey Misreporting and the Real Electorate." Main point:
We show that studies of representation and participation based on survey reports dramatically mis-estimate the differences between voters and non-voters.
What do they do to get there? They carefully compare survey reports of election behavior with actual voting outcomes, and then investigate the demographic determinants of of mis-reporting, which I less charitably call lying. They find that people lie when it's important to them to lie; the most likely to lie about voting are politically engaged, well-educated, church-going, partisan (for both parties) males.

Now to be fair to Ansolabehere and Hersh, I'm going to play fast and loose with their study from here out, but that's why it's my blog and not theirs. I'll at least tell you what I'm doing, in as straightforward a manner as I can.

First, their study asks people after they voted (or didn't) whether they voted, not their ex ante intentions measured against their actual outcomes. To link this to current surveys, I need to assume that these are temporally stable; that is, the same people who lie about having voted are also the same people who lie about expecting to vote, or at least there isn't a massive degree of difference between them.

Formally, what Ansolabehere and Hersh report is the probability that you report you voted conditional on the fact that you didn't. What we need is the probability that you don't vote conditional on the fact that you reported you would (i.e. the inverse). But no worries, that's what we have Bayesian identities for and our temporal assumption above. Given values for the probability of not voting and the probability of reporting that you will vote, we can generate what we need easily. The bottom line: For the values reported in their paper, there's about a 20% chance that you report you'll vote when in fact you won't. If this was random, this would just add significantly to the reported errors in the polling results. But it's not.

Since political science believes in data replication, I can very easily regenerate these values for various dimensions directly from their data. For simplicity (it's a blog dammit), we'll look at men vs. women. Unsurprisingly, men lie about voting 25% of the time, while women only lie about 14%; in other words, men lie about voting 2/3rds more than women do. Yeah, I know, I've been to a bar or three in my time, so this isn't the interesting thing.

The bottom line? If "likely voter" numbers are based on self-reporting and males favor Republicans more than women, then the likely voter results are biased versus what will actually occur. Why? Because men are more likely liars than women. And that's a different type of gender gap, one that potentially biases the "likely voter" polls that virtually all polling organizations will be reporting between now and November 6th.

My advice? Just look at the registered voter polling numbers; a lot of the people who report as "likely" are unlikely to actually show up.

And we're not even talking about the people who say they won't show up, but actually do.

Wednesday, October 10, 2012

Doubling Down on Stupid, or Ignorant or Stupid, Take 2

Ok. Now it's gotten just batshit crazy here. Jack Welch, famous for "earnings management" and accounting fraud, is questioning the employment report by the BLS.  He apparently thinks that the BLS operates like GE or something. Now I know we all get a bit emotional. It's natural to get excited, and sometimes in the heat of the moment we all make mistakes. Thanks to Twitter, these mistakes are memorialized forever, and they can reach over a million people (# of Jack Welch followers...who ARE you people?) instantly. It used to be that when you said something stupid at a party, or to your friends, you'd just apologize and life would roll on. That's life. What's really, well, dumb, is to double down on stupid by writing an opinion piece in the WSJ. But sure enough, that's what Jack Welch did. And it's shameful for the reasons so eloquently written here. Go read that. I'll wait.

Ok, now onto the actual editorial. Leaving aside all of the crazy conspiracy that would be needed to fake the jobs report (I and virtually no one outside of the land of conferences on obscure time series filtering econometrics has any idea how the seasonal adjustments are made for instance), let's just look at the big question.

Jack Welch, where were you in December 2010? That's the last time the unemployment rate fell by more than 3/10ths of a percent. Did you claim conspiracy then? Or is that when it started... Where were you the 63 other months when the rate fell by at least that much?  Here's the jobs report from December 2010. It shows a drop from 9.8% to 9.4% with an establishment jobs number that increased by 104,000 jobs. And no one even noticed.

Hey Jack. About that massive government hiring spree the government went on. From February to April of this year, the government shed 721,000 jobs, and Jack is right, in the last two months, the government added 602,000 jobs. So, the Obama administration fired all those people and then hired them back? I want some of those good drugs, Jack. Oh yeah, and all of those Republican governors and mayors and county officials were all in on it too, since most of those employees are state and local.

What about labor-force participation? Yeah, it's trending down, and been trending down since the giant recession we just had. It's at a low level now, but it was at a low level last month and the month before and the month before, etc. Welcome to the land of labor statistics, Jack, where the trend is your friend.

Now I think that Jack Welch knows a little bit about labor statistics, but not all that much. I know he read the page on the website about data collection and he apparently thinks that makes him an econometrician. That makes him ignorant. But doubling down and equating the USA with Soviet Russia? That's stupid.

Tuesday, October 9, 2012

Our Bad. The IMF Dumps The Chicago Story and You Can Too!

Via Krugman and (much more in depth, Antonio Fatas), we get the biggest economic news of the week: turns out when governments spend money, it's not immediately offset by something else, and economies tend to grow. Those of you who may have missed the last 10 20 years might have trouble understanding what all  the commotion is about here, since this used to be the consensus on this topic, and quite frankly it seems logical to anyone who hasn't been forced to sit through advanced macroeconomic seminars or a tea party rally. Which of these two stories seems to make the most sense?
  1. The government contracts to build a road, and spends money to do so. It pays workers, aggregates companies for the asphalt, etc. So wages to construction workers and profits to companies that make road materials go up, and thereby total income in the US goes up by at least as much as the wages and profits in total. Hey, if the workers ever go out for a drink, or some dinner, or use the money to buy a car, it might even go up a bit more, as other folks' incomes rise too (the waitress, the auto salesman, etc.)
  2. The government contracts to build a road, and spends money to do so. It pays everyone like before, and the road gets built. But somewhere, someplace else in the US, another company (or person) thinks, "Wait! The government borrowed that money, and sooner or later it's gonna ask me for more money. Therefore, instead of building that factory (or buying a new washing machine, or going out to dinner) I'm going to be super smart and postpone all of that purchasing, save the money in a rainy day escrow fund to protect against that inevitable government tax increase."  All that postponing offsets the initial road building income, and total production/income in the US stays the same.
I know, #2 sounds a bit far fetched, but maybe you ARE already an economist and you still buy into these "magically offsetting" hyperrational permanent income ideas. Now, let's assume that interest rates are roughly zero, so that by saving I mean basically stuffing all of your money into a mattress. Still believe that folks don't want to invest or buy washing machines because they'd rather save in their mattress for the inevitable doomsday tax increase someday?  Keep in mind that the government borrowed at 0%, so even if you want to "save" for a rainy day and offset the initial purchase you'd have to have NO investments that yield any positive return (or zero convenience yield for any purchases), because in that world you'd still build the factory and use its profits to pay off the future tax increase. Still buy story #2? Congratulations, welcome to your career in macroeconomics at the University of Chicago.  Outside in the real world where actual people live, we're going with story #1.

Oh yeah. You know which other crazy radicals believe story #1? The IMF. Here's the statistical box...see all of those -1's in the left column? 

Let me interpret. They're the error in the multiplier used to estimate the impact of government spending on growth. Multiplier? Huh? Remember story #1? If the government spending led to an increase in income of exactly the same amount, we say the spending had a multiplier of 1 (or that no one "offset" any of it). If story #2 was correct, we'd see a multiplier much less than 1 (say 0 if you want to make Eugene Fama happy!). But what the IMF is saying here is that they messed up. They used a .5 number as a multiplier and it really was about 1.5. Oh, and sorry about the austerity and all that suffering and unemployment the past two years. Our bad. We'll buy you a cognac and some foie gras to apologize. Oh, but you don't have anything to wear to our club? And you want a job? Tsk Tsk. Why don't you just be glad we admitted we were wrong.

The punchline? Multipliers are bigger than 1, unless interest rates are already high or we're at full employment. And if you tell me that .25% is high or that we're always at full employment (or there now) let me wait while I go find a wall to punch and a bottle of whiskey to drink.

Friday, October 5, 2012

Meat Loaf, the Rear View, and Payroll Employment

You know that warning about "Objects in the Rear View Mirror Are Closer Than They Appear"? I love that. It's also a song by Meat Loaf, which is a subject for a whole other day, or maybe lifetime.  What might not be apparent is that safety, Meatloaf Meat Loaf, and the Payroll Employment Report have something in common. What? They aren't the same looking backwards. There's a lot of good reasons for this, and it's something that the BLS is very up front about, since the initial estimate of jobs comes from relatively poor data, whereas the later revised data is much more accurate. Let's go to the BLS:
Given this short collection cycle for the first preliminary estimates, many establishments are not able to provide their payroll information in time to be included in these estimates. Therefore, CES sample responses for the reference month continue to be collected for two more months and are incorporated into the second preliminary and final sample-based estimates published in subsequent months. Additional sample receipts are the primary source of the monthly CES employment revisions.
As we replace statistically projected data with real data, things get better. What is not as well known is that these revisions are not necessarily uniform through the business cycle. Let's add up the total job revisions in each year and graph them:
 Big negative revisions in recession year (really lagging the onset of recession year), positive revisions elsewhere. All this means is that the statistical projection used by the BLS is conservative; it doesn't want to go nuts every month and scream about jobs and have to explain this whole "statistics" thing over and over again. In normal times, this works great. But in recessions and recoveries, this conservatism means that recessions are initially understated as are subsequent recoveries. While you weren't watching, 877,000 jobs disappeared in 2008, and a million more appeared in 2009 to the present.

News media? We're waiting...

Thursday, October 4, 2012

Ignorant or Stupid, Part 1

William King has an article in the Houston Chronicle titled "The Real Story Behind Rising Unemployment" which he pitches as a post-partisan, reasoned discussion of the impact of government regulation on employment in America. It is both wrong in its conclusions and misleading at best in the use of "facts". His primary conclusion is that since the unemployment rate has a rising trend since 1950, and he asserts that there has been massive regulation of the workplace constantly since the same time, these two trends cause each other. I'll quote:

So if unemployment has been getting worse for the last 60 years, including periods when both parties were in power, it would seem to raise the question whether there is a cause for it other than the ideological hogwash to which we are constantly subjected by both parties.
The answer is no mystery. Anyone who has been an employer for the last several decades, as I have been, can easily provide the answer. Our tax and regulatory policies have increasingly discouraged hiring more employees.
 I'm going to leave all of the easy criticisms to the side...well, I'll just list them at the bottom so we can get on with the really wrong stuff. 

Problem #1: The unemployment rate is a rate. If you're going to write about something, you should know what it is...if you don't know, then you're merely ignorant. If you do know what it is, but still get things wrong about it, well then, you're stupid. The unemployment rate is the number of unemployed people divided by the labor force. What's the labor force? Well, I'm glad you asked. It's a construct of the Bureau of Labor Statistics that measures the number of persons available for work. This changes every month along with the number of unemployed persons, and for that reason, virtually all serious economists and most assuredly the Federal Reserve ignore the unemployment rate and look at the employment to population ratio. This directly measures the number of jobs as a percentage of the total population, which is exactly the measure you'd like for a long run analysis of trend employment. (Actually, for this ratio you'd like the payroll employment number, to avoid problems with self-employment and agriculture effects, but both measures track each other very well.)

Problem #2: The employment to population ratio is skyrocketing up over the period Mr. King claims has been the height of regulatory burden for US employers. Let's take a look:

From 1960-2000, the percentage of employed population went from 55% to 65%, with most drops associated with...recessions (as you note, unemployment rises in a recession when demand drops, which is an assertion that actually is true). This is the giant period of regulation that Mr. King will be telling us about in subsequent weeks; it's associated with massive job creation even above and beyond the absorption of all those newly born workers. In fact, the most anemic rebound from a recessionary drop is associated with the relatively (financially at least) deregulated 2000's.

This is what happens when you start with conclusions and try to find facts. Play me out, Talking Heads:
Facts are simple and facts are straight 
Facts are lazy and facts are late 
Facts all come with points of view 
Facts don't do what I want them to 

P.S. I didn't forget about all the other problems.
  • Correlation/causation, blahblahblahblah
  • No evidence provided about the trend of business regulation and its covariation with the unemployment rate...
  • International comparisons to prove the point...wait, they mostly go the opposite way. Why is unemployment so low in China, for instance, but the US ranks 4th in the World Bank Doing Business Index and China ranks 91st?
  • And finally, for all you "mathematicians" out there, I'll note that regressing the unemployment rate on ANY measure of business regulatory burden is a horribly wrong thing to do, since the labor force is endogenously determined by the regulatory climate.