Online Labor: October 2011

Saturday, October 29, 2011

Should Online Labor Markets Set a Minimum Wage?

Some critics of online labor markets mistakenly believe that the platform creators have an incentive to keep wages low. Employers have that incentive, but all else equal, the creators of online labor markets want to see see wages rise, since almost all of them take a percentage of earnings. At least from a revenue standpoint, they should be indifferent between more work at a lower price or less work at a higher price, so long as the wage bill stays the same. It would be a different story if the platforms could somehow tax employers on the value-added by the platform, but so far, that's not possible.

One tool for raising wages might be for the platform to impose a minimum wage. It's certainly possible that imposing a binding minimum wage would increase platform revenue---it depends on the relative labor supply and demand elasticities, as well as the marginal cost of intermediating work. A platform's costs of good sold (i.e., intermediation service) is not precisely zero---there are server costs, customer service costs, fraud risks etc, so there is some wage at which the platform would be better off not allowing parties to contract.

Moving from generalities, let's look at workers from the Philippines, who (a) make a big chunk of the workforce on oDesk and (b) generally do relatively low-paid work (e.g., data entry, customer service, writing etc.) and thus would be most affected by a minimum wage imposition. If we look from about 2009 on (when the Philippines first started to become important), we can see that wages are basically flat, perhaps with a slight rise in some categories.

We can see that mean hourly wages range from $3 (for low-skilled data entry work) to about $8/hour for software development. By US standards, $3/hour is quite low---it's less than half the US federal minimum wage. However, let's look at where $3/hour wage puts someone in the the Philippine household income distribution, assuming they work 40 hours a week, 50 weeks a year:

Unfortunately the I couldn't get a more refined measure of income, but my eye-ball estimate is that $3.00/hour is at about the 50th percentile of the distribution. The equivalent hourly wage for median household income in the US is about $31/hour (using 2006 measure from Wikipedia) using the same 50 weeks a year, 40 hours a week formulation. It's important to note that this is household income, meaning that in many cases it is the combined income of a husband, wife and working-age children. And although online work does require a computer and a good internet connection, it does not require spending money on transportation, work clothes, food prepared outside the home etc. It also probably lets workers economize on child care e.g., I might be willing to let a 10 year old watch a 3 year old if I'm in the next room, but not if I'm across town.

So, what's the conclusion?

From a platform perspective, I can concede that imposing a minimum wage could be revenue-increasing, but it depends on some pretty hard to estimate factors: how well do we know the elasticities? Are the long and short-term elasticities the same? What happens if we can get our intermediation costs down? Implementation-wise, enforcement might be very hard---I could easily imagine workers giving under-the-table rebates.

From a worker/welfare perspective, a minimum wage would clearly help some but hurt others. Any binding minimum wage is going to price some workers out of the market. How do we weigh their lost opportunities against the increased wages paid to those that see a bump? This starkly highlights one of the real drawbacks of a minimum wage as social policy, which is that it might be globally progressive and yet highly locally regressive for workers on the bad side of the cut-off.

I'd love to hear both employer and worker perspectives on this---feel free to comment here & I'll respond.

Like this? Follow me on twitter.

Thursday, October 27, 2011

Workers-as-Bundled-Goods

Bigger Big Mac by Simon Miller, on Flickr

Image by Simon Miller

A standard pricing strategy in many industries is bundling goods, e.g., productivity "suites" like Microsoft Office, value meals at fast food restaurants, hotel and flight combos, etc. In the labor market, we also see a kind of bundling, though not by design: each worker is a collection of skills and attributes that can't be broken apart and purchased separately by the firm. For example, by hiring me, my company gets my writing, meeting attendance, programming, etc.; they can't choose to not buy my low-quality expense-report-filing service.

Good mangers deal with this bundling by keeping workers engaged at their highest value activity. However, every activity has decreasing marginal returns, so even activities that start out as high-value eventually reach the "flat of the curve" where the marginal benefit of more of X gets pretty small. This phenomena gives large firms an advantage, in that their (generally) larger problems give workers more runway to ply their best skills (by the same token, small firms have to worry much more about "fit" within their existing team).

While pervasive, this flat-of-the curve dynamic and the resulting small-firm handicap is not a fundamental feature of organizations or labor markets--it springs from the binary nature of employment. It goes away or it least is diminished if a worker can instead being partly employed (i.e., freelance) at a number of firms, each paying the worker to do what they do best. To date, the stated value proposition of most freelancing sites has been that they allow for global wage arbitrage. Obviously that's important, but I suspect this "unbundling" efficiency gain will, in the long term, have a more profound effect on how firms organize and how labor markets function.

Saturday, October 8, 2011

Some light data munging with R, with an application to ranking NFL Teams

I recently submitted this blog to R-bloggers, which aggregates R-related blog posts. It's a fantastic site and has been invaluable to me as I've learned R. One of my favorite kinds of articles is the hands-on, "hello world"-style weekend project that dips into a topic/technology, so here's my first attempt at one in this style.

First, some background: I've been working with Greg on a project that analyzes the results of two-person contests. An important part of the problem is comparing different ranking systems that can adjust for the strength of the opponent (e.g., Elo rating system, TrueSkill, Glicko, etc.). As I understand it, all of these systems are working around the intractability of treating this as a purely Bayesian solution and try to deal with things like trends in ability, the distribution of the unobserved component, etc.

We're still collecting data from a pilot, but in the interim, I wanted to start getting my feet wet with some real competition data. Sports statistics provide a readily available source of competition data, so my plan was:

Pull some data on NFL games on the 2011 season to date.
Fit a simple model that produces a rank ordering of teams.
Pull data on ESPN's PowerRanking of NFL teams (based on votes by their columnists), using the XML package.
Make a comparison plot, showing how the two ranks compare, using ggplot2.

For the model, I wanted something really simple (hoping no one from FootballOutsiders is reading this). In my model, the difference in scores between the two teams is simply the difference in their "abilities," plus an error term:

$\Delta S_{ij} = \alpha^H_i - \alpha^A_j + \epsilon$

where the alpha's are team-and-venue (e.g., home or away) specific random effects. For our actual rating, we can order teams based on the sum of their estimate home and away effects, i.e.:

$\hat{\alpha}_i^H + \hat{\alpha}_i^A$

Estimating the 32 x 2 parameters---given how little data we actually have---would probably lead to poor results. Instead, I used the excellent lme4 package which approximates a Bayesian estimation where we start with a prior that the alpha parameters are normally distributed.

Putting the last thing first, here's the result of 4), comparing my "homebrew" ranking to the ESPN ranking, as of Week 5 (before the October 9th games):

No real comment on my model other than it thinks (a) that ESPN vastly overrates the Chargers and (b) more highly of the Ravens.

The code for all the steps is posted below, with explanatory comments:

Sunday, October 2, 2011

All public government data should be easily machine readable

The Bureau of Labor Statistics (BLS) has an annual budget of over $640 million (FY 2011), a budget they use to create and then distribute detailed labor market data and analysis to policy makers, researchers, journalists and the general public. I can't speak to the "creation" part of their mission, but on the "distribution" part, the are failing---organizations with tiny fractions of their resources do a far better job.

It's not the case that government IT is invariably bad---the Federal Reserve Bank of St. Louis has an amazing interface (FRED) and API for working with their data. Unfortunately, not all government statistics are available here, especially some of the more interesting BLS series.

The essential problem with BLS is that all of their work products---reports, tables etc.---are designed to be printed out, not accessed electronically. Many BLS tables are embedded in PDFs, which makes the data they contain essentially impossible to extract; non-PDF, text-based tables, which are better, are difficult to parse electronically: structure is conveyed by tabs and white space, column headings are split over multiple lines with no separators; heading lengths vary etc.

Why does it matter? For one, when users can access data electronically, via an API, they can combine it with other sources, look for patterns, test hypotheses, find bugs / measurement errors, create visualization and do all sorts of other things that make the data more useful.

BLS does offer a GUI tool for downloading data, but it's kludgy, requires a Java Applet, requires series to be hand-selected and then returns an Excel(!) spreadsheet w/ extraneous headers and formatting. Furthermore, it's not clear what series and what transformations are needed from GUI-data to make the more refined, aggregated tables.

To illustrate how hard it is to get the data out, I wrote a python script to extract the results this table (which shows the expected and estimated changes in employment for a number of industries). What I wanted to do was make this, which I think is far easier to understand than the table alone:

To actually create this figure, I needed to get data into in R by way of a CSV file. The code required to get table data into a useful CSV file, while not rocket science, isn't trivial---there's lots of one-off/hacky things to work around the limitations of the table. Getting the nested structure of the industries e.g., ("Durable Goods" is a subset of "Manufacturing" and "Durable Goods" has 4 sub-classifications) required recursion (see the "bread_crumb" function). FWIW, here's the code:

Most of the code is dealing with the problems shows in this sketch:

My suggestion: BLS should borrow someone from FRED and help them create a proper API.

Saturday, October 1, 2011

We can always get jobs working at the local robot factory

The key quote from Slate's recent "robots-will-take-our-jobs" article:

"Most economists aren't taking these worries very seriously. The idea that computers might significantly disrupt human labor markets--and, thus, further weaken the global economy--so far remains on the fringes."

And rightly so. Obviously predicting the future is a fool's errand, and perhaps advances in AI and robotics will ultimately radically reduce the demand for human labor, but a recent article highlights how technological advances can just as easily increase labor demand: NPR reports on an oil-related boom in North Dakota that's driven unemployment close to zero and brought thousands into the state. The cause of the boom is not high oil prices per se---it's technological developments like fraking & horizontal drilling that make formerly non-viable deposits economical to extract.

Obviously technological can displace human laborers and have large effects on the composition of jobs and the returns to different skills, but history and economics both suggest that technology-driven fears about labor markets are overblown.