Online Labor: February 2012

Tuesday, February 21, 2012

Economics of the Cold Start Problem in Talent Discovery

Tyler Cowen recently highlighted this paper by Marko Terviö as an explanation for labor shortages in certain areas of IT. The gist of the model is that in hiring novices, firms cannot fully recoup their hiring costs if the novices' true talents will become common knowledge post-hire. It's a great paper, but what people might not know is that the theory it proposes has been tested and found to perform very well. For her job market paper, Mandy Pallais conducted a large experiment on oDesk where she essentially played the role of the talent-revealing firm.

Here's the abstract from her paper:

... I formalize this intuition in a model of the labor market in which positive hiring costs and publicly observable output lead to inefficiently low novice hiring. I test the models relevance in an online labor market by hiring 952 workers at random from an applicant pool of 3,767 for a 10-hour data entry job. In this market, worker performance is publicly observable. Consistent with the models prediction, novice workers hired at random obtain significantly more employment and have higher earnings than the control group, following the initial hiring spell. A second treatment confirms that this causal effect is likely explained by information revelation rather than skills acquisition. Providing the market with more detailed information about the performance of a subset of the randomly-hired workers raised earnings of high productivity workers and decreased earnings of low-productivity workers.

In a nutshell, as a worker, you can't get hired unless you have feedback, and you can't get feedback unless you've been hired. This "cold start" problem is one of the key challenges of online labor markets, where there are far fewer signals about a worker's ability and less common knowledge about what different signals even mean (quick: what's the MIT of Romania?). I would argue that scalable talent discovery and revelation is the most important applied problem in online labor/crowdsourcing.

Although acute in online labor markets, the problem of talent discovery and revelation is no cake walk in traditional markets. Not surprisingly, several new start-ups (e.g., smarterer and gild) are focusing on scalable skill assessment, and there is excitement in the tech community about using talent revealing sites like StackOverflow and Github as replacements for traditional resumes. It is not hard to imagine these low-cost tools or their future incarnations being paired with scalable tools to create human capital, like the automated training programs and courses offered by Udacity, Kahn Academy, codeacademy and MITx. Taken together, they could create a kind of substitute for the combined training/signaling role that traditional higher education plays today.

Like what you read?
Why not follow me on twitter or subscribe to this blog via RSS?

Monday, February 20, 2012

Solvate joins the deadpool

Techcrunch and Betabeat are reporting that Solvate, a platform for remote work, is shutting down. Unlike oDesk, Elance, Freelancer etc., they were not trying to create a true marketplace: they were trying to do more of a high-touch, human-in-the-loop matching service.

In the email Solvate sent to their users about the shutdown, they explicitly cited scalability issues, which I'm guessing refers to the non-sustainable effort and cost of hand-matching buyers and sellers. I wouldn't say this is definitive proof that the high-touch matching business model doesn't work (my outsider impression is that GLG is killing it), but it is a reminder that the value-added from your human-in-the-loop matching has to be sufficiency high that you can re-coup your costs: you can't take a hit on every unit sold but make it up on volume.

I think it's too bad they are shutting down---I would have liked to see how their approach to online labor would have evolved. That being said, I personally found their emphasis (at least in their marketing copy) on US-based workers off-putting. Solvate's CEO was quoted extensively in a Gigaom article, in which he claimed that online labor markets were undermining US workers. He also suggested that by relying only on US-based workers, Solvate could promise a higher level of talent and expertise. All online labor markets have to find ways to help workers credibly demonstrate their talents, and using crude geography-based proxies for talent is an approach, but not a particularly admirable one. To me, the whole ethical/moral "so what" of online work is that geography and nationality doesn't have to matter.

A a coda, here is my response to the original Gigaom article:

Full disclosure: I’m the staff economist at oDesk and these opinions represent my own views.

A couple of thoughts:

Like any competitive market, the forces of supply and demand are going to determine prices in these online markets. With the opening up of new countries that have large, reasonably well-educated, internet savvy populations, supply increases which will tend to drive down wages. On the other hand, these markets (and the ability to break work up into small, outsourceable bits) also make it possible to outsource more work, increasing demand, and hence prices.
At least within oDesk, we haven’t seen strong trends in wages, though presumably this article is talking about freelancers in general and we obviously don’t have visibility on their wages.
As a practical matter, I don’t think workers in developed countries like the US can’t compete in these markets—they actually have a lot of advantages: perfect english, same time-zone, familiarity with US business culture/expectations etc. Further, price matters, but it’s not the only thing. For what it’s worth, I work with many oDesk contractors and the break-down is 1 x US, 1 x Italy, 1 x Russia, 1 x Pakistan and 2 x Philippines.
The efficiency and distributional effects of information and communications technology are complex and the evidence is ambiguous, so I’d be skeptical of anyone offering a definite answer to these kinds of questions. There was an interesting Quora thread on this topic.
I think focusing on what these markets do for relatively well-paid workers in developed countries misses one of the most important moral facts about these markets, which is that they generate new, relatively well-paid, meaningful work opportunities for people in developing countries. It’s obviously not a random sample of our workers, but If you spend a few minutes on oDesk’s Facebook fanpage and look at the comments and stories, it’s clear that online work is improving lives in a pretty dramatic way.

Saturday, February 18, 2012

High-wage skills on oDesk (or why you might want to learn Clojure if you're not a lawyer)

Update: Hello HackerNews readers. One thing that I discussed but probably didn't emphasize enough is that this data show the correlation between listed skills and offered wages---you absolutely cannot infer a causal relationship (my cheeky title notwithstanding). Unless I get to create and run a massive skills training program experiment, it's going to be hard to get at causality. But I can do something about the offered/earned distinction. If you don't want to miss my follow-on blog post where I explore the relationship between skills and actual earned wages from actual projects, follow me on twitter.

oDesk recently introduced a controlled, centralized vocabulary of about 1,400 skills for buyers and contractors to use when posting jobs and creating profiles. The primary motivation for the change was to make it easier for buyers and sellers to find each other: without a standardized vocabulary, would-be traders can fail to match simply because they use different terms for the same skill.

A side effect of this transition is that high quality data on the relationships between skills and wages are now available. I recently built a dataset of contractors' hourly wages by skill: for each skill, I identified all contractors listing that skill on their profiles and averaged their offered hourly wages. Although contractors are free to offer any hourly wage they like, in my experience, wages offered closely map to actual earnings. However, to reduce the influence of outliers, I restricted the sample to contractors offering between 50 cents and 100 dollars per hour. I also only included skills for which there were 30 or more observations.

In the bar chart below (made using the very cool googleVis package for R), I plotted the top 50 skills, ordered by average hourly wage (here is a "live" version with mouse-over). The top of the list is dominated by high-end consulting areas (e.g., patents and venture capital consulting) or hot newer technologies (e.g., redis and Amazon RDS). The programming language that commands the highest wage is Clojure, which is a rather esoteric skill: it's a lisp dialect that compiles to the Java Virtual Machine (JVM). Perhaps this is the market reflecting Paul Graham's "Python Paradox":

"if a company chooses to write its software in a comparatively esoteric language, they'll be able to hire better programmers, because they'll attract only those who cared enough to learn it. And for programmers the paradox is even more pronounced: the language to learn, if you want to get a good job, is a language that people don't learn merely to get a job."

At the time Graham wrote this, Python was a far less mainstream language, probably analogous to how Clojure is regarded today. It's an interesting pattern, and although they'd cut up my economist membership card if I made a causal claim between knowing Clojure and being able to command hire wages, I'm intrigued by the idea of using online labor markets as a bellwether to help guide human capital choices.

Thursday, February 16, 2012

Why aren't we all freelancers?

Investors typically hold diverse portfolios of assets, with the goal of reducing risk. While diversification is commonplace in investing, most of us have no diversification in our labor income streams: we work at one job at a time, for a single employer. However, the "returns" to a job vary like returns on investments, especially on non-financial dimensions (e.g., engagement, learning, co-workers, working conditions). As in investing, there is also a significant amount of direct financial risk in holding one job---the firm may impose layoffs or go out of business. Given the similarities between jobs and assets, why isn't there a similar impetus to diversify, i.e., why don't we all hold a portfolio of small jobs at the same time, with many different employers [0]?

Some workers---freelancers and independent consultants---do follow this diversified model, but it's hardly the norm of workers generally. Below, I lay out a laundry list of potential economic explanations for why the portfolio/freelancing approach is not more common. What's interesting to me both academically and as someone working at oDesk is that many of these points are not set-in-stone attributes of the productive process but are instead things that smart features or policies might change.

Non-linearity in costs of searching/vetting/bargaining
Hiring a freelancer for a small project is like picking out a fancy restaurant; hiring a full time employee is more like buying a house. The effort of searching and vetting (and thus the cost) is related to the stakes of the hire. However, there is no guarantee that those costs scale linearly with the stakes. Suppose it takes nearly as much effort to find a small job as it does to find a large job---then a portfolio approach will generate larger search costs per dollar earned in wages [1].

Non-linearity in job size and productivity
If you can make X widgets or Y schwidgets in 1 hour, it doesn't mean you can make X/2 widgets and Y/2 schwidgets in 1 hour. Every job has some fixed set-up costs---getting out the materials, remembering the key details, etc. The larger the costs, the less attractive the small job. On the other hand, productivity eventually wanes from boredom, physical fatigue, etc. ("I'm really getting bored with this TPS report---time for some Facebook"). The optimal size job (from a productivity standpoint) might be near or above the current 40 hours per week, 50 weeks a year paradigm, in which case going smaller means getting less efficient.

Complementarities with team members that grow over time
One of the advantages of team production is that workers can share knowledge with each other, motivate each other and generally create an environment where everyone is more productive than they would be working alone. There's no reason teams of freelancers working together cannot achieve the same complementarities with each other, but if these complementarities take time to develop, larger jobs become more attractive.

Firm-specific human capital

If a job requires lots of firm-specific human capital, the per-job learning requirement is high, which tends to encourage larger jobs [2].

Monitoring & policing costs
Once you get a sense of the character and reputation of some trading partner, you don't need to constantly monitor that person/firm. After some level of trust has been established, these costs would fall. This again pushes for larger jobs. This is probably clearer in terms of firms monitoring workers, since the big fear is shirking, but it does go both ways: workers need to make sure their checks don't bounce, that their employers aren't skimming from the 401K, using malk for the coffee service instead of milk, etc.

Employer concerns about IP (broadly defined)
I do not think it is likely to find workers working simultaneously for direct competitors [3], the interests of most firms are fairly orthogonal to each other.

Existing public policy
At least in the US, at the present time, certain realities (health insurance, getting financial credit etc.) are full-time employee advantaged.

[0] Note that this isn't a theory of the firm argument or discussion. I'm assuming that one can be a full employee and reap all the benefits of firm organization / team production even with fractional employment.

[1] One of the reasons mechanical turk is semi-dysfunctional is that when problems arise (about the scope of work, payment terms etc.), all the surplus generated by the relationship is quickly destroyed: one minute thinking, talking and haggling about a task paying pennies is likely to be economically wasteful. This was one motivation for hagglebot.

[2] I think this is why ideal use of online labor is not so much a 1 for 1 replacement of some traditional job, but a decomposition of jobs into easily outsource-able pieces and pieces that require deep firm-specific knowledge.

[3] McKinsey excepted.

Tuesday, February 7, 2012

Writing Smell Detector (WSD) - a tool for finding problematic writing

tl;dr version: WSD is a python tool to help find problems in your writing. Here's the source and here's example output.

In grad school, I wrote a program that used a series of regular expressions to detect "writing smell" (analogous to code smell), i.e., telltale signs of bad writing and mistakes. The rules for smelliness were loosely based on one of my favorite writing how-to's: Style: Toward Clarity and Grace by Joseph Williams.

The program took as input a text file and output was an annotated report with snippets of the offending bits. I used it for all my papers and found it really helpful, but the coding was very, um, academic (i.e., written for use by the person who wrote it) and it was written in Mathematica [1], which was the language I knew best at the time. FWIW, here is my original version.

For a long time, I've wanted to port it to some other language and make it accessible and capable of receiving new rule contributions and explanations. To this end, I recently commissioned an oDesk contractor (utapyngo) to make a more polished, modular version in Python. I think he totally outdid himself. It's got a nice modular model now that lets you easily incorporate new rules and he greatly improved upon my often-flawed regular expressions. Be forewarned---the documentation is non-existent and the rules aren't explained, but I plan to take fix this over time, while I'm using it.

It's open source (courtesy of oDesk, who paid the bills) and available here on github (live example output). To use it, just clone it, install the python package jinja2 and then do:

$ python wsd.py -o output_file.html your_masterpiece.tex

Here's a screenshot of what the HTML output looks like, illustrating the a/an rule (i.e., that it's "an ox" but "a cat"):

Note the statement of the rule, the patterns that it looks for and the snippets. It also has a hyperlink to the full text, which is available at the bottom of the document.

A few thoughts:

If you're interested in contributing (rules or features), let me know.
It might be nice to turn this into a web-service, though my instinct is that someone interested in algorithmically evaluating their LaTeX/structured text isn't going to find cloning the repository & then running a script to be a big obstacle. And they probably don't want to make their writing public.
A few weeks ago, I read this usethis profile of CS professor Matt Might. In the software section of the interview, he said that he had some shell scripts that do something similar. I haven't really investigated, but maybe there's ideas here worth incorporating.

[1] When I told the other members of the oDesk Research / Match Team that I had code for doing this writing smell thing, they were impressed and wanted a copy; when I told them it was written in Mathematica, they thought this was hilarious and mocked me for several minutes. I tried to explain that Mathematica actually has great tools for pattern matching, but this fell on deaf ears.

Monday, February 6, 2012

Minimum Viable Academic Research

A non-viable product in minimum form, courtesy of Flickr

One of the most talked about ideas in the world of start-ups is the notion of the minimum viable product (MVP). The rationale for MVP is clear: you don't want to build products that customers don’t want, never mind waste time polishing and optimizing those unwanted products. "Minimally viable" doesn't even require the product to exist yet---the viability refers to whether it will give you the feedback you need to see if the project has potential. For example, you might do an A/B test where you buy keywords for some new feature, but then just have a landing page where people can enter their email address, thereby gauging interest. The important thing is that it is market feedback, not just opinions of people near you.

In academia, a big part of the the day to day work is getting feedback on ideas. Each new paper or project is like a product you’re thinking of making. So you float ideas with colleagues, your advisers, your spouse, etc., and you might present some preliminary ideas at a workshop or seminar. The problem is that in most workshops and seminars, where you could potentially get something close to a sample of what the research community will think of the final product, the feedback is usually friendly and limited to implementation (e.g., "How convincing is the answer you are providing to the question you've framed?"), instead of "market" feedback on how much "value" your work is creating.

The academia analogue to market feedback on value will come later, in two forms: (a) journal reviews / editor decisions and (b) citations. By value, I mean something like (importance of question) x (usefulness of your answer). At least in economics, knowing what is important is difficult. There is no Hilbert's list of big and obvious open questions. A few such questions do exist, but they tend to be sweeping in nature---e.g., "Why are some countries rich and some countries poor?" and "Why do vacancies and unemployed workers co-exist?"---that no single work can decisively answer. To do real research, you need to pick some important part of a question and work on that.

A fundamental problem is that the institutional framework in some disciplines (economics being one example, though not all---see this recent NYTimes op-ed on scientific works being too short; see here for an economist's take on the topic) requires you to do lots and lots of polishing before you know (via journal rejection/acceptance) whether even the most polished form of your work is going to score high enough on the importance-of-question measure. At seminars, people are usually too polite to say, "Why are you working on this?" or "Even if I believed your answer, I wouldn't care" or "So what?" But that's the kind of painful feedback that would be most useful at early stages. There are some academics that will give that kind of "Why are you doing this?" critique, and while they are notorious and induce fear in grad students, the world needs more of them. (I once gave a seminar talk where an audience member asked, "How does this study have any external validity?" And I had to admit he was right---it had none. I dropped the project shortly thereafter, after spending the better part of 3 months working on it.)

It's not that people won't be critical in seminars. You'll generally get lots of grief about your modeling assumptions, econometrics, framing etc. But those are easy critiques (and they let the critics show off a little). It's the more fundamental critiques about importance/significance that are both rare and useful. In academia, you really, really need the importance/significance critique because you can work on basically anything you want, literally for years, without anyone directly questioning your judgment and choices. And while this gives you tons of freedom and flexibility, you might waste significant fractions of your career on marginalia. I also don't think it's the case that if you're good, you'll simply know: I've heard from several super-star academics that their most cited paper is one they didn't think much of when they wrote it and Their favorite paper has languished in relative obscurity. One interpretation (beyond Summer's law) is that you aren't the best judge of what's important.

How does one get more importance-of-question feedback?

In economics, there's a tendency (need?) to write papers that are 60 page behemoths, filled with robustness checks, enormous literature reviews, extensive proofs that formalize somewhat obvious things, etc. This long, polished version really is the minimally viable version of the paper, in that you can't safely distribute more preliminary, less polished work (people might think you don't know the difference). I think on the whole, this is probably a good thing. But it's often not the minimally viable version of an idea. Often the "so what" of a paper is summarized by the abstract, a blog post, a single regression, etc.

I'm not sure what the solution is, but one intriguing bit of advice I recently received from a very successful (albeit non-traditional) researcher was to essentially live-blog my research. There's actually very little chance of being "scooped”; if anything, being public about what you're doing is likely to deter others. And, because it's "just" a blog post, you nullify the "they don't know the difference between polished and unpolished work." The flip side is that I think there's a kind of folk wisdom in academia that blogging pre-tenure is a bad idea (I imagine the advice is even stronger for a grad student pre-job market). But if you were doing it for MVP reasons / feedback reasons, the slight reputation hit you'd take might be offset by the superior "so what" feedback you might get from doing such a thing. Anyway, still thinking about this strategy.*

* Beyond the purely professional strategic concerns, it might actually move science along a little faster and make research a bit more democratic and open.

Saturday, February 4, 2012

Stereotypes about animals (and children) as revealed by Google auto-suggest

I saw this tweet by @m_sendhil, which had a screenshot of Google's auto-suggest for "why are indians so," which contained a collection of (often contradictory) stereotypes (e.g., fat and skinny). I began doing the same exercise for other nationalities and ethnic groups, products, animals etc.

Here is the screenshot for turtles (which apparently have lots of fans):

It was interesting to me how many of the supposed attributes showed up repeatedly across entities. This gave me an idea: I should turn this procrastination/time-wasting into something more useful, which was to learn how to make graph/network plots with the python package networkx (code below). Here is the result, using the top 4 auto-suggests for cats, children, cows, dogs, frogs, goldfish, hamsters, mice, turtles and pigs. Entities are in blue, attributes in red. Edges are drawn if that attribute was auto-suggested for that entity.

Some observations
I'm guessing the "addicting" and"good" attributes of goldfish refers to the cheesy snack cracker and not the actual fish. People seem to be rather ambivalent about children. I'm kind of surprised that people were not wondering why dogs are smart. Finally, are pigs actually salty (this seems unlikely), or this just how pork is usually prepared?

The code:

Wednesday, February 1, 2012

Employer recruiting intensity

I was reading/skimming this paper by Davis et al. and in the abstract, they write:

"This paper is the first to study vacancies, hires, and vacancy yields at the establishment level in the Job Openings and Labor Turnover Survey, a large sample of U.S. employers. ... We show that (a) employers rely heavily on other instruments, in addition to vacancy numbers, as they vary hires, (b) the hiring technology exhibits strong increasing returns to vacancies at the establishment level, or both. We also develop evidence that effective recruiting intensity per vacancy varies over time, accounting for about 35% of movements in aggregate hires."

In a nutshell, they document that recruiting intensity varies across time and that this variation has a big effect on the number of aggregate hires. What's interesting is that the labor literature tends to focus on search intensity by workers, with firm search intensity comparatively understudied, but this paper suggests that ignoring employer efforts is likely to give a (very) incomplete impression. My guess is that this bias in the literature comes from the comparative lack of employer data on matching, though JOLTS (which this paper uses) is ameliorating the problem.

On oDesk, we've got excellent visibility on employer recruiting. Below is the "so what" plot from a recent experiment where we "recommended" contractors to employers (based on our analysis of what the job consisted of). The recommendations came immediately after the employer posted the job. We also made it easier for that employer to invite those recommended contractors to apply. The y-axis is the fraction of jobs where the employer made at least one invitation; treatment and control are side-by-side. We can see that regardless of category, the treatment was generally effective in increasing the number of invitations. But I think the striking thing is how much variation there is in "levels" of recruiting by category: in the control admin group, less than 10% of employers recruited, while in sales, it's almost 25%.

Presumably the difference depends on a number of factors: how many applicants the job will get organically, how close a substitutes are the different applicants, the value to the firm of filling the vacancy to the firm and so on. It also clearly matters how easy it is to search/recruit, given the effectiveness of our pretty lightweight intervention. From a welfare standpoint, this last point about the role of search/recruiting cost is potentially interesting, as reducing employer search frictions/costs technologically is, at least in online labor markets, a highly scalable proposition.