Online Labor

Would a job by any other name pay as much?

2013-11-30T07:45:00.004-08:00

I'm working on a project where it would be useful to know what an oDesk job is likely to pay at the time it is posted. Although there are plenty of structured predictors available (e.g., the category, skills, estimated duration etc.), presumably the job description and the job title contain lots of wage-relevant information. The title in particular is likely to identify the main skill needed, the task to be done and perhaps the quality of the person the would-be employer is looking for (e.g., "beginner", or "senior").

Unfortunately, I haven't done any natural language processing before, so I'm a bit out of my element. However, there are good tutorials online as well as R packages that can guide you through the rough parts. I thought writing up my explorations might be useful to others that want to get started with this approach. A gist of the code I wrote is available here.

What I did:

1) I took 20K recent hourly oDesk jobs that where the freelancer worked at least 5 hours. I calculated the log wage over the course of the contract. Incidentally, oDesk wages---like real wages---are pretty well approximated by a normal distribution.

2) I used the RTextTools package to create a document term matrix from the job titles (this is just a matrix of 1 & 0 where the rows are jobs and the columns are relatively frequent words that are not common English words---if the job title contained that word, it gets a 1, otherwise a 0).

3) I fit a linear model using the lasso for regularization (using the glmnet package). I used cross validation to select the best lambda. A linear model probably isn't ideal for this, but at least it gives nicely interpretable coefficients.

So, how does it do? Here are a sample of the coefficients that didn't get set to zero by the lasso, ordered by magnitude (point sizes are scaled by the log number of times that word appears in the 10K training sample):

The coefficients can be interpreted as % changes from the mean wage in the sample when that corresponding word (or word fragment) is present in the title. Nothing too surprising I think: at the extremes, SEO is a very low paying job, whereas developing true applications is high paying.

In terms of out of sample prediction, the R-squared was a little over 0.30. I'll have to see how much of an improvement can be obtained from using some of the structured data available, but explaining 30% of the variation just using the titles is a higher than I would have expected before fitting the model.

Economics for skeptical social scientists

2013-09-22T11:46:00.000-07:00

I recently gave a talk at the "Training school on Virtual Work," which was held at the University of Malta. The participants were mostly graduate students and junior faculty at European universities studying some aspect of virtual work e.g., Wikipedia editors, gold farmers, Current TV contributors, MTurk workers etc. Most were coming from very different methodological background than my own and the people I usually work with---sociology, anthropology, media studies, gender studies etc. I think it is fair to say that most participants have a fairly dim view of economics.

One of the organizers felt that few participants would have encountered the economic perspective on online work. I was asked to present a kind of non straw man version of economics and present the basic tools for how economists think about labor markets. Below is the result---a kind of apologia for economics, combined with a smattering of basic labor economics. I'm not the best judge obviously, but I think it was reasonably well received.

Economics and Online Work (a slightly misleading title though - see description) from John Horton

PS - I should write more about the school later, but one of the main take-aways for me was how (a) pervasive the acceptance of the labor theory of value was among participants and (b) how this leads to very different conclusions about almost everything that matters with respect to online work. It would be interesting to try to analyze a couple of different online work phenomena using the LTV and the marginalist approach to value.

Impressions from a visit to a large call center

2013-09-22T02:35:00.003-07:00

Paul Krugman has a piece of research advice, which is to "listen to the gentiles." What he means is to pay attention to what smart practitioners say about their business in order to get economic ideas and insights. I recently had the chance to take a tour of large call center for a major global financial institution. It was interesting throughout and I thought I would share some of my notes.

Recruiting and Training
The company uses a tiered screening approach. They start with an online test. Some proceed to phone interviews and the finally round is in-person interviews. The HR director felt that the simulated work environment test was primary predictor of future job success---this matches up quite well with the industrial psychology literature on employer screening. He also felt that the best predictor of retention was how comfortable a worker seemed up front with the demands of shift-work.

The company made extensive use of their existing employees to recruit new ones. Referrals were highly valued because they were more likely to bring in candidates who understood the reality of shift-based call center work (and thus were less likely to turn over). It seemed to be less about bonding or reducing formal recruitment costs.

Long company-specific training period, but no general training (as Becker would have predicted). However, there are several other call centers in this region and the company does lose employees to them. The company-specific training period was surprisingly long (on the order of 3 months) and was conducted by more senior employees during low call volume periods.

Compensation

The company acts like a price taker with respect to wages. Compensation for new employees was determined by doing yearly market research into what competitors were paying. The company did have very high turn-over (though about in line with the industry), but there was no mention of raising wages as a solution. Their approach seems to be to wait for people to "sort out" of the job that find that they cannot handle shift work. I meant to ask about explicit performance incentives but didn't get a chance to. However, my impression was that rewards came through promotion and one-off bonuses rather than through relating payment to specific actions, despite performance being quite measurable.

They had surprisingly rich amenities. Although pay was not high, amenities were reminiscent of a Silicon Valley start-up: pleasant office, cheap and free food, free gym, concierge service etc. Some of these things seemed like amenities the firm could more cheaply offer than their competitors because of their larger size, since some were club goods. In other words, they could amortize a concierge over many more employees.

Operations

Customers are segmented by value and routed accordingly. The company is multi-national and has call centers in several locations, including Europe, Southwest Asia and East Asia. The company's clients are segmented based on value and routed to the call center that roughly corresponds to the skill level of the workers at the call center e.g., the best customers get the European call center and low-tier do not.

They are highly sophisticated at demand and supply management. Perhaps unsurprisingly, they are good at forecasting call volumes and staffing accordingly---with all of this done semi-automatically. They can adjust supply on the fly by calling off training, meetings etc. if demand spikes via building-wide announcements of status changes.

There was little evidence that much technologically-driven productivity improvement was on the horizon. Although the tasks are highly structured, there was no evidence that significant technology-driven productivity gains were on the horizon. All the big gains from automation already occurred many years ago (e.g., the ubiquitous "Press 1 for "Accounts"). There was no talk of Watson-like automation of responses to customer queries. The one technology they really wanted---and that would radically reduce their costs---was some easy way to verify customer identities over the phone. This alone would increase their productivity by about 20-30%.

There was little evidence that this would could be easily distributed. Most of the firm's workplace policies seemed to be driven by concerns about regulatory compliance and fear of losing sensitive customer information and required a great deal of monitoring and control. It is difficult to imagine a substantial chunk of this work being done by a geographically distributed workforce.

You Can Sometimes Trust Research Done on Mechanical Turk, But It Depends on the Research Question

2013-07-10T21:50:00.004-07:00

Dan Kahan has an interesting post on some of the validity problems with research conducted on Mechanical Turk (MTurk). I think I largely agree with his main point, which is that the evolution of the marketplace has been such that it's become less useful for conducting certain kinds of research. However, I do worry there's a potential baby/bathwater problem if researchers decide that "unrepresentative" or "experiment-savvy" means a useless subject pool (e.g., Andrew Gelman titled his blog post about Kahan's article "Don't Trust the Turk").

I haven't done MTurk research in several years, but the external validity issue raised by the blog post is something I thought about quite a bit when I was running experiments on the platform. I wrote a section about external validity in my ExpEcon paper with Richard Zeckhauser and Dave Rand). They key portion is excerpted below (the source code and data for that paper are available here):

Representativeness

People who choose to participate in social science experiments represent a small segment of the population. The same is true of people work online. Just as the university students who make up the subjects in most physical laboratory experiments are highly selected compared to the U.S. population, so too are subjects in online experiments, although along different demographic dimensions.

The demographics of MTurk are in flux, but surveys have found that U.S.-based workers are more likely to be younger and female, while non-U.S. workers are overwhelmingly from India and are more likely to be male (Ipeirotis, 2010). However, even if subjects "look like" some population of interest in terms of observable characteristics, some degree of self-selection of participation is unavoidable. As in the physical laboratory, and in almost all empirical social science, issues related to selection and "realism'" exist online, but these issues do not undermine the usefulness of such research (Falk, 2009).

Estimates of changes versus estimates of levels

Quantitative research in the social sciences generally takes one of two forms: it is either trying to estimate a level or a change. For "levels" research (for example, what is the infant mortality in the United States? Did the economy expand last quarter? How many people support candidate X?), only a representative sample can guarantee a credible answer. For example, if we disproportionately surveyed young people, we could not assess X's overall popularity.

For "changes" research (for example, does mercury cause autism? Do angry individuals take more risks? Do wage reductions reduce output?), the critical concern is the sign of the change's effect; the precise magnitude of the effect is often secondary. Once a phenomenon has been identified, "changes'" research might make “levels” research desirable to estimate magnitudes for the specific populations of interest. These two kinds of empirical research often use similar methods and even the same data sources, but one suffers greatly when subject pools are unrepresentative, the other much less so.

Laboratory investigations are particularly helpful in "changes" research that seeks to identify phenomena or to elucidate causal mechanisms. Before we even have a well-formed theory to test, we may want to run experiments simply to collect more data on phenomena. This kind of research requires an iterative process of generating hypotheses, testing them, examining the data and then discarding hypotheses. More tests then follow and so on. Because the search space is often large, numerous cycles are needed, which gives the online laboratory an advantage due to its low costs and speedy accretion of subjects.

Platforms can tax externalities and generate costly signals

2013-05-24T11:25:00.001-07:00

The word "URGENT" should cost at least $100 per usage.
— Merlin Mann (@hotdogsladies) May 24, 2013

One thing that's great about platforms is that socially efficient, signal-generating Pigovian taxation like the kind proposed in this tweet is not a joke---you can actually do things like this, which may be one of the great advantages of markets mediated by a powerful third party.

Country-Specific Minimum Wage Data, Courtesy of Wikipedia

2013-05-21T10:24:00.000-07:00

I was looking for some data on minimum wages in various countries and found that Wikipedia (perhaps unsurprisingly) has a very nice, well-annotated table. After downloading the data & cleaning it a bit (harder than it should be), I made several plots. There were too many countries for one plot, so I made one for each (approximate) quartile. At the end of the blog post is the R code I used for fetching the data & making the plots.

Fourth Quartile

NB: Some countries have exemption policies for worker or occupation characteristics, so for a more complete understanding, of say, why Australia appears to have a minimum wage more than 2x the US minimum wage, check the Wikipedia table.

Third Quartile

Second Quartile

First Quartile

Distribution of Minimum Wages

Below is a some R code for grabbing the table of country-specific minimum wages from Wikipedia.

The Indian blackouts & oDesk

2012-07-31T21:58:00.003-07:00

A nationwide blackout in India has left some 600 million people without electricity. Given that a large number of the contractors on oDesk are from India, I assumed that effects of the blackout would show up readily in the oDesk data. This evening, I wrote a query to get the hours worked each day by Indian contractors during the last month and the number of applications sent. I divided these counts by the respective totals for that day for all of oDesk. From this time series, we can get a sense of what was supposed to happen today and compare it to what actually happened. The time series for applications (top) and hours worked (bottom) are plotted below [1], with today annotated in red. Each percentage estimate has a 95% confidence interval.

Some observations

There is a very easy to detect drop-off in the hours worked---my eyeball calculation says they should have been responsible for around 22% of the hours worked today, while the actual number is closer 17.5%. This is far less of a fall-off than we would naively predict from the "1/2 of Indians without power" headline. Presumably many contractors have access to private generators, or perhaps oDesk is over-represented in parts of the country that were less affected by the blackout.
There is no corresponding obvious drop-off in the fraction of applications. I don't have a good explanation for this, but perhaps non-affected Indian contractors have made up the difference and exploited the now-thinner market. If I can get some data on what parts of the country are actually being affected by the blackout, I could test this notion since I do have contractor locations down to the city level.
Indian contractors take weekends off, both in terms of working and job finding (or at least more so than their oDesk counter-parts from other countries). Remember that this time series is the fraction for a given day, so there's no reason for a strong weekend/weekday pattern. See oDesk Country Explorer for more of this kind of data.
Indian contractors are generally over-represented in the application pool, making up ~25% of applications but only about ~20% of hours worked, though this could easily reflect differences in the kinds of categories Indian contractors work in---there is a great deal of variance in the average number of applications per opening across the different job categories.

Code for the plots (done in ggplot2):

Digitization of the supply side of the labor market

2012-07-25T21:35:00.001-07:00

Note: This blog post also contains a short review of Google's new Consumer Surveys service. See the end of the blog post for details.

On most electronic commerce sites, information about the supply side is digitized and publicly available while information about the demand side is generally not: Amazon, Expedia, iTunes, Etsy etc., all collect and display detailed data about the items for sale, but there is generally little or no information about the consumers with the demands. If we look at the labor market, the reverse us true, in that it is the demand side that's digitized. On online job boards like CareerBuilder, Monster.com, Indeed, SimplyHired etc., vacancies are described via detailed textual descriptions about the nature of the work, skills required, location and approximate salary, but the job seekers---the sellers---generally do not create profiles that describe themselves to the marketplace.

While we might think that there are some fundamental reason for this difference, I don't think this is the case for the simple reason that in the case of labor markets, the supply side is being digitized, primarily though LinkedIn (in a big way) and through sites like oDesk (in a comparatively smaller, but more comprehensive way). On these sites, workers create permanent, searchable profiles for employers that containe rich, employment-relevant data about themselves.

With the rise of LinkedIn, we are witnessing an unprecedented, voluntary data collection and digitization of the supply side of the labor market. On LinkedIn, individuals can create public profiles and list their education, professional credentials, associations, skills, current and past work experiences and, critically, their other professional connections (indicated by approved links to other LinkedIn users). As of yesterday (July 24th, 2012), approximately 19% of the US-based Internet using population had a LinkedIn profile [* see note below for interesting background for this 19% figure]. According to LinkedIn, as of March 12, 2012, over 160 million people have created profiles, and in many industries, a LinkedIn profile is expected of all applicants. I talked recently to oDesk's corporate recruiter, asking her how many candidates had LinkedIn profiles. She responded:

I'd say it is close to 100% (and certainly 100% for viable candidates). I can't think of an example of someone who I have screened who didn't have a profile on LinkedIn.

I think this supply digitization is likely to prove consequential, because once the supply side of the labor market is digitized, platforms can begin making data-driven, highly contextualized recommendations to both sides of the market. The recommendations made by a platform can have the advantage of being potentially informed by the platform's holistic perspective on the marketplace. In computer-mediated marketplaces, by necessity essentially every piece of data that goes into or is generated by the marketplace is captured in an electronic database that could conceivably used to make recommendations.

Of course, job board do try to make recommendations by suggesting vacancies to workers, but they are limited to conditioning those recommendations on whatever search terms and perhaps geographic and/or salary constraints a job-seeker enters in a relatively brief search session. The platform cannot condition its recommendations on a worker's employment history, educational background, skills, current employment status, professional connections, certifications, personality, test scores and other match-relevant factors, nevermind try to balance recommendations to navigate the twin shoals of market thinness and market congestion.

Unfortunately, I think a lot of this work on recommendations will happen within companies in a state of semi-secrecy, but hopefully enough will be made public that others can contribute, ala the Netflix challenge. It's a little sad that to date society has expended more machine learning research effort trying to predict taste in moves rather than fit for jobs, despite the enormous welfare consequences of the labor market. However, I predict this will change and expect a lot more work on this topic from computer scientists and market designers in the coming years.

[*] The Origin of "19% of the US Population has a LinkedIn profile" Number

In writing this blog post, I wanted to get an accurate number for what fraction of the US population has a LinkedIn profile. This number was proving hard to come by, so I decided to try a relatively new service launched by Google called Google Consumer Surveys. For 10 cents an answer, you can pose questions to a supposedly representative sample of US-based Internet users. You also get some of the respondent's basic demographics, such as inferred age, gender and income. I launched a one question survey and got 1511 responses in less than a day. The screenshot below shows the main results, but it also includes some neat tools for looking at the data in different ways. I made the survey public---check it out here. I'm quite pleased with the service and plan to use it again.

Shrimponomics, Complements & BPOs

2012-07-05T09:33:00.003-07:00

Most relevant image available from doing a
Google Image search for "Shrimp using a computer"

A few years ago, there was a Freakonomics post about how people reason about economic situations and phenomena. The phenomenon in question was shrimp consumption: the amount of shrimp people eat in the US per capita tripled between 1982 and 2007. When asked to explain this rise, non-economists mainly give demand reasons (changes in preferences), while economists are more likely to also give supply reasons (improved fishing efficiency, rise of aquaculture etc.).

If I had to offer an explanation for this focus on demand explanations, my guess it that demand explanations come more easily to us because it is the side of the market that is more familiar to us : most of us have eaten shrimp & bought shrimp---very few of us have worked in commercial fishing. So when asked "why are people consuming more shrimp?" we start with "why might I consume more shrimp?" and although price is certainly a reason (and a path of thought that would help lead to a demand explanation), it's not as salient or even as interesting as things like changing tastes, health trends, exciting new shrimp-based dishes etc.

So this blog post isn't about shrimp and it isn't about supply & demand. It's about complements and substitutes. I think there is a similar psychological tendency to focus on goods-as-substitutes than goods-as-complements. At the individual level where we are making choices, we are usually thinking in terms of substitutes: do I want coffee or tea? Should I take a vacation to Las Vegas or Hawaii? Mac or PC? It's a bit more subtle to think about "if I had X, would it make Y more useful to me" which is at the heart of all complementarity stories.

This is a long-winded introduction to my real topic, which is that in my last blog post, I made the argument that online work could disrupt the BPO industry by serving as a substitute for what BPOs offer. A point I didn't think of---but in retrospect seems pretty obvious---is how just as easily complementarity could be the dominate effect. After my blog post, my CTO at oDesk, Odysseas, emailed me with his thoughts:

The primary benefit of BPOs is not that of labor cost arbitrage. Thats typically the motive/benefit for offshore staff augmentation firms - but BPOs are business process outsourcers. BPO is ADP [Automated Data Processing] that outsources your payroll or a business that outsource your HR process etc... We often tend to think of BPOs as an offshore firm that does a little bit of everything having as sole pivot point its lower cost of labor - thats true, but its an abuse of the term and I would agree there that the particular type of business is going to be affected in the years to come from online labor.

This part is basically my substitutes story---now the complements part:

However, the more interesting effect would be the effect of online labor to the real BPOs..

There BPOs will not be negatively affected - the opposite. The availability of online labor would allow BPOs to become more flexible lower their overall fixed costs force them to become more automated and streamline (their virtual nature will require that), allowing them to lower even the cost per customer, allowing them to focus on smaller projects, smaller customers allowing to address smaller/different market segments. They will become less relying on an enterprise sales force customer acquisition model which is dramatically affecting their cost structure.

We are seing examples of what the new BPOs will become in companies that outsource the process of testing (uTest) of seo writing (Mediapiston) etc.

He's of course exactly right---and he's a CS PhD, not an economist, so shame on me :). If you think of true BPOs in the sense that Odysseas is talking about, then the complementarity story becomes more important. These true BPOs would be big buyers in the inputs market and would benefit greatly from a liquid, efficient market for labor.

Will online labor markets disrupt the traditional BPO firm?

2012-06-26T21:02:00.002-07:00

Today I spoke on a panel on something called "impact sourcing" at the BPO World Forum. The idea of impact sourcing, in a nutshell, is that online work is a tool for development and that for-profit firms outsourcing some part of their business should look beyond traditional BPO firms and consider non-profits like Samasource and Digital Divide Data. It was a good audience for this pitch, as many of attendees were CIOs from big companies that are accustomed to signing multi-million dollar IT outsourcing deals with the likes of traditional BPO firms like Wipro, Infosys, Tata Consultancy etc.

After the panel, I was at a reception where I talked to someone fairly high up in a traditional BPO. When I described my elevator pitch version of oDesk's business---clients post jobs, contractors make bids, clients make a hire, we intermediate the work and take a percentage---he said, literally "what are you doing here at this conference? You guys are like the Antichrist." What he meant (in a half joking, half serious way) is that oDesk and similar companies threaten the model of the BPO.

My perception is that the traditional BPO model is possible because of two facts: (a) the enormous, purely placed-based differences in wages and (b) the difficulty of actually arbitraging those differences without help. BPOs stand ready to help companies reap the benefits of (a) by giving the help necessitated by (b). The word is still very far away from (a) no longer being true, but if oDesk and similar companies can radically lower the barriers to arbitraging differences by making it easy to hire, manage and pay workers regardless of geography, then (b) starts to become less true. If we get to the point where the qualitative differences of online remote and in-person work diminish and assessing and hiring workers is simple and easy, it would obviate the need for much of what the BPO firm is selling.

This is not to say that there isn't still a huge space for IT consulting---outsourcing an entire process is hard and BPOs with lots of experience have something very valuable to offer. Furthermore, besides purely cost level, one of the motivations for business process outsourcing is ability to change cost structure, namely by turning a fixed cost into a variable cost. But these caveats aside, on the margin, the mediation aspect of the BPO role seems likely to get less attractive over time as technology improves and online labor markets mature.

Resources for online social science

2012-06-06T21:16:00.001-07:00

The Economist recently had an article about the growing use of online labor markets as subject pools in psychology research; ReadWriteWeb wrote a follow-up. If you've been following this topic, there wasn't very much new, but if you're a researcher that would like to use these methods, the articles were pretty light on useful links. This blog post is an attempt to point out some of the resources/papers available. This is my own very biased, probably idiosyncratic view of the resources, so hopefully people will send me corrections/additions and I can update this post.

To start, let's have this medium pay tribute to itself by running through some blogs and their creators.

Blogs

There is the "Follow the Crowd" blog which I believe is associated with HCOMP conference. It's definitely more CS than Social Science, but I think it's filled with good examples of high-quality research done w/ MTurk and with other markets.
There's Gabriel Paolacci's (now at Erasmus University) "Experimental Turk" blog which was mentioned in the article and is probably the best resource for examples of social and psychological science research being done with MTurk.
Panos Ipeirotis (at NYU and who is now academic-in-residence at oDesk) has a great blog "Behind-enemy-lines" that's basically all things relating to online work
The defunct "Deneme blog" by Greg Little (who also works at oDesk) and Lydia Chilton (at University of Washington).

Guides / How-To (Academic Papers)

A number of researchers have written guides to using MTurk for research. I think the first stop for social scientists should be the paper by Jesse Chandler, Gabriel Paolacci and Panos Ipeirotis:

Chandler, J. Paolacci, G. and Iperiotis, I. Running Experiments on Mechanical Turk,
Judgement and Decision Making (paper) (bibtex)

My own contribution is a paper with Dave Rand (who will still be starting as new assistant professor at Yale) and Richard Zeckhauser (at Harvard). The paper paper contains a few replication studies, but the real meat and the part I think is most important is the part discussing precisely why/how you can do valid causal inference online (I'm stealing this write-up/links of the paper from Dave's website):

Horton JJ, Rand DG, Zeckhauser RJ. (2011) The Online Laboratory: Conducting Experiments in a Real Labor Market. Experimental Economics. 14 399-425. (PDF) (bibtex)

Press: NPR's Morning Edition Marketplace [audio], The Atlantic, Berkman Luncheon Series [video], National Affairs, Crowdflower, Marginal Revolution, Experimental Turk, My Heart's in Accra, Joho blog, Veracities blog

Software

Unfortunately there hasn't been too much sharing of software for doing online experiments. Since a lot of the experimentation is done by computer scientists who do not feel daunted by making their own one-off, ad hoc applications, there are a lot of one-off, ad hoc applications. Hopefully people know of other tools that are out there that the can open source / they can share links to.

"Randomizer"

Basically, it lets you provide subjects one link that will automatically redirect them (at random) to a collection of URLs you've specified.I made the first really crummy version of this and then got a real developer to re-do it so it runs on Google App Engine.

"QuickLime"
This is a tool for quickly setting up an Limesurvey (an open source alternative to Qualtrics & Surveymonkey) on a new EC2 machine. This was made courtesy of oDesk research. I haven't fully tested it yet, so as with all this software, caveat oeconomus.

"oDesk APIs"
There haven't been lot of experiments done on oDesk by social scientists, but there's no reason it cannot be done. While it currently is not as convenient or as low-cost as doing experiments on MTurk, I think long-term oDesk workers would make a better subject pool since you can more carefully control experiments, it's easier to get everyone online at the same time to participate in an experiment, there are no spammers etc. If you're looking for some ideas or pointers, feel free to email me.

"Boto"
This is a python toolkit for working with Amazon Web Services (AWS). It's fantastic and saved me a lot of time when I was doing lots of MTurk experiments.

"Seaweed"
This was Lydia Chilton's masters thesis. The idea was to create tools for conducing economics experiments online. I don't think it ever moved beyond the beta stage, but if you (a) have some grant money and (b) are thinking about porting z-tree to the web, you should email Lydia and see where the codebase is & if anyone is working on it.

Here's a little javascript snippet I wrote for doing randomization within the page of an MTurk task.

People

I'm not doing to try to do a who-is-who of Crowdsourcing, but if you're looking for some contacts of other people (particularly those in CS) who are doing work in this field, you can check out the list of recent participants at "CrowdCamp" which was a workshop prior to HCI.

History

Probably the first paper I'm aware of that pointed out that experiments (i.e., user studies) were possible on MTurk was by Ed Chi, Niki Kittur and Bongwon Suh. As far as I know, the first social science done on MTurk was Duncan Watts and Winter Mason's paper on financial incentives and the performance of crowds.

The Innovation of StackOverflow

2012-06-01T23:35:00.001-07:00

So as I write this, there is an egg timer ticking away next to me, set with 10 minutes of time. What am I waiting for? 10 minutes is how much time I predicted it would take to get my programming question answered on StackOverflow (SO):

http://stackoverflow.com/questions/10860020/output-a-vector-in-r-in-the-same-format-used-for-inputting-it-into-r

The back story was that I was writing some R code and I got to a point where I was stuck: there was something I wanted to do and I remembered that there was a built-in function that could accomplish my goal. Unfortunately, I couldn't remember that function's name. After some fruitless googling, I posted the question on SO.

So, how long did it actually take to get the right answer? About 6 1/2 minutes. As I write this sentence, I'm waiting for some more time to elapse so I can actually approve the answer:

This has been my general experience with SO---amazingly high-quality answers delivered almost immediately. I feel sheepish that I haven't been able to answer as many questions as I've asked, but one of the animating ideas of the community is that asking high-quality, answerable questions is a way of contributing.

What's interesting to me is that SO is an example of a primarily social---as opposed to technological---innovation. There's nothing really technically innovative about SO: the site is fast, search works well, tagging works well etc., but lots of sites have those things. What's special about SO is that through a carefully designed system of incentives and policies, they have created a community that is literally---and I think profoundly---changing how people program computers.

The reason I point about the social nature of the innovation is that it's become popular to lament the shallowness or perceived frivolity of many start-ups that are built around social rather than technological innovations (e.g., Facebook, Twitter, Instagram etc.). The idea seems to be that if you aren't making solar panels or cancer-curing drugs, you're not doing something socially useful. I personally don't share that bias, but if we are going to judge companies on the basis of some more "serious" metric like productivity or social surplus, then SO is a great example how a purely social innovation can succeed spectacularly on those metrics.

Data openness by private firms

2012-05-22T10:40:00.000-07:00

The New York Times has a story today about social scientists working with company data and being unable or unwilling to make it public. The story begins:

When scientists publish their research, they also make the underlying data available so the results can be verified by other scientists.

I think the first sentence is probably more a description of how we'd like the world to be than how it actually is right now, especially in the social sciences. The main so-what of the story is that private companies are collecting enormous amounts of high quality data that lets you do fascinating social science, but companies are understandably reluctant to make this data public, primarily for privacy reasons (and probably also because they are afraid of giving up some competitive advantage).

I think the options for any organization that does or might do research are:

1) Do research for business purposes. Make neither the findings nor the data public.
2) Do research for business purposes. Make the findings but not the full data public.
3) Do research for business purposes. Make the findings and data public.
4) Do research. Make findings and data public.

Most companies probably aren't interested in (4) and this is probably academia's biggest comparative advantage. Barring (4), I think from a social perspective, privacy issues aside, the best outcomes in order are (3) > (2) > (1). I can understand (1) in some cases, but at least in the kind of companies I'm familiar with, the advantages of keeping everything secret probably aren't that great.

The advantages of (2) or (3) over (1):

a) If you're a software company and you release a feature that works, it will probably get copied anyway, regardless of whether you publish a paper, so you might as well get the thought leadership credit for coming up with the idea in the first place. This paper is/was the basis for Google's secret sauce---posting it to the InfoLab servers back in 1999 didn't doom the company and probably did a lot to increase the perceptions that they were doing something smarter (even though there were antecedents of this idea going back many years---including in Economics, by my academic grandfather).

b) If you give them access and them publish, you can get outside academics to work on your problems for free (the Netflix prize is an obvious example). You can recruit those academics to come work for you, or at least get their grad students to come work for you.

c) If you let your internal researchers publish, you can get them to work at reduced cost or get researchers you otherwise wouldn't be able to attract (see Scott Stern's paper on scientists "paying" to do science).

On (2) versus (3), I think there is a real dilemma: openness and privacy concerns are in tension. Furthermore, just releasing more aggregated or somehow obfuscated versions of the data is not risk free: there's actually an emerging literature in Computer Science on how to release data in ways that are guaranteed to still have the right privacy properties (~~CMU~~ UPenn professor Aaron Roth recently taught a course on the topic). The fact that smart people are working on it is exciting, since they might figure out provably risk-free ways to release data publicly, but it's also evidence that this isn't a trivially easy problem---seemingly innocuous data disclosures would let someone unravel the obfuscation.

As a coda, I have a personal anecdote to share about this story. One of the people discussed in the article is Bernardo Huberman:

The chairman of the conference panel — Bernardo A. Huberman, a physicist who directs the social computing group at HP Labs here — responded angrily. In the future, he said, the conference should not accept papers from authors who did not make their data public. He was greeted by applause from the audience.

When I was a grad student, I taught a course to Harvard sophomore economics majors called "Online Labor" (syllabus). I assigned some of Huberman's papers on motivation. I emailed him to ask for the data from one of his papers. He wrote back:

Dear Dr. Horton:
Thank you for your interest in my work and I certainly feel pleased when I learn that you liked my paper enough to assign it to your class.
As to your request, let me talk with the person who now handles the youtube data (we lately used it to uncover the persistence paradox) and I'll get back to you.
Incidentally if you are interested in the role that attention and status (its marker) play among people I could send you a paper that reports on a experiment (as opposed to observational data) that elucidates it quite cleanly across cultures.
Best,
Bernardo

I got the data within days---I can state that he privately practices what he preaches publicly.

Update: I incorrectly stated that Aaron Roth was a professor at CMU---he did his PhD at CMU. He's a professor at UPenn. Apologies.

Location of India-Based Contractors on oDesk

2012-03-06T13:38:00.001-08:00

My favorite R package, ggplot2, recently introduced enhanced support for choropleth maps. I'd like to make some of these kinds of maps with oDesk data, but as a first step, I thought I'd just plot the locations of all of our India-based contractors by city. In the plot below, dots size as log-scaled by # of contractors reporting that city. The massive light blue dot near Delhi is default coordinate when we're missing the city.

For those of you who know India, anything surprising/interesting here?

Here's the associated R code to make this figure:

Economics of the Cold Start Problem in Talent Discovery

2012-02-21T09:36:00.001-08:00

Tyler Cowen recently highlighted this paper by Marko Terviö as an explanation for labor shortages in certain areas of IT. The gist of the model is that in hiring novices, firms cannot fully recoup their hiring costs if the novices' true talents will become common knowledge post-hire. It's a great paper, but what people might not know is that the theory it proposes has been tested and found to perform very well. For her job market paper, Mandy Pallais conducted a large experiment on oDesk where she essentially played the role of the talent-revealing firm.

Here's the abstract from her paper:

... I formalize this intuition in a model of the labor market in which positive hiring costs and publicly observable output lead to inefficiently low novice hiring. I test the models relevance in an online labor market by hiring 952 workers at random from an applicant pool of 3,767 for a 10-hour data entry job. In this market, worker performance is publicly observable. Consistent with the models prediction, novice workers hired at random obtain significantly more employment and have higher earnings than the control group, following the initial hiring spell. A second treatment confirms that this causal effect is likely explained by information revelation rather than skills acquisition. Providing the market with more detailed information about the performance of a subset of the randomly-hired workers raised earnings of high productivity workers and decreased earnings of low-productivity workers.

In a nutshell, as a worker, you can't get hired unless you have feedback, and you can't get feedback unless you've been hired. This "cold start" problem is one of the key challenges of online labor markets, where there are far fewer signals about a worker's ability and less common knowledge about what different signals even mean (quick: what's the MIT of Romania?). I would argue that scalable talent discovery and revelation is the most important applied problem in online labor/crowdsourcing.

Although acute in online labor markets, the problem of talent discovery and revelation is no cake walk in traditional markets. Not surprisingly, several new start-ups (e.g., smarterer and gild) are focusing on scalable skill assessment, and there is excitement in the tech community about using talent revealing sites like StackOverflow and Github as replacements for traditional resumes. It is not hard to imagine these low-cost tools or their future incarnations being paired with scalable tools to create human capital, like the automated training programs and courses offered by Udacity, Kahn Academy, codeacademy and MITx. Taken together, they could create a kind of substitute for the combined training/signaling role that traditional higher education plays today.

Like what you read?
Why not follow me on twitter or subscribe to this blog via RSS?

Solvate joins the deadpool

2012-02-20T10:34:00.000-08:00

Techcrunch and Betabeat are reporting that Solvate, a platform for remote work, is shutting down. Unlike oDesk, Elance, Freelancer etc., they were not trying to create a true marketplace: they were trying to do more of a high-touch, human-in-the-loop matching service.

In the email Solvate sent to their users about the shutdown, they explicitly cited scalability issues, which I'm guessing refers to the non-sustainable effort and cost of hand-matching buyers and sellers. I wouldn't say this is definitive proof that the high-touch matching business model doesn't work (my outsider impression is that GLG is killing it), but it is a reminder that the value-added from your human-in-the-loop matching has to be sufficiency high that you can re-coup your costs: you can't take a hit on every unit sold but make it up on volume.

I think it's too bad they are shutting down---I would have liked to see how their approach to online labor would have evolved. That being said, I personally found their emphasis (at least in their marketing copy) on US-based workers off-putting. Solvate's CEO was quoted extensively in a Gigaom article, in which he claimed that online labor markets were undermining US workers. He also suggested that by relying only on US-based workers, Solvate could promise a higher level of talent and expertise. All online labor markets have to find ways to help workers credibly demonstrate their talents, and using crude geography-based proxies for talent is an approach, but not a particularly admirable one. To me, the whole ethical/moral "so what" of online work is that geography and nationality doesn't have to matter.

A a coda, here is my response to the original Gigaom article:

Full disclosure: I’m the staff economist at oDesk and these opinions represent my own views.

A couple of thoughts:

Like any competitive market, the forces of supply and demand are going to determine prices in these online markets. With the opening up of new countries that have large, reasonably well-educated, internet savvy populations, supply increases which will tend to drive down wages. On the other hand, these markets (and the ability to break work up into small, outsourceable bits) also make it possible to outsource more work, increasing demand, and hence prices.
At least within oDesk, we haven’t seen strong trends in wages, though presumably this article is talking about freelancers in general and we obviously don’t have visibility on their wages.
As a practical matter, I don’t think workers in developed countries like the US can’t compete in these markets—they actually have a lot of advantages: perfect english, same time-zone, familiarity with US business culture/expectations etc. Further, price matters, but it’s not the only thing. For what it’s worth, I work with many oDesk contractors and the break-down is 1 x US, 1 x Italy, 1 x Russia, 1 x Pakistan and 2 x Philippines.
The efficiency and distributional effects of information and communications technology are complex and the evidence is ambiguous, so I’d be skeptical of anyone offering a definite answer to these kinds of questions. There was an interesting Quora thread on this topic.
I think focusing on what these markets do for relatively well-paid workers in developed countries misses one of the most important moral facts about these markets, which is that they generate new, relatively well-paid, meaningful work opportunities for people in developing countries. It’s obviously not a random sample of our workers, but If you spend a few minutes on oDesk’s Facebook fanpage and look at the comments and stories, it’s clear that online work is improving lives in a pretty dramatic way.

High-wage skills on oDesk (or why you might want to learn Clojure if you're not a lawyer)

2012-02-18T09:59:00.000-08:00

Update: Hello HackerNews readers. One thing that I discussed but probably didn't emphasize enough is that this data show the correlation between listed skills and offered wages---you absolutely cannot infer a causal relationship (my cheeky title notwithstanding). Unless I get to create and run a massive skills training program experiment, it's going to be hard to get at causality. But I can do something about the offered/earned distinction. If you don't want to miss my follow-on blog post where I explore the relationship between skills and actual earned wages from actual projects, follow me on twitter.

oDesk recently introduced a controlled, centralized vocabulary of about 1,400 skills for buyers and contractors to use when posting jobs and creating profiles. The primary motivation for the change was to make it easier for buyers and sellers to find each other: without a standardized vocabulary, would-be traders can fail to match simply because they use different terms for the same skill.

A side effect of this transition is that high quality data on the relationships between skills and wages are now available. I recently built a dataset of contractors' hourly wages by skill: for each skill, I identified all contractors listing that skill on their profiles and averaged their offered hourly wages. Although contractors are free to offer any hourly wage they like, in my experience, wages offered closely map to actual earnings. However, to reduce the influence of outliers, I restricted the sample to contractors offering between 50 cents and 100 dollars per hour. I also only included skills for which there were 30 or more observations.

In the bar chart below (made using the very cool googleVis package for R), I plotted the top 50 skills, ordered by average hourly wage (here is a "live" version with mouse-over). The top of the list is dominated by high-end consulting areas (e.g., patents and venture capital consulting) or hot newer technologies (e.g., redis and Amazon RDS). The programming language that commands the highest wage is Clojure, which is a rather esoteric skill: it's a lisp dialect that compiles to the Java Virtual Machine (JVM). Perhaps this is the market reflecting Paul Graham's "Python Paradox":

"if a company chooses to write its software in a comparatively esoteric language, they'll be able to hire better programmers, because they'll attract only those who cared enough to learn it. And for programmers the paradox is even more pronounced: the language to learn, if you want to get a good job, is a language that people don't learn merely to get a job."

At the time Graham wrote this, Python was a far less mainstream language, probably analogous to how Clojure is regarded today. It's an interesting pattern, and although they'd cut up my economist membership card if I made a causal claim between knowing Clojure and being able to command hire wages, I'm intrigued by the idea of using online labor markets as a bellwether to help guide human capital choices.

Why aren't we all freelancers?

2012-02-16T09:35:00.000-08:00

Investors typically hold diverse portfolios of assets, with the goal of reducing risk. While diversification is commonplace in investing, most of us have no diversification in our labor income streams: we work at one job at a time, for a single employer. However, the "returns" to a job vary like returns on investments, especially on non-financial dimensions (e.g., engagement, learning, co-workers, working conditions). As in investing, there is also a significant amount of direct financial risk in holding one job---the firm may impose layoffs or go out of business. Given the similarities between jobs and assets, why isn't there a similar impetus to diversify, i.e., why don't we all hold a portfolio of small jobs at the same time, with many different employers [0]?

Some workers---freelancers and independent consultants---do follow this diversified model, but it's hardly the norm of workers generally. Below, I lay out a laundry list of potential economic explanations for why the portfolio/freelancing approach is not more common. What's interesting to me both academically and as someone working at oDesk is that many of these points are not set-in-stone attributes of the productive process but are instead things that smart features or policies might change.

Non-linearity in costs of searching/vetting/bargaining
Hiring a freelancer for a small project is like picking out a fancy restaurant; hiring a full time employee is more like buying a house. The effort of searching and vetting (and thus the cost) is related to the stakes of the hire. However, there is no guarantee that those costs scale linearly with the stakes. Suppose it takes nearly as much effort to find a small job as it does to find a large job---then a portfolio approach will generate larger search costs per dollar earned in wages [1].

Non-linearity in job size and productivity
If you can make X widgets or Y schwidgets in 1 hour, it doesn't mean you can make X/2 widgets and Y/2 schwidgets in 1 hour. Every job has some fixed set-up costs---getting out the materials, remembering the key details, etc. The larger the costs, the less attractive the small job. On the other hand, productivity eventually wanes from boredom, physical fatigue, etc. ("I'm really getting bored with this TPS report---time for some Facebook"). The optimal size job (from a productivity standpoint) might be near or above the current 40 hours per week, 50 weeks a year paradigm, in which case going smaller means getting less efficient.

Complementarities with team members that grow over time
One of the advantages of team production is that workers can share knowledge with each other, motivate each other and generally create an environment where everyone is more productive than they would be working alone. There's no reason teams of freelancers working together cannot achieve the same complementarities with each other, but if these complementarities take time to develop, larger jobs become more attractive.

Firm-specific human capital

If a job requires lots of firm-specific human capital, the per-job learning requirement is high, which tends to encourage larger jobs [2].

Monitoring & policing costs
Once you get a sense of the character and reputation of some trading partner, you don't need to constantly monitor that person/firm. After some level of trust has been established, these costs would fall. This again pushes for larger jobs. This is probably clearer in terms of firms monitoring workers, since the big fear is shirking, but it does go both ways: workers need to make sure their checks don't bounce, that their employers aren't skimming from the 401K, using malk for the coffee service instead of milk, etc.

Employer concerns about IP (broadly defined)
I do not think it is likely to find workers working simultaneously for direct competitors [3], the interests of most firms are fairly orthogonal to each other.

Existing public policy
At least in the US, at the present time, certain realities (health insurance, getting financial credit etc.) are full-time employee advantaged.

[0] Note that this isn't a theory of the firm argument or discussion. I'm assuming that one can be a full employee and reap all the benefits of firm organization / team production even with fractional employment.

[1] One of the reasons mechanical turk is semi-dysfunctional is that when problems arise (about the scope of work, payment terms etc.), all the surplus generated by the relationship is quickly destroyed: one minute thinking, talking and haggling about a task paying pennies is likely to be economically wasteful. This was one motivation for hagglebot.

[2] I think this is why ideal use of online labor is not so much a 1 for 1 replacement of some traditional job, but a decomposition of jobs into easily outsource-able pieces and pieces that require deep firm-specific knowledge.

[3] McKinsey excepted.

Writing Smell Detector (WSD) - a tool for finding problematic writing

2012-02-07T22:00:00.000-08:00

tl;dr version: WSD is a python tool to help find problems in your writing. Here's the source and here's example output.

In grad school, I wrote a program that used a series of regular expressions to detect "writing smell" (analogous to code smell), i.e., telltale signs of bad writing and mistakes. The rules for smelliness were loosely based on one of my favorite writing how-to's: Style: Toward Clarity and Grace by Joseph Williams.

The program took as input a text file and output was an annotated report with snippets of the offending bits. I used it for all my papers and found it really helpful, but the coding was very, um, academic (i.e., written for use by the person who wrote it) and it was written in Mathematica [1], which was the language I knew best at the time. FWIW, here is my original version.

For a long time, I've wanted to port it to some other language and make it accessible and capable of receiving new rule contributions and explanations. To this end, I recently commissioned an oDesk contractor (utapyngo) to make a more polished, modular version in Python. I think he totally outdid himself. It's got a nice modular model now that lets you easily incorporate new rules and he greatly improved upon my often-flawed regular expressions. Be forewarned---the documentation is non-existent and the rules aren't explained, but I plan to take fix this over time, while I'm using it.

It's open source (courtesy of oDesk, who paid the bills) and available here on github (live example output). To use it, just clone it, install the python package jinja2 and then do:

$ python wsd.py -o output_file.html your_masterpiece.tex

Here's a screenshot of what the HTML output looks like, illustrating the a/an rule (i.e., that it's "an ox" but "a cat"):

Note the statement of the rule, the patterns that it looks for and the snippets. It also has a hyperlink to the full text, which is available at the bottom of the document.

A few thoughts:

If you're interested in contributing (rules or features), let me know.
It might be nice to turn this into a web-service, though my instinct is that someone interested in algorithmically evaluating their LaTeX/structured text isn't going to find cloning the repository & then running a script to be a big obstacle. And they probably don't want to make their writing public.
A few weeks ago, I read this usethis profile of CS professor Matt Might. In the software section of the interview, he said that he had some shell scripts that do something similar. I haven't really investigated, but maybe there's ideas here worth incorporating.

[1] When I told the other members of the oDesk Research / Match Team that I had code for doing this writing smell thing, they were impressed and wanted a copy; when I told them it was written in Mathematica, they thought this was hilarious and mocked me for several minutes. I tried to explain that Mathematica actually has great tools for pattern matching, but this fell on deaf ears.

Minimum Viable Academic Research

2012-02-06T08:42:00.000-08:00

A non-viable product in minimum form, courtesy of Flickr

One of the most talked about ideas in the world of start-ups is the notion of the minimum viable product (MVP). The rationale for MVP is clear: you don't want to build products that customers don’t want, never mind waste time polishing and optimizing those unwanted products. "Minimally viable" doesn't even require the product to exist yet---the viability refers to whether it will give you the feedback you need to see if the project has potential. For example, you might do an A/B test where you buy keywords for some new feature, but then just have a landing page where people can enter their email address, thereby gauging interest. The important thing is that it is market feedback, not just opinions of people near you.

In academia, a big part of the the day to day work is getting feedback on ideas. Each new paper or project is like a product you’re thinking of making. So you float ideas with colleagues, your advisers, your spouse, etc., and you might present some preliminary ideas at a workshop or seminar. The problem is that in most workshops and seminars, where you could potentially get something close to a sample of what the research community will think of the final product, the feedback is usually friendly and limited to implementation (e.g., "How convincing is the answer you are providing to the question you've framed?"), instead of "market" feedback on how much "value" your work is creating.

The academia analogue to market feedback on value will come later, in two forms: (a) journal reviews / editor decisions and (b) citations. By value, I mean something like (importance of question) x (usefulness of your answer). At least in economics, knowing what is important is difficult. There is no Hilbert's list of big and obvious open questions. A few such questions do exist, but they tend to be sweeping in nature---e.g., "Why are some countries rich and some countries poor?" and "Why do vacancies and unemployed workers co-exist?"---that no single work can decisively answer. To do real research, you need to pick some important part of a question and work on that.

A fundamental problem is that the institutional framework in some disciplines (economics being one example, though not all---see this recent NYTimes op-ed on scientific works being too short; see here for an economist's take on the topic) requires you to do lots and lots of polishing before you know (via journal rejection/acceptance) whether even the most polished form of your work is going to score high enough on the importance-of-question measure. At seminars, people are usually too polite to say, "Why are you working on this?" or "Even if I believed your answer, I wouldn't care" or "So what?" But that's the kind of painful feedback that would be most useful at early stages. There are some academics that will give that kind of "Why are you doing this?" critique, and while they are notorious and induce fear in grad students, the world needs more of them. (I once gave a seminar talk where an audience member asked, "How does this study have any external validity?" And I had to admit he was right---it had none. I dropped the project shortly thereafter, after spending the better part of 3 months working on it.)

It's not that people won't be critical in seminars. You'll generally get lots of grief about your modeling assumptions, econometrics, framing etc. But those are easy critiques (and they let the critics show off a little). It's the more fundamental critiques about importance/significance that are both rare and useful. In academia, you really, really need the importance/significance critique because you can work on basically anything you want, literally for years, without anyone directly questioning your judgment and choices. And while this gives you tons of freedom and flexibility, you might waste significant fractions of your career on marginalia. I also don't think it's the case that if you're good, you'll simply know: I've heard from several super-star academics that their most cited paper is one they didn't think much of when they wrote it and Their favorite paper has languished in relative obscurity. One interpretation (beyond Summer's law) is that you aren't the best judge of what's important.

How does one get more importance-of-question feedback?

In economics, there's a tendency (need?) to write papers that are 60 page behemoths, filled with robustness checks, enormous literature reviews, extensive proofs that formalize somewhat obvious things, etc. This long, polished version really is the minimally viable version of the paper, in that you can't safely distribute more preliminary, less polished work (people might think you don't know the difference). I think on the whole, this is probably a good thing. But it's often not the minimally viable version of an idea. Often the "so what" of a paper is summarized by the abstract, a blog post, a single regression, etc.

I'm not sure what the solution is, but one intriguing bit of advice I recently received from a very successful (albeit non-traditional) researcher was to essentially live-blog my research. There's actually very little chance of being "scooped”; if anything, being public about what you're doing is likely to deter others. And, because it's "just" a blog post, you nullify the "they don't know the difference between polished and unpolished work." The flip side is that I think there's a kind of folk wisdom in academia that blogging pre-tenure is a bad idea (I imagine the advice is even stronger for a grad student pre-job market). But if you were doing it for MVP reasons / feedback reasons, the slight reputation hit you'd take might be offset by the superior "so what" feedback you might get from doing such a thing. Anyway, still thinking about this strategy.*

* Beyond the purely professional strategic concerns, it might actually move science along a little faster and make research a bit more democratic and open.

Stereotypes about animals (and children) as revealed by Google auto-suggest

2012-02-04T13:07:00.000-08:00

I saw this tweet by @m_sendhil, which had a screenshot of Google's auto-suggest for "why are indians so," which contained a collection of (often contradictory) stereotypes (e.g., fat and skinny). I began doing the same exercise for other nationalities and ethnic groups, products, animals etc.

Here is the screenshot for turtles (which apparently have lots of fans):

It was interesting to me how many of the supposed attributes showed up repeatedly across entities. This gave me an idea: I should turn this procrastination/time-wasting into something more useful, which was to learn how to make graph/network plots with the python package networkx (code below). Here is the result, using the top 4 auto-suggests for cats, children, cows, dogs, frogs, goldfish, hamsters, mice, turtles and pigs. Entities are in blue, attributes in red. Edges are drawn if that attribute was auto-suggested for that entity.

Some observations
I'm guessing the "addicting" and"good" attributes of goldfish refers to the cheesy snack cracker and not the actual fish. People seem to be rather ambivalent about children. I'm kind of surprised that people were not wondering why dogs are smart. Finally, are pigs actually salty (this seems unlikely), or this just how pork is usually prepared?

The code:

Employer recruiting intensity

2012-02-01T08:59:00.000-08:00

I was reading/skimming this paper by Davis et al. and in the abstract, they write:

"This paper is the first to study vacancies, hires, and vacancy yields at the establishment level in the Job Openings and Labor Turnover Survey, a large sample of U.S. employers. ... We show that (a) employers rely heavily on other instruments, in addition to vacancy numbers, as they vary hires, (b) the hiring technology exhibits strong increasing returns to vacancies at the establishment level, or both. We also develop evidence that effective recruiting intensity per vacancy varies over time, accounting for about 35% of movements in aggregate hires."

In a nutshell, they document that recruiting intensity varies across time and that this variation has a big effect on the number of aggregate hires. What's interesting is that the labor literature tends to focus on search intensity by workers, with firm search intensity comparatively understudied, but this paper suggests that ignoring employer efforts is likely to give a (very) incomplete impression. My guess is that this bias in the literature comes from the comparative lack of employer data on matching, though JOLTS (which this paper uses) is ameliorating the problem.

On oDesk, we've got excellent visibility on employer recruiting. Below is the "so what" plot from a recent experiment where we "recommended" contractors to employers (based on our analysis of what the job consisted of). The recommendations came immediately after the employer posted the job. We also made it easier for that employer to invite those recommended contractors to apply. The y-axis is the fraction of jobs where the employer made at least one invitation; treatment and control are side-by-side. We can see that regardless of category, the treatment was generally effective in increasing the number of invitations. But I think the striking thing is how much variation there is in "levels" of recruiting by category: in the control admin group, less than 10% of employers recruited, while in sales, it's almost 25%.

Presumably the difference depends on a number of factors: how many applicants the job will get organically, how close a substitutes are the different applicants, the value to the firm of filling the vacancy to the firm and so on. It also clearly matters how easy it is to search/recruit, given the effectiveness of our pretty lightweight intervention. From a welfare standpoint, this last point about the role of search/recruiting cost is potentially interesting, as reducing employer search frictions/costs technologically is, at least in online labor markets, a highly scalable proposition.

What do contractors in machine learning charge by the hour?

2011-11-26T11:18:00.001-08:00

I saw this question on Quora this morning: "What do contractors in machine learning charge by the hour?"

Obviously the answer depends on the skill level, but based on the few machine learning contractors I've worked with on oDesk, I would guess about $30-$40/hour is average. However, I wanted to check with some real data. I have access to our full database, but a more useful answer would allow people to see how to do this for other skills by simply scraping our search results (we have an API, but I don't know how to use it yet and I wanted to try the BeautifulSoup package).

First step was to reverse-engineer our search syntax and the HTML for profile rates. A contractor search of "machine learning" gives this:

"https://www.odesk.com/contractors?nbs=1#q=machine+learning"

and if I click on the next 10 results, the url is:

"https://www.odesk.com/contractors?nbs=1#q=machine+learning&skip=10"

I then checked to see what happens if I set "skip=0"---it returns that same thing as the first search URL, so now know that I can just write one function that returns the query URL, with parameters of "q" and "skip."

Next, I needed to find out how the rates are stored. Using Chrome's awesome "Inspect element" feature, I found that the rates for the 10 returned results as listed as "rate_1", "rate_2", and so on:

The rest was pretty easy---I just wrote two loops to collect up wages. I saw that we had some clear false positives, which I filtered out. This actually brings up a big problem on oDesk that we've been working on---namely that until recently, we had no standardization of skills, which made it hard to match people or do really good, highly specific queries. We've now moved to a closed (but expandable) vocabulary of skills, ala StackOverflow which in the long run will make it much easier to do matching and recommendations (and little data projects like this). That's a topic for another blog post. So, returning to the original question, my answer based on a pretty tiny sample:

Min: 16.67
Max: 100.0
Mean: 39.6983333333

And here's my pretty crappy but for-the-moment-functional code on GitHub:

Should Online Labor Markets Set a Minimum Wage?

2011-10-29T11:44:00.000-07:00

Some critics of online labor markets mistakenly believe that the platform creators have an incentive to keep wages low. Employers have that incentive, but all else equal, the creators of online labor markets want to see see wages rise, since almost all of them take a percentage of earnings. At least from a revenue standpoint, they should be indifferent between more work at a lower price or less work at a higher price, so long as the wage bill stays the same. It would be a different story if the platforms could somehow tax employers on the value-added by the platform, but so far, that's not possible.

One tool for raising wages might be for the platform to impose a minimum wage. It's certainly possible that imposing a binding minimum wage would increase platform revenue---it depends on the relative labor supply and demand elasticities, as well as the marginal cost of intermediating work. A platform's costs of good sold (i.e., intermediation service) is not precisely zero---there are server costs, customer service costs, fraud risks etc, so there is some wage at which the platform would be better off not allowing parties to contract.

Moving from generalities, let's look at workers from the Philippines, who (a) make a big chunk of the workforce on oDesk and (b) generally do relatively low-paid work (e.g., data entry, customer service, writing etc.) and thus would be most affected by a minimum wage imposition. If we look from about 2009 on (when the Philippines first started to become important), we can see that wages are basically flat, perhaps with a slight rise in some categories.

We can see that mean hourly wages range from $3 (for low-skilled data entry work) to about $8/hour for software development. By US standards, $3/hour is quite low---it's less than half the US federal minimum wage. However, let's look at where $3/hour wage puts someone in the the Philippine household income distribution, assuming they work 40 hours a week, 50 weeks a year:

Unfortunately the I couldn't get a more refined measure of income, but my eye-ball estimate is that $3.00/hour is at about the 50th percentile of the distribution. The equivalent hourly wage for median household income in the US is about $31/hour (using 2006 measure from Wikipedia) using the same 50 weeks a year, 40 hours a week formulation. It's important to note that this is household income, meaning that in many cases it is the combined income of a husband, wife and working-age children. And although online work does require a computer and a good internet connection, it does not require spending money on transportation, work clothes, food prepared outside the home etc. It also probably lets workers economize on child care e.g., I might be willing to let a 10 year old watch a 3 year old if I'm in the next room, but not if I'm across town.

So, what's the conclusion?

From a platform perspective, I can concede that imposing a minimum wage could be revenue-increasing, but it depends on some pretty hard to estimate factors: how well do we know the elasticities? Are the long and short-term elasticities the same? What happens if we can get our intermediation costs down? Implementation-wise, enforcement might be very hard---I could easily imagine workers giving under-the-table rebates.

From a worker/welfare perspective, a minimum wage would clearly help some but hurt others. Any binding minimum wage is going to price some workers out of the market. How do we weigh their lost opportunities against the increased wages paid to those that see a bump? This starkly highlights one of the real drawbacks of a minimum wage as social policy, which is that it might be globally progressive and yet highly locally regressive for workers on the bad side of the cut-off.

I'd love to hear both employer and worker perspectives on this---feel free to comment here & I'll respond.

Like this? Follow me on twitter.

Workers-as-Bundled-Goods

2011-10-27T20:50:00.000-07:00

Image by Simon Miller

A standard pricing strategy in many industries is bundling goods, e.g., productivity "suites" like Microsoft Office, value meals at fast food restaurants, hotel and flight combos, etc. In the labor market, we also see a kind of bundling, though not by design: each worker is a collection of skills and attributes that can't be broken apart and purchased separately by the firm. For example, by hiring me, my company gets my writing, meeting attendance, programming, etc.; they can't choose to not buy my low-quality expense-report-filing service.

Good mangers deal with this bundling by keeping workers engaged at their highest value activity. However, every activity has decreasing marginal returns, so even activities that start out as high-value eventually reach the "flat of the curve" where the marginal benefit of more of X gets pretty small. This phenomena gives large firms an advantage, in that their (generally) larger problems give workers more runway to ply their best skills (by the same token, small firms have to worry much more about "fit" within their existing team).

While pervasive, this flat-of-the curve dynamic and the resulting small-firm handicap is not a fundamental feature of organizations or labor markets--it springs from the binary nature of employment. It goes away or it least is diminished if a worker can instead being partly employed (i.e., freelance) at a number of firms, each paying the worker to do what they do best. To date, the stated value proposition of most freelancing sites has been that they allow for global wage arbitrage. Obviously that's important, but I suspect this "unbundling" efficiency gain will, in the long term, have a more profound effect on how firms organize and how labor markets function.