Online Labor: June 2012

Tuesday, June 26, 2012

Will online labor markets disrupt the traditional BPO firm?

Today I spoke on a panel on something called "impact sourcing" at the BPO World Forum. The idea of impact sourcing, in a nutshell, is that online work is a tool for development and that for-profit firms outsourcing some part of their business should look beyond traditional BPO firms and consider non-profits like Samasource and Digital Divide Data. It was a good audience for this pitch, as many of attendees were CIOs from big companies that are accustomed to signing multi-million dollar IT outsourcing deals with the likes of traditional BPO firms like Wipro, Infosys, Tata Consultancy etc.

After the panel, I was at a reception where I talked to someone fairly high up in a traditional BPO. When I described my elevator pitch version of oDesk's business---clients post jobs, contractors make bids, clients make a hire, we intermediate the work and take a percentage---he said, literally "what are you doing here at this conference? You guys are like the Antichrist." What he meant (in a half joking, half serious way) is that oDesk and similar companies threaten the model of the BPO.

My perception is that the traditional BPO model is possible because of two facts: (a) the enormous, purely placed-based differences in wages and (b) the difficulty of actually arbitraging those differences without help. BPOs stand ready to help companies reap the benefits of (a) by giving the help necessitated by (b). The word is still very far away from (a) no longer being true, but if oDesk and similar companies can radically lower the barriers to arbitraging differences by making it easy to hire, manage and pay workers regardless of geography, then (b) starts to become less true. If we get to the point where the qualitative differences of online remote and in-person work diminish and assessing and hiring workers is simple and easy, it would obviate the need for much of what the BPO firm is selling.

This is not to say that there isn't still a huge space for IT consulting---outsourcing an entire process is hard and BPOs with lots of experience have something very valuable to offer. Furthermore, besides purely cost level, one of the motivations for business process outsourcing is ability to change cost structure, namely by turning a fixed cost into a variable cost. But these caveats aside, on the margin, the mediation aspect of the BPO role seems likely to get less attractive over time as technology improves and online labor markets mature.

Wednesday, June 6, 2012

Resources for online social science

The Economist recently had an article about the growing use of online labor markets as subject pools in psychology research; ReadWriteWeb wrote a follow-up. If you've been following this topic, there wasn't very much new, but if you're a researcher that would like to use these methods, the articles were pretty light on useful links. This blog post is an attempt to point out some of the resources/papers available. This is my own very biased, probably idiosyncratic view of the resources, so hopefully people will send me corrections/additions and I can update this post.

To start, let's have this medium pay tribute to itself by running through some blogs and their creators.

Blogs

There is the "Follow the Crowd" blog which I believe is associated with HCOMP conference. It's definitely more CS than Social Science, but I think it's filled with good examples of high-quality research done w/ MTurk and with other markets.
There's Gabriel Paolacci's (now at Erasmus University) "Experimental Turk" blog which was mentioned in the article and is probably the best resource for examples of social and psychological science research being done with MTurk.
Panos Ipeirotis (at NYU and who is now academic-in-residence at oDesk) has a great blog "Behind-enemy-lines" that's basically all things relating to online work
The defunct "Deneme blog" by Greg Little (who also works at oDesk) and Lydia Chilton (at University of Washington).

Guides / How-To (Academic Papers)

A number of researchers have written guides to using MTurk for research. I think the first stop for social scientists should be the paper by Jesse Chandler, Gabriel Paolacci and Panos Ipeirotis:

Chandler, J. Paolacci, G. and Iperiotis, I. Running Experiments on Mechanical Turk,
Judgement and Decision Making (paper) (bibtex)

My own contribution is a paper with Dave Rand (who will still be starting as new assistant professor at Yale) and Richard Zeckhauser (at Harvard). The paper paper contains a few replication studies, but the real meat and the part I think is most important is the part discussing precisely why/how you can do valid causal inference online (I'm stealing this write-up/links of the paper from Dave's website):

Horton JJ, Rand DG, Zeckhauser RJ. (2011) The Online Laboratory: Conducting Experiments in a Real Labor Market. Experimental Economics. 14 399-425. (PDF) (bibtex)

Press: NPR's Morning Edition Marketplace [audio], The Atlantic, Berkman Luncheon Series [video], National Affairs, Crowdflower, Marginal Revolution, Experimental Turk, My Heart's in Accra, Joho blog, Veracities blog

Software

Unfortunately there hasn't been too much sharing of software for doing online experiments. Since a lot of the experimentation is done by computer scientists who do not feel daunted by making their own one-off, ad hoc applications, there are a lot of one-off, ad hoc applications. Hopefully people know of other tools that are out there that the can open source / they can share links to.

"Randomizer"

Basically, it lets you provide subjects one link that will automatically redirect them (at random) to a collection of URLs you've specified.I made the first really crummy version of this and then got a real developer to re-do it so it runs on Google App Engine.

"QuickLime"
This is a tool for quickly setting up an Limesurvey (an open source alternative to Qualtrics & Surveymonkey) on a new EC2 machine. This was made courtesy of oDesk research. I haven't fully tested it yet, so as with all this software, caveat oeconomus.

"oDesk APIs"
There haven't been lot of experiments done on oDesk by social scientists, but there's no reason it cannot be done. While it currently is not as convenient or as low-cost as doing experiments on MTurk, I think long-term oDesk workers would make a better subject pool since you can more carefully control experiments, it's easier to get everyone online at the same time to participate in an experiment, there are no spammers etc. If you're looking for some ideas or pointers, feel free to email me.

"Boto"
This is a python toolkit for working with Amazon Web Services (AWS). It's fantastic and saved me a lot of time when I was doing lots of MTurk experiments.

"Seaweed"
This was Lydia Chilton's masters thesis. The idea was to create tools for conducing economics experiments online. I don't think it ever moved beyond the beta stage, but if you (a) have some grant money and (b) are thinking about porting z-tree to the web, you should email Lydia and see where the codebase is & if anyone is working on it.

Here's a little javascript snippet I wrote for doing randomization within the page of an MTurk task.

People

I'm not doing to try to do a who-is-who of Crowdsourcing, but if you're looking for some contacts of other people (particularly those in CS) who are doing work in this field, you can check out the list of recent participants at "CrowdCamp" which was a workshop prior to HCI.

History

Probably the first paper I'm aware of that pointed out that experiments (i.e., user studies) were possible on MTurk was by Ed Chi, Niki Kittur and Bongwon Suh. As far as I know, the first social science done on MTurk was Duncan Watts and Winter Mason's paper on financial incentives and the performance of crowds.

Friday, June 1, 2012

The Innovation of StackOverflow

So as I write this, there is an egg timer ticking away next to me, set with 10 minutes of time. What am I waiting for? 10 minutes is how much time I predicted it would take to get my programming question answered on StackOverflow (SO):

http://stackoverflow.com/questions/10860020/output-a-vector-in-r-in-the-same-format-used-for-inputting-it-into-r

The back story was that I was writing some R code and I got to a point where I was stuck: there was something I wanted to do and I remembered that there was a built-in function that could accomplish my goal. Unfortunately, I couldn't remember that function's name. After some fruitless googling, I posted the question on SO.

So, how long did it actually take to get the right answer? About 6 1/2 minutes. As I write this sentence, I'm waiting for some more time to elapse so I can actually approve the answer:

This has been my general experience with SO---amazingly high-quality answers delivered almost immediately. I feel sheepish that I haven't been able to answer as many questions as I've asked, but one of the animating ideas of the community is that asking high-quality, answerable questions is a way of contributing.

What's interesting to me is that SO is an example of a primarily social---as opposed to technological---innovation. There's nothing really technically innovative about SO: the site is fast, search works well, tagging works well etc., but lots of sites have those things. What's special about SO is that through a carefully designed system of incentives and policies, they have created a community that is literally---and I think profoundly---changing how people program computers.

The reason I point about the social nature of the innovation is that it's become popular to lament the shallowness or perceived frivolity of many start-ups that are built around social rather than technological innovations (e.g., Facebook, Twitter, Instagram etc.). The idea seems to be that if you aren't making solar panels or cancer-curing drugs, you're not doing something socially useful. I personally don't share that bias, but if we are going to judge companies on the basis of some more "serious" metric like productivity or social surplus, then SO is a great example how a purely social innovation can succeed spectacularly on those metrics.