Saturday, November 26, 2011

What do contractors in machine learning charge by the hour?

I saw this question on Quora this morning: "What do contractors in machine learning charge by the hour?"

Obviously the answer depends on the skill level, but based on the few machine learning contractors I've worked with on oDesk, I would guess about $30-$40/hour is average. However, I wanted to check with some real data. I have access to our full database, but a more useful answer would allow people to see how to do this for other skills by simply scraping our search results (we have an API, but I don't know how to use it yet and I wanted to try the BeautifulSoup package).

First step was to reverse-engineer our search syntax and the HTML for profile rates. A contractor search of "machine learning" gives this:

"https://www.odesk.com/contractors?nbs=1#q=machine+learning"

and if I click on the next 10 results, the url is:

"https://www.odesk.com/contractors?nbs=1#q=machine+learning&skip=10"

I then checked to see what happens if I set "skip=0"---it returns that same thing as the first search URL, so now know that I can just write one function that returns the query URL, with parameters of "q" and "skip."

Next, I needed to find out how the rates are stored. Using Chrome's awesome "Inspect element" feature, I found that the rates for the 10 returned results as listed as "rate_1", "rate_2", and so on:


The rest was pretty easy---I just wrote two loops to collect up wages. I saw that we had some clear false positives, which I filtered out. This actually brings up a big problem on oDesk that we've been working on---namely that until recently, we had no standardization of skills, which made it hard to match people or do really good, highly specific queries. We've now moved to a closed (but expandable) vocabulary of skills, ala StackOverflow which in the long run will make it much easier to do matching and recommendations (and little data projects like this). That's a topic for another blog post. So, returning to the original question, my answer based on a pretty tiny sample:


Min: 16.67
Max: 100.0
Mean: 39.6983333333


And here's my pretty crappy but for-the-moment-functional code on GitHub: