Obviously the answer depends on the skill level, but based on the few machine learning contractors I've worked with on oDesk, I would guess about 30-40/hour is average. However, I wanted to check with some real data. I have access to our full database, but a more useful answer would allow people to see how to do this for other skills by simply scraping our search results (we have an API, but I don't know how to use it yet and I wanted to try the BeautifulSoup package).
First step was to reverse-engineer our search syntax and the HTML for profile rates. A contractor search of "machine learning" gives this:
"https://www.odesk.com/contractors?nbs=1#q=machine+learning"
and if I click on the next 10 results, the url is:
"https://www.odesk.com/contractors?nbs=1#q=machine+learning&skip=10"
I then checked to see what happens if I set "skip=0"---it returns that same thing as the first search URL, so now know that I can just write one function that returns the query URL, with parameters of "q" and "skip."
Next, I needed to find out how the rates are stored. Using Chrome's awesome "Inspect element" feature, I found that the rates for the 10 returned results as listed as "rate_1", "rate_2", and so on:
The rest was pretty easy---I just wrote two loops to collect up wages. I saw that we had some clear false positives, which I filtered out. This actually brings up a big problem on oDesk that we've been working on---namely that until recently, we had no standardization of skills, which made it hard to match people or do really good, highly specific queries. We've now moved to a closed (but expandable) vocabulary of skills, ala StackOverflow which in the long run will make it much easier to do matching and recommendations (and little data projects like this). That's a topic for another blog post. So, returning to the original question, my answer based on a pretty tiny sample:
Min: 16.67
Max: 100.0
Mean: 39.6983333333
And here's my pretty crappy but for-the-moment-functional code on GitHub:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# John Horton | |
# www.john-joseph-horton.com | |
# Description: Answer to Quora question about machine learning hourly rates | |
# "http://www.quora.com/Machine-Learning/What-do-contractors-in-machine-learning-charge-by-the-hour" | |
from BeautifulSoup import BeautifulSoup | |
import urllib2 | |
def contractors(skill, offset): | |
"""gets search results for skills; offset should be a multiple of 10""" | |
base_url = "https://www.odesk.com/contractors?nbs=1&q=%s&skip=%s" | |
return base_url % (skill, offset) | |
def get_wage(x): | |
"""extracts the hourly wage from the returned HTML; | |
verbose because John sucks at regular expressions """ | |
return float(x.split(">")[1].split("<")[0].replace("$","").replace("/hr","")) | |
def wages(skill, n): | |
"""gets at least n contractors (if they are available) who have that skill, | |
returning a list""" | |
pages = n / 10 + 1 | |
wages = [] | |
for i in range(pages): | |
url = contractors(skill, 10*i) | |
f = urllib2.urlopen(url) | |
soup = BeautifulSoup(f) | |
for r in range(1,10): | |
x = soup.findAll(attrs={"name" : "rate_%s" % r}) | |
wages.append(get_wage(str(x[0]))) | |
return wages | |
# there were a couple of false positives (we're working on this) | |
# so I excluded everyone listing less than $15/hour | |
cleaned_wages = [w for w in wages("machine-learning", 30) if w > 15] | |
print """ | |
Min: %s | |
Max: %s | |
Mean: %s""" % (min(cleaned_wages), | |
max(cleaned_wages), | |
round(sum(cleaned_wages)/float(len(cleaned_wages),2) | |
)) | |