DrivenData: Crowd-sourcing Data Science

Since I just signed up for one of their competitions, I thought I’d make mention of DrivenData.

What is DrivenData?

At DrivenData, we bring cutting-edge practices in data science and crowdsourcing to some of the world’s biggest social challenges and the organizations taking them on. We host online challenges, usually lasting 2-3 months, where a global community of data scientists competes to come up with the best statistical model for difficult predictive problems that make a difference. – “How the process works


While you can visit their page to learn more about the organization itself, the short, three-point version of their process is this:

  1. Work with third-parties to identify and frame a predictive question that can be answered with data.
  2. Host the data science competition, inviting freelance developers and data scientists to submit statistical models; these models are then ranked based on how well they predict data withheld from the competitors.
  3. Integrate the best statistical model into the third-party organization’s workflow at the conclusion of the competition.


As part of the competition guidelines, all of the competition submissions must be published and released underneath the permissive MIT License.

You can review winning models from this GitHub repository.

I was really encouraged to start my journey into data science analysis through DrivenData because of the humanitarian nature behind some of their competitions. In particular, I have about eight months to come up with models that predict the level of damage to buildings caused by the 2015 Gorkha earthquake in Nepal. That’s just one example; I would hope as they continue to work with other organizations they will continue to promote and provide competitions that address other humanitarian crises from the perspective of data.

Here’s hoping that this encourages others to take a look at DrivenData.

1 Like