I put together this grading rubric based on this paper published to the World Economic Forum (terrible name, I know). The idea for this rubric is you should have a checklist of some sort where you can see exactly how you’re doing in terms of responsibly building an AI system. It’s deliberately meant to be difficult to build something that has green across the board so I would suspect most projects out there to not grade very well on it.
However, most of these papers being published on AI have similar requirements for something they would consider “good” so I think an actionable grading rubric such as this is a good start for evaluating where you are at.
You can find the grading rubric on this Google Docs link. Have any feedback? I’d love to hear it here. Especially if there’s something you feel is missing!
Do you have any suggested candidate projects to run through the rubric? I’m interested to try this out, but I’m not a subject-matter expert on AI or popular AI-related projects. It would help me walk through the rubric if I had an example of a good/bad project to try it out on. Some of the things like Team Diversity and Fairness Definition are hard to measure from an external perspective.
Yeah, unfortunately, unlike the community grading rubric, it’s not quite as simple as looking at a GitHub repository to get all of the information needed to grade the project. Majority of AI systems are proprietary simply because often times they’re so context specific, there’s little use in sharing them. It’s really meant as a self-evaluation or something a certifying authority could use but in order to do so, it would require working directly with the team to evaluate where they are at.
This makes sense, but it also makes it difficult for me to understand where to start for feedback without seeing a model of what applying this rubric looks like. It might help to seek out feedback from folks with a deeper AI/ML background. Perhaps @jxr8142 or @deejoe have feedback since they were interested in the FOSDEM 2020 proposal.
For sure. It’s always easier to evaluate things when using them in public. Stinks that doing that is so difficult in this context. I think the kind of feedback I’d be looking for here is from a rights perspective. Of the things that are evaluated here, do you feel like certain user rights are being left out?