Datasheet for dataset template

"Document [the dataset] motivation, composition, collection process, recommended uses, and so on. [They] have the potential to increase transparency and accountability within the machine learning community, mitigate unwanted biases in machine learning systems, facilitate greater reproducibility of machine learning results, and help researchers and practitioners select more appropriate datasets for their chosen tasks.''

The motivation behind the proposal was the electronics industry, where every component has a datasheet that describes its operating characteristics and recommended uses. In machine learning, data is the input for model training. Using the wrong dataset, or using a dataset outside of its original intent, or even not understanding well enough the limitations of a dataset, has dire consequences for the model. However, ``[d]espite the importance of data to machine learning, there is no standardized process for documenting machine learning datasets. To address this gap, we propose datasheets for datasets.''

Datasheet for dataset template

Source

\documentclass \usepackage[utf8] % Used in the explanation text \usepackage[colorlinks] % Used by the template \usepackage \usepackage \usepackage[dvipsnames] \title \author \date \begin \maketitle \section \href ``document [the dataset] motivation, composition, collection process, recommended uses, and so on. [They] have the potential to increase transparency and accountability within the machine learning community, mitigate unwanted biases in machine learning systems, facilitate greater reproducibility of machine learning results, and help researchers and practitioners select more appropriate datasets for their chosen tasks.'' The motivation behind the proposal was the electronics industry, where every component has a datasheet that describes its operating characteristics and recommended uses. In machine learning, data is the input for model training. Using the wrong dataset, or using a dataset outside of its original intent, or even not understanding well enough the limitations of a dataset, has dire consequences for the model. However, ``[d]espite the importance of data to machine learning, there is no standardized process for documenting machine learning datasets. To address this gap, we propose datasheets for datasets.'' \section % Color picked from the Datasets for Datasheets paper \definecolor \newcommand[1]\selectfont \textbf<\textcolor>> > > \newcommand[1]\selectfont \textcolor<\textbf>> > \newcommand[2]\selectfont \textcolor <\textbf#2>> > \newcommand[1] > \begin \begin %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \dssectionheader \dsquestionex \dsanswer < >\dsquestion \dsanswer < >\dsquestionex \dsanswer < >\dsquestion %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \bigskip \dssectionheader \dsquestionex < Are there multiple types of instances (e.g., movies, users, and ratings; people and interactions between them; nodes and edges)? Please provide a description.>\dsanswer < >\dsquestion \dsanswer < >\dsquestionex < If the dataset is a sample, then what is the larger set? Is the sample representative of the larger set (e.g., geographic coverage)? If so, please describe how this representativeness was validated/verified. If it is not representative of the larger set, please describe why not (e.g., to cover a more diverse range of instances, because instances were withheld or unavailable).>\dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestion %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \bigskip \dssectionheader \dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestion \dsanswer < >\dsquestion \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestion \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestion \dsanswer < >%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \bigskip \dssectionheader \dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestion \dsanswer < >%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \bigskip \dssectionheader \dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestion \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestion %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \bigskip \dssectionheader \dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestion \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestion \dsanswer < >%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \bigskip \dssectionheader \dsquestion \dsanswer < >\dsquestion \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestionex \dsanswer < >\dsquestion \dsanswer < >\end \end \end