The DataUp tool has four main features:
1. Check for best practices
The DataUp tool will parse your .xlsx or .csv file to detect the presence of potential issues that do not comply with data management best practices. These issues may inhibit your ability to archive your data file in the preferred .csv format, or may cause problems for future users of your data. Issues detected include:
- Embedded charts, tables, pictures
- Embedded comments
- Special characters
- Color coded text or cell shading
- Columns have mixed data types
- Non-contiguous data
- Merged cells
- Blank cells
- Header row absent or more than one header row
- Multiple sheets (tabs)
In addition to identifying the locations of these problems, DataUp explains why they are potentially problematic, and offers suggested alternatives or the ability to remove embedded charts, comments, and color coded cells in bulk. Users also have the ability to ignore these suggestions and continue without addressing issues (e.g., if you plan to archive the data in .xlsx format)
2. Create metadata
Metadata is “data about data”. It is the who, what, when, where, why, and how for your dataset. Often researchers do not explicitly these details; instead this information is kept in lab notebooks, on spreadsheet tabs, or in their heads. Among the most challenging aspects of being a good data steward is creating quality standard metadata to accompany your dataset.
DataUp will walk you through creating standard metadata using a form that becomes part of your spreadsheet, allowing for future use and sharing. When the data file is uploaded to a repository, the metadata generates a metadata file in Ecological Metadata Language (EML), which is a standard form of metadata for many Earth, environmental, and ecological researchers. By creating standardized metadata, the data file can be discovered and reused more easily.
Metadata can be generated at both the file level and the column level. File level metadata includes names and email addresses for project personnel, and dataset titles, and institutional affiliations. Column level metadata (i.e. attribute metadata) includes information about the variables in your dataset, the units of measure, and descriptions of each column of data.
3. Get credit for data: obtain an identifier
Valuing and incentivizing the time and effort required to manage data well is an important factor in fostering data sharing and reuse. One way to allow data producers to get credit for this is to enable data citation. Rather than citing papers that summarize results from a data set, researchers can begin to cite data sets themselves. For this to be possible, the data must be well documented, archived, and have a persistent, unique identifier (similar to a Digital Object Identifier, or DOI). DataUp will assist you in obtaining an identifier for your dataset, which will allow you and others to cite your data directly, put it on your CV, and determine its impact in your research community.
4. Archive & share data
Once you have created metadata, you can connect directly to a repository via DataUp and upload your data for archiving. Currently, DataUp is connected to the ONEShare, which is a dedicated DataUp repository that allows anyone to deposit tabular data. ONEShare is a DataONE member node, which means datasets deposited there are indexed by DataONE and available for discovery and use by the public. We expect that more repositories will implement DataONE as a way to receive data in the first year of release.
Good data management practices ensure others can use and reuse your data well into the future. In addition, DataUp can help you meet newly implemented funder requirements for data management. By documenting your data, making your data publicly available, and providing a persistent identifier for citation, you are also contributing to open science, and transparent research processes.