Let non-author users provide cleaned up versions of datasets
One of Dryad's philosophies is that it should be as easy as possible for users to contribute data. This is valuable for getting data archived, but it typically makes the data much more difficult to work with, especially in automated ways. Users can clean the data up themselves, but this means that user has to repeat the same work and it this still doesn't solve efforts to automate the acquisition, cleaning, and use of, data.
I proposed providing a mechanism by which someone other than the authors of the data set can provide a restructured/cleaned up version of the data that is more usable (e.g., a set of csv files instead of a multi-sheet Excel file). Credit would likely not be necessary for this work, though a mechanism for blame (in the event that the work were to introduce errors) might be useful.
Tim Lucas commented
This could definitely work especially if the cleaned up version was held to a strict set of standards. While there's a lot to be said for very few restrictions on original data (better that it's hosted and dirty, than clean and unavailable), the same is not true of cleaned up data (better unduplicated and dirty, than duplicated and still kinda dirty).
I'd say a minimum would be a reproducible pipeline for going from the dirty data to the clean data and stricter format controls for common datatypes (i.e. flat data frames).