Does it need to be accessible via API (e.g. SQL) or just a spreadsheet-style web interface?
Does it need to be accessible via API (e.g. SQL) or just a spreadsheet-style web interface?
Interpreting “a previously-unrecognized weakness in X was just found” as “X just got weaker” is dangerously bad tech writing.
Agreed—but note that in this case the information was only discovered because the organizations involved (Common Crawl and LAION) do show their data. We should assume that proprietary data sets have similar issues—but this case should be seen as an opportunity to improve one of the rare open data sets, not to penalize its openness and further entrench proprietary sources.