User talk:Dušan Kreheľ/Wikipedia talk:New matrix format
Appearance
Existing "ez" compressed format, as well as pageviews complete
[edit]Hi! The pageviews "complete" dump version does just this. It's a bit of a mess because the Analytics team that maintains these dumps has changed a lot in the middle of a big effort to create the new dump. But the details that are relevant are thus:
- pageviews_ez is the old dump that first implemented an idea similar to yours: https://dumps.wikimedia.org/other/pagecounts-ez/
- pageviews_complete is the new version that should meet your needs going forward: https://dumps.wikimedia.org/other/pageview_complete/readme.html
- lots of documentation updates are needed to make this clear
- we need to clean up old jobs that are still running and giving the impression that other datasets are supposed to be how people download data
Milimetric (WMF) (talk) 19:46, 5 September 2022 (UTC)
- @Milimetric (WMF): Thx, I looked. My way idea was to have the years export. pageview_complete have only the day statistics. Dušan Kreheľ (talk) 20:32, 18 September 2022 (UTC)
- @Dušan Kreheľ: Indeed, pageviews_complete has daily and monthly statistics. The monthly rollups are here, linked from the daily ones: https://dumps.wikimedia.org/other/pageview_complete/monthly/. Perhaps that should be clearer from the front page. If yearly rollups are useful as well, we should probably just add them to this dataset rather than creating a different dataset, in my opinion. What do you think? Milimetric (WMF) (talk) 13:36, 19 September 2022 (UTC)
- @Milimetric (WMF): Thx for the comment and the links. My actual answer on your question is in the section Epilogue of the article. You look. Dušan Kreheľ (talk) 20:21, 16 October 2022 (UTC)
- @Dušan Kreheľ: Indeed, pageviews_complete has daily and monthly statistics. The monthly rollups are here, linked from the daily ones: https://dumps.wikimedia.org/other/pageview_complete/monthly/. Perhaps that should be clearer from the front page. If yearly rollups are useful as well, we should probably just add them to this dataset rather than creating a different dataset, in my opinion. What do you think? Milimetric (WMF) (talk) 13:36, 19 September 2022 (UTC)
Comparison for other formats
[edit]Thanks for sharing it - this is interesting idea. I wonder how does it work compared to other known formats to store matrix with many zeros like Sparse_matrix#Storing_a_sparse_matrix. Eran (talk) 09:50, 15 October 2022 (UTC)
- @ערן: Excelent comment. I compared the examples for the section Compressed sparse row (CSR, CRS or Yale format) from enwiki page and my format is better. Dušan Kreheľ (talk) 07:08, 16 October 2022 (UTC), Dušan Kreheľ (talk) 07:09, 16 October 2022 (UTC)