Jump to content

User talk:Dušan Kreheľ/Wikipedia talk:New matrix format

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Existing "ez" compressed format, as well as pageviews complete[edit]

Hi! The pageviews "complete" dump version does just this. It's a bit of a mess because the Analytics team that maintains these dumps has changed a lot in the middle of a big effort to create the new dump. But the details that are relevant are thus:

Milimetric (WMF) (talk) 19:46, 5 September 2022 (UTC)[reply]

@Milimetric (WMF): Thx, I looked. My way idea was to have the years export. pageview_complete have only the day statistics. Dušan Kreheľ (talk) 20:32, 18 September 2022 (UTC)[reply]
@Dušan Kreheľ: Indeed, pageviews_complete has daily and monthly statistics. The monthly rollups are here, linked from the daily ones: https://dumps.wikimedia.org/other/pageview_complete/monthly/. Perhaps that should be clearer from the front page. If yearly rollups are useful as well, we should probably just add them to this dataset rather than creating a different dataset, in my opinion. What do you think? Milimetric (WMF) (talk) 13:36, 19 September 2022 (UTC)[reply]
@Milimetric (WMF): Thx for the comment and the links. My actual answer on your question is in the section Epilogue of the article. You look. Dušan Kreheľ (talk) 20:21, 16 October 2022 (UTC)[reply]

Comparison for other formats[edit]

Thanks for sharing it - this is interesting idea. I wonder how does it work compared to other known formats to store matrix with many zeros like Sparse_matrix#Storing_a_sparse_matrix. Eran (talk) 09:50, 15 October 2022 (UTC)[reply]

@ערן: Excelent comment. I compared the examples for the section Compressed sparse row (CSR, CRS or Yale format) from enwiki page and my format is better. Dušan Kreheľ (talk) 07:08, 16 October 2022 (UTC), Dušan Kreheľ (talk) 07:09, 16 October 2022 (UTC)[reply]