[01:52:30] Would you say Abstract Wikipedia is the superset of Wikilambda, or the other way round? [01:53:19] quick question for some napkin math: how much storage does all of Wikidata take? [01:55:57] 60 GB in bz2, workout Lexicographical data (which is probably less than 1G) [01:56:05] https://dumps.wikimedia.org/wikidatawiki/entities/ [01:56:24] The compression is very efficient, though [01:57:03] @deryckchan The wiki of functions is part of Abstract Wikipedia [01:58:52] Napkin maths: [01:58:53] I clicked "random item" 10 times. The items had "page sizes" varying from 5kB to 93kB, and mean 35kB. [01:58:54] Scale up to 100 million items -> 350GB (excluding properties, lexemes, overheads, and not making use of compression) (re @whomstved: quick question for some napkin math: how much storage does all of Wikidata take?) [01:59:13] Interesting [02:00:41] i.e. still small enough to download onto a hard disk 🤪 (no please don't, our friends at Microsoft Cambridge tried and gave up, and instead decided to variously bootstrap and query on the fly) [02:01:22] what was the technical issue with that? [02:01:29] I don't know if it's actually something someone said but I have the impression that it's over 1 tb extracted [02:02:04] also close to half of the items are scientific articles which tend to be pretty big compared to other items [02:02:32] No technical blocker, they simply realised after downloading the entirety of Wikidata that working off an offline dump wasn't the most efficient way of achieving what they want (re @whomstved: what was the technical issue with that?) [02:02:44] Or so I was told [02:03:15] (if someone actually knows how big it is extracted, please let me know, now I'm curious) [02:03:16] But those items are also low on connectivity with other items (re @Nikki: also close to half of the items are scientific articles which tend to be pretty big compared to other items) [02:03:51] I thought the question was about size not connectivity? [02:04:41] Last time it took me about a day or two just to unzip it... But I also think it was bigger than 350G [02:04:47] Thanks 👍 that's helpful in terms of what names I should vote for (re @vrandecic: @deryckchan The wiki of functions is part of Abstract Wikipedia) [02:06:59] Yeah, I guess scholarly article items have lots of content on the item itself, but relatively few links to other items and few sitelinks. So I'm not convinced they are large in terms of storage space compared to items that are rich in internal links and sitelinks (re @Nikki: I thought the question was about size not connectivity?) [02:08:57] Is there much repetition in the unzipped dump to facilitate faster querying? [02:13:21] links to other items are just statements, 10 links to other items isn't going to be vastly different from 10 statements which aren't links to other items... and items with a lot of sitelinks only account for a small percentage... would love to try some queries but I really must sleep. maybe tomorrow [14:48:17] That's fair. Thanks Nikki (re @Nikki: links to other items are just statements, 10 links to other items isn't going to be vastly different from 10 statements which aren't links to other items... and items with a lot of sitelinks only account for a small percentage... would love to try some queries but I really must sleep. maybe tomorrow) [14:48:52] Most likely my tiny random sample happened to give me smaller than average items :p (re @vrandecic: Last time it took me about a day or two just to unzip it... But I also think it was bigger than 350G) [16:29:22] Tomorrow (in most time zones), October 17 at 19:00 IST, in order to celebrate the 8th birthday of Wikidata organized by the Indian community, there will be a presentation on Abstract Wikipedia. [16:29:22] You're all welcome! There will be plenty of time for questions. [16:29:22] https://wikidata.org/wiki/Wikidata:Eighth_Birthday/India [18:29:56] good grief, is Wikidata eight already? [18:31:04] We all feel old, don't we (re @wmtelegram_bot: good grief, is Wikidata eight already?) [18:31:54] if you'd asked me how old it was earlier, I'd have said probably three years. :P [18:32:01] time sure does fly [18:51:44] Well, most Wikimedia editors wouldn't need to interact with Wikidata before ~2014 when sitelinks were removed from all sister projects [18:51:53] That's still 6 years though [21:17:00] if there is someone who speaks an indic language well, and will help me for about ten minutes, please ping me [21:20:16] @mahir256 ^ maybe you can help? [21:38:12] Yeah? (re @wmtelegram_bot: if there is someone who speaks an indic language well, and will help me for about ten minutes, please ping me) [21:38:46] (I am based in Illinois if you need to talk later, fyi) [21:39:07] I would need help with translation of one slide for a talk for tomorrow. Mind if I dm you? [21:39:17] Yeah sure go ahead [21:40:31] Thank you [22:47:56] resolved, thanks @mahir256