[14:26:21] hi channel [14:27:03] does someone knows here, how to count the total number of links in a wiki efficently? I'm trying to count from pagelinks table, but is incredible slow [16:29:31] dsaez: a Wikimedia wiki? and do you want unique links or what [16:31:03] Nemo_bis: yes a Wikimedia wiki, like enwiki. I want to count number of links, as complement for number of pages [16:33:23] 1024612241 rows on enwiki, according to EXPLAIN [16:33:40] * Nemo_bis downloads http://ftp.acc.umu.se/mirror/wikimedia.org/dumps/itwiki/20180601/itwiki-20180601-pagelinks.sql.gz [16:39:45] zgrep -Eo '\),\(' itwiki-20180601-pagelinks.sql.gz | wc -l [16:39:46] 193983181 [16:39:58] if you want something very brutal, this approach takes few seconds ^^ [17:02:25] Nemo_bis, thanks this is a good aproximation [17:22:11] dsaez: do you have some time tomorrow to talk with sgoel and I about Thanks? It would be great to brainstorm a bit and have your input. [17:24:08] lzia, yes, earlier as possible [17:24:35] lzia, sgoel, or if you have time now, we can also do it. [17:26:05] dsaez: I need to go to one of these consulates to pick up my passport now (last minute event as usual) but if you and sgoel want to jump in a call now and have time for it, please do it. [17:26:32] dsaez: I'm free [17:26:55] * lzia signs out. [17:28:03] sgoel, give 2 mins. [17:28:22] dsaez: absolutely [18:22:04] 10Quarry, 10Cloud-Services: GoogleDocs bot has download 125 000 csv exports in the last month - https://phabricator.wikimedia.org/T197256#4283696 (10Framawiki) [18:40:24] 10Quarry: Ask python scripts to use custom user agents - https://phabricator.wikimedia.org/T197258#4283738 (10Framawiki) [18:51:28] 10Quarry: Ask python scripts to use custom user agents - https://phabricator.wikimedia.org/T197258#4283791 (10Framawiki) [18:59:22] 10Quarry: Ask python scripts to use custom user agents - https://phabricator.wikimedia.org/T197258#4283799 (10Framawiki) % of python requests: ``` 60797 42.92% GET HTTP/1.1 / 40367 28.50% GET HTTP/1.1 /login?next=/query/new 20240 14.29% GET HTTP/1.1 /query/new 20214 14.27% GET HTTP/1.1 /query/runs/all ```... [19:26:35] bmansurov: Hi ! [19:26:44] joal: o/ [19:26:55] bmansurov: A quick note on notebooks usage with spark [19:27:22] bmansurov: I know it's not nice, but it's preferable to stop the notebooks once you don't use them [19:27:38] If not, the background spark job is not killed [19:27:50] (I've notices 2 long living processes on the cluster) [19:28:02] joal: got it, I'll stop mine from now on [19:28:13] bmansurov: no big deal, just saying :) [19:28:29] thanks :) [19:28:36] joal: ok thanks for letting me know. I read about it in Wikitech, but forgot. [19:29:12] As said, no worries :) I really prefer you using the cluster and me pinging once in a while if I notice some things :) [19:29:52] ok cool [19:33:19] i would forget too! [19:45:05] I "forgot" ;) [23:23:31] sgoel: I'm back. The German consulate decided that they can't give my passport to USPS which meant I had to drive all the way to SF to pick it up. :( Then I had to spend some time finalizing my docs for South Africa visa. A good day was spent on figuring out how to cross borders. :( ow well! :D [23:24:33] sgoel: let me know if you want to jump in a Meet. We can also talk tomorrow.