[04:01:46] 10Analytics, 10Pageviews-API, 10Tool-Pageviews: Pageviews agent=bot is always 0 - https://phabricator.wikimedia.org/T197277 (10Tbayer) The database table underlying the API, [[https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly|pageview_hourly]], only has two possible values for a... [06:38:43] 10Analytics, 10Analytics-Cluster: Upgrade Hive to ≥1.13 or ≥2.1 - https://phabricator.wikimedia.org/T203498 (10elukey) >>! In T203498#4566676, @mpopov wrote: > Whoops, realized I was missing a digit in the version. Sorry you are right, I have been tricked by the title! Thanks for amending :) >>! In T203498#4... [07:20:48] PROBLEM - Check the last execution of check_webrequest_partitions on analytics1003 is CRITICAL: CRITICAL: Status of check_webrequest_partitions [07:20:57] ah nice! --^ [07:21:02] this is me playing [07:21:07] I found what it wasn't working [07:21:14] elukey: Hi! Have fun ;) [07:26:07] joal: morning! :) [07:40:27] elukey: not sure if you've noticed - We ran into a firewall issue last fraiday [07:41:38] elukey: the namemapce-dowloader script (a refinery python requesting API for namespaces and writing CSV onto hdfs) borke, preventing us to have up-to-date partitions in hive, and therefore to run mediawiki-history-denormalize [07:42:00] I updated the script to use a proxy, once merged we'll need to update puppet [07:46:56] joal: ah snap! I saw a failure but didn't check what was the root case! [07:46:59] *cause! [07:47:03] (03PS1) 10Joal: Fix mediawiki-history-reduced job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/459496 [07:47:05] did it take a long time to get fixed? [07:47:12] elukey: no no - no bother [07:47:36] elukey: Only thing is it was by luck that I saw the thing failed [07:48:06] elukey: I monitored MWH jobs, and therefore saw that sqoop wass finished but MWH-spark not started, and wondered [07:48:28] elukey: Would be good to have an email (or an alarm) about those failures too [07:48:51] joal: ah wait I thought it was the email to alerts@ that I saw during the weekend [07:49:03] elukey: good candidate for sytemd-timer? Or should we wait for something else since the job is about data-import? [07:49:15] elukey: nope, it was on friday nigfht [07:49:35] elukey: The patch about this weekend alerts is just above --^ [07:49:39] I think that we are ready to use a systemd timer for it! Do you need logging? [07:49:47] that it is the only part still missing [07:49:58] elukey: It would be usefull if feasible [07:50:37] joal: so with the systemd unit we get automatic logging to journald [07:50:45] great :) [07:50:58] RECOVERY - Check the last execution of check_webrequest_partitions on analytics1003 is OK: OK: Status of the systemd unit check_webrequest_partitions [07:51:10] elukey: the cron-command needs to be update too (see https://gerrit.wikimedia.org/r/c/analytics/refinery/+/458860) [07:52:06] (03CR) 10Elukey: [C: 031] Add proxy to namespace-downloader script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458860 (owner: 10Joal) [07:52:52] joal: let's update the cron script now so we have it set, and then we can replace it later on we are ready [07:52:56] the new alarm can be done with timers [07:53:17] the main issue with journald is that journalctl is not available to all users :) [07:53:30] so I think we can just redirect its output to a file and that's it [07:53:33] in /var/log/etc.. [07:53:34] elukey: in case of failure, we'll ask for logs :) [07:53:42] nono it would be super boring [07:53:49] you guys need to be able to check them [07:53:53] elukey: no logs needed except in failure case I think [07:53:57] ok [07:54:40] !log Manually restarting mediawiki-reduced oozie with manual addition of missing parameter [07:54:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:57:18] joal: one thing is changing though, namely that we will not rely anymore on having stdout/stderr separated [07:57:46] they will be logged by journald all together (even if I value having logging separated between stdout/err anyway) [07:58:02] the alarm will come like it happened this morning [07:58:08] then we'll need to check [07:58:09] elukey: yes, makes sense - We'll update the python logging-lib [07:59:00] elukey: as of current config, having stderr+stdout logged together means we'll see double log lines for warning and errors [07:59:45] elukey: We also need to make sure return codes are the expected ones (namely not-0 in failures case) [07:59:45] that it is not ideal yes [08:00:08] exactly, part of the "porting" process will be to check the script and make sure it exits properly [08:00:22] it will be a long process and probably a goal-level one [08:00:26] (03PS2) 10Joal: Fix mediawiki-history-reduced job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/459496 [08:00:37] elukey: --^ also when you have a minutes :) [08:06:02] (03CR) 10Elukey: [C: 031] Fix mediawiki-history-reduced job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/459496 (owner: 10Joal) [08:09:37] 10Analytics, 10DBA, 10Growth-Team, 10Notifications: Purge all Schema:Echo data after 90 days - https://phabricator.wikimedia.org/T128623 (10elukey) Needs to be coordinated between me and @mforns when he is back from vacations. Going to put this task in our Incoming Backlog column to get triaged by my team... [08:10:01] 10Analytics, 10DBA, 10Growth-Team, 10Notifications: Purge all Schema:Echo data after 90 days - https://phabricator.wikimedia.org/T128623 (10elukey) p:05Low>03Triage [08:42:31] joal: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/459508/ [08:43:05] when the sre team approves, you guys should be able to use journalctl to check logs of every systemd unit [08:43:25] let's see if it is ok or not, it would be great not to have to create a log file for every cron [08:46:16] joal: another thing.. should we postpone the maintenance to reboot the analytics100[1-3] hosts? [09:06:29] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Refactor analytics cronjobs to alarm on failure reliably - https://phabricator.wikimedia.org/T172532 (10elukey) a:03elukey [09:09:38] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install analytics-master100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T201939 (10elukey) @RobH we thought to schedule the maintenance window to swap analytics100[1,2] with analytics-master100[1,2] for Sept 22nd, and I'd like to sen... [09:11:02] 10Analytics, 10Analytics-Kanban: Reboot Analytics hosts for kernel security upgrades - https://phabricator.wikimedia.org/T203165 (10elukey) [09:15:04] elukey: no need to wait IMO [09:15:24] joal: ah ok, so I can drain and then proceed? [09:15:31] elukey: jobs are not 1003-dependent anymore [09:15:40] ? [09:15:52] hey, the researchers I work with have been given analytics-privatedata-users and they can ssh into stat machines, but when they try to use hive/hadoop, hadoop complains about rights. could be that puppet hasn't run there yet, but is it possible that hadoop requests "researchers" or "statistics-users" access groups? [09:16:09] the description of those groups is about "number crunching hosts" [09:16:56] Hi gilles - I'm goinna let elukey answer this - He's better aware of groups than I am [09:17:34] gilles: I think puppet didn't run on the hadoop master nodes, where we do the mapping, lemme do it [09:17:50] elukey: thanks [09:18:28] gilles: they should be able to log in now, puppet just created their users [09:18:52] joal: what do you mean with "jobs are not 1003-dependent anymore" ? [09:19:22] elukey: sqoop jobs are run from a script running on 1003, so rebooting it breaks that [09:19:52] joal: ah you meant that now the sqoop part is done so we can reboot [09:19:54] okok got it [09:19:57] elukey: Currently running jobs are not started by a script from 1003, therefore stae is ok [09:20:07] elukey: indeed [09:20:29] elukey: oozie will keep it's state in memory, and other jobs are launched from other places [09:22:50] sure, I'd prefer to drain a bit the cluster since on analytics1001 we'll have to stop the history server, even if for not long [09:23:10] elukey: no prob - Let's stop camus? [09:23:16] already done it :) [09:23:31] gilles: I see a drossi user making hive queries, I guess they are ok :D [09:25:05] joal: very interesting - https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+1.2.1+Release [09:25:11] hive 1.2.1 [09:25:17] hadoop 2.7.3 [09:25:49] 10Analytics, 10Analytics-Kanban: Reboot Analytics hosts for kernel security upgrades - https://phabricator.wikimedia.org/T203165 (10elukey) [09:25:59] elukey: indeed, thanks! :) [09:27:04] elukey: other interesting thing - bigtop contains alluxio :) [09:27:39] elukey: but kafka version is 0.10 :( [09:28:32] ah yes even zookeeper is not good, but we are in the same position now with cdh.. we simply use our own repo packages [09:28:56] right [09:29:03] another project that I didn't know, alluxio [09:29:04] ahahah [09:29:13] elukey: fairly new :) [09:31:19] 10Analytics, 10Analytics-Cluster: Upgrade Hive to ≥1.13 or ≥2.1 - https://phabricator.wikimedia.org/T203498 (10elukey) As FYI, the last release of Big Top doesn't seem bad: https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+1.2.1+Release ``` hive 1.2.1 hadoop 2.... [10:27:52] joal: qq - I would like to take my (long) lunch break in a bit, but there are still a lot of jobs flowing.. I'll be back in ~2h, and I am pretty sure that by that time the cluster will be ready to go. Or I can re-enable now, let it go while I am away, and then restart the procedure in the afternoon [10:28:40] mmm even if I could reboot now an1003 [10:34:23] 10Analytics, 10Analytics-Cluster: Update to CDH 6 or other up-to-date Hadoop distribution - https://phabricator.wikimedia.org/T203693 (10elukey) [10:37:31] re-enabled everything so I will not delay the cluster too much, will restart again this afternoon [10:38:21] * elukey lunch! [11:28:04] 10Analytics, 10Analytics-Kanban: Reboot Analytics hosts for kernel security upgrades - https://phabricator.wikimedia.org/T203165 (10GoranSMilovanovic) @elukey Please: is the cluster reboot, planned for September 10 (today), finished? I really need to run a bunch of Hive jobs (it takes, well, many hours to c... [12:47:55] Pchelolo: Hello :) A quick reminder about restbase reboot (if not already done) [12:49:59] !log disable camus as prep step for analytics100[1-3] reboots [12:50:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:50:17] elukey: super sorry I have miswsed the ping earlier :( [12:51:36] super fine! [12:56:37] elukey: One wonder I have - Is ReportUpdater running from analytics1003? [13:00:21] ah this is a good question, lemme check [13:01:41] I think some jobs from stat1005 [13:01:57] and some from 1006 [13:02:20] hm - seems weird to me :) [13:02:40] I thought RU was a single tools, not decentralized - Maybe I misu [13:02:47] nmisunderstood [13:03:55] role::statistics::private [13:04:00] # Run Hadoop/Hive reportupdater jobs here. [13:04:00] include ::profile::reportupdater::jobs::hadoop [13:04:03] this is stat1005 [13:04:13] elukey@stat1005:~$ sudo -u hdfs crontab -l [13:04:23] # Puppet Name: reportupdater_reportupdater-queries-browser [13:04:23] 0 * * * * python /srv/reportupdater/reportupdater/update_reports.py -l info /srv/reportupdater/jobs/reportupdater-queries/browser /srv/reportupdater/output/metrics/browser >> /srv/reportupdater/log/reportupdater-queries-browser.log 2>&1 [13:04:27] # Puppet Name: reportupdater_limn-language-data-interlanguage [13:04:29] 0 * * * * python /srv/reportupdater/reportupdater/update_reports.py -l info /srv/reportupdater/jobs/limn-language-data/interlanguage /srv/reportupdater/output/metrics/interlanguage >> /srv/reportupdater/log/limn-language-data-interlanguage.log 2>&1 [13:05:10] elukey: Just to be sure - This means there is one RU cron per report? [13:05:29] meanwhile for role statistics cruncher (stat1006) [13:05:30] # Reportupdater jobs that get data from MySQL analytics slaves [13:05:31] include ::profile::reportupdater::jobs::mysql [13:05:46] I am fairly ignorant about RU but it seems so yes [13:07:27] joal: any concern about RU? [13:07:49] elukey: wondering about the impact of an1003 reboot [13:07:58] elukey: completely happy with your info :) [13:09:11] elukey: still some oozie jobs, I'm monitoring [13:13:36] joal: today I was wondering about the things to do when we'll swap an1003 to the new hardware [13:14:45] we'll probably have to stop everything that uses the database, dump and migrate it to the new host, run puppet to update everything and restart services gradually [13:15:04] less trivial than swapping the hadoop master but invasive nontheless :) [13:15:15] it would be awesome if we could do it on the 22nd [13:15:38] elukey: sounds correct - Maybe we could setup a DB replication, preventing to have to dump/reimport? [13:15:45] I know andrew did that before [13:16:12] I think it will be less invasive to stop/start, and we'll avoid any inconsistency [13:16:24] no prob for men [13:16:26] me [13:16:28] it should take ~30 mins tops probably [13:16:28] meh [13:27:44] hellooooo teaam :] [13:29:20] Hello mforns!!! [13:29:28] heya joal! [13:32:17] mforns: o/ o/ o/ [13:32:30] heyyy elukey :] [13:36:35] o/////// [13:37:06] helloooo otto! [14:05:44] 10Analytics, 10Analytics-Cluster: Update to CDH 6 or other up-to-date Hadoop distribution - https://phabricator.wikimedia.org/T203693 (10elukey) [14:06:06] 10Analytics, 10Analytics-Cluster: Update to CDH 6 or other up-to-date Hadoop distribution - https://phabricator.wikimedia.org/T203693 (10elukey) [14:07:24] 10Analytics, 10Analytics-Cluster: Update to CDH 6 or other up-to-date Hadoop distribution - https://phabricator.wikimedia.org/T203693 (10elukey) p:05Triage>03Normal [14:10:04] 10Analytics, 10Analytics-Cluster: Update to CDH 6 or other up-to-date Hadoop distribution - https://phabricator.wikimedia.org/T203693 (10elukey) [14:10:27] yes I know a bit of spam, sorry :) [14:17:22] hey! I'm back from jury duty and have been catching up on pings and stuff [14:17:26] I miss you all!!! [14:17:40] Hi milimetric :) [14:23:15] (03PS8) 10Ottomata: Add ConfigHelper trait to auto load config files and CLI opts [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415174 (https://phabricator.wikimedia.org/T203804) [14:25:10] (03CR) 10jerkins-bot: [V: 04-1] Add ConfigHelper trait to auto load config files and CLI opts [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415174 (https://phabricator.wikimedia.org/T203804) (owner: 10Ottomata) [14:25:25] joal: thanks for that opt finder code! i adapted it a bit and kept it! [14:25:26] its nice! [14:25:43] ottomata: I'm glad you like it :) [14:26:02] * joal likes to solve problems functional-way :) [14:26:39] I have a funny edge-case in de.wikipedia - mforns, my german speaking colleague, would you have a minute?| [14:26:51] joal, in a meeting... [14:26:59] will ping you after [14:27:00] np mforns - later :) [14:31:03] (03PS9) 10Ottomata: Add ConfigHelper trait to auto load config files and CLI opts [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415174 (https://phabricator.wikimedia.org/T203804) [14:32:14] all right people going to stop hive/oozie and reboot an1003 [14:36:45] ooboy [14:41:51] all right all up [14:41:59] now analytics1001 and 1002 [14:44:08] wait a sec [14:44:08] elukey@analytics1002:~$ sudo -u hdfs /usr/bin/hdfs haadmin -getServiceState analytics1002-eqiad-wmnet [14:44:11] active [14:44:26] but yarn works? [14:45:34] elukey@analytics1001:~$ sudo -u hdfs /usr/bin/hdfs haadmin -getServiceState analytics1001-eqiad-wmnet [14:45:38] standby [14:45:46] oh yarn.wm.org works? [14:45:55] hm! [14:46:58] yes sorry yarn.w.org [14:47:12] maybe 1001 UI gentl [14:47:18] gently redirects to master? [14:47:48] in theory it does that, redirecting to an1002, that should cause a broken page [14:48:03] elukey: could it do it behind the scene? [14:48:11] it has never done it [14:48:32] but we have also moved yarn to another host, even if afaics the apache config is the same [14:48:35] oh [14:48:40] elukey: you are querying hdfs [14:48:42] not resourcemanager [14:49:06] i think 1001 is still yarn active? [14:49:34] ah snap you are right, pebkac [14:49:39] okok now things make sense [14:49:42] I was kinda scared :) [14:49:43] thanks [14:50:12] all right then I have only to failover one daemon :P [14:50:16] rebooting an1001 [14:51:13] (03PS3) 10Ottomata: Use ConfigHelper for RefineMonitor instead of scopt [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/458862 (https://phabricator.wikimedia.org/T203804) [14:51:33] ok now yarn doesn't work [14:51:35] goooood [15:00:33] gilles: please make sure researchers you work with are subscribed to analytics@ e-mail list , we are doing maintenance on cluster (announced on list) and today might not be the best day to try to access [15:01:13] hi fdans and nuria [15:01:37] aharoni: on meeting sorry, we can talk in 1 hr [15:01:49] OK [15:01:53] ping milimetric [15:02:21] hi milimetric :) [15:02:25] about https://phabricator.wikimedia.org/T203516 [15:02:31] I'm trying to install Dashiki [15:02:41] nuria: done [15:02:42] hi aharoni, we're in meetings now, will be looking at that soon [15:02:50] Oh, all of you :) no problem [15:04:11] ping fdans [15:04:12] hello? [15:07:12] 10Analytics, 10Analytics-Dashiki, 10Analytics-Kanban, 10CX-analytics, 10Language-2018-July-September: Setup Config:Dashiki:CX2Translations as a public chart and update the Dashiki documentation accordingly - https://phabricator.wikimedia.org/T203516 (10Amire80) >>! In T203516#4571213, @fdans wrote: Also... [15:57:06] elukey: when you have time, I have a PR in ops for you :) [15:58:36] joal: you missed my standup update, i have code reviews ready for you too! [15:58:38] joal: you wanted to talk about the new metric/community reaction? [15:58:42] I'm still in cave [15:58:48] confighelper and refinemonitor ready to go [15:58:49] joal, can I help with the de.wikipedia thing? [15:59:00] wow - that's a lot in a few lines :) [15:59:22] ottomata: Review tabs are open, will do them tonight :0 [15:59:24] :) [15:59:29] milimetric: Joining the cave ! [15:59:37] mforns: after talk with milimetric ? [15:59:44] joal, sure, ping :] [15:59:48] Thanks mforns :) [16:00:50] joal: already seen it, good to merge [16:00:50] ? [16:00:55] +1 elukey [16:01:14] team, are we grosking today? I missed it if we're supposed to meet in 10 minutes.. [16:01:59] 10Analytics, 10Analytics-Wikistats: Negative total number of bytes for German Wikipedia in 2001? - https://phabricator.wikimedia.org/T203906 (10Nuria) Negative bytes are due to deletion of revisions/pages. is that your question? [16:02:03] elukey: should we mention systemd timers for analytics in ops meeting [16:02:15] ...and ask if it would be something to move to more general ops puppet? [16:02:27] or maybe wait til we have a few more deployed and stable? [16:02:38] 10Analytics, 10Pageviews-API, 10Wikipedia-iOS-App-Backlog, 10iOS-app-Bugs, 10iOS-app-v6.1-Narwhal-On-A-Bumper-Car: Large increase on 404s from the Wikipedia IOS app - https://phabricator.wikimedia.org/T203688 (10Nuria) p:05High>03Triage [16:02:41] ottomata: I'd prefer the second, Moritz is aware of what we are doing [16:02:58] and also Brooke (we are the only ones interested atm :) [16:03:20] k coo [16:06:17] joal: usual test? depool aqs1004, test, repool and then apply to all? [16:06:29] sounds good elukey !! [16:08:20] joal: aqs1004 is ready for you :) [16:10:32] 10Analytics, 10Analytics-Kanban: Reboot Analytics hosts for kernel security upgrades - https://phabricator.wikimedia.org/T203165 (10elukey) [16:12:05] 10Analytics, 10Analytics-Kanban: Reboot Analytics hosts for kernel security upgrades - https://phabricator.wikimedia.org/T203165 (10GoranSMilovanovic) @elukey Thanks for the update (e-mail) on the cluster reboot. Please: you will not be rebooting `stat100[4-6] hosts` in the following 30 hours or so? Please sa... [16:15:30] 10Analytics, 10Analytics-Kanban: Reboot Analytics hosts for kernel security upgrades - https://phabricator.wikimedia.org/T203165 (10elukey) >>! In T203165#4571554, @GoranSMilovanovic wrote: > @elukey Thanks for the update (e-mail) on the cluster reboot. > > Please: you will not be rebooting `stat100[4-6] host... [16:16:45] 10Analytics, 10Readers-Web-Backlog, 10Wikimedia-Site-requests, 10MobileFrontend (MobileFrontend.js): Turn on MinervaErrorLogSamplingRate (Schema:WebClientError) - https://phabricator.wikimedia.org/T203814 (10Nuria) @Jdlrobson +1 to @Ottomata 's suggestion. I do not think sending this schema to MySQL is a... [16:18:53] elukey: sorry was talking with milimetric [16:18:59] elukey: good for me !!! [16:21:50] 10Analytics, 10Readers-Web-Backlog, 10Wikimedia-Site-requests, 10MobileFrontend (MobileFrontend.js): Turn on MinervaErrorLogSamplingRate (Schema:WebClientError) - https://phabricator.wikimedia.org/T203814 (10Nuria) >. In such an event if it was detected we'd turn the sampling rate down to 0 until the probl... [16:25:22] joal: good - repooling it [16:26:08] !log restarting eventlogging-processors to pick up blacklist of WebClientError schema for MySQL - T203814 [16:26:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:26:11] T203814: Turn on MinervaErrorLogSamplingRate (Schema:WebClientError) - https://phabricator.wikimedia.org/T203814 [16:26:24] 10Analytics, 10Community-consensus-needed: Decide whether enable per-editor edits stats (community decision) - https://phabricator.wikimedia.org/T203826 (10JAllemandou) >>! In T203826#4569399, @Nuria wrote: > The data is not public as there are no public stats of edits per country. Your edit history does not i... [16:26:46] 10Analytics, 10Readers-Web-Backlog, 10Wikimedia-Site-requests, 10MobileFrontend (MobileFrontend.js), 10Patch-For-Review: Turn on MinervaErrorLogSamplingRate (Schema:WebClientError) - https://phabricator.wikimedia.org/T203814 (10Ottomata) > Let's please postpone any sampling rate changes until this schema... [16:28:08] 10Analytics, 10Community-consensus-needed: Decide whether enable per-editor edits stats (community decision) - https://phabricator.wikimedia.org/T203826 (10Nuria) Ah, my mistake totally, wrong ticket for my reply. [16:29:56] 10Analytics, 10DBA, 10Growth-Team, 10Notifications: Purge all Schema:Echo data after 90 days - https://phabricator.wikimedia.org/T128623 (10Nuria) Let's 1) stop purging 2) drop all echo tables on events and events sanitized database 3) start purging again [16:32:15] o/ [16:32:35] If I asked you the question how many edits have been made on wikidata in the past 24 hours, how would you answer it? [16:35:01] addshore: pulling from kafka seems easiest [16:35:10] need to go for diner team, will be back after [16:35:25] pulling from kafka, hmmm [16:35:26] addshore: in ~1h30 I'll be back to discuss that, or tomorrow :) [16:35:34] okay! :P [16:35:36] hmm, naw you can query it in hive [16:35:42] table [16:35:43] ottomata: for the last 24 hours? [16:35:49] 10Analytics, 10Analytics-Kanban: Reboot Analytics hosts for kernel security upgrades - https://phabricator.wikimedia.org/T203165 (10GoranSMilovanovic) @elukey Thanks! Thursday EU morning time should be fine with me: I am running a **huge** update of the [[ http://wdcm.wmflabs.org/ | Wikidata Concepts Monitor... [16:35:52] event.mediawiki_revision_create [16:36:07] 10Analytics, 10Community-consensus-needed: Decide whether enable per-editor edits stats (community decision) - https://phabricator.wikimedia.org/T203826 (10Johan) As @Cirdan and @MusikAnimal say, I think this could vary a lot from community to community. Anyone //could// present this information by scraping it... [16:36:08] addshore: data there should be pretty fresh [16:36:12] last hour or so might not be complete [16:36:17] do you need up to date right now? [16:36:24] or can you do say 2 hours ago - 26 hours ago? [16:37:32] well, im comparing it with data for the last 24 hours [16:37:37] but i could change my comparison [16:37:53] I actually think there is an issue with https://grafana.wikimedia.org/dashboard/db/wikidata-edits hmmm [16:37:55] you could get it up to date from kafka as joal says [16:37:57] *takes a screenshot* [16:39:05] https://usercontent.irccloud-cdn.com/file/2e6aoEQO/image.png [16:39:07] highlighted in that ^^ [16:39:18] joal: aqs should be updated now :) [16:39:29] this comes from polling recent changes currently, but I think the issue on the dashboard isn't actually an issue with the data, but with something else.... [16:40:05] the graph itself is accurate and the same in both panels, but the totals for some reason are just wrong.... [16:42:24] addshore: something like: [16:42:25] select count(*) [16:42:25] from mediawiki_revision_create [16:42:25] where `database` = "wikidatawiki" [16:42:25] and meta.dt between "2018-09-09T02:00Z" and "2018-09-10T02:00Z" [16:42:25] and year=2018 and month=9 and (day=9 or day=10) [16:42:30] that's n ot quite last 24 hours [16:42:33] but you get the point :) [16:44:00] wait, where is that query made for? mediawiki_revision_create ? [16:44:03] hive [16:44:05] event databse [16:44:10] *looks* [16:44:11] event.mediawiki_revision_create [16:44:15] btw, do you have access to superset? [16:44:16] i forget? [16:44:19] ooooooohhhh [16:44:26] i havn't seen the event db yet [16:44:28] thats awesome [16:44:34] addshore: that's where all eventlogging goes [16:44:36] as well as eventbus events [16:44:46] oh wait, event is event logging, gotcha [16:44:49] these ones https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema [16:44:49] yup [16:45:00] recentchange is there too [16:45:08] page_create, page_delete [16:45:09] et. [16:45:19] I didn't think about looking in event logging! [16:45:24] you do have superset! [16:45:30] * addshore looks at superset [16:45:50] https://superset.wikimedia.org/superset/sqllab?id=2 [16:45:51] ^ [16:45:53] addshore: [16:46:30] *loooks* [16:47:07] I dont think I have been in super set before, it wasn't in my list of ldap things in my password manager anyway... [16:47:25] so, that query says around 702k, thanks! [16:47:28] yup! [16:48:28] 702k a day, heh, average of 480 per min [16:48:46] the totals in the grafana dashboard are just totally wrong..... https://grafana.wikimedia.org/dashboard/db/wikidata-edits [16:48:55] one says 768k in a week [16:49:03] addshore: oh man you can get average at any given time with kafka so easily [16:49:03] the other says 418k in a week [16:49:04] wtf [16:49:07] want to try it/??!?! [16:49:07] haha [16:49:12] from stat1005 or wherever: [16:50:02] addshore: it is not eventlogging per se but rather just plain out "events" logged from mediawiki [16:50:17] addshore: same db than eventlogging events end up at [16:50:51] aaaah, okay, so the events via event bus & kafka? [16:50:53] kafkacat -b kafka-jumbo1001.eqiad.wmnet:9092 -t eqiad.mediawiki.revision-create 2>/dev/null | grep '"database": "wikidatawiki"' | pv -l > /dev/null [16:51:20] that'll print the current per second rate for hte last second every second [16:51:25] i guess that's not an average [16:51:28] ok better in grafana: [16:51:35] I figured out what is wrong with the dashboard, it has something to do with graphite not returning all of the data points due to a limit. silly grafana, if I set the limit to 999999999999 then I see 4 million edits over the past week [16:52:14] oh wait, can't filter on database in grafana [16:52:15] nm [16:52:58] grafana automaticly sets the max data points to 1 data point per pixel, so if your looking at too much data / it is too fine grained, then you just dont see all of the data and things like totals make no sense at all... [16:54:50] I might write a blog post about how stupid that is [16:55:08] ottomata: can't filter on database? no, we only do it for wikidata :) [16:57:02] addshore: i just mean i was going to send you a grafana to events / sec but i forgot we needed wikidatawiki only [16:57:18] aaah yes :) [16:57:31] I found the edits globally per min on grafana :) [17:08:05] * elukey off! [17:09:16] 10Analytics, 10DBA, 10Growth-Team, 10Notifications: Purge all Schema:Echo data after 90 days - https://phabricator.wikimedia.org/T128623 (10mforns) > Let's 1) stop purging 2) drop all echo tables on events and events sanitized database 3) start purging again Makes sense. Also MySQL right, or are those alr... [17:21:58] 10Analytics, 10DBA, 10Growth-Team, 10Notifications: Purge all Schema:Echo data after 90 days - https://phabricator.wikimedia.org/T128623 (10Nuria) Looks all tables in mySQL db also need to be deleted. [17:22:57] 10Analytics-Kanban, 10Outreach-Programs-Projects, 10Google-Summer-of-Code (2018): [Analytics] Improvements to Wikistats2 front-end - https://phabricator.wikimedia.org/T189210 (10mforns) 05Open>03Resolved Resolved! Kudos to Sahil [17:27:04] 10Analytics, 10Analytics-Wikistats: Negative total number of bytes for German Wikipedia in 2001? - https://phabricator.wikimedia.org/T203906 (10JAllemandou) My understanding of the question is: How come is it possible that the global-sum of net-bytes since the beginning of de.wikipedia.org can be negative? My... [17:27:36] (03PS4) 10Ottomata: Use ConfigHelper for RefineMonitor instead of scopt [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/458862 (https://phabricator.wikimedia.org/T203804) [17:28:07] mforns: forgot to ping you before going to diner - I actually didn't need your help :) [17:28:19] * joal hopes mforns has not been waiting ... [17:28:33] hey! [17:28:39] no, been doing stufffff [17:28:41] :] [17:28:46] I can imagine :) [17:42:59] 10Analytics, 10Analytics-Cluster: Upgrade Hive to ≥1.13 or ≥2.1 - https://phabricator.wikimedia.org/T203498 (10Neil_P._Quinn_WMF) [17:47:44] ottomata: joal if you are at all interested in the confusion I just had I wrote https://addshore.com/2018/09/grafana-graphite-and-maxdatapoints-confusion-for-totals/ :) [17:48:59] Thanks addshore, wiil read :) [17:49:41] 10Analytics, 10Analytics-Cluster: Upgrade Hive to ≥1.13 or ≥2.1 - https://phabricator.wikimedia.org/T203498 (10Neil_P._Quinn_WMF) @mpopov, I'm actually confused now. I'm looking at [Hive downloads page](https://hive.apache.org/downloads.html), which has the best version history I could find, and neither Hive 1... [18:19:09] (03CR) 10Joal: "Comments inline. Let's discuss on IRC for some of them:)" (037 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415174 (https://phabricator.wikimedia.org/T203804) (owner: 10Ottomata) [18:20:20] o/ [18:20:23] Heya ! [18:20:36] As you'll see, some minor nits, some less :) [18:21:12] And reviewing the second patch makes me think of a new comment for the first ! [18:22:04] haha ok! [18:22:12] very small though :) [18:22:16] i get some of your recurssion comments, let's talk about those last [18:22:27] sue [18:22:31] configureImpl vs configureArgsImpl tho [18:22:55] i wanted to expose an function that allowed you to not pass in the Array of config files [18:23:03] if i could have a single funciton, i would [18:23:07] but i can't use defaults with macros [18:23:09] ottomata: missing comment is: since params are extracted with underscores, let's default the config-file to config_file ? [18:23:14] and i can't seem to use overloading either [18:23:37] joal: the only reason for the underscores is because properties are more likely to use them [18:23:44] they could use hyphens too [18:23:52] oh [18:23:59] except for variable names in case class [18:24:10] we'd ahve to backtick them all if we wanted hyphens [18:24:12] and everything (almost) will be defined as variable [18:24:15] i'd be ok with config_file [18:24:29] but it was kinda nice seeing as it isn't actually a property or a variable in a case class [18:24:33] makes our life simpler (underscores for the win) [18:24:34] to use a more conventional CLI opt [18:24:40] but yeah, maybe better to be consistent [18:24:59] ok fine i'll do config_file :p [18:25:01] --config_file [18:25:05] so [18:25:07] Imple [18:25:08] Impl* [18:25:14] not sure i undersand your comment [18:25:18] I hear that - coherence vs correction :) [18:25:20] but what I want to be able to do is both: [18:25:25] configure(args) [18:25:27] * ebernhardson cries at mixing _ and - [18:25:31] :D [18:25:57] configure(files, args) (or args, files, whatever) [18:26:10] ebernhardson: generally i agree, but i've actually found cases where it makes sense! [18:26:14] e.g. in snake_case [18:26:22] _ is not a concept separator! [18:26:25] its a word separator [18:26:43] so sometimes, its really hard to indicate groupings of concepts [18:27:24] anyway, an aside! [18:27:33] ottomata: we could try to make configureArgs a function calling configure? [18:27:45] you can try joal, but the compiler will complain! [18:27:50] because in configure T is not known [18:27:50] heh, yea i don't mean to get all off on tangents it's just a small pet peeve :) [18:27:52] really? [18:27:56] let me try again [18:27:59] i tried so many ways [18:28:09] hm [18:29:19] ok i get this: [18:29:22] Error:(150, 21) macro implementation not found: configure [18:29:22] (the most common reason for that is that you cannot use macro implementations in the same compilation run that defines them) [18:29:22] configure[T](Array.empty, args) [18:30:19] hmm, maybe if i move object ConfigHelperMacros into its own file.. [18:31:24] nope same problem [18:35:26] ottomata: trying quickly [18:35:56] joal: i think it has somethign to do with multiple compiles needed [18:35:58] reading [18:36:03] hm [18:36:03] lhttps://docs.scala-lang.org/overviews/macros/overview.html [18:36:40] " The separate compilation restriction requires macros to be placed in a separate project. [18:36:40] " [18:54:28] 10Analytics, 10Operations, 10ops-eqiad: rack/setup/install stat1007.eqiad.wmnet (stat1005 user replacement) - https://phabricator.wikimedia.org/T203852 (10RobH) [18:55:51] ottomata: looks like the thing to do if we want to avoid the problem is to create a separate module for macros :( [18:55:57] :( [18:56:35] joal we could put it in refinery spark? hmm, no, we'd need it lower than core [18:56:37] yargh [18:56:37] or [18:56:37] ottomata: Let's keep them as 2 macros with a comment explaining why [18:56:40] hm [18:56:46] we could put the Macros object in core [18:56:50] and put the Helper in spark [18:56:50] ? [18:56:51] meh [18:57:19] joal btw, i'm playing with spark structured streaming [18:57:21] pretty cooool [18:57:27] :) [18:57:48] to figure out what schemas we write to mysql [18:57:50] Dataframe querying over streams :) [18:58:02] yeah the in memory output format is awesome [18:58:11] you write the streaming results to an in memory table [18:58:13] then query it with sql [18:58:14] !!! [18:58:37] I didn't know the way :) [18:59:20] val schemaCounts = schemas.groupBy("schema").count() [18:59:23] val q = schemaCounts.writeStream.queryName("schema_counts").outputMode("complete").format("memory").start() [18:59:26] spark.sql("select * from schema_counts order by count desc").show() [18:59:35] i can run that last one at any moment and get the most up to date result [18:59:56] :) [19:00:03] back to ConfigHelper [19:00:14] ottomata: with that in mind, let's imagine ACID on top then ;) [19:00:15] you think we should try to find a new home for the stuff? or just do the two macros? [19:00:18] ok back to Config [19:00:18] haha :) [19:03:48] ottomata: seems simpler to have 2 macros for now [19:03:51] I think [19:03:58] Still tring some stuff, but no luck [19:04:01] ok [19:07:27] joal: i don't understand your second comment in extractOpts [19:07:36] about matching only case Array(singleOpt) [19:07:49] don't i need the _* to match the case where the args array still has remaning elements? [19:07:50] e.g. [19:08:05] --config-file=myconf.properties --more opts --here [19:08:05] ? [19:08:07] ottomata - Let;s batcave :) [19:08:09] ok! [19:42:57] 10Analytics: Flip blacklist for MySQL eventlogging consumer to be a whilelist of allowed schemas - https://phabricator.wikimedia.org/T203596 (10Nuria) a:03Ottomata [19:43:45] 10Analytics: Flip blacklist for MySQL eventlogging consumer to be a whilelist of allowed schemas - https://phabricator.wikimedia.org/T203596 (10Nuria) Let's message analytics@ list when we get this work started. [19:46:31] joal: did we deployed the AQS code for "per-family" additive metrics? [19:47:29] nope [19:47:37] nuria: nope, I don't think so [19:48:15] Pchelolo: Hello - Bump again on restbase deploy :) [19:48:21] 10Analytics, 10Fundraising-Analysis: CentralNoticeImpression occasionaly fails validation on device enum field - https://phabricator.wikimedia.org/T203597 (10Nuria) [19:48:25] hi joal [19:48:52] you mean for this https://github.com/wikimedia/restbase/pull/1058 ? [19:48:58] ok, will do today [19:49:11] 10Analytics: Sqoop e-mail is emailing errors in try1 for actions that suceeed in try 3 - https://phabricator.wikimedia.org/T203811 (10Nuria) [19:49:18] correct Pchelolo I mean that - No real rush Pchelolo - Was just doing as I said [19:49:36] Pchelolo: Can be this week, but would be good not to postpone too much :) [19:50:31] ok... We don't really have any other changes in the queue for deploying... Maybe postpone till tomorrow when we get some more stuff queued up? [19:50:45] no problem Pchelolo :) [19:51:18] ok, thanks. Deploying is long now, w have some bug that affects it, so better have less of them [19:51:19] 10Analytics-Kanban: Drop old mediawiki_history_reduced snapshots - https://phabricator.wikimedia.org/T197888 (10Nuria) [19:51:27] for sure [19:56:49] (03PS10) 10Ottomata: Add ConfigHelper trait to auto load config files and CLI opts [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415174 (https://phabricator.wikimedia.org/T203804) [19:57:03] (03CR) 10Ottomata: Add ConfigHelper trait to auto load config files and CLI opts (035 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415174 (https://phabricator.wikimedia.org/T203804) (owner: 10Ottomata) [19:57:13] comments addressed joal! :) [19:57:29] reviewing :) [20:03:05] One thing ottomata - You should change the names of the parameters in the recursion: the original parameters use _ (unmatched_values), the new one camelCase (unmatchedValues) - This is super not nice :) [20:03:13] Except from that, good for me [20:06:09] 10Analytics-Kanban: Drop old mediawiki_history_reduced snapshots - https://phabricator.wikimedia.org/T197888 (10Nuria) [20:08:02] 10Analytics, 10Community-consensus-needed: Decide whether enable per-editor edits stats (community decision) - https://phabricator.wikimedia.org/T203826 (10Milimetric) The suggestion to collect use cases is great, thank you for that. I think X!Tool could use this data so it doesn't have to crunch it itself, b... [20:08:40] k [20:10:29] 10Analytics-Kanban, 10Outreach-Programs-Projects, 10Google-Summer-of-Code (2018): [Analytics] Improvements to Wikistats2 front-end - https://phabricator.wikimedia.org/T189210 (10Milimetric) @sahil505 I was a little busy at the end of your GSoC session to say this, but I wanted to make sure it's somewhere pub... [20:14:51] gone for tonight team - see you tomorrow :) [20:36:22] 10Analytics: turnilo x axis improperly labeled - https://phabricator.wikimedia.org/T197276 (10Milimetric) We got lucky and the Turnilo folks fixed this upstream: https://github.com/allegro/turnilo/pull/173 So now we just have to upgrade to 1.8.0, which is released as of last week. I'll go bug people with the r... [20:37:23] hey ottomata, that issue above is solved by Turnilo upstream, can we upgrade to 1.8? https://github.com/allegro/turnilo/pull/173 [20:51:32] 10Analytics, 10Analytics-Kanban: Flip blacklist for MySQL eventlogging consumer to be a whilelist of allowed schemas - https://phabricator.wikimedia.org/T203596 (10Ottomata) [20:56:51] milimetric: sure! [20:59:02] sweeeeeeeeeeeeeet [20:59:04] :) [21:05:49] (03PS1) 10Milimetric: Add cx2 dashboard [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/459637 [21:06:09] 10Analytics, 10Analytics-Kanban: Flip blacklist for MySQL eventlogging consumer to be a whilelist of allowed schemas - https://phabricator.wikimedia.org/T203596 (10Ottomata) Alright! First we need a list of active schemas that are not blacklisted. Those will all go to the eventlogging-valid-mixed topic. F... [21:08:07] (03CR) 10Milimetric: [V: 032 C: 032] "deployed at http://language-reportcard.wmflabs.org/cx2" [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/459637 (owner: 10Milimetric) [21:08:22] 10Analytics, 10Analytics-Dashiki, 10Analytics-Kanban, 10CX-analytics, 10Language-2018-July-September: Setup Config:Dashiki:CX2Translations as a public chart and update the Dashiki documentation accordingly - https://phabricator.wikimedia.org/T203516 (10Milimetric) @Amire80, Francisco's right about the se... [21:11:45] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - differences in view stats for smaller wikipedias - https://phabricator.wikimedia.org/T188613 (10Milimetric) Thanks for that, @Tbayer, I'm not sure at all what could be the inconsistency, but I've only looked for trivial obvious problems so far. I'll continue... [22:03:06] 10Analytics, 10Readers-Web-Backlog, 10Wikimedia-Site-requests, 10MobileFrontend (MobileFrontend.js), 10Patch-For-Review: Turn on MinervaErrorLogSamplingRate (Schema:WebClientError) - https://phabricator.wikimedia.org/T203814 (10Jdlrobson) > EL is really not the best tool to do error logging Completely ag...