[03:06:29] 10Analytics, 10EventBus, 10MediaWiki-Database, 10WMF-JobQueue, and 3 others: Wikimedia\Rdbms\LoadBalancer::{closure}: found writes pending - https://phabricator.wikimedia.org/T191282 (10Krinkle) 05Open>03Resolved a:03Krinkle There are indeed a few hits still, but this has become an tracking task for... [05:25:27] 10Analytics, 10Analytics-Dashiki, 10CX-analytics, 10Language-2018-July-September: Setup Config:Dashiki:CX2Translations as a public chart and update the Dashiki documentation accordingly - https://phabricator.wikimedia.org/T203516 (10Amire80) [06:16:11] morning! [06:16:15] a lot of jobs failed [06:22:49] so for example, if I followed the chain correctly, this is the misc (why do we still have misc??) map attempts that failed in generate_sequence_statistics [06:22:52] https://yarn.wikimedia.org/jobhistory/task/task_1531216937660_190149_m_000000 [06:23:08] they all show Container [..] is running beyond physical memory limits. [06:24:06] I am re-running the two failed jobs to see [06:30:07] about the unexpected page view values, I can see that those have already been merged [06:31:13] ah but probably not uploaded to HDFS [06:31:18] also there are missing tabs [06:32:45] (03PS1) 10Elukey: pageview whitelist: replace spaces with tabs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458111 [06:33:31] (03CR) 10Elukey: [V: 032 C: 032] pageview whitelist: replace spaces with tabs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458111 (owner: 10Elukey) [06:35:03] !log upload new pageview whitelist to hdfs [06:35:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:37:23] !log re-run webrequest-load-wf-misc-2018-9-5-2 and webrequest-load-wf-upload-2018-9-4-19 via Hue [06:37:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:02:23] !log restart oozie on analytics1003 to pick up new smtp settings [07:02:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:14:14] 10Analytics, 10Analytics-Dashiki, 10CX-analytics, 10Language-2018-July-September: Setup Config:Dashiki:CX2Translations as a public chart and update the Dashiki documentation accordingly - https://phabricator.wikimedia.org/T203516 (10Amire80) [08:26:23] as FYI I am rebooting aqs100[5-9] for kernel + openjdk8 upgrades [08:36:15] (03PS1) 10Elukey: refinery-dump-status-webrequest-partitions: add proper exit codes [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458126 (https://phabricator.wikimedia.org/T172532) [08:38:54] (03CR) 10Elukey: "Added manually the new code to analytics1003 and tested:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458126 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [08:39:44] joal: o/ --^ [08:40:02] if this is ok we could replace cron and use only systemd timers for the webrequest partition check [08:40:59] checking with the team the new procedure, alarming, etc.. and then decide if we want to move to timers or not [08:43:13] I also realized that things like "disable all the camus crons" with timers would be way more effective and easy [09:04:49] rebooting druid1001 [09:07:53] (turnilo is now reading from druid1002 temporarily) [09:12:08] !log re-run webrequest-druid-hourly-wf-2018-9-5-7 - failed due to rebooting druid1001 [09:12:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:13:13] druid1001 back [09:36:54] and I've just rebooted aqs1009 [09:40:00] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Archiva (meitnerium) to Debian Stretch - https://phabricator.wikimedia.org/T192639 (10elukey) [09:41:04] 10Analytics, 10Analytics-Kanban: Reboot Analytics hosts for kernel security upgrades - https://phabricator.wikimedia.org/T203165 (10elukey) [10:23:19] 10Analytics, 10Operations, 10hardware-requests: eqiad: (2) hardware refresh for analytics1003 - https://phabricator.wikimedia.org/T198685 (10elukey) @RobH quick ping to check if we can get this hardware before the end of quarter, to schedule Hadoop maintenance ops in one go (since we have to shutdown the who... [10:38:25] * elukey lunch! [11:43:28] (03PS1) 10Fdans: Release 2.3.5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/458162 [11:49:04] (03CR) 10Fdans: [C: 032] Release 2.3.5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/458162 (owner: 10Fdans) [12:54:51] 10Analytics-Tech-community-metrics, 10Upstream: "Wiki Editions" should be "Wiki edits" - https://phabricator.wikimedia.org/T164935 (10Aklapper) 05Open>03Resolved Software upgrade took place today; this is fixed now. [13:01:17] another job failed.. mmm [13:03:26] again Container [..] is running beyond physical memory limits. Current usage: 1.1 GB of 1 GB physical memory used; 9.1 GB of 2.1 GB virtual memory used [13:08:34] so oozie_launcher_memory is 256 [13:08:46] is it the same issue that happened the last time? [13:13:06] fdans: hola :) [13:13:13] helloooo [13:13:31] do you have a min for a brainbounce in here? [13:13:38] I am trying to track down a mem issue :( [13:13:42] yessss cave? [13:14:01] here is fine! [13:14:22] cool [13:14:36] so it seems that some webrequest refine jobs fail for containers to big while generating statistics [13:14:55] the last time almost similar issue was due to oozie_launcher_memory set to 256 [13:15:15] that seems to be now configured in the oozie settings (checked via hue) [13:15:25] elukey: yes I set that yesterday [13:15:28] how did you restart the webrequest load jobs yesterday? [13:15:32] ah! [13:15:36] I think it is too small [13:15:41] we use 1G IIRC [13:15:50] I used the memory setting that was specified in the docs [13:15:53] sorry! [13:16:00] used the following command: [13:16:06] sudo -u hdfs oozie job --oozie $OOZIE_URL -Drefinery_directory=hdfs://analytics-hadoop$(hdfs dfs -ls -d /wmf/refinery/$(date +"%Y")* | tail -n 1 | awk '{print $NF}') -Dqueue_name=production -Doozie_launcher_queue_name=production -Doozie_launcher_memory=256 -Dstart_time=2018-09-04T15:00Z -config /srv/deployment/analytics/refinery/oozie/webrequest/load/bundle.properties -run [13:17:22] all right so I think we'd need to amend it [13:17:37] I don't recall exactly if Joseph went for 1G or 2G the last time [13:17:48] elukey joal why don't we just set the memory in the bundle properties file? [13:18:40] or specify the correct memory value per job in the docs [13:19:07] elukey: let's try with 1G? I can restart the jobs if you want [13:19:45] 10Analytics-Tech-community-metrics, 10Upstream: "Wiki Editions" should be "Wiki edits" - https://phabricator.wikimedia.org/T164935 (10Aklapper) 05Resolved>03Open Ah, one more place. Created https://github.com/chaoss/grimoirelab-sigils/pull/256 [13:20:03] fdans: can you try removing the -D option completely? I am reading the past chan logs and IIRC 2048 is the default somewhere [13:20:08] but if you use -D it gets overridden [13:20:24] oh sounds cool elukey [13:21:28] !log restarting webrequest load bundle, start time 11:00Z [13:21:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:23:07] elukey: restarted :) [13:23:15] fdans: looks good! [13:26:14] removed the -D option from https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Oozie/Administration#Gotchas_when_restarting_webrequest_load_bundle [13:27:27] (03CR) 10Ottomata: [C: 031] refinery-dump-status-webrequest-partitions: add proper exit codes [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458126 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [13:29:14] elukey: whenever you have 5min, I have the FINAL VETTING MACHINE oiled and ready [13:30:40] fdans: just sent an email to internal@ explaining what we did so people will not freak out seeing all the alerts :D [13:31:55] thank youuuuu luca! [13:33:58] (03CR) 10Elukey: [C: 032] refinery-dump-status-webrequest-partitions: add proper exit codes [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458126 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [13:34:26] fdans: sure we can do now if you want! [13:35:49] (03CR) 10Elukey: [V: 032 C: 032] refinery-dump-status-webrequest-partitions: add proper exit codes [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458126 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [13:36:17] elukey: cavin'! [13:40:01] elukey: i'm proceeding with throrium reimage! [13:40:28] !log reimaging thorium to debian stretch (this will cause an announced {stats,analytics}.wm.org downtime!) - T192641 [13:40:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:40:31] T192641: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 [13:41:17] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by otto on neodymium.eqiad.wmnet for hosts: ``` thorium.eqiad.wmnet ``` The log can be found in `/var/log/wmf-auto-rei... [13:41:43] ottomata: \o/ +1 [13:42:29] ACK elukey i didn't change installer to stretch ackg [13:42:37] maybe not too late! [13:42:37] haha [13:43:38] ahahhaha [13:43:50] nope, its going, ok will reimage again one minute... [13:44:31] but it will stop in partitioning right? So we can, via console, force PXE for next boot and this should do the trick no? (after setting stretch as default) [13:44:45] wmf-auto-reimage should not realize anything [13:45:18] or you can stop it and restart it with the options to avoid nuking/checking puppet certs etc. [13:45:22] as you wish :) [13:45:44] yeah [13:45:47] its halted there [13:46:01] oh right, i can reboot and pxe in console good idea [13:46:06] cool [13:46:32] elukey: its like you have some recent experience here or something [13:47:30] :D [13:50:03] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['thorium.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['thorium.eqiad.wmnet'] ``` [13:50:50] auto-reimage didn't like it ? [13:51:15] oh it iddint'! [13:51:17] it seems fine though [13:51:20] hmmmm [13:51:26] ok maybe i'll use auto reimage agian elukey [13:51:28] just to get it going? [13:51:30] it'll reboot again [13:51:31] ya? [13:51:49] i think mayb eit took too long [13:52:00] yeah i'm auto-reimage again [13:52:12] no[e [13:52:13] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by otto on neodymium.eqiad.wmnet for hosts: ``` thorium.eqiad.wmnet ``` The log can be found in `/var/log/wmf-auto-rei... [13:52:16] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['thorium.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['thorium.eqiad.wmnet'] ``` [13:52:24] ottomata: ack [13:53:08] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by otto on neodymium.eqiad.wmnet for hosts: ``` thorium.eqiad.wmnet ``` The log can be found in `/var/log/wmf-auto-rei... [13:53:12] ok i think i had to run this time with --no-verify [13:53:16] because it had already deleted the puppet cert [13:53:25] hm ther eis alos a --no-pxe [13:53:36] i wonder if i had finished install and then ran with that [13:53:42] if it would have just started from after install [13:53:46] oh well, its powercycling [13:55:24] super [13:55:40] ottomata: ok if I reboot druid100[2-6] ? [13:55:56] did druid1001 this morning for kernel + new jvm, all good [13:56:08] ottomata: no-pxe should not be necessary here [13:56:20] that's for a situation where the first install part went fine [13:56:33] and then e.g. it failed to reboot (due to IPMI issues or so) [13:59:49] elukey: for sure [13:59:55] moritzm: ay ok [14:33:15] thorium looking good! [14:33:19] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['thorium.eqiad.wmnet'] ``` and were **ALL** successful. [14:34:25] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10Ottomata) Done! [14:34:41] yesssss [14:34:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10Ottomata) [14:35:01] also new kernel deployed! [14:36:43] ohh cool [14:36:44] great [14:36:44] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: RFC: Modern Event Platform: Scalable Event Intake Service - https://phabricator.wikimedia.org/T201963 (10Ottomata) @daniel I heard you might have some questions/thoughts about this! Find me in IRC to chat? [14:40:23] anything against me deploying refinery? [14:40:32] so the new script will get out [14:42:53] !log deploying refinery (pageview whitelist and cron script change) [14:42:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:44:56] wow it takes ages just to fetch [14:54:34] elukey: does puppet auto delete the contents of user directories? [14:54:42] elukey: super thanks for looking at jobs [14:54:59] fdans: did you corrected docs for memory setting? [14:56:15] nuria: we did yes [14:56:21] removed the "faulty" option [14:56:23] elukey: super thanks [14:56:45] elukey: sorry re: my question above, I mean in thorium? [14:56:55] fdans: mmm I don't think so, unless we explicitly mark a directory as absent, but never tried [14:57:27] fdans: we have just reimaged it [14:57:41] aw [14:57:44] i see [14:58:02] elukey: that was yesterday to backup thorium srv [14:58:29] i think i stil lhave a sreen [14:58:31] will quit it [14:58:46] elukey: the files that I showed you are now gone, after I came back from lunch [14:58:47] ottomata: but you didn't run anything during the past say 30 mins? [14:58:53] no [14:59:10] fdans: maybe they are in /srv/home/fdans, did you check? [14:59:15] oh [14:59:17] lms [14:59:57] elukey: no such file [15:00:06] ah [15:00:07] it's funny because I don't have bash history either [15:00:17] we just reimaged it [15:00:19] to stretch [15:00:28] but not sure if the homes were backed up [15:00:33] ottomata: ---^ [15:00:43] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: RFC: Modern Event Platform: Scalable Event Intake Service - https://phabricator.wikimedia.org/T201963 (10daniel) @Ottomata I'm neck deep into SDC stuff, with deadlines looking... I don't have anything concrete to ask, just... [15:00:52] fdans: can you re-create those files? [15:01:06] IIRC you had them on your laptop as well right? [15:01:17] yall talking about thorium? [15:01:21] no we ddidn' save any homes there [15:01:38] elukey: yeah it's just that i have to rerun the hive queries [15:01:39] didn't even look at that, its not expected that folks keep stuff on thorium...or even really log in there [15:01:43] from thoriuM??? [15:02:50] ottomata: yeah we're checking that all the geowiki stuff that will be removed is consistent with what we have in hive [15:03:14] (03CR) 10Nuria: "Nice." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458126 (https://phabricator.wikimedia.org/T172532) (owner: 10Elukey) [15:03:55] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: RFC: Modern Event Platform: Scalable Event Intake Service - https://phabricator.wikimedia.org/T201963 (10Ottomata) For this component, I don't think it matters much. The event intake is just about a stanadlone HTTP API th... [15:04:20] fdans: sorry didn't think about warning Andrew before reimaging :( [15:05:51] elukey: it's no problem! i just didn't want to take editor counts out from thorium to my machine [15:06:17] confused though...oh because geowiki is on thorium? [15:06:18] sorryyyyy [15:06:21] oh i see [15:06:50] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10Dzahn) There is cronspam from: Cron /usr/local/bin/published-datasets-sync -q rsync: stat "/published-datasets-rsynced/stat1006/archive/public-datasets/all/... [15:06:52] ottomata: how often does thorium reimage? can I redo this on time? [15:07:17] once every year :D [15:07:26] ottomata: it was my fault, I didn't think about warning you [15:07:37] didn't think about Fran's stuff on thorium [15:07:37] OH I SEEE THE EMAIL [15:07:45] I'm a dum [15:07:47] NO NEED TO SHOUT [15:07:54] :D [15:08:07] ok imma rerun all this [15:09:46] fdans: yaaa sorry, not even once a year, less often! [15:11:01] that's ok, nothing of special value was lost, the only thing is the script, which I have a local copy of [15:13:52] elukey ottomata: one more thing... it seems like the permissions of user stats to access /srv/geowiki/data-private have been revoked [15:14:00] i'm getting access denied with sudo -u [15:14:45] probably the stat's uid changed [15:15:09] lemme check [15:15:27] elukey@thorium:~$ sudo ls -l /srv/geowiki/data-private [15:15:27] total 96 [15:15:27] drwxr-x--- 2 debmonitor www-data 20480 May 31 22:06 datafiles [15:15:27] drwxr-x--- 2 debmonitor www-data 20480 May 31 22:06 datasources [15:15:28] drwxr-x--- 2 debmonitor www-data 57344 May 31 22:06 graphs [15:15:30] lol [15:15:41] yeah now the uid of stats is owned by debmonitor [15:16:51] fdans: should be better now [15:18:22] elukey: all good now! [15:19:04] thank youuuu [15:20:44] ahhh yes [15:20:45] thanks elukey [15:21:30] I chowned stat:stat since those will get nuked soon, didn't bother keeping www-data etc.. [15:21:35] but I can re-check if needed [15:22:59] yup [15:27:06] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10Ottomata) I think that was failing during the reinstall, it looks fine now. [15:31:17] 10Analytics, 10Analytics-Kanban: Reboot Analytics hosts for kernel security upgrades - https://phabricator.wikimedia.org/T203165 (10elukey) [15:33:16] elukey: ops sycn? [15:33:26] sure! [15:36:13] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10Dzahn) Yep, looks like it stopped. No more mails so far. thanks! [16:13:04] 10Analytics, 10Analytics-Kanban: Upgrade Analytics infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T192642 (10Ottomata) [16:25:25] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: RFC: Modern Event Platform: Schema Registry - https://phabricator.wikimedia.org/T201643 (10Tbayer) To recap what I said in last week's IRC meeting: This kind of decision should not fall under TechCom's authority. It is not... [16:33:38] elukey: all the queries match, wanna take a look a second? [16:34:37] fdans: can we do it tomorrow morning? I have some stuff to finish now and you have to get the plane in a bit probably [16:35:00] elukey: sure no prob! [16:35:11] thanks! [16:38:51] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: RFC: Modern Event Platform: Schema Registry - https://phabricator.wikimedia.org/T201643 (10Tbayer) To follow up on a question @Krinkle raised in the RfC meeting: > 21:17:49 HaeB: However, I do genuinely want to... [16:40:52] https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=6&fullscreen&orgId=1 [16:40:59] I just got a page for --^ [16:41:11] not sure if it alarms in here or not [16:41:53] It seems that CitationUsage skyrocketed [16:42:18] is it expected? [16:44:52] hey just saw, am making lunch... [16:45:21] yeah they just merged https://gerrit.wikimedia.org/r/#/c/454854 [16:45:32] dialing up to 100% the schema usage [16:47:57] hmmm [16:48:51] Hmm CitationUsage is not blacklisted [16:48:55] from mysql [16:48:59] commented in https://phabricator.wikimedia.org/T191086 [16:49:28] they dialled from 1% to 100%, maybe a bit too much? [16:50:09] it is ~3x VirtualPageViews [16:50:35] ok, i'm blacklisting it [16:50:40] from mysql at least [16:50:48] that's gonna break the mysql consumer and/or delay evetything else [16:52:10] elukey: i'm going to ping them in research channel if you want to join there [16:52:20] sure, #wikimedia-research ? [16:54:38] ya [16:56:21] !log restarting eventlogging processors to blacklist CitationUsage - T191086 [16:56:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:56:24] T191086: Instrument and collect data via CitationUsage schema - https://phabricator.wikimedia.org/T191086 [16:57:32] elukey: citationusage is now harvesting all events yes [17:09:54] (we are reverting the change as FYI) [17:10:58] hmm, events still going into mysql [17:10:59] not sure why [17:11:01] looking into it [17:13:10] i think maybe my restart command before didn't actualy restart [17:13:30] hmmm or no ahhhh [17:13:38] its beacuse there are still so many of these events in eventlogging-valid-mixed [17:13:43] it needs to insert them all, unfortunetly [17:13:53] because the blacklisting is done on the processor when producing to valid-mixed [17:13:55] I am not sure if Timo deployed the change [17:14:01] he +2ed [17:14:06] but I don't see any deploy [17:14:14] there you go, it is happening now [17:15:24] ottomata: the alarm is really annoying, I think it needs some tweaks :) [17:15:31] oh for sure [17:15:33] its hardcoded throughputs [17:15:36] but in this case, it is correct! [17:15:40] good thing we got the alarm! [17:15:41] :) [17:15:43] yeah but it keeps paging [17:15:46] oh [17:15:47] :/ [17:15:48] one time is enough :D [17:30:29] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Schema Registry + Schema Usage Metadata Configuration Service - https://phabricator.wikimedia.org/T201063 (10Tbayer) >>! In T201063#4546442, @Ottomata wrote: >> 1.) these don't necessarily... [17:30:32] 10Analytics, 10Operations, 10ops-eqiad: analytics1068 doesn't boot - https://phabricator.wikimedia.org/T203244 (10Cmjohnson) @elukey analytics1068 is broke...it will not get past loading bios drivers during the post. I tired a hard reset (removing power, drain flea power) I reseated memory (sometimes memory... [17:31:24] 10Analytics, 10Operations, 10ops-eqiad: analytics1068 doesn't boot - https://phabricator.wikimedia.org/T203244 (10elukey) Thanks a lot! No rush I only wanted to know if draining flea power would have helped.. [17:40:20] oof i think there is a mysql consumer insertion problem [17:40:24] in meeting but will investigate [17:45:41] ah the Row size too large ? [17:46:43] CentralNoticeImpression [17:49:12] joal: your new endpoints are live https://wikimedia.org/api/rest_v1/#!/Edits_data/get_metrics_edits_per_page_project_page_title_editor_type_granularity_start_end [17:49:34] Pchelolo: \o/ [17:52:39] weird, https://meta.wikimedia.org/wiki/Schema:CentralNoticeImpression looks unchanged [17:52:56] got to go, ottomata lemme know what you find later on :) [17:52:58] * elukey off! [18:01:27] yeah, its killing the consumer process somehow [18:01:33] which is cuasing events to be reconsumed a lot [18:08:31] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: CentralNoticeImpression event schema too large for MySQL - https://phabricator.wikimedia.org/T203592 (10Ottomata) [18:15:01] ottomata: can i help? killing mysql consumer? [18:15:16] nuria: https://phabricator.wikimedia.org/T203592 andyrussg says we can blacklist it [18:15:38] k ottomata i actually thought most centralnotice events where blacklisted [18:15:57] ottomata: i am starting to think that we should have mysql whitelist rather than a blacklist [18:16:18] yeah me too [18:16:51] nuria: we could probably collect a few days worth of unique schemas in eventlogging-valid-mixed [18:16:54] to build the current whitelist [18:17:03] and even prune it if we find some that we can ask to blacklist [18:17:46] ottomata: k, creating ticket [18:18:36] !log restarted eventlogging processors blacklisting CentralNoticeImpression - T203592 [18:18:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:18:39] T203592: CentralNoticeImpression event schema too large for MySQL - https://phabricator.wikimedia.org/T203592 [18:18:51] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: CentralNoticeImpression event schema too large for MySQL - https://phabricator.wikimedia.org/T203592 (10Ottomata) [18:20:54] 10Analytics: Flip blacklist for MySQL eventlogging consumer to be a whilelist of allowed schemas - https://phabricator.wikimedia.org/T203596 (10Nuria) [18:21:16] ottomata: ticket created , for next quarter i'd say [18:22:15] 10Analytics-EventLogging, 10Analytics-Kanban: CentralNoticeImpression occasionaly fails validation on device enum field - https://phabricator.wikimedia.org/T203597 (10Ottomata) [18:22:17] aye [18:24:31] nuria: that would also be a good step to eventuallly deprecating EL mysql [18:26:19] ottomata: YES, a major step towards that [18:38:24] ottomata: nice! [18:39:41] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: RFC: Modern Event Platform: Schema Registry - https://phabricator.wikimedia.org/T201643 (10Ottomata) Just had a shortish meeting with Tilman and Josh Minor. I don't think we resolved much, but Josh is going to work with T... [18:39:49] 10Analytics-EventLogging, 10Analytics-Kanban, 10Fundraising-Analysis: CentralNoticeImpression occasionaly fails validation on device enum field - https://phabricator.wikimedia.org/T203597 (10Ottomata) [18:59:45] \o/ Pchelolo :) Many thanks ! [19:03:15] nuria: https://wikimedia.org/api/rest_v1/metrics/editors/top-by-edits/es.wikipedia/user/content/monthly/20180701/20180801 [19:03:32] backend ready ! [19:05:10] nuria: and as an example: https://wikimedia.org/api/rest_v1/metrics/edits/per-editor/es.wikipedia/JuanCamacho/all-page-types/daily/20180101/20180801 [19:06:21] nuria: Do you agree we suggest Leon to add edits to the pageview-tool (by project and by page, for instance to watch correlations between edits and views) ? [19:07:52] joal: oohhhh, let's let this bake until our next snapshot and anounce it then? [19:08:05] nuria: sounds good :) [19:09:57] yo! (pinged by Leon :) This new API looks stellar!!! Pageviews Analysis does show edits per page, and number of unique editors, but no per-user breakdown [19:10:12] Hi musikanimal :) [19:10:26] musikanimal: I actually used Leon not to ping you - Arf ;) [19:10:32] haha, sorry! [19:11:39] musikanimal: you could use thqat though, for per-page: https://wikimedia.org/api/rest_v1/metrics/edits/per-page/fr.wikipedia/Charlemagne/user/monthly/20100101/20180801 [19:12:48] nice! I'm currently using the replicas to get total edits in the timeframe, but I'm not showing edit fluctuations in the chart [19:12:57] I guess that could be done but you'd need a 2nd X-axis [19:13:33] musikanimal: can you tell me more on the "replicas?" [19:13:54] enwiki.web.db.svc.eqiad.wmflabs [19:14:02] Yes ! [19:14:04] (not just enwiki, though) [19:14:10] Makes sense - You query the DBs [19:14:15] ok [19:14:21] yep, all we had up until now! [19:14:32] musikanimal: no problemo !! Was just wondering :) [19:14:58] you folks at Analytics are slowly going to make XTools obsolete, hehe [19:15:14] musikanimal: New API should give you more precision in term of data (not just global count over the period), and as you said, can be done per user [19:15:37] musikanimal: Ah, forgot as well: you have the "top-by" endpoints [19:16:03] musikanimal: Now that I told you all that, let's wait a bit for anouncement as nuria said, please :) [19:16:09] yeah no problem [19:16:13] I should warn you... [19:16:13] Just to be sure the thing don't break too fast ! [19:16:27] musikanimal: yes? [19:16:48] in XTools the "month counts" (number of edits a user made, on a per-month breakdown), is considered "private" per global consensus [19:17:01] in XTools you have to opt-in to these stats [19:17:21] I know it's silly, since that data is already public [19:17:25] hm [19:17:34] I can find you the RfC, one moment [19:17:38] This is something we wondered about [19:18:05] well, also, I should mention XTools does namespace breakdowns. You're showing totals for all namespaces, so maybe not as bad [19:18:08] I'm gonna push a patch to remove per-editor stats then :) [19:18:32] musikanimal: split can be done on content/non_content, but that's all (for now) [19:19:36] here's one of the RfC's (there are others): https://meta.wikimedia.org/wiki/Requests_for_comment/X!%27s_Edit_Counter [19:21:10] on enwiki consensus is that all stats should be available, always, no opt-in requirement [19:21:22] I think that's the only one, other wikis require opt-in [19:22:15] ok - Many thanks for that information musikanimal I-m gonna send my patch right now [19:23:05] no problem :) again you're not doing namespace breakdowns, I don't know if the community cares [19:23:44] musikanimal: from the RfC, it seems the consensus is on 2 things: top namespaces edited and monthly stats [19:24:12] Monthly stats seems not to be related to namespaces (or at least it;s my understanding) [19:24:16] hm [19:24:30] Let's disab [19:24:37] yep, but the monthly stats also show number of edits per namespace in that month. I can't tell if that was brought up in the RfC or not [19:24:47] I hear that [19:24:58] XTools does show yearly counts, without opt-in requirement. Community doesn't seem to mind that [19:25:12] I'll wait tomorrow, to have the team opinion, and then we'll take action [19:46:25] ottomata: looks like mysql is doing OK now, correct? [19:47:30] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: RFC: Modern Event Platform: Schema Registry - https://phabricator.wikimedia.org/T201643 (10daniel) >>! In T201643#4560077, @Tbayer wrote: > To recap what I said in last week's IRC meeting: This kind of decision should not... [19:53:06] 10Quarry, 10Cloud-Services, 10Community-Wikimetrics, 10DBA, and 2 others: Evaluate future of wmf puppet module "mysql" - https://phabricator.wikimedia.org/T165625 (10Dzahn) T202588 exists for the quarry migration. that will unblock a lot of this. Also T162070 is a duplicate of this ticket in a way. [20:06:27] yes nuria [20:06:32] joal: got a few for scala fun [20:06:32] ? [20:07:32] ottomata: sure ! [20:07:36] ottomata: batcave? [20:09:01] nuria: was asking joal! you wanna help with some case class / type parameters ? [20:09:27] ottomata: ahem..... [20:09:29] its like esoteric scala question, trying to find a nice way to do something abstractly with type parameters [20:09:34] and implicits too... [20:09:37] ottomata: maybe not [20:09:41] hehe :) [21:56:01] 10Analytics-Kanban: Document sampling in eventlogging module - https://phabricator.wikimedia.org/T203612 (10Nuria)