[09:11:47] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: Analytics1034 eth0 negotiated speed to 100Mb/s instead of 1000Mb/s - https://phabricator.wikimedia.org/T172633#3523462 (10elukey) I got fooled by dmesg, eth0 was operating at 100 Mbps and it was not flapping. After re-negotiation I can see this:... [09:50:45] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: Analytics1034 eth0 negotiated speed to 100Mb/s instead of 1000Mb/s - https://phabricator.wikimedia.org/T172633#3523524 (10elukey) Stopped again analytics1034, @Cmjohnson I think that the cable swap didn't work :( [10:57:01] 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3523625 (10elukey) @Ottomata @Pchelolo Turns out I am stupid, Event Streams might not be the culprit. I remembered this... [12:06:16] 10Analytics, 10Analytics-Wikistats, 10Wikimedia-Site-requests: Add li: Wikibooks to Wikistats - https://phabricator.wikimedia.org/T165634#3523712 (10Ooswesthoesbes) In that case, adding the "VPS-project-wikistats" tag would seem to be the best option for now. [12:06:33] * elukey lunch! [13:05:23] (03CR) 10Mforns: "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/371496 (owner: 10Ottomata) [13:05:50] hi a-team! :) [13:05:57] (03CR) 10Mforns: [C: 032] Add simple shell script to check if a yarn app is running [analytics/refinery] - 10https://gerrit.wikimedia.org/r/371496 (owner: 10Ottomata) [13:10:24] ottomata: o/ [13:12:41] ottomata: I think I know what it is causing the timeouts in kafka [13:13:12] on rhenium pmacct is running, and since it is stretch it uses librdkafka 0.11 [13:13:21] that by defaults enables version negotiation [13:13:52] I tried to fix it in puppet but of course the version that we have, 1.6.1 does not support librdkafka configs (1.6.2 does) [13:14:49] ! [13:15:11] 1.6.1 of what? [13:15:19] oh pmacct? [13:16:01] yeah [13:16:38] also [13:16:39] 2017-07-28 [13:16:40] 14:11 upgrading rhenium to stretch via dist-upgrade [13:16:46] that matches perfectly :) [13:18:24] ah! [13:18:26] v interesting [13:19:56] 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3523813 (10elukey) This time the root cause seems found: ``` 2017-07-28 14:11 upgrading rhenium to stretch vi... [13:29:00] ottomata: stopped nfacctd (a subdaemon of pmacct) and the issue went away :) [13:29:07] no moar exceptions in the brokers [13:29:19] \o/ [13:29:36] let's see if it fixes also the varnishkafka timeouts and the burrow alarms [13:33:31] 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3523854 (10elukey) nfacctd stopped, immediate recovery on the brokers logs (no more exceptions logged). Let's wait a bi... [14:22:19] (03PS5) 10Mforns: Add script to purge old mediawiki data snapshots [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355601 (https://phabricator.wikimedia.org/T162034) [14:27:04] (03PS6) 10Mforns: Add script to purge old mediawiki data snapshots [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355601 (https://phabricator.wikimedia.org/T162034) [14:42:32] I cannot add rows to https://grafana-admin.wikimedia.org/dashboard/db/varnishkafka anymore [14:42:37] not sure why [14:58:33] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: Analytics1034 eth0 negotiated speed to 100Mb/s instead of 1000Mb/s - https://phabricator.wikimedia.org/T172633#3524055 (10Cmjohnson) The cable seemed loose and would disconnect at the server eth0 port when touched. Replaced the cable and moved i... [15:05:39] 10Analytics, 10Analytics-Wikistats, 10VPS-project-Wikistats, 10Wikimedia-Site-requests: Add li: Wikibooks to Wikistats - https://phabricator.wikimedia.org/T165634#3524068 (10MarcoAurelio) Per above. Seems that it is the only remaining thing to do. [15:36:02] 10Analytics, 10Analytics-Cluster, 10Operations, 10ops-eqiad, and 2 others: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3524181 (10Ottomata) @Cmjohnson Heyaaa, we are pretty ready and excited to start working with these. Can you let us know when they'l... [15:37:43] 10Analytics, 10Analytics-Cluster, 10Operations, 10ops-eqiad, and 2 others: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3524196 (10Cmjohnson) @ottomata okay, I understand I will get them going as soon as I can there in my being worked on queue with a fe... [15:44:01] nuria_, ping? :] [15:44:19] 10Analytics, 10Analytics-Cluster, 10Operations, 10ops-eqiad, and 2 others: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3524215 (10Ottomata) Great, thank you! [15:47:03] 10Analytics-Kanban, 10Operations, 10hardware-requests: Decommission stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T173097#3524220 (10Nuria) [15:49:04] 10Analytics, 10Discovery-Analysis: Get 'sparklyr' working on stats1005 - https://phabricator.wikimedia.org/T139487#3524226 (10Nuria) [15:58:05] 10Analytics, 10Analytics-EventLogging: Alarm on errors on /var/log/upstart/eventlogging* files - https://phabricator.wikimedia.org/T170620#3524241 (10Nuria) Instrument code to send errors to graphite? That would work for errors but not process flapping. [16:05:49] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, 10Community-Tech-Sprint: Add index to mediawiki_page_create_1 table - https://phabricator.wikimedia.org/T170990#3450014 (10Nuria) Looks like indexes are there: UNIQUE KEY `ix_mediawiki_page_create_2_meta_id` (`meta_id`), KEY... [16:05:54] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 2 others: Visualize page create events for all wikis - https://phabricator.wikimedia.org/T170850#3524279 (10Nuria) [16:05:56] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, 10Community-Tech-Sprint: Add index to mediawiki_page_create_1 table - https://phabricator.wikimedia.org/T170990#3524278 (10Nuria) 05Open>03Resolved [16:08:51] 10Analytics, 10Easy: Site for Wikimedia Analytics lacks clear license - https://phabricator.wikimedia.org/T169270#3524282 (10Nuria) 05Open>03Resolved [16:09:18] 10Analytics, 10Easy: Site for Wikimedia Analytics lacks clear license - https://phabricator.wikimedia.org/T169270#3393022 (10Nuria) Please take a look, http://analytics.wikimedia.org see cc-o license note [16:12:04] 10Analytics, 10Analytics-Cluster: hdfs password file for mysql should be re-generated when the password file is changed by puppet - https://phabricator.wikimedia.org/T170162#3524285 (10Nuria) The pw file on hdfs is used for scoop, for example. hdfs dfs -ls /user/hdfs/mysql-analytics-research-client-pw.txt Pi... [16:15:14] 10Analytics, 10Performance-Team: Explore NavigationTiming by faceted properties - EventLogging refine - https://phabricator.wikimedia.org/T166414#3524302 (10Nuria) Seems that importing plainly NavigationTiming in Druid is the 1st step towards doing what gilles is requesting [16:15:56] 10Analytics, 10Research: productionize ClickStream dataset - https://phabricator.wikimedia.org/T158972#3524327 (10Nuria) p:05High>03Low [16:18:04] 10Analytics: Weird performance of sqoop job on Edit Reconstruction - https://phabricator.wikimedia.org/T172579#3524336 (10Nuria) [16:38:11] 10Analytics, 10Operations, 10Traffic, 10Varnish: Sort out analytics service dependency issues for cp* cache hosts - https://phabricator.wikimedia.org/T128374#3524406 (10elukey) Question for the Traffic team: is this task still valid after T138747 or shall we call it done? [16:40:55] !log analytics1034 back in service after swapping the eth cable - T172633 [16:40:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:40:58] T172633: Analytics1034 eth0 negotiated speed to 100Mb/s instead of 1000Mb/s - https://phabricator.wikimedia.org/T172633 [16:41:52] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: Analytics1034 eth0 negotiated speed to 100Mb/s instead of 1000Mb/s - https://phabricator.wikimedia.org/T172633#3524408 (10elukey) 05Open>03Resolved [16:46:14] 10Analytics, 10Discovery, 10Discovery-Analysis (Current work): Reportupdater outputs files with restricted permissions - https://phabricator.wikimedia.org/T173333#3524418 (10mpopov) [16:51:50] * elukey off! [16:51:52] byeeee [17:01:03] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, 10Community-Tech-Sprint: Add index to mediawiki_page_create_1 table - https://phabricator.wikimedia.org/T170990#3524449 (10Ottomata) @nuria, they were asking about an index on wiki database, not on these more generic fields. [17:01:57] (03PS1) 10Bearloga: Give group write permission to output files [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371955 (https://phabricator.wikimedia.org/T173333) [17:02:36] (03CR) 10jerkins-bot: [V: 04-1] Give group write permission to output files [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371955 (https://phabricator.wikimedia.org/T173333) (owner: 10Bearloga) [17:02:47] (03PS2) 10Bearloga: Give group write permission to output files [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371955 (https://phabricator.wikimedia.org/T173333) [17:03:39] (03CR) 10Bearloga: "recheck" [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371955 (https://phabricator.wikimedia.org/T173333) (owner: 10Bearloga) [17:04:13] 10Analytics, 10Discovery, 10Discovery-Analysis, 10Patch-For-Review: Reportupdater outputs files with restricted permissions - https://phabricator.wikimedia.org/T173333#3524453 (10mpopov) [17:04:22] (03CR) 10jerkins-bot: [V: 04-1] Give group write permission to output files [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371955 (https://phabricator.wikimedia.org/T173333) (owner: 10Bearloga) [17:08:00] mforns: is it just me or does the error in https://gerrit.wikimedia.org/r/#/c/371955/ look like it's not related to my actual patch? [17:08:12] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 2 others: Visualize page create events for all wikis - https://phabricator.wikimedia.org/T170850#3524478 (10Nuria) [17:08:15] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, 10Community-Tech-Sprint: Add index to mediawiki_page_create_1 table - https://phabricator.wikimedia.org/T170990#3524477 (10Nuria) 05Resolved>03Open [18:12:32] 10Analytics-Kanban, 10User-Elukey: Archive PageContentSaveComplete in hdfs while we continue collecting data - https://phabricator.wikimedia.org/T170720#3524625 (10Nuria) Spot checking (note that the way I set up avro table in hive is really not performant due to avro schema being hardcoded in table but functi... [18:15:42] 10Analytics-Kanban, 10User-Elukey: Archive PageContentSaveComplete in hdfs while we continue collecting data - https://phabricator.wikimedia.org/T170720#3524631 (10Nuria) ping @elukey: I think we can drop table on mysql, please do couple selects on hive to confirm that you have access. [18:17:46] 10Analytics, 10Operations, 10Traffic, 10Varnish: Sort out analytics service dependency issues for cp* cache hosts - https://phabricator.wikimedia.org/T128374#3524632 (10BBlack) I think there's still some work here to do, if nothing else to audit the situation as it stands. There's basically two things to... [18:49:16] (03PS1) 10Bearloga: README.md: update link [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371964 [18:50:48] (03CR) 10jerkins-bot: [V: 04-1] README.md: update link [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371964 (owner: 10Bearloga) [18:52:03] bearloga, sorry, missed your ping, yes it looks unrelated, will have a look [18:52:17] (03CR) 10Bearloga: "ok, just as I thought -- something is broken with the CI" [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371964 (owner: 10Bearloga) [19:08:30] bearloga, I don't think it's the CI itself but just a test that is failing, the same test fails on my local machine [19:22:43] mforns: please lmk when you fix the test so I can rebase both patches and check :) [19:22:56] bearloga, sure :] [20:13:22] 10Analytics-Tech-community-metrics, 10Developer-Relations: Adjust to Grimoirelab / Bitergia moving to GitLab - https://phabricator.wikimedia.org/T171290#3525003 (10Aklapper) * Got access to https://gitlab.com/groups/Bitergia/c/Wikimedia ; also documented on my internal Continuity page on officewiki [20:13:46] 10Analytics-Kanban: Fix reportupdater graphite tests - https://phabricator.wikimedia.org/T173345#3525004 (10mforns) [20:14:58] (03PS1) 10Mforns: Fix graphite tests [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371972 (https://phabricator.wikimedia.org/T173345) [20:15:12] bearloga, ^ [20:16:18] if CI passes, I will merge that directly [20:17:11] (03CR) 10Mforns: [V: 032 C: 032] "Self-merging to unbreak CI, as it's only tests and no functionality modified." [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371972 (https://phabricator.wikimedia.org/T173345) (owner: 10Mforns) [20:17:36] bearloga, done, I guess you'll be able to unbreak by rebasing now [20:18:34] 10Analytics-Kanban, 10Patch-For-Review: Fix reportupdater graphite tests - https://phabricator.wikimedia.org/T173345#3525025 (10mforns) [20:20:54] (03CR) 10Mforns: Give group write permission to output files (031 comment) [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371955 (https://phabricator.wikimedia.org/T173333) (owner: 10Bearloga) [21:18:19] (03PS2) 10Bearloga: README.md: update link [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371964 [21:18:21] (03CR) 10Bearloga: "recheck" [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371964 (owner: 10Bearloga) [21:18:58] (03PS3) 10Bearloga: Give group write permission to output files [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371955 (https://phabricator.wikimedia.org/T173333) [21:24:08] (03CR) 10Bearloga: Give group write permission to output files (031 comment) [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371955 (https://phabricator.wikimedia.org/T173333) (owner: 10Bearloga) [23:52:18] (03CR) 10Mforns: Give group write permission to output files (031 comment) [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/371955 (https://phabricator.wikimedia.org/T173333) (owner: 10Bearloga)