Fork me on GitHub

Wikimedia IRC logs browser - #wikimedia-operations

Filter:
Start date
End date

Displaying 1226 items:

2018-02-26 00:04:03 <wikibugs> ('CR) ''Chad: "Couple of minor inlines, but otherwise lgtm" (''3 comments) [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414598 (owner: ''Paladox)'
2018-02-26 00:05:23 <wikibugs> ('PS5) ''Paladox: Add BUILD files to build plugin [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414598'
2018-02-26 00:05:25 <wikibugs> ('CR) ''Paladox: Add BUILD files to build plugin (''3 comments) [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414598 (owner: ''Paladox)'
2018-02-26 00:11:13 <wikibugs> ('CR) ''Chad: [V: ''2 C: ''2] Add BUILD files to build plugin [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414598 (owner: ''Paladox)'
2018-02-26 00:19:26 <wikibugs> ('PS1) ''Chad: Add a few more things to gitignore [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414599'
2018-02-26 00:30:15 <wikibugs> ('PS2) ''Chad: Add a few more things to gitignore [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414599'
2018-02-26 00:30:39 <wikibugs> ('CR) ''Paladox: [V: ''2 C: ''2] Add a few more things to gitignore [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414599 (owner: ''Chad)'
2018-02-26 00:57:43 <wikibugs> ('PS1) ''Chad: Adding symlink to about.md for README.md [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414604'
2018-02-26 00:58:18 <wikibugs> ('PS2) ''Paladox: Adding symlink to about.md for README.md [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414604 (owner: ''Chad)'
2018-02-26 00:58:22 <wikibugs> ('CR) ''Paladox: [V: ''2 C: ''2] Adding symlink to about.md for README.md [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414604 (owner: ''Chad)'
2018-02-26 01:06:10 <wikibugs> ('PS1) ''Chad: Basic bootstrapping for Github project creation listener [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414605'
2018-02-26 01:22:16 <wikibugs> ('CR) ''Paladox: Basic bootstrapping for Github project creation listener (''1 comment) [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414605 (owner: ''Chad)'
2018-02-26 01:25:26 <wikibugs> ('CR) ''Chad: Basic bootstrapping for Github project creation listener (''1 comment) [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414605 (owner: ''Chad)'
2018-02-26 01:31:16 <wikibugs> ('PS2) ''Chad: Basic bootstrapping for Github project creation listener [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414605'
2018-02-26 01:31:54 <wikibugs> ('CR) ''Paladox: [V: ''2 C: ''2] Basic bootstrapping for Github project creation listener [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414605 (owner: ''Chad)'
2018-02-26 02:15:42 <XioNoX> !log disabling ALGs on MR routers
2018-02-26 02:15:57 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 02:21:55 <wikibugs> ('PS1) ''Chad: WIP: Adding a "Deployed to" bit for the "Included In" header [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414607'
2018-02-26 02:30:30 <icinga-wm> PROBLEM - HHVM jobrunner on mw1299 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
2018-02-26 02:31:30 <icinga-wm> RECOVERY - HHVM jobrunner on mw1299 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time
2018-02-26 02:43:50 <logmsgbot> !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.22) (duration: 07m 12s)
2018-02-26 02:44:06 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 02:55:49 <XioNoX> !log labs->cloud vlan rename in codfw - T187933
2018-02-26 02:56:03 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 02:56:04 <stashbot> T187933: Labs to Cloud renaming for networking equipment - https://phabricator.wikimedia.org/T187933
2018-02-26 03:01:11 <icinga-wm> PROBLEM - puppet last run on restbase1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
2018-02-26 03:15:50 <wikibugs> ('CR) ''Krinkle: "Needs careful testing by someone who knows how to test these endpoints on mwdebug (or beta). I personally don't know." [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414310 (owner: ''Umherirrender)'
2018-02-26 03:26:51 <icinga-wm> RECOVERY - Check systemd state on rhenium is OK: OK - running: The system is fully operational
2018-02-26 03:29:51 <icinga-wm> PROBLEM - Check systemd state on rhenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
2018-02-26 03:31:11 <icinga-wm> RECOVERY - puppet last run on restbase1011 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
2018-02-26 04:27:11 <icinga-wm> RECOVERY - Check systemd state on rhenium is OK: OK - running: The system is fully operational
2018-02-26 04:30:11 <icinga-wm> PROBLEM - Check systemd state on rhenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
2018-02-26 05:09:01 <wikibugs> ('PS1) ''Legoktm: ExtensionDistributor: Ignore empty repositories [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414612'
2018-02-26 05:26:40 <icinga-wm> RECOVERY - Check systemd state on rhenium is OK: OK - running: The system is fully operational
2018-02-26 05:29:40 <icinga-wm> PROBLEM - Check systemd state on rhenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
2018-02-26 06:15:11 <marostegui> !log Stop MySQL on db1115 tendril database to copy it to db2093. Tendril (dbtree) service will be down for maintenance - T184704
2018-02-26 06:15:26 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 06:15:26 <stashbot> T184704: Setup tendril database monitoring on 2 new hosts, one on eqiad and one on codfw - https://phabricator.wikimedia.org/T184704
2018-02-26 06:29:51 <wikibugs> ('PS1) ''Marostegui: db-codfw.php: Depool db2070 and db2055 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414614'
2018-02-26 06:31:39 <wikibugs> ('CR) ''Marostegui: [C: ''2] db-codfw.php: Depool db2070 and db2055 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414614 (owner: ''Marostegui)'
2018-02-26 06:33:07 <wikibugs> ('Merged) ''jenkins-bot: db-codfw.php: Depool db2070 and db2055 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414614 (owner: ''Marostegui)'
2018-02-26 06:33:21 <wikibugs> ('CR) ''jenkins-bot: db-codfw.php: Depool db2070 and db2055 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414614 (owner: ''Marostegui)'
2018-02-26 06:35:07 <logmsgbot> !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2055 and db2070 (duration: 01m 07s)
2018-02-26 06:35:15 <marostegui> !log Stop MySQL db2070 and db2055 to copy data to db2055 (and upgrade kernel and mariadb)
2018-02-26 06:35:20 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 06:35:32 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 06:49:20 <wikibugs> ('PS1) ''Marostegui: db-eqiad.php: Depool db1103:3312 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414617 (https://phabricator.wikimedia.org/T187089)'
2018-02-26 06:50:58 <wikibugs> ('CR) ''Marostegui: [C: ''2] db-eqiad.php: Depool db1103:3312 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414617 (https://phabricator.wikimedia.org/T187089) (owner: ''Marostegui)'
2018-02-26 06:52:22 <wikibugs> ('Merged) ''jenkins-bot: db-eqiad.php: Depool db1103:3312 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414617 (https://phabricator.wikimedia.org/T187089) (owner: ''Marostegui)'
2018-02-26 06:52:43 <wikibugs> ('CR) ''jenkins-bot: db-eqiad.php: Depool db1103:3312 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414617 (https://phabricator.wikimedia.org/T187089) (owner: ''Marostegui)'
2018-02-26 06:53:50 <logmsgbot> !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 (duration: 00m 56s)
2018-02-26 06:54:02 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 06:55:14 <wikibugs> ('PS1) ''Marostegui: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414618'
2018-02-26 06:55:28 <wikibugs> ('CR) ''Elukey: "> I think it would be nicer if you do this first: https://gerrit.wikimedia.org/r/413889"; [puppet] - ''https://gerrit.wikimedia.org/r/413685 (https://phabricator.wikimedia.org/T187805) (owner: ''Elukey)'
2018-02-26 06:56:46 <wikibugs> ('CR) ''Marostegui: [C: ''2] db-eqiad.php: Depool db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414618 (owner: ''Marostegui)'
2018-02-26 06:58:09 <wikibugs> ('Merged) ''jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414618 (owner: ''Marostegui)'
2018-02-26 06:59:13 <logmsgbot> !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 (duration: 00m 54s)
2018-02-26 06:59:25 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 06:59:26 <marostegui> !log Stop MySQL on db1103:3312 and 3314 to upgrade it and kernel
2018-02-26 06:59:26 <wikibugs> ('CR) ''jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414618 (owner: ''Marostegui)'
2018-02-26 06:59:36 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 07:08:04 <marostegui> !log Deploy schema change on db1103:3312 - T187089 T185128 T153182
2018-02-26 07:08:18 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 07:08:18 <stashbot> T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089
2018-02-26 07:08:18 <stashbot> T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182
2018-02-26 07:08:19 <stashbot> T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128
2018-02-26 07:11:21 <wikibugs> ('PS10) ''Elukey: Introduce role::kafka::monitoring [puppet] - ''https://gerrit.wikimedia.org/r/413728 (https://phabricator.wikimedia.org/T187805)'
2018-02-26 07:14:03 <wikibugs> ('PS1) ''Marostegui: db-eqiad.php: Slowly repool db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414619'
2018-02-26 07:17:54 <wikibugs> ('CR) ''Marostegui: [C: ''2] db-eqiad.php: Slowly repool db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414619 (owner: ''Marostegui)'
2018-02-26 07:19:19 <wikibugs> ('Merged) ''jenkins-bot: db-eqiad.php: Slowly repool db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414619 (owner: ''Marostegui)'
2018-02-26 07:19:30 <wikibugs> ('CR) ''jenkins-bot: db-eqiad.php: Slowly repool db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414619 (owner: ''Marostegui)'
2018-02-26 07:20:36 <logmsgbot> !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1103:3314 after mariadb and kernel upgrade (duration: 00m 56s)
2018-02-26 07:20:49 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 07:21:14 <wikibugs> ('CR) ''Elukey: "pcc after rebase: https://puppet-compiler.wmflabs.org/compiler02/10135/"; [puppet] - ''https://gerrit.wikimedia.org/r/413728 (https://phabricator.wikimedia.org/T187805) (owner: ''Elukey)'
2018-02-26 07:32:34 <wikibugs> ('PS1) ''Marostegui: db-eqiad.php: Increase traffic db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414621'
2018-02-26 07:34:07 <wikibugs> ('CR) ''Marostegui: [C: ''2] db-eqiad.php: Increase traffic db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414621 (owner: ''Marostegui)'
2018-02-26 07:34:17 <wikibugs> ('PS2) ''Elukey: role::configcluster: update zookeeper's ferm rule [puppet] - ''https://gerrit.wikimedia.org/r/413685 (https://phabricator.wikimedia.org/T187805)'
2018-02-26 07:34:23 <wikibugs> ('CR) ''Elukey: "pcc: https://puppet-compiler.wmflabs.org/compiler02/10137/"; [puppet] - ''https://gerrit.wikimedia.org/r/413685 (https://phabricator.wikimedia.org/T187805) (owner: ''Elukey)'
2018-02-26 07:35:31 <wikibugs> ('Merged) ''jenkins-bot: db-eqiad.php: Increase traffic db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414621 (owner: ''Marostegui)'
2018-02-26 07:36:46 <logmsgbot> !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic db1103:3314 (duration: 00m 56s)
2018-02-26 07:36:51 <wikibugs> ('CR) ''jenkins-bot: db-eqiad.php: Increase traffic db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414621 (owner: ''Marostegui)'
2018-02-26 07:36:59 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 07:43:47 <wikibugs> ('CR) ''Elukey: [C: ''2] role::configcluster: update zookeeper's ferm rule [puppet] - ''https://gerrit.wikimedia.org/r/413685 (https://phabricator.wikimedia.org/T187805) (owner: ''Elukey)'
2018-02-26 07:45:44 <wikibugs> ('PS1) ''Marostegui: db-eqiad.php: Increase traffic for db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414622'
2018-02-26 07:48:30 <wikibugs> ('CR) ''Marostegui: [C: ''2] db-eqiad.php: Increase traffic for db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414622 (owner: ''Marostegui)'
2018-02-26 07:49:55 <wikibugs> ('Merged) ''jenkins-bot: db-eqiad.php: Increase traffic for db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414622 (owner: ''Marostegui)'
2018-02-26 07:50:07 <wikibugs> ('CR) ''jenkins-bot: db-eqiad.php: Increase traffic for db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414622 (owner: ''Marostegui)'
2018-02-26 07:51:58 <logmsgbot> !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic db1103:3314 (duration: 00m 56s)
2018-02-26 07:52:11 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 08:02:37 <wikibugs> ('PS1) ''Marostegui: db-eqiad.php: Fully repool db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414623'
2018-02-26 08:05:30 <wikibugs> ('CR) ''Elukey: [C: ''2] Introduce role::kafka::monitoring [puppet] - ''https://gerrit.wikimedia.org/r/413728 (https://phabricator.wikimedia.org/T187805) (owner: ''Elukey)'
2018-02-26 08:05:35 <wikibugs> ('PS11) ''Elukey: Introduce role::kafka::monitoring [puppet] - ''https://gerrit.wikimedia.org/r/413728 (https://phabricator.wikimedia.org/T187805)'
2018-02-26 08:09:04 <wikibugs> ('CR) ''Marostegui: [C: ''2] db-eqiad.php: Fully repool db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414623 (owner: ''Marostegui)'
2018-02-26 08:10:30 <icinga-wm> PROBLEM - puppet last run on kafkamon1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[burrow]
2018-02-26 08:10:32 <logmsgbot> !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 (duration: 00m 56s)
2018-02-26 08:10:33 <wikibugs> ('Merged) ''jenkins-bot: db-eqiad.php: Fully repool db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414623 (owner: ''Marostegui)'
2018-02-26 08:10:41 <icinga-wm> PROBLEM - puppet last run on kafkamon2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[burrow]
2018-02-26 08:10:43 <wikibugs> ('CR) ''jenkins-bot: db-eqiad.php: Fully repool db1103:3314 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414623 (owner: ''Marostegui)'
2018-02-26 08:10:46 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 08:11:15 <elukey> failures on kafkamon are mine, burrow is not on the stretch apt repo
2018-02-26 08:11:51 <logmsgbot> !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1103:3314 (duration: 00m 56s)
2018-02-26 08:12:02 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 08:12:22 <wikibugs> ('CR) ''Muehlenhoff: [C: ''2] Fix verbose logging in debdeploy-deploy [debs/debdeploy] - ''https://gerrit.wikimedia.org/r/413758 (owner: ''Muehlenhoff)'
2018-02-26 08:35:41 <wikibugs> ('PS3) ''Jcrespo: tendril: Add memcache to tendril web frontend [puppet] - ''https://gerrit.wikimedia.org/r/414502 (https://phabricator.wikimedia.org/T133906)'
2018-02-26 08:38:25 <wikibugs> 'Operations: TransparencyReport-private is not auto deploying - https://phabricator.wikimedia.org/T188224#4000455 (''Peachey88) >>! In T188224#4000339, @Prtksxna wrote: > Also, would it be possible to move https://transparency.wikimedia.org/private to https://private.transparency.wikimedia.org/. Happy to raise...'
2018-02-26 08:43:45 <wikibugs> ('CR) ''Jcrespo: [C: ''2] tendril: Add memcache to tendril web frontend [puppet] - ''https://gerrit.wikimedia.org/r/414502 (https://phabricator.wikimedia.org/T133906) (owner: ''Jcrespo)'
2018-02-26 08:45:38 <wikibugs> ('PS4) ''Gehel: Increas bulk insert threadpool for relforge [puppet] - ''https://gerrit.wikimedia.org/r/413810 (owner: ''EBernhardson)'
2018-02-26 08:46:45 <wikibugs> ('CR) ''Gehel: [C: ''2] "LGTM (and puppet compiler agrees: https://puppet-compiler.wmflabs.org/compiler02/10138/)" [puppet] - ''https://gerrit.wikimedia.org/r/413810 (owner: ''EBernhardson)'
2018-02-26 08:48:40 <icinga-wm> PROBLEM - Host rutherfordium is DOWN: PING CRITICAL - Packet loss = 100%
2018-02-26 08:48:40 <icinga-wm> PROBLEM - Host bohrium is DOWN: PING CRITICAL - Packet loss = 100%
2018-02-26 08:48:41 <icinga-wm> PROBLEM - Host dubnium is DOWN: PING CRITICAL - Packet loss = 64%, RTA = 11892.24 ms
2018-02-26 08:48:41 <icinga-wm> PROBLEM - Host chlorine is DOWN: PING CRITICAL - Packet loss = 0%, RTA = 2930.83 ms
2018-02-26 08:48:41 <icinga-wm> PROBLEM - Host planet1001 is DOWN: PING CRITICAL - Packet loss = 28%, RTA = 4503.55 ms
2018-02-26 08:48:41 <icinga-wm> PROBLEM - Host install1002 is DOWN: PING CRITICAL - Packet loss = 100%
2018-02-26 08:48:41 <icinga-wm> PROBLEM - Host logstash1007 is DOWN: PING CRITICAL - Packet loss = 100%
2018-02-26 08:48:55 <elukey> i guess ganeti1006 down :)
2018-02-26 08:49:00 <icinga-wm> PROBLEM - Host hassium is DOWN: PING CRITICAL - Packet loss = 50%, RTA = 4426.14 ms
2018-02-26 08:49:03 <icinga-wm> PROBLEM - Host mwdebug1002 is DOWN: PING CRITICAL - Packet loss = 8%, RTA = 4510.35 ms
2018-02-26 08:49:10 <icinga-wm> RECOVERY - Host chlorine is UP: PING WARNING - Packet loss = 0%, RTA = 1891.89 ms
2018-02-26 08:49:11 <icinga-wm> PROBLEM - Host webperf1001 is DOWN: PING CRITICAL - Packet loss = 0%, RTA = 2671.33 ms
2018-02-26 08:49:20 <icinga-wm> PROBLEM - Host netmon1003 is DOWN: PING CRITICAL - Packet loss = 54%, RTA = 7861.66 ms
2018-02-26 08:49:30 <icinga-wm> RECOVERY - Host hassium is UP: PING WARNING - Packet loss = 73%, RTA = 1379.77 ms
2018-02-26 08:50:41 <icinga-wm> PROBLEM - SSH on ganeti1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 08:50:50 <icinga-wm> PROBLEM - Host chlorine is DOWN: PING CRITICAL - Packet loss = 100%
2018-02-26 08:51:09 <wikibugs> ('PS3) ''Gehel: Resize the Cirrus LTR model cache [puppet] - ''https://gerrit.wikimedia.org/r/413407 (https://phabricator.wikimedia.org/T188015) (owner: ''EBernhardson)'
2018-02-26 08:51:31 <icinga-wm> PROBLEM - Host hassium is DOWN: PING CRITICAL - Packet loss = 100%
2018-02-26 08:51:51 <icinga-wm> PROBLEM - SSH on releases1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 08:51:51 <icinga-wm> PROBLEM - HTTP on releases1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 08:54:48 <wikibugs> ('PS1) ''Marostegui: Revert "db-codfw.php: Depool db2070 and db2055" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414627'
2018-02-26 08:54:50 <icinga-wm> RECOVERY - SSH on ganeti1006 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0)
2018-02-26 08:54:51 <icinga-wm> RECOVERY - Host webperf1001 is UP: PING WARNING - Packet loss = 80%, RTA = 180.23 ms
2018-02-26 08:55:00 <icinga-wm> RECOVERY - Host dubnium is UP: PING OK - Packet loss = 0%, RTA = 2.83 ms
2018-02-26 08:55:00 <icinga-wm> RECOVERY - Host hassium is UP: PING OK - Packet loss = 0%, RTA = 2.96 ms
2018-02-26 08:55:00 <icinga-wm> RECOVERY - Host netmon1003 is UP: PING OK - Packet loss = 0%, RTA = 2.81 ms
2018-02-26 08:55:00 <icinga-wm> RECOVERY - Host rutherfordium is UP: PING OK - Packet loss = 0%, RTA = 2.34 ms
2018-02-26 08:55:00 <icinga-wm> RECOVERY - Host planet1001 is UP: PING OK - Packet loss = 0%, RTA = 2.48 ms
2018-02-26 08:55:10 <icinga-wm> RECOVERY - Host chlorine is UP: PING OK - Packet loss = 0%, RTA = 2.55 ms
2018-02-26 08:55:10 <icinga-wm> RECOVERY - Host install1002 is UP: PING OK - Packet loss = 0%, RTA = 2.80 ms
2018-02-26 08:55:11 <icinga-wm> PROBLEM - Check systemd state on ganeti1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
2018-02-26 08:55:40 <icinga-wm> RECOVERY - Host mwdebug1002 is UP: PING OK - Packet loss = 0%, RTA = 0.84 ms
2018-02-26 08:55:50 <icinga-wm> RECOVERY - SSH on releases1001 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u2 (protocol 2.0)
2018-02-26 08:55:50 <icinga-wm> RECOVERY - Host logstash1007 is UP: PING OK - Packet loss = 0%, RTA = 0.42 ms
2018-02-26 08:55:51 <icinga-wm> RECOVERY - HTTP on releases1001 is OK: HTTP OK: HTTP/1.1 200 OK - 15234 bytes in 2.091 second response time
2018-02-26 08:56:00 <icinga-wm> RECOVERY - Host bohrium is UP: PING OK - Packet loss = 0%, RTA = 0.95 ms
2018-02-26 08:56:22 <wikibugs> ('CR) ''Filippo Giunchedi: "Thanks Paladox! Did you test this in labs? I'd like to run some tests myself as well." [puppet] - ''https://gerrit.wikimedia.org/r/391336 (https://phabricator.wikimedia.org/T184562) (owner: ''Paladox)'
2018-02-26 09:03:32 <wikibugs> ('CR) ''Gehel: [C: ''2] "LGTM, puppet compiler agrees: https://puppet-compiler.wmflabs.org/compiler02/10139/"; [puppet] - ''https://gerrit.wikimedia.org/r/413407 (https://phabricator.wikimedia.org/T188015) (owner: ''EBernhardson)'
2018-02-26 09:06:47 <_joe_> anyoine doing something about ganeti1006?
2018-02-26 09:07:11 <_joe_> it went down twice
2018-02-26 09:07:33 <wikibugs> ('CR) ''Ema: "Please use self.report to log the main events (check is up, a icmp destination unreachable has been received). See how the idleconnection " [debs/pybal] - ''https://gerrit.wikimedia.org/r/413211 (https://phabricator.wikimedia.org/T178151) (owner: ''Vgutierrez)'
2018-02-26 09:07:35 <elukey> I tried to join the mgmt console but then it was up and showing recoveries
2018-02-26 09:07:44 <elukey> didn't check logs though
2018-02-26 09:13:20 <icinga-wm> RECOVERY - Check systemd state on ganeti1006 is OK: OK - running: The system is fully operational
2018-02-26 09:15:10 <icinga-wm> PROBLEM - Request latencies on chlorine is CRITICAL: CRITICAL - apiserver_request_latencies is 14273695 https://grafana.wikimedia.org/dashboard/db/kubernetes-api
2018-02-26 09:15:11 <icinga-wm> PROBLEM - etcd request latencies on chlorine is CRITICAL: CRITICAL - etcd_request_latencies is 14240109 https://grafana.wikimedia.org/dashboard/db/kubernetes-api
2018-02-26 09:17:10 <icinga-wm> RECOVERY - Request latencies on chlorine is OK: OK - apiserver_request_latencies is 2055 https://grafana.wikimedia.org/dashboard/db/kubernetes-api
2018-02-26 09:17:11 <icinga-wm> RECOVERY - etcd request latencies on chlorine is OK: OK - etcd_request_latencies is 1511 https://grafana.wikimedia.org/dashboard/db/kubernetes-api
2018-02-26 09:23:54 <elukey> !log copied burrow 0.1 from jessie-wikimedia to stretch-wikimedia
2018-02-26 09:24:09 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 09:25:31 <icinga-wm> RECOVERY - puppet last run on kafkamon1001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures
2018-02-26 09:26:24 <wikibugs> ('CR) ''Marostegui: [C: ''2] Revert "db-codfw.php: Depool db2070 and db2055" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414627 (owner: ''Marostegui)'
2018-02-26 09:26:40 <wikibugs> ('Abandoned) ''Gehel: T136696 Including a .policy file to grant permission to send logs to logstash [puppet] - ''https://gerrit.wikimedia.org/r/295129 (owner: ''Nicko)'
2018-02-26 09:28:10 <wikibugs> ('Merged) ''jenkins-bot: Revert "db-codfw.php: Depool db2070 and db2055" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414627 (owner: ''Marostegui)'
2018-02-26 09:28:12 <wikibugs> ('CR) ''jenkins-bot: Revert "db-codfw.php: Depool db2070 and db2055" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414627 (owner: ''Marostegui)'
2018-02-26 09:29:21 <logmsgbot> !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2055 and db2070 (duration: 00m 55s)
2018-02-26 09:29:33 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 09:30:41 <icinga-wm> RECOVERY - puppet last run on kafkamon2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
2018-02-26 09:31:43 <wikibugs> ('PS1) ''Gilles: Add Thumbor private container user configuration keys [puppet] - ''https://gerrit.wikimedia.org/r/414631 (https://phabricator.wikimedia.org/T187822)'
2018-02-26 09:32:46 <wikibugs> ('PS1) ''Elukey: prometheus::ops|analytics: update Kafka Burrow's exporter config [puppet] - ''https://gerrit.wikimedia.org/r/414632 (https://phabricator.wikimedia.org/T180442)'
2018-02-26 09:34:00 <wikibugs> ('PS5) ''Gehel: maps: Icinga alert when OSM replication lags [puppet] - ''https://gerrit.wikimedia.org/r/410172 (https://phabricator.wikimedia.org/T167549)'
2018-02-26 09:34:21 <wikibugs> ('CR) ''Elukey: [C: ''2] prometheus::ops|analytics: update Kafka Burrow's exporter config [puppet] - ''https://gerrit.wikimedia.org/r/414632 (https://phabricator.wikimedia.org/T180442) (owner: ''Elukey)'
2018-02-26 09:36:31 <icinga-wm> PROBLEM - HHVM jobrunner on mw1300 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
2018-02-26 09:37:31 <icinga-wm> RECOVERY - HHVM jobrunner on mw1300 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time
2018-02-26 09:38:03 <wikibugs> ('PS6) ''Gehel: maps: Icinga alert when OSM replication lags [puppet] - ''https://gerrit.wikimedia.org/r/410172 (https://phabricator.wikimedia.org/T167549)'
2018-02-26 09:38:42 <wikibugs> ('PS2) ''Gilles: Add Thumbor private container user configuration keys [puppet] - ''https://gerrit.wikimedia.org/r/414631 (https://phabricator.wikimedia.org/T187822)'
2018-02-26 09:39:16 <wikibugs> ('CR) ''Gehel: [C: ''2] "Looks good: https://puppet-compiler.wmflabs.org/compiler02/10140/"; [puppet] - ''https://gerrit.wikimedia.org/r/410172 (https://phabricator.wikimedia.org/T167549) (owner: ''Gehel)'
2018-02-26 09:45:25 <wikibugs> 'Operations, ''Discovery, ''Icinga, ''Maps, and 2 others: Create Icinga alert when OSM replication lags on maps - https://phabricator.wikimedia.org/T167549#4000604 (''Gehel) Those alerts are now available on Icinga and passing. I'll keep an eye on them for the next few days to make sure we don't have fals...'
2018-02-26 09:48:22 <wikibugs> ('PS3) ''Gehel: maps: icinga alert if tiles are not being generated [puppet] - ''https://gerrit.wikimedia.org/r/410136 (https://phabricator.wikimedia.org/T175243)'
2018-02-26 09:49:27 <wikibugs> ('PS1) ''Elukey: role::kafka::monitoring: add lag monitoring for Jumbo [puppet] - ''https://gerrit.wikimedia.org/r/414636 (https://phabricator.wikimedia.org/T180442)'
2018-02-26 09:51:14 <wikibugs> ('PS2) ''Elukey: role::kafka::monitoring: add lag monitoring for Jumbo [puppet] - ''https://gerrit.wikimedia.org/r/414636 (https://phabricator.wikimedia.org/T180442)'
2018-02-26 09:52:01 <wikibugs> ('CR) ''Elukey: [C: ''2] role::kafka::monitoring: add lag monitoring for Jumbo [puppet] - ''https://gerrit.wikimedia.org/r/414636 (https://phabricator.wikimedia.org/T180442) (owner: ''Elukey)'
2018-02-26 09:54:52 <wikibugs> ('PS1) ''Gehel: Resize the Cirrus LTR model cache [puppet] - ''https://gerrit.wikimedia.org/r/414637 (https://phabricator.wikimedia.org/T188015)'
2018-02-26 09:56:42 <wikibugs> ('CR) ''DCausse: [C: ''] Resize the Cirrus LTR model cache [puppet] - ''https://gerrit.wikimedia.org/r/414637 (https://phabricator.wikimedia.org/T188015) (owner: ''Gehel)'
2018-02-26 09:56:52 <wikibugs> ('CR) ''Gehel: [C: ''2] Resize the Cirrus LTR model cache [puppet] - ''https://gerrit.wikimedia.org/r/414637 (https://phabricator.wikimedia.org/T188015) (owner: ''Gehel)'
2018-02-26 09:57:37 <icinga-wm> PROBLEM - puppet last run on kafkamon1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
2018-02-26 09:58:28 <elukey> checking --^
2018-02-26 10:02:29 <wikibugs> 'Operations: netfilter software at WMF: iptables vs nftables - https://phabricator.wikimedia.org/T187994#4000623 (''aborrero) >>! In T187994#3992756, @aborrero wrote: > [...] > Our use cases could benefit from nftables in several aspects: > * performance, by using sets, maps, dicts and concatenations instead of...'
2018-02-26 10:04:34 <wikibugs> ('PS1) ''Gehel: logstash: kafka analytics cluster isn't available from deployment-prep [puppet] - ''https://gerrit.wikimedia.org/r/414638'
2018-02-26 10:10:06 <moritzm> !log rebooting mw canaries for kernel security update
2018-02-26 10:10:20 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 10:31:42 <wikibugs> ('PS1) ''Elukey: role::kafka::monitoring: fix wrong cut in previous change [puppet] - ''https://gerrit.wikimedia.org/r/414639 (https://phabricator.wikimedia.org/T180442)'
2018-02-26 10:32:15 <wikibugs> ('CR) ''Elukey: [C: ''2] role::kafka::monitoring: fix wrong cut in previous change [puppet] - ''https://gerrit.wikimedia.org/r/414639 (https://phabricator.wikimedia.org/T180442) (owner: ''Elukey)'
2018-02-26 10:32:20 <wikibugs> ('PS2) ''Elukey: role::kafka::monitoring: fix wrong cut in previous change [puppet] - ''https://gerrit.wikimedia.org/r/414639 (https://phabricator.wikimedia.org/T180442)'
2018-02-26 10:35:37 <wikibugs> ('PS2) ''Arturo Borrero Gonzalez: toollabs: apt_pinning: extend pinnigs for pam libs [puppet] - ''https://gerrit.wikimedia.org/r/413780 (https://phabricator.wikimedia.org/T187193)'
2018-02-26 10:36:43 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [C: ''2] toollabs: apt_pinning: extend pinnigs for pam libs [puppet] - ''https://gerrit.wikimedia.org/r/413780 (https://phabricator.wikimedia.org/T187193) (owner: ''Arturo Borrero Gonzalez)'
2018-02-26 10:37:37 <icinga-wm> RECOVERY - puppet last run on kafkamon1001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
2018-02-26 10:38:19 <hashar> kart_: talking with zeljkof looks like we could deploy your ULS patch to maintenance/ULSCompactLinksDisablePref.php right now? ( https://gerrit.wikimedia.org/r/#/c/414609/ )
2018-02-26 10:39:06 <hashar> ohh no that is more complicated :D
2018-02-26 10:39:23 <hashar> the script is on hold until it is switched to vslow replicas https://gerrit.wikimedia.org/r/#/c/414608/
2018-02-26 10:39:27 <hashar> which I guess we can also deploy right now
2018-02-26 10:40:35 <hashar> zeljkof: we should get an early morning swat slot just for Kartik :]
2018-02-26 10:41:31 <zeljkof> hashar, kart_: let's compromise :) and review/merge the patch now (so we don't have to wait for CI during SWAT) but deploy during EU SWAT
2018-02-26 10:41:45 <zeljkof> if kart_ promises to be around during EU SWAT ;)
2018-02-26 10:42:00 <hashar> they only touch a maintenance script which is run manually
2018-02-26 10:42:10 <hashar> but yeah lets see what kart_ has to say about it :)
2018-02-26 10:42:14 <hashar> I am not worried for sure
2018-02-26 10:46:53 <wikibugs> ('PS1) ''Marostegui: site.pp: Add a comment about db1113 [puppet] - ''https://gerrit.wikimedia.org/r/414641 (https://phabricator.wikimedia.org/T184704)'
2018-02-26 10:48:02 <wikibugs> ('CR) ''Marostegui: [C: ''2] site.pp: Add a comment about db1113 [puppet] - ''https://gerrit.wikimedia.org/r/414641 (https://phabricator.wikimedia.org/T184704) (owner: ''Marostegui)'
2018-02-26 10:49:49 <kart_> hashar: zeljkof basically, script won't be run today. It is scheduled to run on Wednesday.
2018-02-26 10:50:25 <kart_> hashar: those two patches can be deploy in SWAT or before. I'm fine. Just added as per normal routine procedure.
2018-02-26 10:51:01 <kart_> hashar: I'll take most of free slot of Wednesday to run the script actully..
2018-02-26 10:52:04 <zeljkof> kart_: can both of your patches be merged before SWAT, and deployed together?
2018-02-26 10:52:20 <zeljkof> (deployed during SWAT)
2018-02-26 10:52:31 <kart_> zeljkof: yes. doable.
2018-02-26 10:52:40 <zeljkof> or should I merge and deploy patches one by one?
2018-02-26 10:53:05 <zeljkof> just curious, to make the swat quicker
2018-02-26 10:53:19 <kart_> zeljkof: merge both and deploy. Less chance. No need to seprately deploy.
2018-02-26 10:53:33 <kart_> zeljkof: no testing needed, except checking both changes are in wmf.22.
2018-02-26 10:54:24 <zeljkof> kart_: cool, I'll merge them now and deploy during SWAT
2018-02-26 10:54:31 <kart_> OK!
2018-02-26 10:58:21 <wikibugs> ('CR) ''Alexandros Kosiaris: [C: ''-2] "I don't think this are for any reason special hosts. They are ordinary hosts (alongside many others in that stanza that are wrongfully pla" [puppet] - ''https://gerrit.wikimedia.org/r/413889 (https://phabricator.wikimedia.org/T187805) (owner: ''Dzahn)'
2018-02-26 11:00:05 <jouncebot> jan_drewniak: #bothumor I � Unicode. All rise for Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180226T1100).
2018-02-26 11:00:05 <jouncebot> No GERRIT patches in the queue for this window AFAICS.
2018-02-26 11:01:51 <moritzm> !log powercycling mw1264 (stuck after reboot)
2018-02-26 11:02:04 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 11:02:10 <wikibugs> ('PS1) ''Jdrewniak: Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414643 (https://phabricator.wikimedia.org/T128546)'
2018-02-26 11:04:21 <wikibugs> ('CR) ''Jdrewniak: [C: ''2] Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414643 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
2018-02-26 11:05:48 <wikibugs> ('Merged) ''jenkins-bot: Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414643 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
2018-02-26 11:06:56 <wikibugs> ('CR) ''jenkins-bot: Bumping portals to master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414643 (https://phabricator.wikimedia.org/T128546) (owner: ''Jdrewniak)'
2018-02-26 11:11:08 <wikibugs> 'Operations, ''Puppet, ''Patch-For-Review: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#4000729 (''fgiunchedi) I am trying at least to get `role::puppetmaster::standalone` going on stretch, so far not a whole lot of luck, namely the server 500s when cont...'
2018-02-26 11:11:23 <logmsgbot> !log jdrewniak@tin Synchronized portals/prod/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:402805|Bumping portals to master (T128546)]] (duration: 00m 58s)
2018-02-26 11:11:37 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 11:11:38 <stashbot> T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
2018-02-26 11:12:21 <logmsgbot> !log jdrewniak@tin Synchronized portals: Wikimedia Portals Update: [[gerrit:402805|Bumping portals to master (T128546)]] (duration: 00m 57s)
2018-02-26 11:12:33 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 11:14:47 <wikibugs> ('CR) ''Lucas Werkmeister (WMDE): "wmf.22 is deployed now." [mediawiki-config] - ''https://gerrit.wikimedia.org/r/413724 (https://phabricator.wikimedia.org/T184812) (owner: ''Lucas Werkmeister (WMDE))'
2018-02-26 11:15:51 <wikibugs> 'Operations, ''HHVM, ''MW-1.31-release-notes (WMF-deploy-2018-02-13 (1.31.0-wmf.21)), ''Performance-Team (Radar): HHVM hangs on the API cluster - https://phabricator.wikimedia.org/T184048#4000758 (''Joe) ''Open>''Resolved a:''Joe'
2018-02-26 11:19:26 <wikibugs> 'Operations, ''Continuous-Integration-Infrastructure, ''MediaWiki-Core-Tests, ''HHVM: Readd complete URL parsing fix from 3.18.7 release - https://phabricator.wikimedia.org/T185024#4000766 (''MoritzMuehlenhoff) p:''Unbreak!>''Normal'
2018-02-26 11:20:10 <wikibugs> 'Operations: netfilter software at WMF: iptables vs nftables - https://phabricator.wikimedia.org/T187994#4000768 (''MoritzMuehlenhoff) p:''Triage>''Normal'
2018-02-26 11:21:53 <wikibugs> 'Operations, ''media-storage: Have swift metrics available in Prometheus - https://phabricator.wikimedia.org/T187991#4000771 (''MoritzMuehlenhoff) p:''Triage>''Normal'
2018-02-26 11:24:24 <wikibugs> 'Operations: Define a special range in constants.pp for the LVS hosts - https://phabricator.wikimedia.org/T187910#3989817 (''MoritzMuehlenhoff) @Andrew : There is $CACHE_MISC already as a network constant / ferm macro.'
2018-02-26 11:25:06 <wikibugs> 'Operations, ''ops-codfw: db2049 management unable to login via ssh - https://phabricator.wikimedia.org/T187534#4000781 (''MoritzMuehlenhoff) p:''Triage>''Normal a:''Papaul'
2018-02-26 11:25:27 <wikibugs> 'Operations, ''ops-eqiad, ''DBA: Disk #5 (count starts at #0) of db1111 has corrupted sectors - https://phabricator.wikimedia.org/T187526#4000783 (''MoritzMuehlenhoff) p:''Triage>''Normal'
2018-02-26 11:27:55 <wikibugs> 'Operations, ''Puppet, ''Patch-For-Review: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#4000789 (''fgiunchedi) After manually running `puppet master --debug --no-daemonize --masterport 8142` and then interrupting it, apparently now also phusion is able t...'
2018-02-26 11:41:08 <icinga-wm> PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
2018-02-26 11:42:37 <_joe_> whoa what a peak
2018-02-26 11:42:46 <_joe_> it's already gone, but still
2018-02-26 11:42:47 <icinga-wm> PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
2018-02-26 11:43:11 <_joe_> someone should take a look, this looks genuinely like a small outage
2018-02-26 11:49:17 <icinga-wm> RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5
2018-02-26 11:49:47 <icinga-wm> RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
2018-02-26 12:06:39 <wikibugs> ('PS2) ''Alexandros Kosiaris: apache: Support IPv6 in status [puppet] - ''https://gerrit.wikimedia.org/r/411193'
2018-02-26 12:06:42 <wikibugs> ('CR) ''Alexandros Kosiaris: [V: ''2 C: ''2] apache: Support IPv6 in status [puppet] - ''https://gerrit.wikimedia.org/r/411193 (owner: ''Alexandros Kosiaris)'
2018-02-26 12:09:47 <wikibugs> ('PS1) ''Arturo Borrero Gonzalez: apt: apt_upgrade: include link to wikitech docs [puppet] - ''https://gerrit.wikimedia.org/r/414649 (https://phabricator.wikimedia.org/T181647)'
2018-02-26 12:10:48 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [C: ''2] apt: apt_upgrade: include link to wikitech docs [puppet] - ''https://gerrit.wikimedia.org/r/414649 (https://phabricator.wikimedia.org/T181647) (owner: ''Arturo Borrero Gonzalez)'
2018-02-26 12:10:58 <wikibugs> ('PS1) ''Ladsgroup: Add patrol rights/groups to fawikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414650 (https://phabricator.wikimedia.org/T187662)'
2018-02-26 12:11:42 <wikibugs> ('CR) ''Paladox: "> Thanks Paladox! Did you test this in labs? I'd like to run some" [puppet] - ''https://gerrit.wikimedia.org/r/391336 (https://phabricator.wikimedia.org/T184562) (owner: ''Paladox)'
2018-02-26 12:25:59 <wikibugs> 'Operations, ''ops-codfw, ''DBA: db2048: RAID with predictive failure - https://phabricator.wikimedia.org/T187983#4000988 (''jcrespo) ''Resolved>''Open This failed again, I guess because using a bad disk: Predictive Failure: 1I:1:1'
2018-02-26 12:26:31 <wikibugs> 'Operations, ''ops-codfw, ''DBA: db2048: RAID with predictive failure - https://phabricator.wikimedia.org/T187983#4000990 (''Marostegui) a:''Marostegui>''Papaul'
2018-02-26 12:28:10 <wikibugs> ('CR) ''Sau226: "I've filed a request on the page the admin linked in the phab task and will add relevant info to the task when consensus is acquired." [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414509 (https://phabricator.wikimedia.org/T184959) (owner: ''Sau226)'
2018-02-26 12:30:59 <wikibugs> 'Operations: TransparencyReport-private is not auto deploying - https://phabricator.wikimedia.org/T188224#4001006 (''akosiaris) p:''Triage>''Normal It's updating fine from what I see. Both https://gerrit.wikimedia.org/r/#/admin/projects/wikimedia/TransparencyReport-private and the repo on the server are at...'
2018-02-26 12:33:48 <wikibugs> 'Operations, ''Discovery, ''Traffic, ''WMDE-Analytics-Engineering, and 3 others: Allow access to wdqs.svc.eqiad.wmnet on port 8888 - https://phabricator.wikimedia.org/T176875#4001024 (''Addshore) Bump as this is probably trivial but needs the right pair of hands to get it done.'
2018-02-26 12:58:42 <wikibugs> ('PS1) ''Ladsgroup: Enable statement usage tracking in several wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414654 (https://phabricator.wikimedia.org/T151717)'
2018-02-26 13:09:34 <moritzm> !log rebooting video scalers in eqiad for kernel security update
2018-02-26 13:09:48 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 13:22:46 <wikibugs> ('PS1) ''Arturo Borrero Gonzalez: toollabs: tools-clush-generator: introduce clush group 'one_of_each' [puppet] - ''https://gerrit.wikimedia.org/r/414657 (https://phabricator.wikimedia.org/T181647)'
2018-02-26 13:26:30 <wikibugs> ('PS2) ''Arturo Borrero Gonzalez: toollabs: tools-clush-generator: introduce clush group 'one_of_each' [puppet] - ''https://gerrit.wikimedia.org/r/414657 (https://phabricator.wikimedia.org/T181647)'
2018-02-26 13:29:30 <wikibugs> ('CR) ''Phuedx: [C: ''] "Given the comment in the header of pp_stage1_raw.dblist, I think that the pp_*.dblist can be deleted safely. If you've queued this for dep" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/413978 (owner: ''Krinkle)'
2018-02-26 13:34:56 <wikibugs> ('PS7) ''Lokal Profil: Drop the medlem user group and editallpages user right [mediawiki-config] - ''https://gerrit.wikimedia.org/r/404942 (https://phabricator.wikimedia.org/T184981)'
2018-02-26 13:37:55 <wikibugs> ('CR) ''Elukey: "Me and Gehel had a chat over IRC, we have a Kafka cluster in deployment prep but its name is not analytics, but 'jumbo-deployment-prep'. T" [puppet] - ''https://gerrit.wikimedia.org/r/414638 (owner: ''Gehel)'
2018-02-26 13:39:29 <wikibugs> 'Operations, ''Cloud-VPS, ''cloud-services-team, ''hardware-requests: eqiad: (2) systems for labstore expansion (labstore1008 & labstore1009) - https://phabricator.wikimedia.org/T186931#4001142 (''chasemp) @robh poke'
2018-02-26 13:40:36 <wikibugs> 'Operations, ''ops-eqiad, ''Patch-For-Review: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#4001145 (''chasemp) @robh or @Cmjohnson any luck figuring out what the NIC situation is here? We will have to figure out something fairly soon if we need to order different NICs.'
2018-02-26 13:59:11 <wikibugs> 'Operations, ''Puppet, ''Patch-For-Review: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#4001197 (''fgiunchedi) a:''fgiunchedi'
2018-02-26 13:59:26 <wikibugs> 'Operations, ''Puppet, ''Patch-For-Review, ''User-fgiunchedi: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#3888192 (''fgiunchedi)'
2018-02-26 14:00:04 <jouncebot> addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to snap out of that daydream and deploy European Mid-day SWAT(Max 8 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180226T1400).
2018-02-26 14:00:05 <jouncebot> Jhs, kart_, Lucas_WMDE, and Amir1: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2018-02-26 14:00:13 <Amir1> o/
2018-02-26 14:00:16 <zeljkof> I can SWAT today
2018-02-26 14:00:21 <Amir1> I also have another one coming
2018-02-26 14:00:55 <kart_> is here
2018-02-26 14:00:58 <zeljkof> kart_: I will deploy your commits first, since they are already merged, is there anything to test?
2018-02-26 14:01:06 <zeljkof> or should I just deploy?
2018-02-26 14:01:07 <kart_> zeljkof: nope.
2018-02-26 14:01:19 <kart_> Just deploy. I'll verify quickly in branch.
2018-02-26 14:01:21 <zeljkof> kart_: ok, I'll let you know once it's deployed
2018-02-26 14:01:33 <wikibugs> 'Operations, ''Phabricator, ''Patch-For-Review, ''Release-Engineering-Team (Kanban), and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#4001208 (''MoritzMuehlenhoff) >>! In T182832#3982284, @elu...'
2018-02-26 14:02:28 <Lucas_WMDE> is here
2018-02-26 14:03:31 <wikibugs> ('PS1) ''Awight: Restore ORES celery worker count; kill defaults [puppet] - ''https://gerrit.wikimedia.org/r/414666'
2018-02-26 14:03:40 <wikibugs> ('PS1) ''Ladsgroup: Enable reading full entity id from wb_terms table in three wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414667 (https://phabricator.wikimedia.org/T114903)'
2018-02-26 14:04:03 <Amir1> The third one: https://gerrit.wikimedia.org/r/414667
2018-02-26 14:04:11 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Restore ORES celery worker count; kill defaults [puppet] - ''https://gerrit.wikimedia.org/r/414666 (owner: ''Awight)'
2018-02-26 14:04:42 <wikibugs> 'Operations, ''Ops-Access-Requests: Add Ian Marlier to udp2log-users group - https://phabricator.wikimedia.org/T188042#4001214 (''MoritzMuehlenhoff) p:''Triage>''Normal a:''MoritzMuehlenhoff'
2018-02-26 14:06:32 <icinga-wm> PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 792.21 seconds
2018-02-26 14:07:20 <jynus> chcking
2018-02-26 14:08:45 <wikibugs> ('PS1) ''Muehlenhoff: Add imarlier to udp2log-users [puppet] - ''https://gerrit.wikimedia.org/r/414668 (https://phabricator.wikimedia.org/T188042)'
2018-02-26 14:10:38 <logmsgbot> !log zfilipin@tin Synchronized php-1.31.0-wmf.22/extensions/UniversalLanguageSelector/maintenance/ULSCompactLinksDisablePref.php: SWAT: [[gerrit:414609|Added option to continue script from particular User ID]] [[gerrit:414608|Use a replica dedicated to slow queries (if available) (T187880)]] (duration: 00m 58s)
2018-02-26 14:10:52 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 14:10:54 <stashbot> T187880: Improve preference migration script - https://phabricator.wikimedia.org/T187880
2018-02-26 14:11:18 <zeljkof> kart_: deployed, please check and thanks for deploying with #releng ;)
2018-02-26 14:11:23 <kart_> Okay
2018-02-26 14:11:26 <wikibugs> ('PS2) ''Awight: Restore ORES celery worker count; kill defaults [puppet] - ''https://gerrit.wikimedia.org/r/414666'
2018-02-26 14:11:28 <zeljkof> Amir1: do you want to deploy next?
2018-02-26 14:11:48 <zeljkof> (I need some time to review other patches)
2018-02-26 14:12:18 <Amir1> zeljkof: at some sort of meeting atm :/
2018-02-26 14:12:32 <zeljkof> Amir1: should I deploy your patches?
2018-02-26 14:12:40 <zeljkof> or will you do it later in the swat window?
2018-02-26 14:12:40 <Amir1> zeljkof: it would be great
2018-02-26 14:12:44 <zeljkof> Amir1: sure, will do
2018-02-26 14:13:05 <zeljkof> Jhs, Lucas_WMDE: do you want to deploy your patches, if you can?
2018-02-26 14:13:40 <Lucas_WMDE> zeljkof: I don’t have deploy rights, so I’ll have to depend on the lovely #releng folks to help me :)
2018-02-26 14:13:53 <wikibugs> ('PS1) ''Giuseppe Lavagetto: Add the --hostname switch to simple node actions. [software/conftool] - ''https://gerrit.wikimedia.org/r/414669'
2018-02-26 14:13:55 <wikibugs> ('PS1) ''Giuseppe Lavagetto: Make full path of the object seen in the output for any change in SetAction and EditAction [software/conftool] - ''https://gerrit.wikimedia.org/r/414670'
2018-02-26 14:13:56 <Lucas_WMDE> but I can test on the debug servers
2018-02-26 14:14:12 <zeljkof> Lucas_WMDE: will do :) as far as I rememer, Jhs also can not deploy
2018-02-26 14:14:45 <wikibugs> 'Operations, ''Cloud-VPS, ''cloud-services-team (Kanban): rack/setup/install labnodepool1002.eqiad.wmnet - https://phabricator.wikimedia.org/T168407#4001238 (''hashar) So yes: lets decommission labnodepool1002.eqiad.wmnet :]'
2018-02-26 14:15:23 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Add the --hostname switch to simple node actions. [software/conftool] - ''https://gerrit.wikimedia.org/r/414669 (owner: ''Giuseppe Lavagetto)'
2018-02-26 14:15:28 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Make full path of the object seen in the output for any change in SetAction and EditAction [software/conftool] - ''https://gerrit.wikimedia.org/r/414670 (owner: ''Giuseppe Lavagetto)'
2018-02-26 14:15:37 <moritzm> !log rebooting scb in codfw for kernel security updates
2018-02-26 14:15:43 <zeljkof> Jhs: around for SWAT?
2018-02-26 14:15:50 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 14:16:22 <zeljkof> Lucas_WMDE: you are next, I'll let you know when your patch is at mwdebug1002
2018-02-26 14:16:27 <Lucas_WMDE> ok thanks
2018-02-26 14:16:29 <zeljkof> in a few minutes
2018-02-26 14:17:27 <wikibugs> ('CR) ''Zfilipin: [C: ''2] "SWAT" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/413724 (https://phabricator.wikimedia.org/T184812) (owner: ''Lucas Werkmeister (WMDE))'
2018-02-26 14:17:34 <kart_> zeljkof: looks good. Thanks!
2018-02-26 14:17:46 <zeljkof> kart_: /me thumbs up ;)
2018-02-26 14:17:49 <kart_> (sorry, took more minutes than I assumed)
2018-02-26 14:18:24 <zeljkof> kart_: no problem, it's in a separate place from the other things, so I could do other stuff in parallel :)
2018-02-26 14:18:59 <wikibugs> ('Merged) ''jenkins-bot: Enable caching of constraint check results [mediawiki-config] - ''https://gerrit.wikimedia.org/r/413724 (https://phabricator.wikimedia.org/T184812) (owner: ''Lucas Werkmeister (WMDE))'
2018-02-26 14:19:13 <wikibugs> ('CR) ''jenkins-bot: Enable caching of constraint check results [mediawiki-config] - ''https://gerrit.wikimedia.org/r/413724 (https://phabricator.wikimedia.org/T184812) (owner: ''Lucas Werkmeister (WMDE))'
2018-02-26 14:19:51 <wikibugs> ('PS4) ''Niedzielski: New: add chromium_render service [puppet] - ''https://gerrit.wikimedia.org/r/409996 (https://phabricator.wikimedia.org/T178166)'
2018-02-26 14:21:21 <wikibugs> ('CR) ''Filippo Giunchedi: "> > Thanks Paladox! Did you test this in labs? I'd like to run some" [puppet] - ''https://gerrit.wikimedia.org/r/391336 (https://phabricator.wikimedia.org/T184562) (owner: ''Paladox)'
2018-02-26 14:21:30 <wikibugs> ('CR) ''Filippo Giunchedi: [C: ''-1] puppetmaster: Use ruby-mysql2 over ruby-mysql and migrate servermon to it [puppet] - ''https://gerrit.wikimedia.org/r/391336 (https://phabricator.wikimedia.org/T184562) (owner: ''Paladox)'
2018-02-26 14:21:30 <jynus> there is this research query blocking enwiki replication to dbstore1002
2018-02-26 14:22:05 <zeljkof> Lucas_WMDE: your patch is at mwdebug1002, please test and let me know if I can deploy it
2018-02-26 14:22:13 <Lucas_WMDE> zeljkof: already testing, thank you :)
2018-02-26 14:22:18 <Lucas_WMDE> looking good so far
2018-02-26 14:22:23 <wikibugs> ('CR) ''Paladox: "> > > Thanks Paladox! Did you test this in labs? I'd like to run some" [puppet] - ''https://gerrit.wikimedia.org/r/391336 (https://phabricator.wikimedia.org/T184562) (owner: ''Paladox)'
2018-02-26 14:22:48 <wikibugs> ('PS1) ''Filippo Giunchedi: puppetmaster: ruby-activerecord-deprecated-finders not in stretch [puppet] - ''https://gerrit.wikimedia.org/r/414674 (https://phabricator.wikimedia.org/T184562)'
2018-02-26 14:22:51 <wikibugs> ('PS1) ''Filippo Giunchedi: WIP ruby-mysql2 [puppet] - ''https://gerrit.wikimedia.org/r/414675 (https://phabricator.wikimedia.org/T184562)'
2018-02-26 14:24:12 <Jhs> zeljkof, i'm here now,
2018-02-26 14:24:15 <Jhs> almost forgot
2018-02-26 14:24:23 <zeljkof> Jhs: ok, you are next, please stand by :)
2018-02-26 14:24:43 <zeljkof> you patch will be at mwdebug1002 in 5-10 minutes
2018-02-26 14:24:54 <zeljkof> I'll let you know when it's there
2018-02-26 14:25:06 <Jhs> cool
2018-02-26 14:26:33 <Lucas_WMDE> zeljkof: okay, everything seems to be working so far…
2018-02-26 14:26:41 <wikibugs> ('CR) ''Volans: [C: ''] "LGTM" [software/conftool] - ''https://gerrit.wikimedia.org/r/414670 (owner: ''Giuseppe Lavagetto)'
2018-02-26 14:26:44 <Lucas_WMDE> and I think I’m done with testing
2018-02-26 14:26:52 <wikibugs> 'Operations, ''Puppet, ''Patch-For-Review, ''User-fgiunchedi: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#4001287 (''fgiunchedi) So `role::puppetmaster::standalone` with the patches proposed above works on stretch. For production AFAIK it isn't trivia...'
2018-02-26 14:26:54 <zeljkof> Lucas_WMDE: ok to deploy?
2018-02-26 14:26:54 <wikibugs> 'Operations, ''Discovery-Wikidata-Query-Service-Sprint: Activate kafka-based recent change poller for wikidata query service - https://phabricator.wikimedia.org/T188252#4001288 (''Gehel)'
2018-02-26 14:27:02 <Lucas_WMDE> yes, I think it is
2018-02-26 14:27:07 <zeljkof> Lucas_WMDE: deploying
2018-02-26 14:27:22 <Amir1> I'll be ready for deploy in five minutes or so
2018-02-26 14:27:27 <Amir1> the meeting has finished
2018-02-26 14:27:45 <addshore> jynus: I'll look at that ticket and altering the query asap :)
2018-02-26 14:28:10 <logmsgbot> !log zfilipin@tin Synchronized wmf-config/Wikibase-production.php: SWAT: [[gerrit:413724|Enable caching of constraint check results (T184812)]] (duration: 00m 55s)
2018-02-26 14:28:24 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 14:28:24 <stashbot> T184812: Enable constraint result caching on Wikidata - https://phabricator.wikimedia.org/T184812
2018-02-26 14:28:36 <zeljkof> Amir1: ok, just to deploy Jhs's patch and the swat is yours :)
2018-02-26 14:28:53 <zeljkof> Lucas_WMDE: deployed! please test and thank for deploying with #releng ;)
2018-02-26 14:29:05 <zeljkof> Jhs: merging your patch
2018-02-26 14:29:05 <Lucas_WMDE> zeljkof: thank you, always a pleasure :)
2018-02-26 14:29:30 <jynus> addshore: https://phabricator.wikimedia.org/T175790#4001318
2018-02-26 14:29:47 <jynus> that will also fix it when the analytics topology is changed
2018-02-26 14:30:01 <wikibugs> ('CR) ''Zfilipin: [C: ''2] "SWAT" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) (owner: ''Zoranzoki21)'
2018-02-26 14:30:03 <wikibugs> ('PS1) ''BBlack: interface::rps: change IRQ count without reboot [puppet] - ''https://gerrit.wikimedia.org/r/414676'
2018-02-26 14:30:06 <jynus> e.g. if you have a single staging db but enwiki is "outside"
2018-02-26 14:30:11 <Amir1> cool
2018-02-26 14:30:15 <Amir1> let me know when it's done
2018-02-26 14:30:32 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] interface::rps: change IRQ count without reboot [puppet] - ''https://gerrit.wikimedia.org/r/414676 (owner: ''BBlack)'
2018-02-26 14:31:43 <wikibugs> ('Merged) ''jenkins-bot: Add namespaces to urwiktionary [mediawiki-config] - ''https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) (owner: ''Zoranzoki21)'
2018-02-26 14:31:57 <wikibugs> ('CR) ''jenkins-bot: Add namespaces to urwiktionary [mediawiki-config] - ''https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) (owner: ''Zoranzoki21)'
2018-02-26 14:32:58 <zeljkof> Jhs: your patch is at mwdebug1002, please test and let me know if I can deploy it
2018-02-26 14:33:57 <Jhs> zeljkof, looks good as far as I can tell (Y)
2018-02-26 14:34:04 <Jhs> remember to run the script :)
2018-02-26 14:34:15 <Zoranzoki21> Wait
2018-02-26 14:34:20 <Zoranzoki21> I want to test same
2018-02-26 14:34:23 <Zoranzoki21> No deploy
2018-02-26 14:34:24 <Jhs> I don't think there are any conflicting pages
2018-02-26 14:34:43 <wikibugs> ('PS2) ''BBlack: interface::rps: change IRQ count without reboot [puppet] - ''https://gerrit.wikimedia.org/r/414676'
2018-02-26 14:35:57 <zeljkof> Jhs: uh, which script? the patch and the task are both big, can not find it
2018-02-26 14:36:21 <Jhs> zeljkof, mwscript namespaceDupes.php urwiktionary --fix
2018-02-26 14:36:33 <Jhs> (from memory)
2018-02-26 14:36:40 <Zoranzoki21> Jhs: help
2018-02-26 14:36:42 <Zoranzoki21> Jhs: http://prntscr.com/ijyrqw
2018-02-26 14:36:46 <Zoranzoki21> Jhs: Is it good?
2018-02-26 14:36:52 <wikibugs> 'Operations, ''hardware-requests: Site: (2) hardware access request for videoscalers - https://phabricator.wikimedia.org/T188075#4001344 (''Joe) I think reusing the imagescalers (which are quite beefy machines) to this purpose is a good idea. I don't think that the normal load of the videoscalers cluster meri...'
2018-02-26 14:37:15 <zeljkof> Jhs: thanks, will do
2018-02-26 14:37:33 <Jhs> Zoranzoki21, looks right. click the talk page tab as well and check that there is only one colon : in the title, not two
2018-02-26 14:37:42 <Zoranzoki21> ok
2018-02-26 14:37:44 <Zoranzoki21> looks good
2018-02-26 14:37:50 <Zoranzoki21> zeljkof: I tested more detailed
2018-02-26 14:37:57 <Zoranzoki21> zeljkof: Lets deploy it
2018-02-26 14:38:34 <zeljkof> Jhs, Zoranzoki21: ok, deploying
2018-02-26 14:40:04 <logmsgbot> !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:407901|Add namespaces to urwiktionary (T186393)]] (duration: 00m 56s)
2018-02-26 14:40:18 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 14:40:19 <stashbot> T186393: Localize namespaces on urwiktionary - https://phabricator.wikimedia.org/T186393
2018-02-26 14:40:19 <zeljkof> Jhs, Zoranzoki21: deployed, running the script
2018-02-26 14:40:22 <wikibugs> ('CR) ''Alexandros Kosiaris: [C: ''-1] "No, not really. There are clearly calls in the code that belong to mysql gem API and are not present in the mysql2 gem API. e.g. things li" [puppet] - ''https://gerrit.wikimedia.org/r/391336 (https://phabricator.wikimedia.org/T184562) (owner: ''Paladox)'
2018-02-26 14:42:41 <zeljkof> Jhs, Zoranzoki21: the script is done https://phabricator.wikimedia.org/T186393#4001362 please check and thanks for deploying with #releng! ;)
2018-02-26 14:43:15 <Jhs> thanks zeljkof :) o/
2018-02-26 14:43:19 <zeljkof> Amir1: the SWAT is all yours! :)
2018-02-26 14:43:27 <Amir1> Thanks!
2018-02-26 14:43:38 <wikibugs> ('PS2) ''Ladsgroup: Enable statement usage tracking in several wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414654 (https://phabricator.wikimedia.org/T151717)'
2018-02-26 14:43:47 <wikibugs> ('CR) ''Ladsgroup: [C: ''2] "SWAT" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414654 (https://phabricator.wikimedia.org/T151717) (owner: ''Ladsgroup)'
2018-02-26 14:43:48 <zeljkof> Amir1: don't forget to close the window with !log EU SWAT finished :)
2018-02-26 14:43:51 <icinga-wm> RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 287.79 seconds
2018-02-26 14:43:57 <Amir1> zeljkof: Sure
2018-02-26 14:45:24 <addshore> Amir1: how many patches do you have?
2018-02-26 14:45:30 <wikibugs> ('Merged) ''jenkins-bot: Enable statement usage tracking in several wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414654 (https://phabricator.wikimedia.org/T151717) (owner: ''Ladsgroup)'
2018-02-26 14:45:32 <Amir1> three
2018-02-26 14:45:35 <addshore> okay
2018-02-26 14:45:43 <addshore> I might add 1 thing to the end of swat
2018-02-26 14:47:04 <wikibugs> ('CR) ''jenkins-bot: Enable statement usage tracking in several wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414654 (https://phabricator.wikimedia.org/T151717) (owner: ''Ladsgroup)'
2018-02-26 14:47:27 <wikibugs> 'Operations, ''hardware-requests: Site: (2) hardware access request for videoscalers - https://phabricator.wikimedia.org/T188075#4001386 (''faidon) a:''RobH Sounds good. Note that eqiad has 6 imagescalers (mw1293-mw1298) and codfw has 4 now ( mw2244-2245/mw2150-2151) but let's go with reassigning 4+4 for vi...'
2018-02-26 14:47:52 <Amir1> I put v instead of the Shift + V the log is a little bit weird, sorry
2018-02-26 14:48:01 <logmsgbot> !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:V|Enable statement usage tracking in several wikis (T151717)]] (duration: 00m 57s)
2018-02-26 14:48:19 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 14:48:19 <stashbot> T151717: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717
2018-02-26 14:48:20 <wikibugs> ('CR) ''Ladsgroup: [C: ''2] "SWAT" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414650 (https://phabricator.wikimedia.org/T187662) (owner: ''Ladsgroup)'
2018-02-26 14:48:27 <wikibugs> ('PS2) ''Ladsgroup: Add patrol rights/groups to fawikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414650 (https://phabricator.wikimedia.org/T187662)'
2018-02-26 14:48:38 <wikibugs> ('CR) ''Ladsgroup: [C: ''2] "SWAT" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414650 (https://phabricator.wikimedia.org/T187662) (owner: ''Ladsgroup)'
2018-02-26 14:50:10 <godog> !log upload puppetdb 4.4.0-1~wmf1 to stretch-wikimedia - T177253
2018-02-26 14:50:10 <wikibugs> ('Merged) ''jenkins-bot: Add patrol rights/groups to fawikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414650 (https://phabricator.wikimedia.org/T187662) (owner: ''Ladsgroup)'
2018-02-26 14:50:22 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 14:50:23 <stashbot> T177253: Upgrade PuppetDB to version 4.4 - https://phabricator.wikimedia.org/T177253
2018-02-26 14:50:28 <wikibugs> ('CR) ''jenkins-bot: Add patrol rights/groups to fawikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414650 (https://phabricator.wikimedia.org/T187662) (owner: ''Ladsgroup)'
2018-02-26 14:50:38 <wikibugs> 'Operations, ''hardware-requests: eqiad/codfw: (4)+(4) hardware access request for videoscalers - https://phabricator.wikimedia.org/T188075#4001397 (''faidon) ''Open>''stalled p:''Triage>''Normal'
2018-02-26 14:51:22 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 14:52:44 <Amir1> confirming the patch works fine, moving forward
2018-02-26 14:52:58 <gehel> !log rebooting relforge for kernel upgrade
2018-02-26 14:53:12 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 14:53:12 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.145 second response time
2018-02-26 14:53:56 <wikibugs> ('PS2) ''Ladsgroup: Enable reading full entity id from wb_terms table in three wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414667 (https://phabricator.wikimedia.org/T114903)'
2018-02-26 14:54:25 <wikibugs> ('CR) ''Ladsgroup: [C: ''2] "SWAT" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414667 (https://phabricator.wikimedia.org/T114903) (owner: ''Ladsgroup)'
2018-02-26 14:54:38 <logmsgbot> !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:414650|Add patrol rights/groups to fawikisource (T187662)]] (duration: 00m 56s)
2018-02-26 14:54:51 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 14:54:51 <stashbot> T187662: Add autopatrol and related rights to fawikisource - https://phabricator.wikimedia.org/T187662
2018-02-26 14:56:31 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 14:58:51 <wikibugs> ('PS14) ''Elukey: [WIP] eventlogging: add systemd support [puppet] - ''https://gerrit.wikimedia.org/r/413362'
2018-02-26 15:00:36 <wikibugs> ('PS3) ''BBlack: interface::rps: change IRQ count without reboot [puppet] - ''https://gerrit.wikimedia.org/r/414676'
2018-02-26 15:01:03 <wikibugs> ('Merged) ''jenkins-bot: Enable reading full entity id from wb_terms table in three wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414667 (https://phabricator.wikimedia.org/T114903) (owner: ''Ladsgroup)'
2018-02-26 15:01:16 <wikibugs> ('CR) ''jenkins-bot: Enable reading full entity id from wb_terms table in three wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414667 (https://phabricator.wikimedia.org/T114903) (owner: ''Ladsgroup)'
2018-02-26 15:04:58 <addshore> Amir1: are you all done?
2018-02-26 15:05:10 <Amir1> not yet
2018-02-26 15:05:12 <addshore> okay
2018-02-26 15:05:16 <addshore> ping me when you are :)
2018-02-26 15:06:56 <wikibugs> 'Operations, ''Pybal, ''Traffic, ''Patch-For-Review: Pybal stuck at BGP state OPENSENT while the other peer reached ESTABLISHED - https://phabricator.wikimedia.org/T188085#4001464 (''Vgutierrez) On pybal-test2001 the following behaviour can be observed: ``` pybal -d | grep -i bgp pybal -d 2>&1 | grep -i b...'
2018-02-26 15:08:07 <wikibugs> ('PS3) ''Zoranzoki21: Add mushroomobserver.org to wgCopyUploadsDomains [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414401 (https://phabricator.wikimedia.org/T188203)'
2018-02-26 15:09:20 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Add mushroomobserver.org to wgCopyUploadsDomains [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414401 (https://phabricator.wikimedia.org/T188203) (owner: ''Zoranzoki21)'
2018-02-26 15:09:38 <Amir1> works fine on mwdebug1002, moving forward
2018-02-26 15:11:02 <icinga-wm> PROBLEM - Disk space on rhenium is CRITICAL: DISK CRITICAL - free space: / 1763 MB (3% inode=96%)
2018-02-26 15:11:49 <logmsgbot> !log ladsgroup@tin Synchronized wmf-config/Wikibase-production.php: [[gerrit:414667|Enable reading full entity id from wb_terms table in three wikis (T114903)]] (duration: 00m 56s)
2018-02-26 15:11:55 <gehel> !log reboot of relforge completed, cluster is green again
2018-02-26 15:12:02 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 15:12:03 <stashbot> T114903: Migrate wb_terms to using prefixed entity IDs instead of numeric IDs - https://phabricator.wikimedia.org/T114903
2018-02-26 15:12:14 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 15:12:38 <Amir1> !log This might have performance implications roll it back if it affects these wikis too much
2018-02-26 15:12:49 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 15:13:00 <Amir1> addshore: the floor is yours, don't forget to log when EU SWAT is finished
2018-02-26 15:13:02 <icinga-wm> RECOVERY - Disk space on rhenium is OK: DISK OK
2018-02-26 15:13:17 <addshore> thanks
2018-02-26 15:13:18 <addshore> will do!
2018-02-26 15:19:38 <logmsgbot> !log addshore@tin Started scap: Updated mediawiki/extensions/AdvancedSearch i18n files for some translations
2018-02-26 15:19:52 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 15:20:00 <addshore> thats the last thing in EU swat
2018-02-26 15:21:18 <wikibugs> 'Operations: Remove imagescaler cluster (aka 'rendering') - https://phabricator.wikimedia.org/T188062#4001493 (''MoritzMuehlenhoff) p:''Triage>''Normal a:''MoritzMuehlenhoff'
2018-02-26 15:27:44 <paladox> akosiaris hi, how would i replace fetch_row please? I've been looking but the only thing i've come accross is
2018-02-26 15:27:45 <paladox> https://stackoverflow.com/questions/14064649/ruby-mysql-fetching-single-row-but-still-using-each
2018-02-26 15:28:06 <paladox> also would this if rs.num_rows.zero? become if rs.count > 0 ?
2018-02-26 15:30:30 <akosiaris> paladox: I honestly don't know. I 'll have to study the mysql2 API (which I haven't had found the time do yet)
2018-02-26 15:31:08 <logmsgbot> !log addshore@tin Finished scap: Updated mediawiki/extensions/AdvancedSearch i18n files for some translations (duration: 11m 29s)
2018-02-26 15:31:18 <akosiaris> paladox: a quick look at e.g. https://github.com/brianmario/mysql2/search?utf8=%E2%9C%93&q=num_rows&type= is that tipped me off that the API is different
2018-02-26 15:31:20 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 15:31:37 <akosiaris> but that's about the depth at which I have went to up to now
2018-02-26 15:32:09 <godog> paladox: btw I took a stab at https://gerrit.wikimedia.org/r/#/c/414675/ but again it'll need testing
2018-02-26 15:32:13 <godog> also WIP
2018-02-26 15:32:19 <addshore> !log EU SWAT done
2018-02-26 15:32:33 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 15:32:35 <paladox> godog thanks.
2018-02-26 15:32:44 <paladox> akosiaris oh
2018-02-26 15:33:07 <akosiaris> I think that godog is at a good path though
2018-02-26 15:33:28 <wikibugs> ('CR) ''Paladox: WIP ruby-mysql2 (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/414675 (https://phabricator.wikimedia.org/T184562) (owner: ''Filippo Giunchedi)'
2018-02-26 15:33:48 <paladox> akosiaris yep, just one comment though
2018-02-26 15:33:51 <icinga-wm> PROBLEM - Check systemd state on rhenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
2018-02-26 15:33:54 <paladox> which i posted :)
2018-02-26 15:34:05 <godog> akosiaris: so in theory to test it on a puppetmaster what's needed is reports = servermon and the db* config values (?)
2018-02-26 15:34:20 <akosiaris> godog: yes
2018-02-26 15:34:31 <icinga-wm> PROBLEM - puppet last run on actinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
2018-02-26 15:34:47 <akosiaris> what I always end up doing is having a puppetmaster set with very low load and live testing it in production :P
2018-02-26 15:36:14 <godog> hahah live_testing_in_production.jpg
2018-02-26 15:41:24 <andrewbogott> !log marking wikitech read-only (via a local edit to CommonSettings.php) for https://phabricator.wikimedia.org/T188029
2018-02-26 15:41:26 <stashbot> andrewbogott: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 15:41:41 <andrewbogott> oh, of course
2018-02-26 15:42:07 <andrewbogott> !log marking wikitech read-only (via a local edit to CommonSettings.php) for https://phabricator.wikimedia.org/T188029
2018-02-26 15:42:19 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 15:43:12 <cmjohnson1> !log swapping failed disk db1068
2018-02-26 15:43:19 <cmjohnson1> marostegui ^
2018-02-26 15:43:23 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 15:43:26 <marostegui> cmjohnson1: thanks!!
2018-02-26 15:45:12 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.359 second response time
2018-02-26 15:45:36 <andrewbogott> !log made wikitech read/write again pending a bit more preliminary work
2018-02-26 15:45:48 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 15:46:19 <wikibugs> ('CR) ''Dzahn: "ok, i didn't know where we draw the line between "special" host and "regular" host. I figured these are monitoring hosts and that's why." [puppet] - ''https://gerrit.wikimedia.org/r/413889 (https://phabricator.wikimedia.org/T187805) (owner: ''Dzahn)'
2018-02-26 15:46:31 <cmjohnson1> marostegui disk in slot 2 is rebuilding. Ping me after and I can swap the other
2018-02-26 15:46:35 <wikibugs> ('Abandoned) ''Dzahn: network::constants: add kafkamon servers [puppet] - ''https://gerrit.wikimedia.org/r/413889 (https://phabricator.wikimedia.org/T187805) (owner: ''Dzahn)'
2018-02-26 15:47:26 <marostegui> cmjohnson1: I can see the disk now being rebuilt, thanks. It will probably take a while…probably the other one will be done tomorrow I guess.
2018-02-26 15:47:29 <wikibugs> ('PS2) ''Muehlenhoff: Switch debdeploy clients to Python 3 (WIP) [debs/debdeploy] - ''https://gerrit.wikimedia.org/r/413397'
2018-02-26 15:47:46 <cmjohnson1> okay, just let me know or ping me in task
2018-02-26 15:48:00 <wikibugs> ('PS1) ''Hashar: Tweak gbp to use 'master' has the upstream branch [software/conftool] - ''https://gerrit.wikimedia.org/r/414694'
2018-02-26 15:48:08 <marostegui> cmjohnson1: will do - thanks a lot
2018-02-26 15:48:10 <wikibugs> 'Operations, ''ops-eqiad, ''DBA: Degraded RAID on db1068 - https://phabricator.wikimedia.org/T188187#4001627 (''Marostegui) Thanks Chris: ``` root@db1068:~# megacli -PDRbld -ShowProg -PhysDrv [32:2] -aALL Rebuild Progress on Device at Enclosure 32, Slot 2 Completed 1% in 1 Minutes. ``` Once this is finish...'
2018-02-26 15:48:31 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 15:49:29 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Tweak gbp to use 'master' has the upstream branch [software/conftool] - ''https://gerrit.wikimedia.org/r/414694 (owner: ''Hashar)'
2018-02-26 15:50:04 <wikibugs> ('PS2) ''Hashar: Tweak gbp to use 'master' has the upstream branch [software/conftool] - ''https://gerrit.wikimedia.org/r/414694'
2018-02-26 15:50:06 <wikibugs> ('PS1) ''Hashar: Typo in changelog: jesse -> jessie [software/conftool] - ''https://gerrit.wikimedia.org/r/414695'
2018-02-26 15:50:36 <wikibugs> ('Abandoned) ''Dzahn: introduce role(kafkamon) and make new VMs use it [puppet] - ''https://gerrit.wikimedia.org/r/413672 (https://phabricator.wikimedia.org/T187805) (owner: ''Dzahn)'
2018-02-26 15:51:32 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Typo in changelog: jesse -> jessie [software/conftool] - ''https://gerrit.wikimedia.org/r/414695 (owner: ''Hashar)'
2018-02-26 15:51:37 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Tweak gbp to use 'master' has the upstream branch [software/conftool] - ''https://gerrit.wikimedia.org/r/414694 (owner: ''Hashar)'
2018-02-26 15:52:26 <wikibugs> ('CR) ''Hashar: "recheck" [software/conftool] - ''https://gerrit.wikimedia.org/r/414694 (owner: ''Hashar)'
2018-02-26 15:52:56 <wikibugs> ('PS4) ''BBlack: interface::rps: change IRQ count without reboot [puppet] - ''https://gerrit.wikimedia.org/r/414676'
2018-02-26 15:53:00 <wikibugs> ('PS1) ''BBlack: numa_networking: remove "isolate" experiment [puppet] - ''https://gerrit.wikimedia.org/r/414697'
2018-02-26 15:53:53 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Tweak gbp to use 'master' has the upstream branch [software/conftool] - ''https://gerrit.wikimedia.org/r/414694 (owner: ''Hashar)'
2018-02-26 15:55:30 <wikibugs> ('Abandoned) ''Dzahn: webserver_misc_apps: remove kafka related includes [puppet] - ''https://gerrit.wikimedia.org/r/413673 (https://phabricator.wikimedia.org/T187805) (owner: ''Dzahn)'
2018-02-26 15:55:31 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.618 second response time
2018-02-26 15:56:31 <wikibugs> ('PS2) ''Andrew Bogott: wikitech: grants for the new labswiki db on m5 [puppet] - ''https://gerrit.wikimedia.org/r/413884 (https://phabricator.wikimedia.org/T188029)'
2018-02-26 15:58:41 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 16:00:53 <jynus> mobrovac: is pdfrender under your umbrella? it is flopping
2018-02-26 16:00:58 <jynus> the one on 1004
2018-02-26 16:01:05 <wikibugs> ('Draft1) ''Paladox: Add build documentation on building the plugin [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414698'
2018-02-26 16:01:06 <wikibugs> ('PS2) ''Paladox: Add build documentation on building the plugin [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414698'
2018-02-26 16:01:22 <wikibugs> 'Operations, ''ops-codfw: db2049 management unable to login via ssh - https://phabricator.wikimedia.org/T187534#4001687 (''Papaul) @Marostegui can you please depool the system for me? Thanks'
2018-02-26 16:01:47 <icinga-wm> PROBLEM - MariaDB disk space on silver is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=57%)
2018-02-26 16:01:54 <jynus> ah
2018-02-26 16:01:58 <marostegui> andrewbogott: ^
2018-02-26 16:02:07 <jynus> andrewbogott: did you try to do a local backup?
2018-02-26 16:02:08 <andrewbogott> yep, I'm on it
2018-02-26 16:02:48 <icinga-wm> RECOVERY - MariaDB disk space on silver is OK: DISK OK
2018-02-26 16:02:51 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.674 second response time
2018-02-26 16:03:33 <wikibugs> ('PS2) ''Dzahn: lists: apache -> httpd module [puppet] - ''https://gerrit.wikimedia.org/r/409480'
2018-02-26 16:03:36 <wikibugs> ('PS1) ''Marostegui: db-codfw.php: Depool db2049 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414699 (https://phabricator.wikimedia.org/T187534)'
2018-02-26 16:04:18 <wikibugs> ('CR) ''BBlack: [C: ''2] "PCC says all-ok here as expect (functional no-op, no hosts are currently configured with "isolate")" [puppet] - ''https://gerrit.wikimedia.org/r/414697 (owner: ''BBlack)'
2018-02-26 16:04:31 <icinga-wm> RECOVERY - puppet last run on actinium is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
2018-02-26 16:04:52 <wikibugs> ('PS1) ''Subramanya Sastry: Enable RemexHtml on all wikinews wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414700 (https://phabricator.wikimedia.org/T188000)'
2018-02-26 16:04:54 <wikibugs> ('PS1) ''Subramanya Sastry: Enable RemexHtml on all private wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414701'
2018-02-26 16:04:56 <wikibugs> ('PS1) ''Subramanya Sastry: Enable RemexHtml on a few miscellaneous wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414702'
2018-02-26 16:04:59 <wikibugs> ('PS3) ''Dzahn: lists: apache -> httpd module [puppet] - ''https://gerrit.wikimedia.org/r/409480'
2018-02-26 16:05:22 <cmjohnson1> marostegui can you make the failed/failed ssd blink on db1111
2018-02-26 16:05:35 <marostegui> cmjohnson1: let me seeee
2018-02-26 16:05:38 <wikibugs> ('CR) ''Marostegui: [C: ''2] db-codfw.php: Depool db2049 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414699 (https://phabricator.wikimedia.org/T187534) (owner: ''Marostegui)'
2018-02-26 16:05:51 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 16:06:22 <wikibugs> 'Operations, ''Puppet, ''Patch-For-Review, ''User-fgiunchedi: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#4001762 (''fgiunchedi) I mocked some configuration values and installed mariadb on `puppetmaster-filippo-stretch2` to test `servermon.rb` reporte...'
2018-02-26 16:07:21 <wikibugs> ('CR) ''Zoranzoki21: "recheck Jenkins ill" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414401 (https://phabricator.wikimedia.org/T188203) (owner: ''Zoranzoki21)'
2018-02-26 16:07:24 <wikibugs> ('Merged) ''jenkins-bot: db-codfw.php: Depool db2049 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414699 (https://phabricator.wikimedia.org/T187534) (owner: ''Marostegui)'
2018-02-26 16:07:34 <wikibugs> ('CR) ''jenkins-bot: db-codfw.php: Depool db2049 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414699 (https://phabricator.wikimedia.org/T187534) (owner: ''Marostegui)'
2018-02-26 16:07:38 <wikibugs> ('CR) ''Zoranzoki21: "recheck" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414401 (https://phabricator.wikimedia.org/T188203) (owner: ''Zoranzoki21)'
2018-02-26 16:08:24 <wikibugs> ('PS4) ''Dzahn: lists: apache -> httpd module [puppet] - ''https://gerrit.wikimedia.org/r/409480'
2018-02-26 16:08:47 <logmsgbot> !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2049 - T187534 (duration: 00m 56s)
2018-02-26 16:09:19 <marostegui> !log Stop MySQL db2049 to get its mgmt network fixed - T187534
2018-02-26 16:09:44 <wikibugs> ('PS1) ''Urbanecm: New throttle rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414704'
2018-02-26 16:11:00 <wikibugs> ('CR) ''Alexandros Kosiaris: "Well.. the "special" part was me back in Icd266ac5f1c0edd40d07de041be90422f8003daf. I specifically wanted to create 2 sets of hosts (monit" [puppet] - ''https://gerrit.wikimedia.org/r/413889 (https://phabricator.wikimedia.org/T187805) (owner: ''Dzahn)'
2018-02-26 16:11:23 <marostegui> cmjohnson1: let me know if you see it blinking now
2018-02-26 16:11:46 <wikibugs> ('PS2) ''Urbanecm: New throttle rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414704 (https://phabricator.wikimedia.org/T188129)'
2018-02-26 16:11:54 <marostegui> !log Poweroff db2049 for maintenance - T187534
2018-02-26 16:12:13 <cmjohnson1> i see it
2018-02-26 16:12:14 <wikibugs> 'Operations, ''ops-codfw, ''Patch-For-Review: db2049 management unable to login via ssh - https://phabricator.wikimedia.org/T187534#4001815 (''Marostegui)  @papaul db2049 is now off'
2018-02-26 16:12:21 <godog> ugh, stashbot left?
2018-02-26 16:12:21 <marostegui> cmjohnson1: cool!
2018-02-26 16:12:30 <godog> bd808: ^
2018-02-26 16:12:34 <Urbanecm> jouncebot, now
2018-02-26 16:12:34 <jouncebot> No deployments scheduled for the next 1 hour(s) and 47 minute(s)
2018-02-26 16:12:46 <cmjohnson1> !log replacing disk slot 5 db1111
2018-02-26 16:14:01 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 4.666 second response time
2018-02-26 16:15:52 <wikibugs> 'Operations, ''ops-codfw, ''Patch-For-Review: db2049 management unable to login via ssh - https://phabricator.wikimedia.org/T187534#4001817 (''Papaul) Thanks'
2018-02-26 16:16:27 <wikibugs> ('CR) ''Dzahn: "http://puppet-compiler.wmflabs.org/10142/fermium.wikimedia.org/"; [puppet] - ''https://gerrit.wikimedia.org/r/409480 (owner: ''Dzahn)'
2018-02-26 16:16:51 <wikibugs> 'Operations, ''ops-eqiad, ''DBA: Disk #5 (count starts at #0) of db1111 has corrupted sectors - https://phabricator.wikimedia.org/T187526#4001826 (''Cmjohnson) The ssd was replaced, @marostegui please confirm and resolve after rebuild Return shipping informaitn USPS 9202 3946 5301 2438 0714 10 FEDEX 961191...'
2018-02-26 16:17:02 <wikibugs> 'Operations, ''Puppet, ''Patch-For-Review, ''User-fgiunchedi: Upgrade Puppet Master Infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T184562#4001827 (''Paladox) @fgiunchedi could that be the heap? https://stackoverflow.com/questions/20297524/c-free-invalid-pointer'
2018-02-26 16:17:11 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 16:20:13 <wikibugs> 'Operations, ''ops-eqiad, ''DBA: Disk #5 (count starts at #0) of db1111 has corrupted sectors - https://phabricator.wikimedia.org/T187526#4001835 (''Marostegui) @Cmjohnson looks like storage crashed and the FS became read-only. We are investigating why...'
2018-02-26 16:20:23 <mutante> !log no logging
2018-02-26 16:24:24 <wikibugs> 'Operations, ''ops-eqiad, ''DBA: Disk #5 (count starts at #0) of db1111 has corrupted sectors - https://phabricator.wikimedia.org/T187526#4001842 (''Marostegui) This is all we have from the HW logs: ``` /admin1/system1/logs1/log1-> show record3 properties CreationTimestamp = 20180226161220.000000-360...'
2018-02-26 16:24:52 <icinga-wm> PROBLEM - Host db2049.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
2018-02-26 16:25:37 <mutante> restarting stashbot on toolforge
2018-02-26 16:26:02 <mutante> !log restarted stashbot on toolforge because it didn't react to !log
2018-02-26 16:26:30 <wikibugs> 'Operations, ''Ops-Access-Requests, ''Analytics-Kanban, ''Patch-For-Review: Add Tilman to analytics-admins - https://phabricator.wikimedia.org/T178802#4001844 (''elukey) @HaeB Hi! Do you still need these perms or can we roll them back?'
2018-02-26 16:26:42 <mutante> !log test !log
2018-02-26 16:26:45 <mutante> ...
2018-02-26 16:26:56 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 16:27:14 <godog> mutante: thanks!
2018-02-26 16:27:20 <wikibugs> ('PS2) ''Filippo Giunchedi: WIP ruby-mysql2 [puppet] - ''https://gerrit.wikimedia.org/r/414675 (https://phabricator.wikimedia.org/T184562)'
2018-02-26 16:27:20 <wikibugs> ('PS1) ''Filippo Giunchedi: hieradata: depool rhodium [puppet] - ''https://gerrit.wikimedia.org/r/414706 (https://phabricator.wikimedia.org/T184562)'
2018-02-26 16:27:22 <wikibugs> ('PS1) ''Filippo Giunchedi: install_server: reinstall rhodium with Stretch [puppet] - ''https://gerrit.wikimedia.org/r/414707 (https://phabricator.wikimedia.org/T184562)'
2018-02-26 16:27:22 <mutante> godog: but it did not fix it :/
2018-02-26 16:27:33 <mutante> godog: oh, now it did :)
2018-02-26 16:27:46 <godog> mutante: yeah I think it might be related to wikitech being ro
2018-02-26 16:27:49 <mutante> i was already in "./bin/stashbot.sh tail" heh
2018-02-26 16:27:57 <mutante> found the docs on wikitech
2018-02-26 16:28:28 <mutante> dzahn user could not do it, but root could "become stashbot"
2018-02-26 16:29:11 <mutante> !log restarted stashbot on toolforge because it didn't react to !log
2018-02-26 16:29:26 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 16:30:02 <icinga-wm> RECOVERY - Host db2049.mgmt is UP: PING WARNING - Packet loss = 44%, RTA = 36.99 ms
2018-02-26 16:31:12 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.679 second response time
2018-02-26 16:31:36 <papaul> !log Maintenance: removing Msw-d4-codfw for replacement:T187534
2018-02-26 16:31:50 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 16:31:51 <stashbot> T187534: db2049 management unable to login via ssh - https://phabricator.wikimedia.org/T187534
2018-02-26 16:33:23 <wikibugs> ('PS1) ''Rush: openstack: labtestcontrol2003 to jessie [puppet] - ''https://gerrit.wikimedia.org/r/414708 (https://phabricator.wikimedia.org/T188266)'
2018-02-26 16:33:43 <marostegui> !log Reboot db1111 storage crashed - T187526
2018-02-26 16:33:58 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 16:33:58 <stashbot> T187526: Disk #5 (count starts at #0) of db1111 has corrupted sectors - https://phabricator.wikimedia.org/T187526
2018-02-26 16:34:21 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 16:35:38 <wikibugs> ('CR) ''Rush: [C: ''2] openstack: labtestcontrol2003 to jessie [puppet] - ''https://gerrit.wikimedia.org/r/414708 (https://phabricator.wikimedia.org/T188266) (owner: ''Rush)'
2018-02-26 16:35:52 <icinga-wm> PROBLEM - Host ps1-d4-codfw is DOWN: PING CRITICAL - Packet loss = 100%
2018-02-26 16:36:11 <icinga-wm> PROBLEM - Host ores2008.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
2018-02-26 16:36:18 <wikibugs> ('CR) ''Elukey: "As FYI zookeeper ferm rules are already in hiera: https://gerrit.wikimedia.org/r/#/c/413685/2/hieradata/role/common/configcluster.yaml"; [puppet] - ''https://gerrit.wikimedia.org/r/413889 (https://phabricator.wikimedia.org/T187805) (owner: ''Dzahn)'
2018-02-26 16:39:22 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.735 second response time
2018-02-26 16:39:40 <andrewbogott> !log making wikitech read-only (via a local patch) while I migrate the database to m5
2018-02-26 16:39:55 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 16:40:01 <icinga-wm> RECOVERY - Host ps1-d4-codfw is UP: PING OK - Packet loss = 0%, RTA = 38.97 ms
2018-02-26 16:41:21 <icinga-wm> RECOVERY - Host ores2008.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.93 ms
2018-02-26 16:42:31 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 16:43:21 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 276 bytes in 0.610 second response time
2018-02-26 16:46:32 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 16:47:03 <wikibugs> ('PS2) ''Herron: WIP: puppet_compiler: add support for puppetdb4 and local postgresql [puppet] - ''https://gerrit.wikimedia.org/r/413881'
2018-02-26 16:47:40 <wikibugs> ('PS1) ''Vgutierrez: Provide testing for FSM.BGPTimer [debs/pybal] - ''https://gerrit.wikimedia.org/r/414711 (https://phabricator.wikimedia.org/T188085)'
2018-02-26 16:47:44 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] WIP: puppet_compiler: add support for puppetdb4 and local postgresql [puppet] - ''https://gerrit.wikimedia.org/r/413881 (owner: ''Herron)'
2018-02-26 16:50:17 <wikibugs> ('PS2) ''Giuseppe Lavagetto: Add the --hostname switch to simple node actions. [software/conftool] - ''https://gerrit.wikimedia.org/r/414669'
2018-02-26 16:50:19 <wikibugs> ('PS2) ''Giuseppe Lavagetto: Make full path of the object seen in the output for any change in SetAction and EditAction [software/conftool] - ''https://gerrit.wikimedia.org/r/414670'
2018-02-26 16:50:53 <wikibugs> 'Operations, ''netops: cr1-eqsin faulty interfaces - https://phabricator.wikimedia.org/T187807#4001998 (''ayounsi) Most recent update was: > We are pushing the delivery by next week if there everything is smooth and no customs clearance issue. Send on the 24th. Still asking for a more accurate ETA.'
2018-02-26 16:51:28 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Add the --hostname switch to simple node actions. [software/conftool] - ''https://gerrit.wikimedia.org/r/414669 (owner: ''Giuseppe Lavagetto)'
2018-02-26 16:51:50 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Make full path of the object seen in the output for any change in SetAction and EditAction [software/conftool] - ''https://gerrit.wikimedia.org/r/414670 (owner: ''Giuseppe Lavagetto)'
2018-02-26 16:51:53 <wikibugs> 'Operations, ''ops-codfw, ''netops: codfw: mgmt switch replacement in D4 - https://phabricator.wikimedia.org/T187816#4002005 (''Papaul) ''Open>''Resolved Switch replacement complete. - Racktables update - Test serial console of all 3 servers connected to msw-d4-codfw Resolving this task.'
2018-02-26 16:52:20 <wikibugs> 'Operations, ''ops-codfw, ''Patch-For-Review: db2049 management unable to login via ssh - https://phabricator.wikimedia.org/T187534#4002008 (''Marostegui) Looks like this is back to life: ``` root@db2049.mgmt.codfw.wmnet's password: User:root logged-in to ILO2M245205HN.(10.193.1.99 / FE80::FE15:B4FF:FE92:E...'
2018-02-26 16:54:29 <wikibugs> 'Operations: replace bast1001 (new hardware) - https://phabricator.wikimedia.org/T183412#4002028 (''Dzahn) p:''Triage>''High doing this now with hi prio ( using misc system from T184480#3938314)'
2018-02-26 16:57:41 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.276 second response time
2018-02-26 16:58:27 <wikibugs> ('PS1) ''Giuseppe Lavagetto: Release 1.0.0 [software/conftool] - ''https://gerrit.wikimedia.org/r/414715'
2018-02-26 16:59:40 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Release 1.0.0 [software/conftool] - ''https://gerrit.wikimedia.org/r/414715 (owner: ''Giuseppe Lavagetto)'
2018-02-26 17:00:29 <wikibugs> ('PS1) ''Vgutierrez: Provide unique logging for BGP instances [debs/pybal] - ''https://gerrit.wikimedia.org/r/414716 (https://phabricator.wikimedia.org/T188085)'
2018-02-26 17:00:44 <wikibugs> ('CR) ''Giuseppe Lavagetto: "recheck" [software/conftool] - ''https://gerrit.wikimedia.org/r/414715 (owner: ''Giuseppe Lavagetto)'
2018-02-26 17:01:52 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 17:01:52 <wikibugs> 'Operations, ''ops-eqiad, ''DBA: Disk #5 (count starts at #0) of db1111 has corrupted sectors - https://phabricator.wikimedia.org/T187526#4002101 (''Marostegui) Looks like the new disk has not been added automatically to the RAID. I have been digging around the PERC menu, but it is terribly slow from here,...'
2018-02-26 17:04:03 <wikibugs> ('CR) ''Jforrester: [C: ''] Enable RemexHtml on all wikinews wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414700 (https://phabricator.wikimedia.org/T188000) (owner: ''Subramanya Sastry)'
2018-02-26 17:04:18 <wikibugs> ('CR) ''Jforrester: [C: ''] Enable RemexHtml on all private wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414701 (owner: ''Subramanya Sastry)'
2018-02-26 17:05:12 <wikibugs> ('CR) ''Jforrester: [C: ''] Enable RemexHtml on a few miscellaneous wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414702 (owner: ''Subramanya Sastry)'
2018-02-26 17:07:17 <wikibugs> 'Operations, ''ops-codfw, ''Patch-For-Review: db2049 management unable to login via ssh - https://phabricator.wikimedia.org/T187534#4002144 (''Papaul) ''Open>''Resolved - Power drain server - Reset ILO Server is back up.'
2018-02-26 17:09:02 <wikibugs> 'Operations: Remove 'moodbar-admin' from 'staff' global group - https://phabricator.wikimedia.org/T188278#4002152 (''MarcoAurelio)'
2018-02-26 17:13:38 <wikibugs> ('CR) ''Andrew Bogott: "I've manually applied these changes on db1009" [puppet] - ''https://gerrit.wikimedia.org/r/413884 (https://phabricator.wikimedia.org/T188029) (owner: ''Andrew Bogott)'
2018-02-26 17:14:11 <icinga-wm> PROBLEM - HHVM jobrunner on mw1305 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
2018-02-26 17:15:01 <wikibugs> 'Operations, ''Phabricator, ''Patch-For-Review, ''Release-Engineering-Team (Kanban), and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#4002197 (''mmodell) Thanks @MoritzMuehlenhoff! Please let...'
2018-02-26 17:15:11 <icinga-wm> RECOVERY - HHVM jobrunner on mw1305 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.001 second response time
2018-02-26 17:16:19 <wikibugs> 'Operations: Remove 'moodbar-admin' from 'staff' global group - https://phabricator.wikimedia.org/T188278#4002199 (''MarcoAurelio) Also, `sendemail-new-users` was recently added to the steward group, but it is not listed as an avalaible permission anymore. Perhaps it should be taken away too.'
2018-02-26 17:17:19 <wikibugs> ('CR) ''Alexandros Kosiaris: "FWIW rhodium is old enough to warrant refresh. At the same time we are doing quite well perf wise so to even justify doing without it. But" [puppet] - ''https://gerrit.wikimedia.org/r/414707 (https://phabricator.wikimedia.org/T184562) (owner: ''Filippo Giunchedi)'
2018-02-26 17:20:38 <tgr> jynus: hi, do you know when T188048 might go out?
2018-02-26 17:20:39 <stashbot> T188048: Deploy ReadingLists schema change for efficient count(*) handling - https://phabricator.wikimedia.org/T188048
2018-02-26 17:21:15 <wikibugs> 'Operations, ''Pybal, ''Traffic, ''Patch-For-Review: Pybal stuck at BGP state OPENSENT while the other peer reached ESTABLISHED - https://phabricator.wikimedia.org/T188085#4002221 (''Vgutierrez) As suggested by @mark, https://gerrit.wikimedia.org/r/414716 provides unique logging for bgp.py classes based o...'
2018-02-26 17:25:07 <jynus> tgr: on a meeting and a possible outage
2018-02-26 17:25:18 <jynus> ask on whatever ticket is for an estimation
2018-02-26 17:26:15 <wikibugs> ('CR) ''Bstorm: "Would it be possible to use random.choice instead of [0]? I haven't looked at the data structure, but that seems like it would work here." [puppet] - ''https://gerrit.wikimedia.org/r/414657 (https://phabricator.wikimedia.org/T181647) (owner: ''Arturo Borrero Gonzalez)'
2018-02-26 17:30:12 <wikibugs> 'Operations, ''Continuous-Integration-Infrastructure, ''Goal, ''Release-Engineering-Team (Watching / External), and 2 others: Add Prometheus exporter to Jenkins instances - https://phabricator.wikimedia.org/T182759#4002278 (''greg)'
2018-02-26 17:30:21 <wikibugs> 'Operations, ''Gerrit, ''Release-Engineering-Team (Watching / External): Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086#4002280 (''greg)'
2018-02-26 17:32:23 <akosiaris> !log shutdown sca1004 on ganeti1005 for T181121
2018-02-26 17:32:43 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 17:32:43 <stashbot> T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O - https://phabricator.wikimedia.org/T181121
2018-02-26 17:32:54 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: "> Would it be possible to use random.choice instead of [0]? I" [puppet] - ''https://gerrit.wikimedia.org/r/414657 (https://phabricator.wikimedia.org/T181647) (owner: ''Arturo Borrero Gonzalez)'
2018-02-26 17:34:12 <icinga-wm> PROBLEM - Host sca1004 is DOWN: PING CRITICAL - Packet loss = 100%
2018-02-26 17:34:43 <wikibugs> ('CR) ''Rush: "would it be possible to grab the non-floating-ip assigned instance in an HA pair for this pool I wonder?" [puppet] - ''https://gerrit.wikimedia.org/r/414657 (https://phabricator.wikimedia.org/T181647) (owner: ''Arturo Borrero Gonzalez)'
2018-02-26 17:34:51 <jynus> !log deploying new query killer to db1109
2018-02-26 17:35:04 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 17:36:31 <wikibugs> 'Operations, ''Mobile-Content-Service, ''ORES, ''Reading-Infrastructure-Team-Backlog, and 2 others: Limit resources used by ORES - https://phabricator.wikimedia.org/T146664#4002357 (''awight) ''Open>''Resolved a:''awight We've moved to a dedicated cluster—the best possible way to limit resources ;-)'
2018-02-26 17:36:42 <icinga-wm> PROBLEM - HP RAID on db2048 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Failed: 1I:1:1 - Controller: OK - Battery/Capacitor: OK
2018-02-26 17:36:43 <icinga-wm> ACKNOWLEDGEMENT - HP RAID on db2048 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Failed: 1I:1:1 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T188286
2018-02-26 17:36:51 <wikibugs> 'Operations, ''ops-codfw: Degraded RAID on db2048 - https://phabricator.wikimedia.org/T188286#4002363 (''ops-monitoring-bot)'
2018-02-26 17:37:23 <wikibugs> ('PS1) ''Chad: Revert "Enable caching of constraint check results" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414720'
2018-02-26 17:39:23 <wikibugs> ('PS2) ''Chad: Revert "Enable caching of constraint check results" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414720'
2018-02-26 17:39:30 <wikibugs> ('CR) ''Chad: [V: ''2 C: ''2] Revert "Enable caching of constraint check results" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414720 (owner: ''Chad)'
2018-02-26 17:39:31 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.846 second response time
2018-02-26 17:39:48 <wikibugs> ('CR) ''Bstorm: ">" [puppet] - ''https://gerrit.wikimedia.org/r/414657 (https://phabricator.wikimedia.org/T181647) (owner: ''Arturo Borrero Gonzalez)'
2018-02-26 17:39:50 <wikibugs> ('CR) ''jenkins-bot: Revert "Enable caching of constraint check results" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414720 (owner: ''Chad)'
2018-02-26 17:39:56 <wikibugs> ('CR) ''Lucas Werkmeister (WMDE): [C: ''] "Doesn’t hurt too much even if it’s unrelated – it just means some API requests will take longer again, just as they did before today’s SWA" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414720 (owner: ''Chad)'
2018-02-26 17:40:34 <wikibugs> 'Operations, ''ops-codfw, ''DBA: db2048: RAID with predictive failure - https://phabricator.wikimedia.org/T187983#4002381 (''Papaul) a:''Papaul>''Marostegui Disk placement complete.'
2018-02-26 17:40:48 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: "> would it be possible to grab the non-floating-ip assigned instance" [puppet] - ''https://gerrit.wikimedia.org/r/414657 (https://phabricator.wikimedia.org/T181647) (owner: ''Arturo Borrero Gonzalez)'
2018-02-26 17:41:02 <logmsgbot> !log demon@tin Synchronized wmf-config/Wikibase-production.php: Revert "Enable caching of constraint check results" (duration: 00m 57s)
2018-02-26 17:41:05 <stashbot> demon@tin: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 17:41:05 <no_justification> marostegui, jynus ^^^^
2018-02-26 17:41:14 <greg-g> hu, failed to log?
2018-02-26 17:41:16 <jynus> thanks
2018-02-26 17:41:17 <marostegui> no_justification: is it deployed then?
2018-02-26 17:41:23 <marostegui> greg-g: yeah, !log is under maintenance
2018-02-26 17:41:27 <greg-g> gotcha
2018-02-26 17:41:29 <jynus> I am going to kill gain
2018-02-26 17:41:37 <jynus> to see they don't come back
2018-02-26 17:41:42 <icinga-wm> PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 22 probes of 289 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
2018-02-26 17:41:58 <jynus> if someone else can check errors also go down
2018-02-26 17:41:58 <marostegui> jynus: cool, I can see some small downs on the graph, but those are from the killing probably
2018-02-26 17:42:02 <marostegui> I am
2018-02-26 17:44:32 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 17:45:00 <jynus> things seem under control
2018-02-26 17:45:17 <jynus> I am almost sure the revert fixed it
2018-02-26 17:45:23 <marostegui> yes
2018-02-26 17:45:28 <jynus> or, in other words, the patch was the cause
2018-02-26 17:45:32 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.945 second response time
2018-02-26 17:47:06 <Lucas_WMDE> okay, so now I need to figure out what the hell my code was doing wrong…
2018-02-26 17:47:15 <greg-g> :)
2018-02-26 17:48:25 <jynus> breaking wikidata :-)
2018-02-26 17:48:30 <marostegui> xddd
2018-02-26 17:48:34 <jynus> you should communicate with users
2018-02-26 17:48:41 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 17:48:42 <jynus> and explain there was some slowdown
2018-02-26 17:48:48 <Lucas_WMDE> but I wasn’t the one who reverted the change, so I don’t deserve a T-shirt? ;)
2018-02-26 17:48:53 <jynus> maybe an incident report
2018-02-26 17:49:13 <jynus> the slowdown was slow to buildup
2018-02-26 17:49:28 <jynus> so it was not detected by monitoring inmediatelyu
2018-02-26 17:50:46 <jynus> Lucas_WMDE: this is the best way to see it: https://grafana.wikimedia.org/dashboard/db/mysql-aggregated?panelId=11&fullscreen&orgId=1&from=1519592991194&to=1519667375278&var-dc=eqiad%20prometheus%2Fops&var-group=core&var-shard=All&var-role=All
2018-02-26 17:51:07 <Lucas_WMDE> oh dear lord
2018-02-26 17:51:12 <jynus> 24-second extra latency while normally it should be 200 ms
2018-02-26 17:51:39 <marostegui> Lucas_WMDE: and this was the network used on an affected server: https://grafana.wikimedia.org/dashboard/file/server-board.json?panelId=10&fullscreen&orgId=1&var-server=db1109&var-network=eth0&from=now-6h&to=now&refresh=1m
2018-02-26 17:51:41 <icinga-wm> RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 11 probes of 289 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
2018-02-26 17:51:54 <jynus> marostegui: actually it was affecting all servers
2018-02-26 17:52:04 <marostegui> jynus: yeah, it was an example of a server :)
2018-02-26 17:52:10 <jynus> ah, ok
2018-02-26 17:52:22 <jynus> a prometheus alert would be nice
2018-02-26 17:52:30 <Lucas_WMDE> I really don’t understand how this could happen, though…
2018-02-26 17:53:41 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.496 second response time
2018-02-26 17:56:27 <wikibugs> 'Operations, ''Ops-Access-Requests, ''Patch-For-Review: Give 'sudo -u yarn' asccess to joal on analytics-hadoop-workers nodes - https://phabricator.wikimedia.org/T187723#4002456 (''RobH) >>! In T187723#3983455, @elukey wrote: > I support the request, and it might be wise to allow this simple diff for the wh...'
2018-02-26 17:56:47 <wikibugs> ('PS2) ''RobH: admin::data: allow analytics-admins to sudo as yarn [puppet] - ''https://gerrit.wikimedia.org/r/412704 (https://phabricator.wikimedia.org/T187723) (owner: ''Elukey)'
2018-02-26 17:56:51 <icinga-wm> PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 17:57:00 <robh> elukey: did you wanna merge your patchset for yarn sudo?
2018-02-26 17:57:11 <robh> (can wait post meeting but didnt wanna let it sit abandoned)
2018-02-26 17:57:41 <elukey> robh: sure I can do it now
2018-02-26 17:57:54 <logmsgbot> !log mobrovac@tin Started restart [electron-render/deploy@94d27d7]: Stuck, restart - T174916
2018-02-26 17:57:57 <stashbot> mobrovac@tin: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 17:57:58 <stashbot> T174916: electron/pdfrender hangs - https://phabricator.wikimedia.org/T174916
2018-02-26 17:58:04 <wikibugs> 'Operations, ''ops-eqiad, ''DBA: Degraded RAID on db1068 - https://phabricator.wikimedia.org/T188187#4002471 (''Marostegui) The rebuilt failed for this disk, I guess this disk was not in a good state: ``` PD: 0 Information Enclosure Device ID: 32 Slot Number: 2 Drive's position: DiskGroup: 0, Span: 1, Arm:...'
2018-02-26 17:58:35 <wikibugs> ('CR) ''Elukey: [C: ''2] admin::data: allow analytics-admins to sudo as yarn [puppet] - ''https://gerrit.wikimedia.org/r/412704 (https://phabricator.wikimedia.org/T187723) (owner: ''Elukey)'
2018-02-26 17:58:51 <robh> cool
2018-02-26 17:59:41 <icinga-wm> RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time
2018-02-26 18:00:04 <jouncebot> gehel: #bothumor I � Unicode. All rise for Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180226T1800).
2018-02-26 18:00:04 <jouncebot> No GERRIT patches in the queue for this window AFAICS.
2018-02-26 18:00:06 <wikibugs> 'Operations, ''Ops-Access-Requests, ''Patch-For-Review: Give 'sudo -u yarn' asccess to joal on analytics-hadoop-workers nodes - https://phabricator.wikimedia.org/T187723#4002487 (''elukey) ''Open>''Resolved'
2018-02-26 18:00:27 <gehel> jouncebot: Deploy new Updater, new GUI and new whitelist.txt coming up...
2018-02-26 18:02:10 <wikibugs> ('CR) ''Chad: [V: ''2 C: ''2] Add build documentation on building the plugin [software/gerrit/plugins/wikimedia] - ''https://gerrit.wikimedia.org/r/414698 (owner: ''Paladox)'
2018-02-26 18:02:49 <wikibugs> ('CR) ''Chad: [C: ''2] ExtensionDistributor: Ignore empty repositories [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414612 (owner: ''Legoktm)'
2018-02-26 18:04:20 <wikibugs> ('Merged) ''jenkins-bot: ExtensionDistributor: Ignore empty repositories [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414612 (owner: ''Legoktm)'
2018-02-26 18:04:24 <wikibugs> 'Operations, ''hardware-requests: eqiad/codfw: (4)+(4) hardware access request for videoscalers - https://phabricator.wikimedia.org/T188075#4002505 (''brion) Nice. :D'
2018-02-26 18:05:21 <logmsgbot> !log gehel@tin Started deploy [wdqs/wdqs@4edbbaa]: new update, GUI and whitelist.txt
2018-02-26 18:05:23 <stashbot> gehel@tin: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 18:06:57 <wikibugs> ('CR) ''jenkins-bot: ExtensionDistributor: Ignore empty repositories [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414612 (owner: ''Legoktm)'
2018-02-26 18:08:37 <wikibugs> ('PS1) ''Awight: Enable Extension:JADE on all beta cluster wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414729 (https://phabricator.wikimedia.org/T176333)'
2018-02-26 18:10:05 <logmsgbot> !log gehel@tin Finished deploy [wdqs/wdqs@4edbbaa]: new update, GUI and whitelist.txt (duration: 04m 44s)
2018-02-26 18:10:08 <stashbot> gehel@tin: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 18:10:29 <gehel> anyone knows which error logs to check for stashbot?
2018-02-26 18:11:17 <wikibugs> ('PS1) ''Greg Grossmeier: beta: add fr.wikipedia for LE cert [puppet] - ''https://gerrit.wikimedia.org/r/414730 (https://phabricator.wikimedia.org/T188288)'
2018-02-26 18:11:27 <gehel> SMalyshev: deployment completed, tests are green (except wdqs1004, still down - T188045)
2018-02-26 18:11:28 <stashbot> T188045: wdqs1004 broken - https://phabricator.wikimedia.org/T188045
2018-02-26 18:11:47 <greg-g> gehel: 17:41:23 marostegui | greg-g: yeah, !log is under maintenance
2018-02-26 18:12:06 <gehel> Oh, I missed that one...
2018-02-26 18:12:13 <gehel> greg-g: thanks!
2018-02-26 18:12:14 <greg-g> marostegui: who's maintenance'ing logs/stashbot?
2018-02-26 18:12:32 <marostegui> greg-g: andrewbogott is working on wikitech and it is set to read_only
2018-02-26 18:13:02 <andrewbogott> I hope to have it back soon but there are a fair number of unknowns
2018-02-26 18:13:06 <greg-g> marostegui: ahhhhhh
2018-02-26 18:13:11 <greg-g> andrewbogott: godspeed
2018-02-26 18:13:51 <logmsgbot> !log demon@tin Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: Ignore empty repositories (duration: 00m 56s)
2018-02-26 18:14:08 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 18:16:39 <wikibugs> ('PS5) ''Dzahn: lists: apache -> httpd module [puppet] - ''https://gerrit.wikimedia.org/r/409480'
2018-02-26 18:16:45 <wikibugs> ('CR) ''Greg Grossmeier: "Is this all that needs to be done to get a LE cert for a domain in Beta Cluster?" [puppet] - ''https://gerrit.wikimedia.org/r/414730 (https://phabricator.wikimedia.org/T188288) (owner: ''Greg Grossmeier)'
2018-02-26 18:18:31 <icinga-wm> RECOVERY - Host sca1004 is UP: PING OK - Packet loss = 0%, RTA = 0.43 ms
2018-02-26 18:20:42 <wikibugs> ('PS1) ''Andrew Bogott: wikitech: use 'labswiki' database on m5-master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414733 (https://phabricator.wikimedia.org/T188029)'
2018-02-26 18:22:23 <wikibugs> ('CR) ''Dzahn: [C: ''] "wasn't involved in setting this up, but after looking at code.. looks like this is all, yea" [puppet] - ''https://gerrit.wikimedia.org/r/414730 (https://phabricator.wikimedia.org/T188288) (owner: ''Greg Grossmeier)'
2018-02-26 18:22:32 <wikibugs> ('CR) ''Dzahn: [C: ''2] beta: add fr.wikipedia for LE cert [puppet] - ''https://gerrit.wikimedia.org/r/414730 (https://phabricator.wikimedia.org/T188288) (owner: ''Greg Grossmeier)'
2018-02-26 18:23:45 <greg-g> thanks mutante :)
2018-02-26 18:24:50 <mutante> greg-g: you're welcome. .. when logging in on that machine i see though that last puppet run as 19072 minutes ago
2018-02-26 18:24:56 <greg-g> mutante: added you because of https://gerrit.wikimedia.org/r/c/386077 (you +2'd it) :)
2018-02-26 18:24:58 <logmsgbot> !log gehel@tin Started deploy [wdqs/wdqs@f74cbd1]: new forAllCategoryWikis.sh
2018-02-26 18:25:03 <greg-g> mutante: ugh
2018-02-26 18:25:12 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 18:25:42 <mutante> yea, there is usually some other issue
2018-02-26 18:25:51 <mutante> but means we cant apply it
2018-02-26 18:26:08 <mutante> "not find data item profile::cache::kafka::webrequest::kafka_cluster_name in any Hiera data file"
2018-02-26 18:27:42 <wikibugs> 'Operations, ''ops-eqiad, ''Discovery, ''Wikidata, and 2 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4002694 (''Gehel) a:''Gehel Hardware diagnostic is running, I'll report back with the results when completed.'
2018-02-26 18:31:27 <logmsgbot> !log gehel@tin Finished deploy [wdqs/wdqs@f74cbd1]: new forAllCategoryWikis.sh (duration: 06m 28s)
2018-02-26 18:31:38 <gehel> SMalyshev: ^ done !
2018-02-26 18:31:40 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 18:33:42 <wikibugs> ('CR) ''Rush: "while it should be possible to know the floating ip in these cases before hand and to verify which instance it is assigned to during dynam" [puppet] - ''https://gerrit.wikimedia.org/r/414657 (https://phabricator.wikimedia.org/T181647) (owner: ''Arturo Borrero Gonzalez)'
2018-02-26 18:36:14 <wikibugs> 'Operations, ''Discovery-Wikidata-Query-Service-Sprint: Activate kafka-based recent change poller for wikidata query service - https://phabricator.wikimedia.org/T188252#4002716 (''Smalyshev) p:''Triage>''Normal'
2018-02-26 18:44:14 <wikibugs> 'Operations, ''ops-eqiad, ''Analytics-Cluster, ''Analytics-Kanban: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4002756 (''RobH) p:''Triage>''Normal'
2018-02-26 18:49:38 <wikibugs> ('PS1) ''Dzahn: deployment-prep: set profile::cache::kafka::webrequest::kafka_cluster_name [puppet] - ''https://gerrit.wikimedia.org/r/414738 (https://phabricator.wikimedia.org/T188288)'
2018-02-26 18:50:04 <wikibugs> ('PS5) ''BBlack: rps: change IRQs without reboot on bnx2x [puppet] - ''https://gerrit.wikimedia.org/r/414676'
2018-02-26 18:50:06 <wikibugs> ('PS1) ''BBlack: Add net_driver fact [puppet] - ''https://gerrit.wikimedia.org/r/414739'
2018-02-26 18:50:08 <wikibugs> ('PS1) ''BBlack: lvs - use new fact to determine bnx2x [puppet] - ''https://gerrit.wikimedia.org/r/414740'
2018-02-26 18:51:00 <wikibugs> ('PS2) ''Dzahn: deployment-prep: set profile::cache::kafka::webrequest::kafka_cluster_name [puppet] - ''https://gerrit.wikimedia.org/r/414738 (https://phabricator.wikimedia.org/T188288)'
2018-02-26 18:51:23 <wikibugs> ('CR) ''Dzahn: [C: ''2] deployment-prep: set profile::cache::kafka::webrequest::kafka_cluster_name [puppet] - ''https://gerrit.wikimedia.org/r/414738 (https://phabricator.wikimedia.org/T188288) (owner: ''Dzahn)'
2018-02-26 18:51:48 <Lucas_WMDE> greg-g / : T184812 no longer needs to be UBN!, right? you reverted the change, everything fine again for now. (and it’s a config change, so we don’t need to worry about the next train reintroducing the issue either.)
2018-02-26 18:51:48 <stashbot> T184812: Enable constraint result caching on Wikidata - https://phabricator.wikimedia.org/T184812
2018-02-26 18:52:07 <Lucas_WMDE> oops, IRC client dropped the mention – marostegui ^
2018-02-26 18:52:18 <greg-g> Lucas_WMDE: right
2018-02-26 18:52:24 <Lucas_WMDE> ok thanks
2018-02-26 18:52:34 <Lucas_WMDE> updated
2018-02-26 18:54:19 <icinga-wm> PROBLEM - mysqld processes on silver is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld
2018-02-26 18:54:46 <herron> !log disabling puppet agents and rebooting codfw puppet masters for kernel update
2018-02-26 18:54:51 <chasemp> ^ should be known, apologies. andrewbogott ^^ silver
2018-02-26 18:55:00 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 18:55:22 <andrewbogott> fights with icinga to mute
2018-02-26 18:59:30 <James_F> is here, for when the bot pings.
2018-02-26 19:00:04 <jouncebot> James_F, Zoranzoki21, and Urbanecm: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2018-02-26 19:00:04 <jouncebot> addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180226T1900). Please do the needful.
2018-02-26 19:01:16 <andrewbogott> James_F: I'm going to want to deploy my change either last or not at all… will update in a few
2018-02-26 19:01:22 <herron> !log codfw puppet master kernel updates complete — re-enabling puppet agents
2018-02-26 19:01:25 <stashbot> herron: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 19:01:26 <James_F> OK.
2018-02-26 19:01:38 <andrewbogott> um… also of course my change isn't on wikitech because it's read-only :)
2018-02-26 19:01:43 <herron> !log --
2018-02-26 19:01:45 <stashbot> herron: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 19:01:55 <herron> !log codfw puppet master kernel updates complete re-enabling puppet agents
2018-02-26 19:01:57 <stashbot> herron: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 19:02:56 <wikibugs> 'Operations, ''ops-eqiad, ''Patch-For-Review: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#3867914 (''faidon) So for some reason (WMCS bad luck!), these seem to have been ordered with Intel NIC daughter cards. We have had Intel NICs only in the distant past, 99% of our 10G f...'
2018-02-26 19:03:14 <wikibugs> 'Operations, ''ops-eqiad, ''Patch-For-Review: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#4002878 (''faidon) a:''Cmjohnson>''RobH'
2018-02-26 19:04:19 <RoanKattouw> Is anyone doing the SWAT or should I do it?
2018-02-26 19:05:07 <James_F> RoanKattouw: Could you?
2018-02-26 19:05:37 <icinga-wm> RECOVERY - mysqld processes on silver is OK: PROCS OK: 1 process with command name mysqld
2018-02-26 19:06:19 <wikibugs> 'Operations, ''ops-eqiad, ''Analytics-Cluster, ''Analytics-Kanban: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4002884 (''elukey) > OS Version: Existing hadoop worker nodes use Jessie. Can these new hosts be stretch? We still haven't tested Hadoop packages on stretch...'
2018-02-26 19:06:35 <wikibugs> 'Operations, ''ops-eqiad, ''Analytics-Cluster, ''Analytics-Kanban, ''User-Elukey: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4002886 (''elukey)'
2018-02-26 19:06:42 <andrewbogott> James_F: sorry for the commotion; I've rolled back my changes for the moment and you should be able to log normally for now
2018-02-26 19:06:59 <James_F> OK.
2018-02-26 19:07:56 <RoanKattouw> James_F: So help me understand your config patch
2018-02-26 19:08:05 <herron> !log codfw puppet master kernel updates complete re-enabling puppet agents
2018-02-26 19:08:19 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 19:08:28 <wikibugs> 'Operations, ''ops-eqiad, ''Analytics-Cluster, ''Analytics-Kanban, ''User-Elukey: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4002892 (''RobH)'
2018-02-26 19:09:19 <James_F> RoanKattouw: Stack of patches, starting with that, then once that's deployed, https://gerrit.wikimedia.org/r/#/c/413656/ will go out in the train.
2018-02-26 19:09:40 <James_F> RoanKattouw: Changing the meaning of wgVisualEditorEnableWikitext and adding wgVisualEditorEnableWikitextBetaFeature.
2018-02-26 19:09:58 <wikibugs> ('CR) ''Dzahn: [C: ''2] "this remove one error but due to:" [puppet] - ''https://gerrit.wikimedia.org/r/414738 (https://phabricator.wikimedia.org/T188288) (owner: ''Dzahn)'
2018-02-26 19:11:01 <wikibugs> ('PS6) ''Dzahn: lists: apache -> httpd module [puppet] - ''https://gerrit.wikimedia.org/r/409480'
2018-02-26 19:11:56 <wikibugs> ('PS4) ''Catrope: Add mushroomobserver.org to wgCopyUploadsDomains [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414401 (https://phabricator.wikimedia.org/T188203) (owner: ''Zoranzoki21)'
2018-02-26 19:12:01 <wikibugs> ('CR) ''Catrope: [C: ''2] Add mushroomobserver.org to wgCopyUploadsDomains [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414401 (https://phabricator.wikimedia.org/T188203) (owner: ''Zoranzoki21)'
2018-02-26 19:12:31 <wikibugs> 'Operations, ''ops-eqiad, ''Patch-For-Review: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#4002903 (''chasemp) >>! In T183937#4002875, @faidon wrote: >Any disagreements? Nope, please and thank you. Really appreciate you and DC Ops working through our puzzles.'
2018-02-26 19:12:35 <RoanKattouw> James_F: Oh I see, you're setting a cofig var that doesn't exist yet
2018-02-26 19:12:42 <icinga-wm> PROBLEM - HHVM rendering on mw2150 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 19:12:47 <James_F> RoanKattouw: Yeah, rather than break the world with the train. :-)
2018-02-26 19:12:47 <wikibugs> ('CR) ''Catrope: [C: ''2] 2017 wikitext editor: Simplify config part 1 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/413651 (owner: ''Jforrester)'
2018-02-26 19:13:27 <wikibugs> ('Merged) ''jenkins-bot: Add mushroomobserver.org to wgCopyUploadsDomains [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414401 (https://phabricator.wikimedia.org/T188203) (owner: ''Zoranzoki21)'
2018-02-26 19:13:41 <wikibugs> ('CR) ''jenkins-bot: Add mushroomobserver.org to wgCopyUploadsDomains [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414401 (https://phabricator.wikimedia.org/T188203) (owner: ''Zoranzoki21)'
2018-02-26 19:13:41 <icinga-wm> RECOVERY - HHVM rendering on mw2150 is OK: HTTP OK: HTTP/1.1 200 OK - 82081 bytes in 1.060 second response time
2018-02-26 19:14:18 <wikibugs> ('CR) ''Catrope: [C: ''2] New throttle rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414704 (https://phabricator.wikimedia.org/T188129) (owner: ''Urbanecm)'
2018-02-26 19:14:27 <wikibugs> 'Operations: TransparencyReport-private is not auto deploying - https://phabricator.wikimedia.org/T188224#4002915 (''APalmer_WMF) Thanks, everyone! We spoke with @Catrope last week, and he was able to get it working again. Is there any way to determine why it happened / make sure it doesn't happen again?'
2018-02-26 19:15:46 <wikibugs> ('Merged) ''jenkins-bot: New throttle rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414704 (https://phabricator.wikimedia.org/T188129) (owner: ''Urbanecm)'
2018-02-26 19:16:33 <RoanKattouw> Lucas_WMDE: Are you around to verify the SWAT deployment of https://gerrit.wikimedia.org/r/#/c/414714/ ?
2018-02-26 19:16:44 <Lucas_WMDE> I am around
2018-02-26 19:17:01 <Lucas_WMDE> but currently writing an incident report for an outage caused by a change I had in the earlier SWAT
2018-02-26 19:17:03 <wikibugs> ('CR) ''jenkins-bot: New throttle rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414704 (https://phabricator.wikimedia.org/T188129) (owner: ''Urbanecm)'
2018-02-26 19:17:08 <Lucas_WMDE> so, just so you know… :)
2018-02-26 19:17:18 <Lucas_WMDE> I won’t be mad if you say “this guy can’t be trusted right now not to break the wiki”
2018-02-26 19:17:28 <Lucas_WMDE> I think this change is fine, but I thought so of the other one too
2018-02-26 19:18:22 <wikibugs> ('PS2) ''Catrope: 2017 wikitext editor: Simplify config part 1 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/413651 (owner: ''Jforrester)'
2018-02-26 19:18:29 <wikibugs> ('CR) ''Catrope: 2017 wikitext editor: Simplify config part 1 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/413651 (owner: ''Jforrester)'
2018-02-26 19:18:32 <wikibugs> ('CR) ''Catrope: [C: ''2] 2017 wikitext editor: Simplify config part 1 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/413651 (owner: ''Jforrester)'
2018-02-26 19:19:58 <RoanKattouw> haha OK no worries
2018-02-26 19:20:00 <wikibugs> ('Merged) ''jenkins-bot: 2017 wikitext editor: Simplify config part 1 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/413651 (owner: ''Jforrester)'
2018-02-26 19:20:17 <wikibugs> ('CR) ''jenkins-bot: 2017 wikitext editor: Simplify config part 1 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/413651 (owner: ''Jforrester)'
2018-02-26 19:21:53 <greg-g> side-eyes both of you
2018-02-26 19:22:21 <logmsgbot> !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Add mushroomobserver.org to wgCopyUploadsDomains (T188203) (duration: 00m 57s)
2018-02-26 19:22:25 <awight> puts on an extremely trustworthy expression in anticipation of the scap stare
2018-02-26 19:22:35 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 19:22:35 <stashbot> T188203: Please add <http://mushroomobserver.org>; to the wgCopyUploadsDomains whitelist of Wikimedia Commons - https://phabricator.wikimedia.org/T188203
2018-02-26 19:23:27 <Amir1> I'm around in the office for a while in case things break
2018-02-26 19:23:41 <icinga-wm> PROBLEM - High lag on wdqs1003 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1800.0] https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
2018-02-26 19:23:49 <wikibugs> ('CR) ''Herron: [C: ''] hieradata: depool rhodium [puppet] - ''https://gerrit.wikimedia.org/r/414706 (https://phabricator.wikimedia.org/T184562) (owner: ''Filippo Giunchedi)'
2018-02-26 19:24:06 <wikibugs> 'Operations, ''ops-codfw: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4002973 (''RobH) p:''Triage>''Normal'
2018-02-26 19:24:28 <wikibugs> 'Operations, ''ops-codfw: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4002987 (''RobH) Before these are racked, I'd like someone to review my racking proposal: Racking Proposal: mw systems in codfw have been racked in the #3 and #4 racks in each row. Presently, there is a bi...'
2018-02-26 19:26:51 <wikibugs> 'Operations, ''Phabricator, ''Patch-For-Review, ''Release-Engineering-Team (Someday): Add support for stretch in the phabricator puppet class - https://phabricator.wikimedia.org/T187127#4002998 (''greg)'
2018-02-26 19:26:52 <logmsgbot> !log catrope@tin Synchronized wmf-config/throttle.php: Add throttle rule (T188129) (duration: 00m 56s)
2018-02-26 19:27:07 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 19:27:08 <stashbot> T188129: Request for allowance of multiple account registers from same IP for 2018-02-27 - https://phabricator.wikimedia.org/T188129
2018-02-26 19:28:18 <MaxSem> RoanKattouw: could you ping me when you're done? need to deploy some config cleanups
2018-02-26 19:28:34 <RoanKattouw> Sure will do
2018-02-26 19:29:20 <logmsgbot> !log catrope@tin Synchronized wmf-config/CommonSettings.php: Simplify 2017 wikitext editor config (part 1) (duration: 00m 54s)
2018-02-26 19:29:33 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 19:29:57 <wikibugs> ('CR) ''Dzahn: [C: ''2] lists: apache -> httpd module [puppet] - ''https://gerrit.wikimedia.org/r/409480 (owner: ''Dzahn)'
2018-02-26 19:32:10 <wikibugs> ('CR) ''Dzahn: [C: ''2] "no-op on fermium .. nothing at all" [puppet] - ''https://gerrit.wikimedia.org/r/409480 (owner: ''Dzahn)'
2018-02-26 19:32:16 <James_F> RoanKattouw: Thanks.
2018-02-26 19:35:01 <wikibugs> ('CR) ''Herron: [C: ''] "in addition I think it would be more intuitive if all puppet masters followed the same naming convention. but for the purposes of upgradi" [puppet] - ''https://gerrit.wikimedia.org/r/414707 (https://phabricator.wikimedia.org/T184562) (owner: ''Filippo Giunchedi)'
2018-02-26 19:35:17 <wikibugs> 'Operations, ''Electron-PDFs, ''Proton, ''Readers-Web-Backlog, and 4 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#4003039 (''phuedx)'
2018-02-26 19:36:35 <wikibugs> ('PS1) ''Dzahn: lists: move httpd class to role [puppet] - ''https://gerrit.wikimedia.org/r/414748'
2018-02-26 19:38:33 <wikibugs> ('CR) ''Dzahn: [C: ''2] "see how this was ok but did not fix the style violation of including from another module. https://gerrit.wikimedia.org/r/#/c/414748/ will" [puppet] - ''https://gerrit.wikimedia.org/r/409480 (owner: ''Dzahn)'
2018-02-26 19:39:18 <wikibugs> ('CR) ''Dzahn: "@akosiaris since we recently talked about the location of the httpd declaration. this is an example why i move it to role classes" [puppet] - ''https://gerrit.wikimedia.org/r/414748 (owner: ''Dzahn)'
2018-02-26 19:40:37 <wikibugs> ('CR) ''Dzahn: [C: ''2] lists: move httpd class to role [puppet] - ''https://gerrit.wikimedia.org/r/414748 (owner: ''Dzahn)'
2018-02-26 19:40:40 <RoanKattouw> Lucas_WMDE: OK, your patch is on mwdebug1002, please test
2018-02-26 19:40:57 <RoanKattouw> Sorry that it took so long, Jenkins took a while and then I got distracted and briefly forgot that I was doing the SWAT
2018-02-26 19:41:07 <Lucas_WMDE> no problem
2018-02-26 19:41:11 <Lucas_WMDE> seems to be working
2018-02-26 19:41:16 <Lucas_WMDE> and I’ll quickly check that no grafana boards are exploding ;)
2018-02-26 19:41:29 <Lucas_WMDE> (though I guess that wouldn’t happen just from mwdebug, hm)
2018-02-26 19:42:12 <wikibugs> ('CR) ''Dzahn: [C: ''2] "still everything no-op on fermium" [puppet] - ''https://gerrit.wikimedia.org/r/414748 (owner: ''Dzahn)'
2018-02-26 19:42:42 <Lucas_WMDE> RoanKattouw: everything okay as far as I can tell
2018-02-26 19:42:44 <Lucas_WMDE> crosses fingers
2018-02-26 19:43:37 <wikibugs> ('PS2) ''Dzahn: varnish: add misc director for design.wm.org -> bromine [puppet] - ''https://gerrit.wikimedia.org/r/413986 (https://phabricator.wikimedia.org/T185282)'
2018-02-26 19:44:05 <Hauskatze> Keegan: beta eswiki is locked (closed) so I doubt users could do much testing (this is in reply of your announcement on my talk; oh and fwiw I've been testing them already :) )
2018-02-26 19:44:29 <Keegan> Ha, okay, thanks. I'll remove it :)
2018-02-26 19:44:53 <wikibugs> ('CR) ''Dzahn: [C: ''2] varnish: add misc director for design.wm.org -> bromine [puppet] - ''https://gerrit.wikimedia.org/r/413986 (https://phabricator.wikimedia.org/T185282) (owner: ''Dzahn)'
2018-02-26 19:45:01 <RoanKattouw> Alright here goes
2018-02-26 19:46:26 <mutante> !log running puppet on cache::misc servers to add new director for design.wm
2018-02-26 19:46:36 <logmsgbot> !log catrope@tin Synchronized php-1.31.0-wmf.22/extensions/WikibaseQualityConstraints/: T184937 (duration: 01m 03s)
2018-02-26 19:46:40 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 19:46:53 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 19:46:54 <stashbot> T184937: Change wbcheckconstraints’ status parameter’s default value to cacheable value - https://phabricator.wikimedia.org/T184937
2018-02-26 19:54:12 <wikibugs> ('PS1) ''Pmiazga: Enable HTML Previews on all wikipedias [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414751 (https://phabricator.wikimedia.org/T182319)'
2018-02-26 19:56:59 <wikibugs> ('CR) ''Jdlrobson: [C: ''] Enable HTML Previews on all wikipedias (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414751 (https://phabricator.wikimedia.org/T182319) (owner: ''Pmiazga)'
2018-02-26 19:57:23 <andrewbogott> James_F: is SWAT all done or still in progress? (I'm going to break wikitech again)
2018-02-26 19:57:57 <wikibugs> ('PS2) ''Dzahn: xhgui: apache -> httpd module [puppet] - ''https://gerrit.wikimedia.org/r/410620'
2018-02-26 19:58:44 <wikibugs> ('CR) ''Dzahn: [C: ''2] "http://puppet-compiler.wmflabs.org/10143/tungsten.eqiad.wmnet/ and "delta -3"" [puppet] - ''https://gerrit.wikimedia.org/r/410620 (owner: ''Dzahn)'
2018-02-26 19:59:51 <icinga-wm> RECOVERY - High lag on wdqs1003 is OK: OK: Less than 30.00% above the threshold [600.0] https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
2018-02-26 20:00:30 <wikibugs> ('CR) ''Dzahn: [C: ''2] "no-op on tungsten. https://performance.wikimedia.org/xhgui is fine" [puppet] - ''https://gerrit.wikimedia.org/r/410620 (owner: ''Dzahn)'
2018-02-26 20:00:43 <wikibugs> ('PS1) ''Urbanecm: Add new throttle rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414755'
2018-02-26 20:01:03 <wikibugs> 'Operations, ''ops-codfw: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4002973 (''Papaul) @Robh since I have rack space to covert in B3 (9-17) what about not put anything in A3 and put 7 hosts in B3 see below |rack|systems| |A4|5| |B3|7| |D3|10| |D4|10|'
2018-02-26 20:01:41 <wikibugs> ('PS2) ''Urbanecm: Add new throttle rule [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414755 (https://phabricator.wikimedia.org/T188292)'
2018-02-26 20:02:26 <Lucas_WMDE> RoanKattouw: thanks for the deploy btw!
2018-02-26 20:02:59 <Lucas_WMDE> server board and mysql-aggregated still look okay as wel
2018-02-26 20:03:19 <wikibugs> 'Operations, ''ops-codfw: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4003134 (''RobH) >>! In T188301#4003113, @Papaul wrote: > @Robh since I have rack space to covert in B3 (9-17) what about not put anything in A3 and put 7 hosts in B3 see below > |rack|systems| > |A4|5| >...'
2018-02-26 20:03:44 <SMalyshev> anybody knows who owns noc.wikimedia.org?
2018-02-26 20:03:57 <no_justification> Nobody really, but whats up?
2018-02-26 20:04:01 <wikibugs> 'Operations, ''ops-codfw, ''Patch-For-Review: rack/setup/install wdqs200[4-6] - https://phabricator.wikimedia.org/T187800#4003141 (''Papaul)'
2018-02-26 20:04:01 <no_justification> docroots are in wmf-config
2018-02-26 20:04:04 <SMalyshev> there has been some change in URLs there and I am trying to figure out where it comes from
2018-02-26 20:04:10 <no_justification> Yes, I did some stuff
2018-02-26 20:04:11 <andrewbogott> RoanKattouw: still swatting?
2018-02-26 20:04:12 <no_justification> Friday :)
2018-02-26 20:04:14 <SMalyshev> ahh, ok, so it's wmf-config
2018-02-26 20:04:23 <RoanKattouw> andrewbogott: Sorry I'm done now
2018-02-26 20:04:25 <RoanKattouw> cc MaxSem
2018-02-26 20:04:30 <andrewbogott> great, thanks
2018-02-26 20:04:32 <wikibugs> ('CR) ''Pmiazga: "All wikis were using the RestbasePlain endpoint, now we want to use the HTML endpoint. I'll check with services whether is possible to en" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414751 (https://phabricator.wikimedia.org/T182319) (owner: ''Pmiazga)'
2018-02-26 20:04:35 <andrewbogott> I'm going to disable !log again
2018-02-26 20:04:50 <wikibugs> ('PS2) ''Awight: Enable Extension:JADE on all beta cluster wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414729 (https://phabricator.wikimedia.org/T176333)'
2018-02-26 20:04:52 <awight> aww
2018-02-26 20:05:01 <no_justification> Ahhh, the "raw" links are still busted
2018-02-26 20:05:04 <no_justification> I can fix that!
2018-02-26 20:05:44 <SMalyshev> no_justification: so you moved dblists to subdir on noc, correct?
2018-02-26 20:06:11 <awight> MaxSem: wanna throw https://gerrit.wikimedia.org/r/#/c/414729/ into your deployment, or just lmk when you’re done and I can deploy?
2018-02-26 20:06:11 <no_justification> Yep
2018-02-26 20:06:19 <SMalyshev> no_justification: is there some config which lists this path? it broke categories dump to wdqs pipeline, since URLs were all wrong
2018-02-26 20:06:28 <wikibugs> ('PS1) ''Urbanecm: Publish throttle-analyze at noc [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414758 (https://phabricator.wikimedia.org/T187894)'
2018-02-26 20:06:33 <no_justification> It's just what you see in the noc docroot, no config
2018-02-26 20:06:42 <no_justification> But basically, put them in that subdirectory
2018-02-26 20:06:51 <SMalyshev> I fixed the URL but I'd like for it not to happen again
2018-02-26 20:06:54 <no_justification> noc.wm.o/conf/all.dblist -> noc.wm.o/conf/dblists/all.dblist
2018-02-26 20:06:57 <no_justification> It won't
2018-02-26 20:07:00 <no_justification> It was a one-time change
2018-02-26 20:07:35 <wikibugs> ('CR) ''Ppchelko: Enable HTML Previews on all wikipedias (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414751 (https://phabricator.wikimedia.org/T182319) (owner: ''Pmiazga)'
2018-02-26 20:08:02 <SMalyshev> so no config that has that path? would appreciate an announcement next time then :)
2018-02-26 20:08:45 <no_justification> mea culpa, I'll mention it next time
2018-02-26 20:08:57 <no_justification> And nope, no config. noc.wm.o is very hacky and ad hoc
2018-02-26 20:08:57 <wikibugs> ('PS2) ''Dzahn: admins: Add imarlier to udp2log-users [puppet] - ''https://gerrit.wikimedia.org/r/414668 (https://phabricator.wikimedia.org/T188042) (owner: ''Muehlenhoff)'
2018-02-26 20:09:34 <SMalyshev> ok, I'll keep hardcoding it then :)
2018-02-26 20:09:44 <wikibugs> ('CR) ''Dzahn: [C: ''] "right group for access to oxygen as requested (and a little bit more, but seems good)" [puppet] - ''https://gerrit.wikimedia.org/r/414668 (https://phabricator.wikimedia.org/T188042) (owner: ''Muehlenhoff)'
2018-02-26 20:09:58 <wikibugs> ('PS1) ''Chad: noc.wm.o: Stop urlencoding filenames [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414759'
2018-02-26 20:10:55 <wikibugs> ('PS1) ''Ppchelko: [JoqbQueue] Switch refreshLinks for all but wikipedia and wiktionary. [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414760 (https://phabricator.wikimedia.org/T185052)'
2018-02-26 20:11:12 <wikibugs> 'Operations, ''Wikimedia-Site-requests: Remove 'moodbar-admin' from 'staff' global group - https://phabricator.wikimedia.org/T188278#4003163 (''Dzahn)'
2018-02-26 20:12:18 <wikibugs> 'Operations, ''ops-codfw, ''netops: switch port configuration for wdq200[4-6] - https://phabricator.wikimedia.org/T188303#4003165 (''Papaul) p:''Triage>''Normal'
2018-02-26 20:13:48 <awight> MaxSem: If you’re not deploying yet, then I’ll jump in to SWAT my config change, shall I?
2018-02-26 20:15:04 <awight> ^ I shall…
2018-02-26 20:15:24 <wikibugs> ('CR) ''Awight: [C: ''2] Enable Extension:JADE on all beta cluster wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414729 (https://phabricator.wikimedia.org/T176333) (owner: ''Awight)'
2018-02-26 20:16:27 <wikibugs> 'Operations, ''Wikimedia-Site-requests: Remove 'moodbar-admin' from 'staff' global group - https://phabricator.wikimedia.org/T188278#4003216 (''demon) ''Open>''Resolved a:''demon Done.'
2018-02-26 20:16:41 <icinga-wm> RECOVERY - HP RAID on db2048 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK
2018-02-26 20:16:49 <wikibugs> ('CR) ''Legoktm: "Has this extension been reviewed yet?" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414729 (https://phabricator.wikimedia.org/T176333) (owner: ''Awight)'
2018-02-26 20:16:57 <wikibugs> ('Merged) ''jenkins-bot: Enable Extension:JADE on all beta cluster wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414729 (https://phabricator.wikimedia.org/T176333) (owner: ''Awight)'
2018-02-26 20:17:14 <wikibugs> 'Operations, ''Wikimedia-Site-requests: Remove 'moodbar-admin' from 'staff' global group - https://phabricator.wikimedia.org/T188278#4003236 (''demon) (Just the moodbar one, I'd like more info on the sendemail-new-users bit)'
2018-02-26 20:17:17 <wikibugs> ('CR) ''jenkins-bot: Enable Extension:JADE on all beta cluster wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414729 (https://phabricator.wikimedia.org/T176333) (owner: ''Awight)'
2018-02-26 20:17:18 <legoktm> awight: uhh
2018-02-26 20:17:23 <wikibugs> ('CR) ''Awight: [C: ''2] "> Has this extension been reviewed yet?" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414729 (https://phabricator.wikimedia.org/T176333) (owner: ''Awight)'
2018-02-26 20:17:34 <wikibugs> ('Draft1) ''Paladox: gerrit: Ajust scap files (DO NOT MERGE) [software/gerrit] (stable-2.14) - ''https://gerrit.wikimedia.org/r/414763'
2018-02-26 20:17:35 <legoktm> awight: https://www.mediawiki.org/wiki/Review_queue#Preparing_for_deployment
2018-02-26 20:17:36 <wikibugs> ('Draft2) ''Paladox: gerrit: Ajust scap files (DO NOT MERGE) [software/gerrit] (stable-2.14) - ''https://gerrit.wikimedia.org/r/414763'
2018-02-26 20:17:52 <wikibugs> ('CR) ''Legoktm: "https://www.mediawiki.org/wiki/Review_queue#Preparing_for_deployment"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414729 (https://phabricator.wikimedia.org/T176333) (owner: ''Awight)'
2018-02-26 20:17:58 <awight> legoktm: This is just the beta cluster, is that necessary?
2018-02-26 20:18:02 <legoktm> yes
2018-02-26 20:18:06 <awight> kk lemme revert
2018-02-26 20:18:15 <wikibugs> ('PS1) ''Awight: Revert "Enable Extension:JADE on all beta cluster wikis" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414764'
2018-02-26 20:18:22 <wikibugs> ('CR) ''Awight: [C: ''2] Revert "Enable Extension:JADE on all beta cluster wikis" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414764 (owner: ''Awight)'
2018-02-26 20:18:24 <wikibugs> ('PS3) ''Paladox: gerrit: Ajust scap files (DO NOT MERGE) [software/gerrit] (stable-2.14) - ''https://gerrit.wikimedia.org/r/414763'
2018-02-26 20:18:52 <awight> legoktm: You’re right, thanks for calling that out.
2018-02-26 20:19:47 <awight> I read “production deployment tracking task arf arf arf”, if you’re a Gary Larson fan…
2018-02-26 20:21:02 <greg-g> thanks legoktm and awight :) yes, it's required :)
2018-02-26 20:21:05 <no_justification> Also: please please please do not use extension-list-labs
2018-02-26 20:21:25 <no_justification> Oh wait you did it right
2018-02-26 20:21:30 <wikibugs> ('Merged) ''jenkins-bot: Revert "Enable Extension:JADE on all beta cluster wikis" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414764 (owner: ''Awight)'
2018-02-26 20:21:33 <no_justification> I reflexively was upset!
2018-02-26 20:21:34 <no_justification> <3
2018-02-26 20:21:45 <awight> lol
2018-02-26 20:21:49 <wikibugs> 'Operations, ''ops-codfw: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4003262 (''Papaul) a:''Papaul'
2018-02-26 20:22:10 <awight> no_justification: I did notice the file, and decided I didn’t want to keep that sort of company
2018-02-26 20:22:14 <wikibugs> ('CR) ''Mobrovac: Enable HTML Previews on all wikipedias (''1 comment) [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414751 (https://phabricator.wikimedia.org/T182319) (owner: ''Pmiazga)'
2018-02-26 20:24:11 <wikibugs> ('Abandoned) ''Krinkle: highlight.php: Don't use the escaped URL for the raw URL either [mediawiki-config] - ''https://gerrit.wikimedia.org/r/413939 (owner: ''Chad)'
2018-02-26 20:24:16 <wikibugs> ('CR) ''Krinkle: [C: ''] noc.wm.o: Stop urlencoding filenames [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414759 (owner: ''Chad)'
2018-02-26 20:27:01 <icinga-wm> RECOVERY - Check systemd state on rhenium is OK: OK - running: The system is fully operational
2018-02-26 20:27:54 <no_justification> awight: That file is about to be deleted ;-)
2018-02-26 20:28:30 <awight> I also didn’t edit noc config :p
2018-02-26 20:28:53 <wikibugs> ('PS1) ''Chad: Move FileImporter/FileExporter to general extension setup [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414767'
2018-02-26 20:30:05 <icinga-wm> PROBLEM - Check systemd state on rhenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
2018-02-26 20:30:46 <wikibugs> ('CR) ''Chad: [C: ''2] noc.wm.o: Stop urlencoding filenames [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414759 (owner: ''Chad)'
2018-02-26 20:32:16 <wikibugs> ('PS1) ''Pmiazga: Enable VirtualPagePreviews events on beta cluster [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414769 (https://phabricator.wikimedia.org/T184793)'
2018-02-26 20:32:20 <wikibugs> ('Merged) ''jenkins-bot: noc.wm.o: Stop urlencoding filenames [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414759 (owner: ''Chad)'
2018-02-26 20:33:06 <wikibugs> ('PS1) ''Awight: [DNM] Enable Extension:JADE on all beta cluster wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414771 (https://phabricator.wikimedia.org/T176333)'
2018-02-26 20:33:18 <wikibugs> ('PS2) ''Jdlrobson: Enable VirtualPagePreviews events on beta cluster [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414769 (https://phabricator.wikimedia.org/T186728) (owner: ''Pmiazga)'
2018-02-26 20:33:21 <wikibugs> ('PS3) ''Pmiazga: beta: enable VirtualPagePreviews events on beta cluster [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414769 (https://phabricator.wikimedia.org/T184793)'
2018-02-26 20:33:35 <wikibugs> ('CR) ''Awight: "Waiting for security review: T188308" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414771 (https://phabricator.wikimedia.org/T176333) (owner: ''Awight)'
2018-02-26 20:33:50 <wikibugs> ('CR) ''Jdlrobson: [C: ''] beta: enable VirtualPagePreviews events on beta cluster [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414769 (https://phabricator.wikimedia.org/T184793) (owner: ''Pmiazga)'
2018-02-26 20:34:11 <logmsgbot> !log demon@tin Synchronized docroot/noc/conf/: Fix urlencoding (duration: 00m 57s)
2018-02-26 20:34:16 <stashbot> demon@tin: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 20:34:30 <awight> yuck
2018-02-26 20:34:34 <wikibugs> ('CR) ''jenkins-bot: Revert "Enable Extension:JADE on all beta cluster wikis" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414764 (owner: ''Awight)'
2018-02-26 20:34:36 <wikibugs> ('CR) ''jenkins-bot: noc.wm.o: Stop urlencoding filenames [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414759 (owner: ''Chad)'
2018-02-26 20:35:08 <no_justification> Krenair: All the links work again!
2018-02-26 20:35:09 <no_justification> Yay!
2018-02-26 20:35:15 <wikibugs> ('PS4) ''Pmiazga: beta: enable VirtualPagePreviews events on beta cluster [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414769 (https://phabricator.wikimedia.org/T184793)'
2018-02-26 20:37:37 <wikibugs> 'Operations, ''Wikimedia-Incident: Detect high server load earlier – prometheus alert? - https://phabricator.wikimedia.org/T188317#4003428 (''Lucas_Werkmeister_WMDE)'
2018-02-26 20:37:51 <wikibugs> ('CR) ''Pmiazga: "@Ppchelko, @Mobrovac - thanks for your input. For now, we decided to keep it for wikis only. Later we will have another review of all proj" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414751 (https://phabricator.wikimedia.org/T182319) (owner: ''Pmiazga)'
2018-02-26 20:43:41 <wikibugs> ('PS1) ''Rush: openstack: groundwork for labtestn on mitaka [puppet] - ''https://gerrit.wikimedia.org/r/414773 (https://phabricator.wikimedia.org/T188266)'
2018-02-26 20:52:26 <wikibugs> ('PS2) ''Rush: openstack: groundwork for labtestn on mitaka [puppet] - ''https://gerrit.wikimedia.org/r/414773 (https://phabricator.wikimedia.org/T188266)'
2018-02-26 20:55:11 <wikibugs> ('PS2) ''Ppchelko: [JoqbQueue] Switch refreshLinks for all but wikipedia and wiktionary. [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414760 (https://phabricator.wikimedia.org/T185052)'
2018-02-26 20:58:47 <wikibugs> ('PS3) ''Rush: openstack: groundwork for labtestn on mitaka [puppet] - ''https://gerrit.wikimedia.org/r/414773 (https://phabricator.wikimedia.org/T188266)'
2018-02-26 20:59:18 <wikibugs> ('PS4) ''Rush: openstack: groundwork for labtestn on mitaka [puppet] - ''https://gerrit.wikimedia.org/r/414773 (https://phabricator.wikimedia.org/T188266)'
2018-02-26 21:00:05 <jouncebot> cscott, arlolra, subbu, bearND, halfak, and Amir1: (Dis)respected human, time to deploy Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180226T2100). Please do the needful.
2018-02-26 21:00:05 <jouncebot> No GERRIT patches in the queue for this window AFAICS.
2018-02-26 21:00:22 <subbu> arlo will be doing a parsoid deploy
2018-02-26 21:00:22 <icinga-wm> PROBLEM - HHVM rendering on mw2121 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 21:01:12 <icinga-wm> RECOVERY - HHVM rendering on mw2121 is OK: HTTP OK: HTTP/1.1 200 OK - 82088 bytes in 0.258 second response time
2018-02-26 21:05:37 <wikibugs> ('CR) ''Rush: [C: ''2] openstack: groundwork for labtestn on mitaka [puppet] - ''https://gerrit.wikimedia.org/r/414773 (https://phabricator.wikimedia.org/T188266) (owner: ''Rush)'
2018-02-26 21:14:41 <icinga-wm> PROBLEM - Disk space on rhenium is CRITICAL: DISK CRITICAL - free space: / 1767 MB (3% inode=96%)
2018-02-26 21:17:09 <logmsgbot> !log mholloway-shell@tin Started deploy [mobileapps/deploy@9970f97]: Update mobileapps to 8aa38e7
2018-02-26 21:17:11 <stashbot> mholloway-shell@tin: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 21:17:41 <icinga-wm> PROBLEM - Disk space on rhenium is CRITICAL: DISK CRITICAL - free space: / 1489 MB (3% inode=96%)
2018-02-26 21:20:02 <icinga-wm> PROBLEM - haproxy failover on dbproxy1005 is CRITICAL: CRITICAL check_failover servers up 2 down 1
2018-02-26 21:21:44 <wikibugs> ('PS1) ''Legoktm: Fix $wgShellRestrictionMethod typo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414830 (https://phabricator.wikimedia.org/T188039)'
2018-02-26 21:21:46 <wikibugs> ('CR) ''Legoktm: [C: ''2] Fix $wgShellRestrictionMethod typo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414830 (https://phabricator.wikimedia.org/T188039) (owner: ''Legoktm)'
2018-02-26 21:23:02 <wikibugs> ('Merged) ''jenkins-bot: Fix $wgShellRestrictionMethod typo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414830 (https://phabricator.wikimedia.org/T188039) (owner: ''Legoktm)'
2018-02-26 21:23:16 <wikibugs> ('CR) ''jenkins-bot: Fix $wgShellRestrictionMethod typo [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414830 (https://phabricator.wikimedia.org/T188039) (owner: ''Legoktm)'
2018-02-26 21:23:42 <logmsgbot> !log mholloway-shell@tin Finished deploy [mobileapps/deploy@9970f97]: Update mobileapps to 8aa38e7 (duration: 06m 33s)
2018-02-26 21:23:44 <stashbot> mholloway-shell@tin: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 21:24:46 <logmsgbot> !log legoktm@tin Synchronized wmf-config/InitialiseSettings.php: Fix $wgShellRestrictionMethod typo - T188039 (duration: 00m 57s)
2018-02-26 21:24:49 <stashbot> legoktm@tin: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 21:25:43 <wikibugs> ('PS1) ''Legoktm: Revert "Fix $wgShellRestrictionMethod typo" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414831'
2018-02-26 21:25:46 <wikibugs> ('CR) ''Legoktm: [C: ''2] Revert "Fix $wgShellRestrictionMethod typo" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414831 (owner: ''Legoktm)'
2018-02-26 21:27:30 <logmsgbot> !log arlolra@tin Started deploy [parsoid/deploy@cf9b02e]: Updating Parsoid to 24c783c
2018-02-26 21:27:32 <stashbot> arlolra@tin: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 21:28:25 <wikibugs> ('CR) ''Legoktm: [V: ''2 C: ''2] Revert "Fix $wgShellRestrictionMethod typo" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414831 (owner: ''Legoktm)'
2018-02-26 21:28:48 <subbu> arlolra, wikitech wiki is currently locked for migration. so, you will have to manually update the sal page once this is back to read-write
2018-02-26 21:29:02 <subbu> locked for *db migration
2018-02-26 21:29:42 <subbu> ah, looks like you are not the only one.
2018-02-26 21:29:47 <logmsgbot> !log legoktm@tin Synchronized wmf-config/InitialiseSettings.php: Revert Fix $wgShellRestrictionMethod typo - T188039 (duration: 00m 55s)
2018-02-26 21:29:49 <stashbot> legoktm@tin: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 21:30:44 <wikibugs> ('CR) ''jenkins-bot: Revert "Fix $wgShellRestrictionMethod typo" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414831 (owner: ''Legoktm)'
2018-02-26 21:32:53 <no_justification> subbu: Ideally someone can grab all the missed !log entries and do it en masse :)
2018-02-26 21:33:07 <subbu> no_justification, yup.
2018-02-26 21:33:47 <wikibugs> 'Operations, ''ops-eqiad, ''fundraising-tech-ops, ''Patch-For-Review: Rack/setup frmon1001 - https://phabricator.wikimedia.org/T186073#4003598 (''cwdent) @cmjohnson - the console pw needs changed, i will get it to you securely'
2018-02-26 21:40:42 <Krenair> '<no_justification> Krenair: All the links work again!'
2018-02-26 21:40:42 <Krenair> '<no_justification''> Yay!'
2018-02-26 21:40:45 <Krenair> I think this was for Krinkle
2018-02-26 21:40:52 <no_justification> Whoops, yes it was
2018-02-26 21:40:53 <no_justification> Haha
2018-02-26 21:40:57 <no_justification> But yay anyway?
2018-02-26 21:40:58 <no_justification> :p
2018-02-26 21:42:25 <Krenair> yay indeed
2018-02-26 21:42:26 <logmsgbot> !log arlolra@tin Finished deploy [parsoid/deploy@cf9b02e]: Updating Parsoid to 24c783c (duration: 14m 57s)
2018-02-26 21:42:29 <stashbot> arlolra@tin: Failed to log message to wiki. Somebody should check the error logs.
2018-02-26 21:45:04 <Krinkle> no_justification: nice
2018-02-26 21:47:05 <no_justification> Krinkle: That killed /70/ symlinks
2018-02-26 21:47:13 <no_justification> Yay for less indirection in wmf-config!
2018-02-26 21:48:23 <wikibugs> ('CR) ''Chad: [C: ''2] Move FileImporter/FileExporter to general extension setup [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414767 (owner: ''Chad)'
2018-02-26 21:51:57 <icinga-wm> PROBLEM - Nginx local proxy to apache on mw1346 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.006 second response time
2018-02-26 21:52:53 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Move FileImporter/FileExporter to general extension setup [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414767 (owner: ''Chad)'
2018-02-26 21:52:58 <icinga-wm> RECOVERY - Nginx local proxy to apache on mw1346 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.050 second response time
2018-02-26 21:57:43 <wikibugs> ('CR) ''Chad: [C: ''2] "recheck" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414767 (owner: ''Chad)'
2018-02-26 21:59:17 <wikibugs> ('Merged) ''jenkins-bot: Move FileImporter/FileExporter to general extension setup [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414767 (owner: ''Chad)'
2018-02-26 21:59:28 <wikibugs> ('CR) ''jenkins-bot: Move FileImporter/FileExporter to general extension setup [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414767 (owner: ''Chad)'
2018-02-26 22:00:04 <jouncebot> bawolff and Reedy: I, the Bot under the Fountain, allow thee, The Deployer, to do Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180226T2200).
2018-02-26 22:00:04 <jouncebot> No GERRIT patches in the queue for this window AFAICS.
2018-02-26 22:00:57 <wikibugs> ('PS2) ''Catrope: Enable ORES filters on simplewiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/395818 (https://phabricator.wikimedia.org/T182012)'
2018-02-26 22:02:55 <wikibugs> ('PS2) ''Andrew Bogott: wikitech: use 'labswiki' database on m5-master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414733 (https://phabricator.wikimedia.org/T188029)'
2018-02-26 22:03:16 <andrewbogott> !log testing the log by logging a test
2018-02-26 22:03:28 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 22:04:30 <wikibugs> ('PS3) ''Herron: WIP: puppet_compiler: add support for puppetdb4 and local postgresql [puppet] - ''https://gerrit.wikimedia.org/r/413881'
2018-02-26 22:05:34 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] WIP: puppet_compiler: add support for puppetdb4 and local postgresql [puppet] - ''https://gerrit.wikimedia.org/r/413881 (owner: ''Herron)'
2018-02-26 22:05:42 <andrewbogott> !log logging a log to test logging a log
2018-02-26 22:05:56 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 22:07:32 <andrewbogott> !log made mysql on silver read-only, hopefully for good. T188029
2018-02-26 22:07:45 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 22:07:46 <stashbot> T188029: Move labswiki database to m5 - https://phabricator.wikimedia.org/T188029
2018-02-26 22:09:22 <andrewbogott> !log hotfixed mediawiki on silver to use m5-master for wikitech. This will be finalized with the merge of https://gerrit.wikimedia.org/r/#/c/414733/
2018-02-26 22:09:35 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 22:10:17 <andrewbogott> subbu: if you don't mind being a test subject, you can try logging things now. Worst case is I'll have to revert back to the read-only version earlier.
2018-02-26 22:11:06 <subbu> andrewbogott, ok.
2018-02-26 22:12:13 <icinga-wm> PROBLEM - puppet last run on labtestcontrol2003 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 2 minutes ago with 3 failures. Failed resources (up to 3 shown): Package[python-glanceclient],Package[python-openstackclient],Package[python-designateclient]
2018-02-26 22:13:02 <subbu> andrewbogott, success .. https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=1783718&oldid=1783711
2018-02-26 22:13:19 <subbu> but, looks like RoanKattouw beat me to it by 8 mins! :)
2018-02-26 22:13:50 <RoanKattouw> Hm?
2018-02-26 22:13:53 <RoanKattouw> I didn't log anything?
2018-02-26 22:13:55 <wikibugs> ('CR) ''Herron: [C: ''-1] puppetmaster: use puppetdb-termini on stretch (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/413690 (https://phabricator.wikimedia.org/T184562) (owner: ''Filippo Giunchedi)'
2018-02-26 22:14:52 <andrewbogott> RoanKattouw: I think he was talking about the deployment schedule
2018-02-26 22:15:33 <subbu> RoanKattouw, sorry .. yes .. deployments page. i had been blocked earlier because of the db migration and andrew asked me to test whether i can get through now.
2018-02-26 22:22:13 <icinga-wm> RECOVERY - puppet last run on labtestcontrol2003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
2018-02-26 22:25:24 <Hauskatze> so where should "we" run sql commands for wikitech?
2018-02-26 22:25:34 <Hauskatze> (if silver is now read-only only)
2018-02-26 22:34:00 <wikibugs> ('CR) ''MarcoAurelio: [C: ''] "If you want to have this merged, do not forget to add it to https://wikitech.wikimedia.org/wiki/Deployments or ask someone else to do it f" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/404942 (https://phabricator.wikimedia.org/T184981) (owner: ''Lokal Profil)'
2018-02-26 22:36:00 <wikibugs> ('PS1) ''Rush: openstack: deal with competing version priorties on jessie [puppet] - ''https://gerrit.wikimedia.org/r/414842 (https://phabricator.wikimedia.org/T188266)'
2018-02-26 22:36:11 <wikibugs> ('PS2) ''Rush: openstack: deal with competing version priorties on jessie [puppet] - ''https://gerrit.wikimedia.org/r/414842 (https://phabricator.wikimedia.org/T188266)'
2018-02-26 22:36:27 <Krinkle> andrewbogott: Regarding CommonSettings.php, any reason not to commit the change to git? It should be fine to have it in an if conditional.
2018-02-26 22:36:36 <Krinkle> That way it won't be overwritten, nor require a lock.
2018-02-26 22:37:22 <wikibugs> ('CR) ''Rush: [C: ''2] openstack: deal with competing version priorties on jessie [puppet] - ''https://gerrit.wikimedia.org/r/414842 (https://phabricator.wikimedia.org/T188266) (owner: ''Rush)'
2018-02-26 22:39:46 <wikibugs> 'Operations, ''hardware-requests: eqiad/codfw: (4)+(4) hardware access request for videoscalers - https://phabricator.wikimedia.org/T188075#4003858 (''RobH) >>! In T188075#4001386, @faidon wrote: > Sounds good. Note that eqiad has 6 imagescalers (mw1293-mw1298) and codfw has 4 now ( mw2244-2245/mw2150-2151) b...'
2018-02-26 22:46:31 <icinga-wm> RECOVERY - Disk space on rhenium is OK: DISK OK
2018-02-26 22:47:21 <icinga-wm> PROBLEM - Check systemd state on rhenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
2018-02-26 22:49:43 <no_justification> +1 to Krinkle
2018-02-26 22:54:22 <wikibugs> 'Operations, ''Wikimedia-Incident: Detect high server load earlier – prometheus alert? - https://phabricator.wikimedia.org/T188317#4003947 (''Lucas_Werkmeister_WMDE)'
2018-02-26 22:54:43 <wikibugs> 'Operations, ''Wikimedia-Incident: Detect high server load earlier – prometheus alert? - https://phabricator.wikimedia.org/T188317#4003428 (''Lucas_Werkmeister_WMDE)'
2018-02-26 22:54:50 <wikibugs> 'Operations, ''Ops-Access-Requests: reinstate ezachte's access - https://phabricator.wikimedia.org/T188335#4003953 (''RobH) p:''Triage>''Normal'
2018-02-26 22:54:59 <wikibugs> 'Operations, ''Patch-For-Review: setup/install bast1002(WMF4749) - https://phabricator.wikimedia.org/T186623#4003966 (''Dzahn) @RobH i see it's in site.pp with role spare::system. are some more of the checkboxes done meanwhile?'
2018-02-26 22:56:03 <logmsgbot> !log demon@tin Synchronized wmf-config/InitialiseSettings.php: fileimporter/fileexporter improvements (duration: 00m 57s)
2018-02-26 22:56:16 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 22:57:18 <logmsgbot> !log demon@tin Synchronized wmf-config/: fileimporter/fileexporter improvements (duration: 00m 58s)
2018-02-26 22:58:38 <andrewbogott> Krinkle, no_justification, I'm not sure I understand… what CommonSettings flag are you talking about?
2018-02-26 22:59:06 <Krinkle> andrewbogott: The setting of readonly mode could've been committed to git/gerrit and deployed normally instead of local patch on silver.
2018-02-26 22:59:12 <Krinkle> So that it doens't get overridden or require scap locks.
2018-02-26 22:59:17 <wikibugs> 'Operations, ''Patch-For-Review: setup/install bast1002(WMF4749) - https://phabricator.wikimedia.org/T186623#4003976 (''Dzahn) a:''RobH>''Dzahn'
2018-02-26 22:59:24 <andrewbogott> Even though I only wanted it set for a few hours?
2018-02-26 22:59:32 <Krinkle> andrewbogott: We make commits that last minutes :)
2018-02-26 22:59:36 <andrewbogott> You would commit/deploy and then two hours later revert/deploy?
2018-02-26 22:59:39 <andrewbogott> *shrug* ok
2018-02-26 22:59:40 <Krinkle> Yep
2018-02-26 22:59:51 <andrewbogott> I'm trying to be polite and not merge things except via the swat process
2018-02-26 22:59:53 <Krinkle> andrewbogott: I don't mind, but it's for your own convenience, and predictability.
2018-02-26 23:00:05 <Krinkle> Given that it did.. not go as expected.
2018-02-26 23:00:18 <andrewbogott> yeah, in retrospect that would've helped :)
2018-02-26 23:02:42 <wikibugs> ('PS1) ''Rush: openstack: keystone running on mitaka setup [puppet] - ''https://gerrit.wikimedia.org/r/414847 (https://phabricator.wikimedia.org/T188266)'
2018-02-26 23:03:05 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] openstack: keystone running on mitaka setup [puppet] - ''https://gerrit.wikimedia.org/r/414847 (https://phabricator.wikimedia.org/T188266) (owner: ''Rush)'
2018-02-26 23:03:50 <wikibugs> 'Operations: setup/install deploy1001/wmf4750 - https://phabricator.wikimedia.org/T188337#4003997 (''RobH) p:''Triage>''Normal'
2018-02-26 23:06:17 <wikibugs> ('PS1) ''Dzahn: site: turn bast1002 into a bastion host [puppet] - ''https://gerrit.wikimedia.org/r/414848 (https://phabricator.wikimedia.org/T186623)'
2018-02-26 23:08:53 <andrewbogott> Krinkle: given that 'scap lock' doesn't work and the config just updated itself on silver despite my having a lock...
2018-02-26 23:09:02 <andrewbogott> would you consider just merging https://gerrit.wikimedia.org/r/#/c/414733/ so I can stop fighting this?
2018-02-26 23:09:33 <bawolff> umm. wikitech's db seems to have gone away
2018-02-26 23:09:52 <andrewbogott> or, no_justification, same question?
2018-02-26 23:09:55 <Hauskatze> it's down
2018-02-26 23:09:56 <Krinkle> andrewbogott: I prefer not to, but Jaime or Chad might. I can't verify this right now.
2018-02-26 23:10:53 <Krinkle> andrewbogott: afaik locks can only be held on the deployment host (tin), not on clients.
2018-02-26 23:10:54 <wikibugs> ('PS1) ''Nfontes: Add Apache 2.0 license. [puppet/zookeeper] - ''https://gerrit.wikimedia.org/r/414851'
2018-02-26 23:11:13 <Krinkle> But that is admitedly something scap doesn't realise when you run it locally.
2018-02-26 23:11:34 <Krinkle> andrewbogott: Should mysql be restarted on silver then? same read-only as before?
2018-02-26 23:11:42 <wikibugs> 'Operations, ''Patch-For-Review: setup/install bast1002(WMF4749) - https://phabricator.wikimedia.org/T186623#4004040 (''Dzahn) i was able to login on DRAC and get a console, then i saw Build : 4239.35 ePSA Pre-boot System Assessment and shortly after the system rebooted into Debian installer There i...'
2018-02-26 23:11:56 <Krinkle> Seems like that should happen *after* switching the mw side of reading.
2018-02-26 23:12:14 <Krinkle> Anyway, I'm confused and too busy in areas I shouldn't be putting my nose in :)
2018-02-26 23:12:27 <wikibugs> 'Operations: update hostname label and racktables for deploy1001/wmf4750 - https://phabricator.wikimedia.org/T188339#4004049 (''RobH) p:''Triage>''Normal'
2018-02-26 23:17:06 <wikibugs> 'Operations, ''hardware-requests: hardware request for tin replacement - https://phabricator.wikimedia.org/T184481#4004076 (''RobH)'
2018-02-26 23:17:08 <wikibugs> 'Operations: setup/install deploy1001/wmf4750 - https://phabricator.wikimedia.org/T188337#4004072 (''RobH) ''Open>''Invalid Already have tin replacement on T175288.'
2018-02-26 23:17:22 <wikibugs> 'Operations: setup/install deploy1001/wmf4750 - https://phabricator.wikimedia.org/T188337#4004082 (''RobH)'
2018-02-26 23:17:24 <wikibugs> 'Operations: update hostname label and racktables for deploy1001/wmf4750 - https://phabricator.wikimedia.org/T188339#4004078 (''RobH) ''Open>''Invalid Already have tin replacement on T175288.'
2018-02-26 23:18:07 <wikibugs> 'Operations, ''hardware-requests: hardware request for tin replacement - https://phabricator.wikimedia.org/T184481#3884434 (''RobH) So it seems this was already requested on T174452, and setup is blocked for onsite work on T175288.'
2018-02-26 23:18:44 <wikibugs> ('CR) ''Nfontes: "Hi everyone," [puppet/zookeeper] - ''https://gerrit.wikimedia.org/r/414851 (owner: ''Nfontes)'
2018-02-26 23:19:39 <wikibugs> 'Operations, ''ops-eqdfw, ''Patch-For-Review, ''Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4004108 (''RobH) To be clear, mgmt responds, but the mgmt password doesn't work.'
2018-02-26 23:19:57 <wikibugs> 'Operations, ''ops-eqiad, ''Patch-For-Review, ''Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4004112 (''RobH)'
2018-02-26 23:20:51 <wikibugs> ('PS2) ''Rush: openstack: keystone running on mitaka setup [puppet] - ''https://gerrit.wikimedia.org/r/414847 (https://phabricator.wikimedia.org/T188266)'
2018-02-26 23:25:39 <wikibugs> ('CR) ''BryanDavis: [C: ''2] wikitech: use 'labswiki' database on m5-master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414733 (https://phabricator.wikimedia.org/T188029) (owner: ''Andrew Bogott)'
2018-02-26 23:25:53 <no_justification> Lock is on the master, not the slaves
2018-02-26 23:26:37 <no_justification> Anyway: committing is important. Local hacks can (and are) routinely overwritten
2018-02-26 23:26:44 <no_justification> Only way to avoid that *for certain* is to depool it
2018-02-26 23:27:12 <wikibugs> ('Merged) ''jenkins-bot: wikitech: use 'labswiki' database on m5-master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414733 (https://phabricator.wikimedia.org/T188029) (owner: ''Andrew Bogott)'
2018-02-26 23:27:26 <wikibugs> ('CR) ''jenkins-bot: wikitech: use 'labswiki' database on m5-master [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414733 (https://phabricator.wikimedia.org/T188029) (owner: ''Andrew Bogott)'
2018-02-26 23:27:39 <wikibugs> 'Operations, ''ops-codfw, ''netops: switch port configuration for wdq200[4-6] - https://phabricator.wikimedia.org/T188303#4004161 (''ayounsi) ''Open>''Resolved Descriptions added, enabled (for the 1 that was not already), moved to the private vlan.'
2018-02-26 23:27:45 <wikibugs> 'Operations, ''Patch-For-Review, ''Release-Engineering-Team (Watching / External), ''Scoring-platform-team (Current), ''Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#3778593 (''awight) Do not merge patches, currently blocked on scap 3.8 dep...'
2018-02-26 23:30:31 <bd808> config change seems to be the expected no-op on mwdebug1001
2018-02-26 23:30:57 <bd808> andrewbogott: I'm going to pull the change to sliver now
2018-02-26 23:31:04 <andrewbogott> great
2018-02-26 23:31:56 <bd808> !log Pulled T188029 change to silver
2018-02-26 23:32:10 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 23:32:11 <stashbot> T188029: Move labswiki database to m5 - https://phabricator.wikimedia.org/T188029
2018-02-26 23:32:20 <bd808> andrewbogott: it's live there now. let's check it out before I sync everywhere else
2018-02-26 23:33:34 <andrewbogott> bd808: I logged out and in and made a trivial edit with VE
2018-02-26 23:33:48 <bd808> yeah, looks good to me too. Let do it
2018-02-26 23:34:16 <wikibugs> ('CR) ''Hashar: [C: ''] Add Apache 2.0 license. [puppet/zookeeper] - ''https://gerrit.wikimedia.org/r/414851 (owner: ''Nfontes)'
2018-02-26 23:34:30 <logmsgbot> !log bd808@tin Started scap: wikitech: use 'labswiki' database on m5-master (T188029)
2018-02-26 23:34:42 <wikibugs> 'Operations, ''Patch-For-Review, ''Release-Engineering-Team (Watching / External), ''Scoring-platform-team (Current), ''Wikimedia-Incident: [Blocked] Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#4004179 (''awight) ''Open>''stalled'
2018-02-26 23:34:44 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 23:35:36 <bd808> wonders why scap just sat there for 30s...
2018-02-26 23:35:58 <wikibugs> 'Operations, ''Patch-For-Review: setup/install bast1002(WMF4749) - https://phabricator.wikimedia.org/T186623#4004188 (''Dzahn) a:''Dzahn>''RobH'
2018-02-26 23:37:21 <icinga-wm> PROBLEM - Apache HTTP on mw2130 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2018-02-26 23:37:51 <logmsgbot> !log bd808@tin Finished scap: wikitech: use 'labswiki' database on m5-master (T188029) (duration: 03m 21s)
2018-02-26 23:38:06 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2018-02-26 23:38:08 <stashbot> T188029: Move labswiki database to m5 - https://phabricator.wikimedia.org/T188029
2018-02-26 23:38:11 <icinga-wm> RECOVERY - Apache HTTP on mw2130 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.126 second response time
2018-02-26 23:39:06 <bd808> no_justification: is "error: request has exceeded memory limit in /srv/mediawiki/php-1.31.0-wmf.22/includes/parser/StripState.php on line 137" a known thing?
2018-02-26 23:39:22 <wikibugs> ('CR) ''Paladox: [C: ''] "bump" [puppet] - ''https://gerrit.wikimedia.org/r/410069 (owner: ''Andrew Bogott)'
2018-02-26 23:40:06 <thcipriani> bd808: yes https://phabricator.wikimedia.org/T187833
2018-02-26 23:41:34 <bd808> thanks thcipriani
2018-02-26 23:42:56 <no_justification> Yeah what he said
2018-02-26 23:44:15 <Krinkle> bd808: Aye, RE: 30s waiting in scap. That was in 'scap pull', right?
2018-02-26 23:44:18 <Krinkle> I've been seeing the same thing
2018-02-26 23:44:36 <thcipriani> the scap pull one is the cdb rebuild without it telling you what it's doing
2018-02-26 23:44:46 <bd808> Krinkle: this was in "scap sync"
2018-02-26 23:44:49 <Krinkle> Whenever I first use scap on a server on a given day, after it is visible done (in terms of shell output), it still does something for 30s before returning the prompt.
2018-02-26 23:44:50 <thcipriani> for scap sync I imagine it is linting without telling you what it's doing
2018-02-26 23:44:56 <Krinkle> Subsequent syncs are much quicker though
2018-02-26 23:45:11 <Krinkle> But it happens after it's done in terms of output
2018-02-26 23:45:18 <Krinkle> Ah, okay
2018-02-26 23:45:19 <thcipriani> https://phabricator.wikimedia.org/T162207
2018-02-26 23:45:22 <Krinkle> the cdb happens last?
2018-02-26 23:45:23 <bd808> if we could put the cdbs on a ram disk... it would be sooo much faster
2018-02-26 23:45:37 <Krinkle> Thanks, I'll subscribe there, no problem.
2018-02-26 23:45:40 <wikibugs> 'Operations, ''ops-eqiad, ''Patch-For-Review, ''Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4004238 (''RobH) a:''Cmjohnson>''RobH'
2018-02-26 23:46:21 <no_justification> bd808: If we could get rid of CDBs it'd be even faster
2018-02-26 23:46:23 <no_justification> ;-)
2018-02-26 23:46:38 <awight> perks up
2018-02-26 23:46:44 <wikibugs> ('PS1) ''Catrope: Enable ORES filters on svwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414859 (https://phabricator.wikimedia.org/T174560)'
2018-02-26 23:46:51 <bd808> no_justification: sure :) has anyone tested to see if TC cache can survive it these days?
2018-02-26 23:46:57 <no_justification> YES
2018-02-26 23:47:09 <no_justification> Why does everyone thing the TC isn't fixed?
2018-02-26 23:47:15 <awight> Beating VE in #staff, burning CDB here… it’s an exciting evening
2018-02-26 23:47:16 <bd808> sweet! where's ori when you need him to roll out that then?
2018-02-26 23:48:07 <bd808> no_justification: I just by default assume that hhvm bugs are forever
2018-02-26 23:48:50 <awight> RoanKattouw: /me points at https://sv.wikipedia.beta.wmflabs.org
2018-02-26 23:48:56 <no_justification> Some aren't forever, they just come back again ;-) https://github.com/facebook/hhvm/pull/8139
2018-02-26 23:49:14 <RoanKattouw> awight: Are you saying you want me to deploy to beta first?
2018-02-26 23:49:23 <awight> :) it seems prudent
2018-02-26 23:49:39 <awight> I’m not gonna pretend it’s a requirement, though
2018-02-26 23:49:59 <awight> it’s just that… I’ve been hurt before :)
2018-02-26 23:51:13 <wikibugs> ('PS3) ''Dzahn: icinga: apache -> httpd module [puppet] - ''https://gerrit.wikimedia.org/r/409204'
2018-02-26 23:55:03 <wikibugs> ('PS1) ''Awight: Enable Swedish on the beta cluster [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414860 (https://phabricator.wikimedia.org/T174560)'
2018-02-26 23:55:20 <awight> Looks like more of a pain than I imagined.
2018-02-26 23:57:36 <wikibugs> ('PS2) ''Awight: Enable Swedish and Spanish Wikibooks on the beta cluster [mediawiki-config] - ''https://gerrit.wikimedia.org/r/414860 (https://phabricator.wikimedia.org/T174560)'
2018-02-26 23:59:08 <awight> Am I supposed to use addwiki.php for this...

This page is generated from SQL logs, you can also download static txt files from here