[00:17:35] <wmf-insecte>	 Yippee, build fixed!
[00:17:36] <wmf-insecte>	 Project selenium-Flow » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #208: 09FIXED in 1 min 35 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/208/
[00:17:58] <wmf-insecte>	 Yippee, build fixed!
[00:17:59] <wmf-insecte>	 Project selenium-Flow » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #208: 09FIXED in 1 min 58 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/208/
[00:20:20] <wmf-insecte>	 Project beta-update-databases-eqiad build #12827: 04STILL FAILING in 19 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12827/
[00:25:38] <wikibugs>	 10Beta-Cluster-Infrastructure, 06WMDE-TLA-Team, 13Patch-For-Review, 15User-Ladsgroup: Make beta German Wiktionary - https://phabricator.wikimedia.org/T150764#2797723 (10Mattflaschen-WMF) All the steps are done, except: * Merge the last two changes above (Parsoid and RESTBase, they are automatically deploye...
[00:46:11] <godog>	 twentyafterfour: still here? what's the rationale for https://phabricator.wikimedia.org/D448 ? I'm asking because I see version.py would be committed with 3.3.1
[00:55:51] <thcipriani>	 hrm? version.py shouldn't be in 3.3.1...right?
[00:56:42] <thcipriani>	 should be where the tag is: https://github.com/wikimedia/scap/tree/80c0dd09ababa3c3e762868513a0ef1dce6db9f3
[01:00:43] <ostriches>	 Completely untested, but should be fun :) https://phabricator.wikimedia.org/D454
[01:04:45] <thcipriani>	 neat
[01:20:18] <wmf-insecte>	 Project beta-update-databases-eqiad build #12828: 04STILL FAILING in 17 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12828/
[01:21:05] <godog>	 thcipriani: could be an error, I'll flag it
[02:20:21] <wmf-insecte>	 Project beta-update-databases-eqiad build #12829: 04STILL FAILING in 20 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12829/
[02:45:19] <twentyafterfour>	 godog: the rational is to be able to see which version is installed, especially useful on beta
[02:47:25] <godog>	 twentyafterfour: *nod* see also the comments about version.py
[02:48:28] <twentyafterfour>	 godog: I set it up so that the debian packaging scripts will update the version before packaging
[02:48:35] <twentyafterfour>	 based on the newest debian/changelog
[02:50:13] <twentyafterfour>	 I could have it use `git describe` for the version when running it from a dev environment
[02:52:01] <godog>	 yup that'd work too, but yeah not sure how feasible it is what I suggested, I don't have a python debian package in mind that does it
[02:52:31] <twentyafterfour>	 godog: I think it's quite feasible
[02:53:12] <twentyafterfour>	 here's how I updated the version in the packaging script: https://phabricator.wikimedia.org/D448#d2b2229a
[02:54:16] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 10releng-201617-q1, 06Operations, 10ops-eqiad: decom gallium (data center) - https://phabricator.wikimedia.org/T150316#2797918 (10Dzahn) a:05Dzahn>03None
[02:55:01] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10ops-eqiad: decom gallium (data center) - https://phabricator.wikimedia.org/T150316#2781777 (10Dzahn)
[02:55:18] <wikibugs>	 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10ops-eqiad: decom gallium (data center) - https://phabricator.wikimedia.org/T150316#2781777 (10Dzahn) p:05High>03Normal
[02:56:14] <godog>	 twentyafterfour: yeah that has the problem ostriches was mentioning in the comments
[02:56:27] <godog>	 anyways we can chat on the review too, I have to run!
[03:20:15] <wmf-insecte>	 Project beta-update-databases-eqiad build #12830: 04STILL FAILING in 15 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12830/
[04:20:18] <wmf-insecte>	 Project beta-update-databases-eqiad build #12831: 04STILL FAILING in 18 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12831/
[05:20:16] <wmf-insecte>	 Project beta-update-databases-eqiad build #12832: 04STILL FAILING in 16 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12832/
[05:44:27] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:48:29] <shinken-wm>	 PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:49:25] <shinken-wm>	 PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:56:03] <wikibugs>	 03Scap3, 10ContentTranslation-CXserver, 10MediaWiki-extensions-ContentTranslation, 06Services, and 3 others: Enable Scap3 config deploys for CXServer - https://phabricator.wikimedia.org/T147634#2798041 (10KartikMistry) @mobrovac Ping!
[05:58:53] <Amir1>	 I'm looking to see what's going on
[06:01:31] <Amir1>	 logstahs says permission error
[06:20:04] <wmf-insecte>	 Project beta-update-databases-eqiad build #12833: 04STILL FAILING in 3.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12833/
[06:20:35] <Amir1>	 twentyafterfour: around?
[06:21:20] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki06 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50409 bytes in 1.203 second response time
[06:25:40] <twentyafterfour>	 Amir1: yo
[06:25:59] <Amir1>	 twentyafterfour: It seems the whole beta cluster is down
[06:26:04] <Amir1>	 https://en.wikipedia.beta.wmflabs.org/wiki/Special:Preferences#mw-prefsection-betafeatures
[06:26:26] <Amir1>	 logstash says it get permission error
[06:26:48] <Amir1>	 https://logstash-beta.wmflabs.org/ to make image 
[06:27:15] <twentyafterfour>	 Amir1: looking into it
[06:27:29] <Amir1>	 Thanks, tell me if I can help
[06:27:46] <mutante>	 i logged in on deployment-mediawiki06 and see apache is running but ..
[06:28:03] <mutante>	 cant connect to backend
[06:30:09] <mutante>	 hhvm crashed
[06:30:15] <mutante>	 fatal error: stack overflow
[06:30:46] <mutante>	 !log restarting hhvm on deployment-mediawiki06
[06:30:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[06:31:39] <mutante>	 hhvm[22362]: Fatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/objectcache/BagOStuff....line 754
[06:32:06] <mutante>	 and now it says it is running again (on that instance)
[06:32:19] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki05 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 50624 bytes in 1.503 second response time
[06:32:32] <Amir1>	 okay, so. Let's restart hhvm on nodes
[06:33:26] <Amir1>	 !log ladsgroup@deployment-mediawiki05:~$ sudo service hhvm restart
[06:33:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[06:33:38] <mutante>	 on deployment-mediawiki05 it says the status of hhvm is running and no such error
[06:33:45] <mutante>	 even though shinken-wm just told us the above
[06:33:50] <mutante>	 ah
[06:33:56] <Amir1>	 I just restarted it
[06:34:02] <mutante>	 yep:)
[06:34:04] <Amir1>	 maybe that's it
[06:34:15] <twentyafterfour>	 !log restarting hhvm on deployment-mediawiki04
[06:34:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[06:34:46] <twentyafterfour>	 hey it's back
[06:34:56] <Amir1>	 Now it says db is locked
[06:35:10] <twentyafterfour>	 hmm
[06:38:32] * twentyafterfour doesn't know the mariadb password
[06:41:49] <Amir1>	 I guess it goes down again 
[06:41:50] <mutante>	 yea, uhm, i also dont really know about the db 
[06:42:10] <Amir1>	 stackoverflow error 
[06:55:01] <twentyafterfour>	 FSFileBackend::doPrepareInternal: cannot create directory /srv/mediawiki/php-master/images/thumb/9/9d/Commons-...
[06:55:38] <twentyafterfour>	 "Server deployment-db04 (#1) is not replicating?"   "All replica DBs lagged. Switch to read-only mode"
[06:55:55] <twentyafterfour>	 I can't figure out how to get into the slave db
[06:57:22] <Krenair>	 ssh to the machine and sudo -i mysql?
[06:58:31] <Krenair>	 oh, nope, apparently that's not set up
[07:00:09] <twentyafterfour>	 lots of confd.service errors
[07:06:19] <Krenair>	 twentyafterfour, need me to take it down and set a root pass?
[07:06:47] <twentyafterfour>	 ah ha!
[07:06:51] <Krenair>	 ?
[07:07:05] <twentyafterfour>	 161116  5:42:13 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave
[07:07:13] <twentyafterfour>	 [ERROR] Slave SQL: Error 'Can't drop database 'dewiktionary'; database doesn't exist' on query. ..
[07:07:25] <Krenair>	 matt_flaschen, ^
[07:08:17] <Amir1>	 I think this one was me
[07:08:38] <Amir1>	 per https://phabricator.wikimedia.org/T150764
[07:08:42] <twentyafterfour>	 Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'deployment-db03-bin.000009' position 592967822
[07:09:09] <Amir1>	 this one wasn't me
[07:09:21] <twentyafterfour>	 so if someone can get into mysql console on deployment-db04 we can restart the slave replicating
[07:09:32] <Amir1>	 mutante: ^
[07:10:33] <twentyafterfour>	 Krenair: you know how to reset the password?
[07:12:19] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki05 is OK: HTTP OK: HTTP/1.1 200 OK - 1546 bytes in 0.653 second response time
[07:12:59] <mutante>	 Amir1: eh, this?
[07:13:00] <mutante>	 ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2 "No such file or directory")
[07:13:10] <mutante>	 what do you want me to try
[07:13:21] <twentyafterfour>	 that's on db04?
[07:13:25] <mutante>	 yes
[07:13:29] <Krenair>	 I just turned it off to have a go at changing the password
[07:13:35] <mutante>	 ok
[07:16:18] <wikibugs>	 10Beta-Cluster-Infrastructure, 06WMDE-TLA-Team, 13Patch-For-Review, 15User-Ladsgroup: Make beta German Wiktionary - https://phabricator.wikimedia.org/T150764#2795757 (10mmodell) So this happened:  ``` [ERROR] Slave SQL: Error 'Can't drop database 'dewiktionary'; database doesn't exist' on query. Default da...
[07:17:07] <Krenair>	 huh
[07:20:02] <wmf-insecte>	 Project beta-update-databases-eqiad build #12834: 04STILL FAILING in 1.6 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12834/
[07:24:12] <Krenair>	 oh, right
[07:24:38] <marostegui>	 Hello guys! Was this released already? https://phabricator.wikimedia.org/T150604
[07:26:29] <Krenair>	 okay, that did the trick
[07:27:18] <Krenair>	 twentyafterfour, you'll find the new mysql root password for -db04 at /tmp/newmysqlpass
[07:28:23] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki05 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 50624 bytes in 1.385 second response time
[07:29:28] <Krenair>	 marostegui, looking at git log at tin.eqiad.wmnet:/srv/deployment/ocg/ocg, no
[07:29:51] <marostegui>	 Krenair: Roger, thank you
[07:30:17] <Krenair>	 the task it's supposed to fix still occurs too
[07:30:57] <Krenair>	 -rw-r--r-- 1 udp2log udp2log   5947 Nov 16 05:25 {channel}.log
[07:30:57] <Krenair>	 wut
[07:32:22] <Krenair>	 how is beta so broken today?
[07:34:30] <Amir1>	 it's here to take shots for prod
[07:35:01] <Krenair>	 okay, I have restarted hhvm on the -mediawiki boxes
[07:35:15] <Krenair>	 they got stuck with 'Fatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/objectcach...ine 183'
[07:36:22] <Krenair>	 okay, nope, they broke again
[07:36:22] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki06 is OK: HTTP OK: HTTP/1.1 200 OK - 44273 bytes in 3.200 second response time
[07:36:26] <Amir1>	 aand it's back
[07:36:29] <Amir1>	 thanks Krenair 
[07:36:50] <Krenair>	 um, did someone restart again?
[07:37:19] <Amir1>	 maybe I hit a node you already restarted 
[07:38:06] <Krenair>	 Nov 16 07:37:46 deployment-mediawiki04 hhvm[16066]: Fatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/rdbms/data...ine 660
[07:38:38] <Krenair>	 Nov 16 07:37:46 deployment-mediawiki04 hhvm[16066]: [Wed Nov 16 07:37:46 2016] [hphp] [16066:7f077cfff700:7:000001] [] \nFatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/rdbms/database/DatabaseMysqlBase.php on line 660
[07:40:53] <Krenair>	 and the other is
[07:40:55] <Krenair>	 Nov 16 07:38:18 deployment-mediawiki04 hhvm[16185]: [Wed Nov 16 07:38:18 2016] [hphp] [16185:7f640f3ff700:3:000001] [] \nFatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/objectcache/BagOStuff.php on line 754
[07:41:55] <Krenair>	 why do I have a funny feeling this might be related to https://gerrit.wikimedia.org/r/#/c/317304/
[07:42:17] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki06 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 100697 bytes in 1.430 second response time
[07:44:50] <Krenair>	 well anyway, this stuff is way beyond my area
[07:44:54] <Krenair>	 someone else can deal with it
[07:46:09] <Krenair>	 twentyafterfour
[08:18:19] <twentyafterfour>	 Krenair: ok?
[08:20:02] <wmf-insecte>	 Project beta-update-databases-eqiad build #12835: 04STILL FAILING in 1.7 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12835/
[08:20:47] <twentyafterfour>	 it looks like you got the replication going ...
[08:58:57] <hashar>	 looks like deployment-mediawiki instances have bunch of issues
[08:59:08] <hashar>	 and https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad/ broke yesterday bah
[08:59:37] <hashar>	 !log beta database update broken with: MediaWiki 1.29.0-alpha Updater\n\nYour composer.lock file is up to date with current dependencies!
[08:59:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[09:04:58] <wikibugs>	 10Beta-Cluster-Infrastructure, 06WMDE-TLA-Team, 13Patch-For-Review, 15User-Ladsgroup: Make beta German Wiktionary - https://phabricator.wikimedia.org/T150764#2798189 (10hashar) p:05Triage>03Normal
[09:05:28] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 10Thumbor: Thumbor keeps losing Swift auth on beta - https://phabricator.wikimedia.org/T150649#2792043 (10hashar) p:05Triage>03Normal
[09:05:42] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs, 10Wikimedia-General-or-Unknown, 13Patch-For-Review: rename -labs.php to -beta.php - https://phabricator.wikimedia.org/T150268#2780096 (10hashar) p:05Triage>03Low
[09:12:08] <wikibugs>	 10Beta-Cluster-Infrastructure, 07HHVM: beta cluster app servers no more respond to http request / beta web access is down - https://phabricator.wikimedia.org/T150833#2798202 (10hashar)
[09:12:28] <hashar>	 !log deployment-mediawiki04 stopping hhv
[09:12:30] <hashar>	 !log deployment-mediawiki04 stopping hhvm
[09:12:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[09:12:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[09:15:20] <marostegui>	 hashar: by any chance you know if this will be deployed today? https://phabricator.wikimedia.org/T150604#2797315
[09:18:27] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[09:18:50] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798220 (10hashar)
[09:20:02] <wmf-insecte>	 Project beta-update-databases-eqiad build #12836: 04STILL FAILING in 1.7 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12836/
[09:20:39] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798232 (10hashar) And there is a nice stack overflow:      Fatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/objectcache/BagOStuff.php on line...
[09:23:35] <wikibugs>	 10Beta-Cluster-Infrastructure, 07HHVM: beta cluster app servers no more respond to http request / beta web access is down - https://phabricator.wikimedia.org/T150833#2798235 (10hashar) And there is a nice stack overflow:      Fatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/objectcache/Ba...
[09:30:54] <shinken-wm>	 PROBLEM - SSH on deployment-mediawiki04 is CRITICAL: Connection refused
[09:31:25] <shinken-wm>	 PROBLEM - Host deployment-mediawiki04 is DOWN: CRITICAL - Host Unreachable (10.68.19.128)
[09:34:25] <shinken-wm>	 RECOVERY - Host deployment-mediawiki04 is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms
[09:35:55] <shinken-wm>	 RECOVERY - SSH on deployment-mediawiki04 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0)
[09:36:35] <wikibugs>	 10Beta-Cluster-Infrastructure, 07HHVM: On beta cluster: Fatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/objectcache/BagOStuff.php on line 754 - https://phabricator.wikimedia.org/T150833#2798240 (10hashar)
[09:39:20] <shinken-wm>	 RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 17537 bytes in 4.502 second response time
[09:39:20] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki04 is OK: HTTP OK: HTTP/1.1 200 OK - 44265 bytes in 2.913 second response time
[09:40:24] <wikibugs>	 10Beta-Cluster-Infrastructure, 07HHVM: On beta cluster: Fatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/objectcache/BagOStuff.php on line 754 - https://phabricator.wikimedia.org/T150833#2798249 (10hashar) On deployment-mediawiki04 under /var/log/hhvm  HHVM has a coredump ``` name=/var/lo...
[09:43:28] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[09:44:47] <wikibugs>	 10Beta-Cluster-Infrastructure, 07HHVM: On beta cluster: Fatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/objectcache/BagOStuff.php on line 754 - https://phabricator.wikimedia.org/T150833#2798267 (10hashar) p:05Triage>03Unbreak!
[09:45:26] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:45:26] <shinken-wm>	 PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:46:23] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798220 (10mmodell) @hashar: I believe it was broken by {T150764}
[09:46:42] <hashar>	 marostegui: for OCG, I guess that is the services team deploying it in their own window
[09:46:52] <hashar>	 marostegui: maybe Marko would know how to push that one
[09:47:37] <hashar>	 doc being apparently https://wikitech.wikimedia.org/wiki/OCG#Deploying_changes
[09:47:56] <marostegui>	 hashar: Ah ok - thanks!
[09:48:17] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798275 (10mmodell) see also {rMWb47ce21cec3a4340dd37c773210a514350f10297}
[09:49:48] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798277 (10hashar) I havent tried to find the triggered exception yet.   I am looking at T150833 which causes:      Fatal error: Stack overflow in /srv/mediawiki/ph...
[09:51:13] <hashar>	 !log marking deployment-tin offline so I can live hack mediawiki code / scap for T150833 and T15034
[09:51:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[09:51:22] <wikibugs>	 03Scap3, 10Parsoid, 06Services (done), 15User-Joe, 15User-mobrovac: Enable Scap3 config deploys for Parsoid - https://phabricator.wikimedia.org/T144596#2798287 (10mobrovac) 05Open>03Resolved Transition completed, resolving.
[09:53:54] <hashar>	 what a mess
[09:55:01] <twentyafterfour>	 hashar: indeed, if you read scrollback you'll see that several of us worked on it for a long time and it's still broken
[09:55:14] <hashar>	 oh
[09:55:19] <hashar>	 I lack the scroll back though : D
[09:55:37] <hashar>	 did it start happening with the introduction of dewiktionary on beta?
[09:56:01] <twentyafterfour>	 that's what caused  beta-update-databases-eqiad to start failing
[09:56:15] <hashar>	 yeah
[09:56:25] <hashar>	 and on top of that there is the segfault/Fatal error in objectcache.php
[09:56:29] <hashar>	 both are probably unrelated
[09:56:32] <twentyafterfour>	 and the stack overflows I believe started with rMWb47ce21cec3a4340dd37c773210a514350f10297 or the related commits
[09:56:43] <twentyafterfour>	 yes the two are unrelated
[09:56:59] <hashar>	 I am trying to find out when fatal errors started
[09:57:02] <twentyafterfour>	 but... the dewiktionary stuff DID break replication on deployment-db04
[09:57:14] <twentyafterfour>	 which krenair fixed, apparently
[09:57:50] <twentyafterfour>	 fatals started right around when b47ce21cec3a4340dd37c773210a514350f10297 landed
[10:00:33] <hashar>	 logstash shows the first stack overflow at 5:16am UTC
[10:00:45] <hashar>	 and the code got updated at 5:13 with https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/130347/console
[10:00:58] <hashar>	 that shows the bump of mw core ea42d90..32f3a99
[10:02:59] <hashar>	 *   32f3a99 Merge "objectcache: Remove broken cas() method from WinCacheBagOStuff"
[10:02:59] <hashar>	 |\  
[10:02:59] <hashar>	 | * d1b53e3 objectcache: Remove broken cas() method from WinCacheBagOStuff
[10:03:00] <hashar>	 * b47ce21 objectcache: detect default getWithSetCallback() set options
[10:04:09] <hashar>	 poor Aaron :(
[10:04:49] <wikibugs>	 10Beta-Cluster-Infrastructure, 07HHVM: On beta cluster: Fatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/objectcache/BagOStuff.php on line 754 - https://phabricator.wikimedia.org/T150833#2798347 (10hashar) From logstash, the first stack overflow occurred on 2016-11-16T05:16:16  The Jenkin...
[10:05:17] <hashar>	 twentyafterfour: yup the objectcache.php change is most probably the reason
[10:05:27] <hashar>	 I wanted to back up the claim with logs/traces etc and double confirm
[10:05:43] <hashar>	 +       public function declareUsageSectionEnd( $id ) {
[10:05:43] <hashar>	 +               return $this->__call( __FUNCTION__, func_get_args() );
[10:05:44] <hashar>	 ...
[10:06:26] <hashar>	 I am taking a break
[10:06:31] <hashar>	 will revert both patch on beta cluster
[10:06:33] <hashar>	 scap it
[10:06:38] <hashar>	 and see whether that fix the issue
[10:06:43] <hashar>	 then revert both patches in mediawiki/core
[10:06:53] <hashar>	 but that is after  more coffee / natural call etc :-}
[10:26:44] <hashar>	 !log Reverting mediawiki/core b47ce21cec3a4340dd37c773210a514350f10297 on beta cluster T150833
[10:26:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[10:32:19] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki06 is OK: HTTP OK: HTTP/1.1 200 OK - 44264 bytes in 5.213 second response time
[10:33:20] <hashar>	 deal
[10:33:22] <shinken-wm>	 RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 44706 bytes in 2.460 second response time
[10:35:22] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki04 is OK: HTTP OK: HTTP/1.1 200 OK - 44274 bytes in 3.499 second response time
[10:36:23] <wikibugs>	 10Beta-Cluster-Infrastructure, 07HHVM, 13Patch-For-Review: On beta cluster: Fatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/objectcache/BagOStuff.php on line 754 - https://phabricator.wikimedia.org/T150833#2798447 (10hashar) I have reverted b47ce21cec3a4340dd37c773210a514350f10297 on t...
[10:38:19] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798450 (10hashar) I had the commit triggering T150833 reverted on beta cluster but update.php (`mwscript update.php --wiki=enwiki --quick`) still fails. So that is...
[10:39:20] <hashar>	 !log Removing revert b47ce21cec3a4340dd37c773210a514350f10297 from deployment-tin and reenabling jenkins job.  https://gerrit.wikimedia.org/r/321857 will get it fixed
[10:39:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[10:39:55] <wmf-insecte>	 Project beta-update-databases-eqiad build #12837: 04STILL FAILING in 1.7 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12837/
[10:40:00] <wmf-insecte>	 Project beta-code-update-eqiad build #130375: 15ABORTED in 7.4 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/130375/
[10:40:08] <wmf-insecte>	 Project beta-scap-eqiad build #129111: 15ABORTED in 7.9 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/129111/
[10:44:48] <wmf-insecte>	 Project beta-scap-eqiad build #129112: 04FAILURE in 0.27 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/129112/
[10:46:25] <hashar>	 twentyafterfour: I am reverting the mwcore patch so it is all good :}  thx for the pointer
[10:54:47] <wmf-insecte>	 Project beta-scap-eqiad build #129113: 04STILL FAILING in 0.27 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/129113/
[10:58:59] <wmf-insecte>	 Project beta-scap-eqiad build #129114: 04STILL FAILING in 0.27 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/129114/
[11:00:41] <hashar>	 OSError: [Errno 17] File exists: '/var/lock/scap'
[11:00:42] <hashar>	 bahh
[11:02:51] <wmf-insecte>	 Yippee, build fixed!
[11:02:51] <wmf-insecte>	 Project beta-scap-eqiad build #129115: 09FIXED in 1 min 52 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/129115/
[11:04:44] <wikibugs>	 10Beta-Cluster-Infrastructure, 07HHVM, 05MW-1.29-release-notes, 13Patch-For-Review, 05WMF-deploy-2016-11-29_(1.29.0-wmf.4): On beta cluster: Fatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/objectcache/BagOStuff.php on line 754 - https://phabricator.wikimedia.org/T150833#2798533 (10h...
[11:20:02] <wmf-insecte>	 Project beta-update-databases-eqiad build #12838: 04STILL FAILING in 1.7 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12838/
[11:30:00] <wikibugs>	 10Continuous-Integration-Config, 10Wikidata: Run Wikidata browser tests on testwikidata via Jenkins - https://phabricator.wikimedia.org/T105985#2798658 (10Tobi_WMDE_SW)
[11:30:02] <wikibugs>	 10Continuous-Integration-Config, 10Wikidata, 07Browser-Tests, 07Story: [Story] Run browsertests regularly on test.wikidata.org via Jenkins - https://phabricator.wikimedia.org/T101497#2798659 (10Tobi_WMDE_SW)
[11:30:05] <wikibugs>	 10Browser-Tests-Infrastructure, 10Wikidata, 07Tracking: Wikidata Browsertests (tracking) - https://phabricator.wikimedia.org/T88541#2798661 (10Tobi_WMDE_SW)
[11:30:08] <wikibugs>	 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic, 05MW-1.28-release-notes, and 3 others: Fix scenarios that fail at en.wikipedia.beta.wmflabs.org or do not run them daily - https://phabricator.wikimedia.org/T94150#2798660 (10Tobi_WMDE_SW)
[11:34:29] <wikibugs>	 10Browser-Tests-Infrastructure, 10Wikidata: mediawiki_api::log_in does not work due to gzip issue - https://phabricator.wikimedia.org/T127309#2798667 (10Tobi_WMDE_SW) 05Open>03Resolved a:03Tobi_WMDE_SW I was not able to reproduce this, so it probably got fixed in the meantime or in a later version of the...
[11:36:18] <wikibugs>	 10Browser-Tests-Infrastructure: selenium fails to connect to firefox (headless not sauce) - https://phabricator.wikimedia.org/T117561#2798671 (10Tobi_WMDE_SW)
[11:46:58] <wikibugs>	 03Scap3, 10ContentTranslation-CXserver, 10MediaWiki-extensions-ContentTranslation, 05Language-Engineering October-December 2016, and 4 others: Enable Scap3 config deploys for CXServer - https://phabricator.wikimedia.org/T147634#2798685 (10mobrovac) a:05KartikMistry>03mobrovac The two patches above need...
[11:48:46] <Amir1>	 https://en.wikipedia.beta.wmflabs.org/wiki/Special:Preferences still gives out database locked
[11:51:09] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798696 (10hashar) ``` $ strace -f -y -s1024 /usr/local/bin/mwscript update.php --wiki=aawiki --quick execve("/usr/local/bin/mwscript", ["/usr/local/bin/mwscript",...
[11:52:13] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798697 (10hashar) p:05Triage>03Unbreak!
[11:52:14] <hashar>	 :-/
[11:52:28] <hashar>	 can't believe I am going to spend my whole day just to unbreak all that mess
[12:00:10] <andre__>	 :(
[12:01:31] <Nikerabbit>	 :/
[12:04:37] <hashar>	 and you know or code is crap
[12:04:54] <hashar>	 gripping "Set $wgShowExceptionDetails = true; in LocalSettings.php to show detailed debugging information.
[12:04:55] <hashar>	 "
[12:04:58] <hashar>	 show 6 occurrences :(
[12:05:03] <hashar>	 COPY PASTE IS EVIL
[12:07:17] <hashar>	 AHHH
[12:09:04] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798713 (10hashar) I could not figure out why `$wgShowExceptionDetails` is not true when running update.php so I have just live hacked the related PHP files: * incl...
[12:13:54] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798715 (10hashar) On the slave we have      Error 'Can't drop database 'dewiktionary'; database doesn't exist' on query. Default database: 'dewiktionary'. Query: '...
[12:24:28] <hashar>	 !log beta: created dewiktionary table on the Database slave. Restarted replication with START SLAVE;    T150834  T150764
[12:24:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[12:24:56] <hashar>	 fixd
[12:25:06] <wmf-insecte>	 Project selenium-RelatedArticles » chrome,beta-mobile,Linux,contintLabsSlave && UbuntuTrusty build #211: 04FAILURE in 24 min: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-mobile,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/211/
[12:25:08] <wmf-insecte>	 Project selenium-RelatedArticles » chrome,beta-desktop,Linux,contintLabsSlave && UbuntuTrusty build #211: 04FAILURE in 24 min: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-desktop,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/211/
[12:26:49] <wmf-insecte>	 Project beta-update-databases-eqiad build #12839: 04STILL FAILING in 17 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12839/
[12:26:52] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798731 (10hashar) 05Open>03Resolved a:03hashar That has been caused by the addition of the German Wiktionary ( T150764 ).  The MySQL master/slave replication...
[12:27:28] <wikibugs>	 10Beta-Cluster-Infrastructure, 06WMDE-TLA-Team, 13Patch-For-Review, 15User-Ladsgroup: Make beta German Wiktionary - https://phabricator.wikimedia.org/T150764#2795757 (10hashar) From T150834:  That has been caused by the addition of the German Wiktionary ( T150764 ).  The MySQL master/slave replication got...
[12:27:50] <hashar>	 oh my
[12:28:26] <paladox>	 hashar i belive the problem with $wgShowExceptionDetails = true; is a known problem, i think i found having that problem too
[12:28:32] <paladox>	 when trying to debug postgres
[12:29:34] <Amir1>	 hashar: thanks!
[12:29:57] <Amir1>	 Sorry it happened. Mostly it was because the creation wasn't completed due to an issue in flow tables
[12:30:21] <Amir1>	 sorry, flow external datasources. You can find it in the patch 
[12:30:28] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798740 (10hashar) 05Resolved>03Open And the job fails:  ``` $ mwscript update.php --wiki=dewiktionary  --quick #!/usr/bin/env php MediaWiki 1.29.0-alpha Update...
[12:30:38] <wikibugs>	 10Beta-Cluster-Infrastructure, 06WMDE-TLA-Team, 13Patch-For-Review, 15User-Ladsgroup: Make beta German Wiktionary - https://phabricator.wikimedia.org/T150764#2798742 (10hashar) And the job fails:  ``` $ mwscript update.php --wiki=dewiktionary  --quick #!/usr/bin/env php MediaWiki 1.29.0-alpha Updater  Your...
[12:34:14] <hashar>	 I am just reverting
[12:37:25] <Amir1>	 It's blocked on https://gerrit.wikimedia.org/r/321810
[12:37:38] <Amir1>	 once this is tested and merged we are good to go
[12:46:06] <wmf-insecte>	 Project selenium-GettingStarted » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #210: 04FAILURE in 24 min: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/210/
[12:52:57] <hashar>	 https://en.wikipedia.beta.wmflabs.org/w/api.php gateway timeout bah
[12:54:24] <wmf-insecte>	 Yippee, build fixed!
[12:54:24] <wmf-insecte>	 Project beta-update-databases-eqiad build #12840: 09FIXED in 1 min 7 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12840/
[12:54:44] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 13Patch-For-Review: Beta update.php broken since Nov 15th 19:20 - https://phabricator.wikimedia.org/T150834#2798806 (10hashar) 05Open>03Resolved 12:54 UTC Project beta-update-databases-eqiad build #12840: FIXED in 1 min 7 sec: https://integration...
[12:55:53] <wikibugs>	 10Beta-Cluster-Infrastructure, 06WMDE-TLA-Team, 13Patch-For-Review, 15User-Ladsgroup: Make beta German Wiktionary - https://phabricator.wikimedia.org/T150764#2798810 (10hashar) To unblock T150763, the `dewiktionary` database is no more on the MySQL master and slave. I also removed it from the dblist and wi...
[12:56:47] <wikibugs>	 10Beta-Cluster-Infrastructure: https://en.wikipedia.beta.wmflabs.org/w/api.php 504 Server Error: Gateway Time-out - https://phabricator.wikimedia.org/T150849#2798812 (10hashar)
[12:56:54] <wikibugs>	 10Beta-Cluster-Infrastructure: https://en.wikipedia.beta.wmflabs.org/w/api.php 504 Server Error: Gateway Time-out - https://phabricator.wikimedia.org/T150849#2798824 (10hashar) p:05Triage>03Unbreak!
[13:02:33] <hashar>	 !log Restarted HHVM on deployment-mediawiki05 was not honoring requests T150849
[13:02:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[13:02:56] <wikibugs>	 10Beta-Cluster-Infrastructure: https://en.wikipedia.beta.wmflabs.org/w/api.php 504 Server Error: Gateway Time-out - https://phabricator.wikimedia.org/T150849#2798830 (10hashar)
[13:03:20] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki05 is OK: HTTP OK: HTTP/1.1 200 OK - 44262 bytes in 3.517 second response time
[13:03:30] <wmf-insecte>	 Yippee, build fixed!
[13:03:30] <wmf-insecte>	 Project selenium-GettingStarted » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #211: 09FIXED in 48 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/211/
[13:03:33] <wikibugs>	 10Beta-Cluster-Infrastructure: https://en.wikipedia.beta.wmflabs.org/w/api.php 504 Server Error: Gateway Time-out - https://phabricator.wikimedia.org/T150849#2798812 (10hashar) 05Open>03Resolved a:03hashar Fixed by restarting HHVM on deployment-mediawiki05.
[13:05:58] <wikibugs>	 10Beta-Cluster-Infrastructure: https://en.wikipedia.beta.wmflabs.org/w/api.php 504 Server Error: Gateway Time-out - https://phabricator.wikimedia.org/T150849#2798835 (10hashar) From IRC logs: ``` lang=irc [06:32:19] <shinken-wm>  PROBLEM - App Server Main HTTP Response on deployment-mediawiki05 is CRITICAL: HTTP...
[13:06:08] <wikibugs>	 10Beta-Cluster-Infrastructure: https://en.wikipedia.beta.wmflabs.org/w/api.php 504 Server Error: Gateway Time-out - https://phabricator.wikimedia.org/T150849#2798840 (10hashar)
[13:06:10] <wikibugs>	 10Beta-Cluster-Infrastructure, 07HHVM, 05MW-1.29-release-notes, 13Patch-For-Review, 05WMF-deploy-2016-11-29_(1.29.0-wmf.4): On beta cluster: Fatal error: Stack overflow in /srv/mediawiki/php-master/includes/libs/objectcache/BagOStuff.php on line 754 - https://phabricator.wikimedia.org/T150833#2798202 (10h...
[13:46:47] <wmf-insecte>	 Yippee, build fixed!
[13:46:47] <wmf-insecte>	 Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #214: 09FIXED in 2 min 46 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/214/
[14:44:30] <wmf-insecte>	 Project selenium-Wikibase » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #178: 15ABORTED in 1 hr 9 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/178/
[14:44:31] <wmf-insecte>	 Project selenium-Wikibase » chrome,test,Linux,contintLabsSlave && UbuntuTrusty build #178: 15ABORTED in 1 hr 9 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/178/
[15:32:16] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[15:34:31] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[15:35:45] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[16:00:46] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:07:17] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:09:29] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:17:54] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 13Patch-For-Review, 07Regression: doc.wikimedia.org displays "403 Forbidden" for coverage sub directories - https://phabricator.wikimedia.org/T150727#2799454 (10hashar) https://gerrit.wikimedia.org/r/321651 makes Apache to honor .htaccess...
[16:30:32] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10hardware-requests: codfw: 1 hardware access request for continuous integration - https://phabricator.wikimedia.org/T150865#2799514 (10hashar)
[16:30:48] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Secondary production Jenkins for CI - https://phabricator.wikimedia.org/T150771#2795939 (10hashar)
[16:32:30] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10hardware-requests: codfw: 1 hardware access request for continuous integration - https://phabricator.wikimedia.org/T150865#2799534 (10hashar) contint1001 is a rather large machine and I am not aware of what is available in...
[16:32:56] <hashar>	 thcipriani: good morning. I got the hardware request filled for a contint2001.codfw.wmnet machine ( https://phabricator.wikimedia.org/T150865 ) you are subscribed obviously
[16:33:48] <thcipriani>	 hashar: howdy, yup, saw that in email :)
[16:34:10] <thcipriani>	 also got the CI staging setup we talked about filed: https://phabricator.wikimedia.org/T150772
[16:34:22] <hashar>	 if they get some server available in codfw that match contint1001, that will probably be a fast allocation
[16:34:29] <hashar>	 neat!
[16:34:55] <hashar>	 and eventually we will want to start puppetizing Jenkins
[16:35:21] <thcipriani>	 yeah, puppetizing jenkins plugins will be interesting
[16:35:45] <thcipriani>	 I think the CI staging area will let us try out some different ideas there
[16:37:59] <thcipriani>	 this, for instance, seems like a very bad idea, https://git.openstack.org/cgit/openstack-infra/puppet-jenkins/tree/manifests/plugin.pp#n67
[16:44:05] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-precise-1002 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[17:02:37] <hashar>	 thcipriani: I was with Greg on video
[17:02:51] <hashar>	 and saying exactly the same: leverage staging to experiment  new things / puppetize from scratch :D
[17:02:52] <greg-g>	 blame it all on me
[17:02:57] <greg-g>	 oh right
[17:02:58] <greg-g>	 :)
[17:03:03] <hashar>	 ahh --no-check-certificate booh
[17:03:29] <hashar>	 thcipriani: for the plugins, we can probably build them from source and push them to archiva.wikimedia.org
[17:03:37] <hashar>	 it is used / published to  by analytics
[17:03:43] <hashar>	 or
[17:04:09] <hashar>	 wrap them in Debian packages. 
[17:08:02] <greg-g>	 legoktm: they (cloudbees) emailed me too anyways :) Hopefully my "I'm the manager in charge of Release Engineering and QA" and "no" is sufficient.
[17:10:02] <hashar>	 Debian Java Team has packaged Jenkins embedded libraries
[17:10:07] <hashar>	 they are in Jessie
[17:10:15] <hashar>	 but the stripped Jenkins  is not included/got removed ( https://packages.qa.debian.org/j/jenkins.html )
[17:10:18] <hashar>	 ...
[17:11:08] <hashar>	 https://lists.debian.org/debian-release/2015/04/msg00209.html  
[17:11:18] <hashar>	 Jenkins LTS cycle does not match Debian ones.
[17:11:21] <hashar>	 so makes sense
[17:12:09] <Sagan>	 hi hashar & greg-g :)
[17:12:25] <hashar>	 hello
[17:15:03] <greg-g>	 oh Luke081515
[17:15:09] <greg-g>	 :)
[17:15:11] <Sagan>	 hi :)
[17:15:34] <Sagan>	 yep, using a different nick is ometimes a bit confusing ;)
[17:16:05] <greg-g>	  /whois to the rescue
[17:16:26] <Sagan>	 yep, it do it too usually :)
[17:16:44] <hashar>	 first thing I did :D
[17:16:55] <hashar>	 anyway I am disappearing / commuting back home
[17:17:10] <hashar>	 dewiktionary on beta will need to be recreated
[17:17:15] <hashar>	 though it is blocked on a change in Flow iirc
[17:17:19] <Sagan>	 hm
[17:17:36] <Sagan>	 hashar: I have some ressources for some bouncers left at my server, if you want one ;)
[17:18:14] <hashar>	 I am on some private channels though :D
[17:18:23] <hashar>	 and lack of a bouncer is actually a good thing
[17:18:29] <hashar>	 A) I can pretend I am not aware of something
[17:18:45] <hashar>	 B) I dont have to spend an hour every morning reading bunch of backlogs and thus start with a white sheet of paper
[17:19:02] <hashar>	 (eventually I should delete all my pending emails at midnight as well)
[17:19:07] <hashar>	 to start all fresh and anew every morning!
[17:19:14] <hashar>	 thx for the offer though :}
[17:19:37] <Sagan>	 :D interesting logic 
[17:24:04] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-precise-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[17:51:12] <mutante>	 64GB RAM for contint2001 ?
[17:51:23] <mutante>	 would that be enough, heh
[17:52:26] <mutante>	 i checked the spares list, looks like we have one that is matching contint1001
[17:52:44] <mutante>	 but gotta go through the procurement process, comented on ticket
[17:54:41] <greg-g>	 thanks mutante ! :)
[18:24:22] <shinken-wm>	 PROBLEM - Host integration-puppetmaster is DOWN: CRITICAL - Host Unreachable (10.68.16.42)
[18:24:50] <Krenair>	 twentyafterfour, hasharAway: did you guys end up using mysql root on -db04?
[18:30:20] <twentyafterfour>	 Krenair: I logged in with it but didn't end up doing anything
[18:30:29] <twentyafterfour>	 just verified that replication was running
[18:45:38] <Krenair>	 twentyafterfour, at some point we should take down -db03 and set the password there too
[18:57:23] <matt_flaschen>	 Thanks, Krenair, twentyafterfour.  I don't know how they created dewiktionary only on the master before, though (and in general how the tables were created).
[18:58:29] <matt_flaschen>	 I'll follow up later today and hopefully addWiki.php just works.
[18:58:42] <matt_flaschen>	 If they created it on the master, it should have replicated to the slave.
[19:36:32] <kaldari>	 greg-g: Any idea why this is not triggering gate and submit: https://gerrit.wikimedia.org/r/#/c/320405/ ?
[19:40:10] <greg-g>	 kaldari: I'm not sure, it's not showing a dependent patch anymore...
[19:40:27] <kaldari>	 should I just submit it?
[19:40:56] <paladox>	 kaldari you need to re c+2
[19:40:59] <kaldari>	 ok
[19:41:38] <paladox>	 Thanks
[19:41:48] <greg-g>	 it says the parent is not current?
[19:42:03] <kaldari>	 greg-g: What does that mean?
[19:42:17] <greg-g>	 no idea!
[19:42:48] <paladox>	 greg-g patch set 5 shows parent as 44a9c1bf59b110e3d04b6a5083242f981f4de832
[19:43:06] <paladox>	 so the change should merge once he removes c+2 and re do c+2
[19:43:31] <greg-g>	 I search for that change in gerrit and it can't find it
[19:43:35] <greg-g>	 https://gerrit.wikimedia.org/r/#/q/44a9c1bf59b110e3d04b6a5083242f981f4de832
[19:43:56] <paladox>	 Oh your correct
[19:44:00] <kaldari>	 I'll rebase and +2
[19:44:03] <paladox>	 kaldari try doing rebase
[19:44:08] <paladox>	 and put in master
[19:44:26] <greg-g>	 "not current" apparently means "doesn't exist" thanks gerrit
[19:45:16] <kaldari>	 there it goes! :)
[19:45:22] <paladox>	 :)
[19:45:25] <kaldari>	 ah, of course :P
[19:51:47] <hasharAway>	 Krenair: I could not found the beta cluster databases credential and ended up with wikiadmin:  sql --write aawiki
[19:52:01] <paladox>	 hasharAway i found the bug https://phabricator.wikimedia.org/T148957
[19:52:08] <paladox>	 for why you coulden debug the updater
[19:52:09] <hasharAway>	 Krenair: then   create database dewiktionary;  on slave +  SLAVE START
[19:52:31] <hasharAway>	 paladox: hello :)
[19:52:36] <paladox>	 Hi :)
[19:52:47] <hashar>	 no clue what is up with update.php
[19:53:08] <Krenair>	 wikiadmin could start replication? interesting
[19:53:15] <paladox>	 Ok
[19:53:51] <hashar>	 Krenair: I guess perms are wide open
[19:55:07] <Krenair>	 GRANT ALL PRIVILEGES ON *.* TO 'wikiadmin'@'%' IDENTIFIED BY PASSWORD
[19:55:08] <Krenair>	 rre
[19:55:09] <Krenair>	 ffs*
[19:57:29] <Krenair>	 hashar, okay, taking the master down to fix perms
[19:59:29] <Krenair>	 auth will be via unix socket rather than password for root
[20:03:00] <wikibugs>	 10Beta-Cluster-Infrastructure, 06WMDE-TLA-Team, 13Patch-For-Review, 15User-Ladsgroup: Make beta German Wiktionary - https://phabricator.wikimedia.org/T150764#2795757 (10greg) >>! In T150764#2800153, @Mattflaschen-WMF wrote: >>>! In T150764#2798735, @hashar wrote: >> That has been caused by the addition of...
[20:03:08] <Krenair>	 So in prod, wikiadmin has these grants:
[20:03:37] <Krenair>	 GRANT PROCESS, REPLICATION CLIENT ON *.* TO 'wikiadmin'@'10.%' IDENTIFIED
[20:03:43] <Krenair>	 GRANT ALL PRIVILEGES ON `%wik%`.* TO 'wikiadmin'@'10.%'
[20:03:47] <Krenair>	 GRANT SELECT ON `heartbeat`.`heartbeat` TO 'wikiadmin'@'10.%'
[20:06:27] <Krenair>	 wikiuser:
[20:06:29] <Krenair>	 GRANT PROCESS, REPLICATION CLIENT ON *.* TO 'wikiuser'@'10.64.%'
[20:06:36] <Krenair>	 GRANT SELECT, INSERT, UPDATE, DELETE ON `%wik%`.* TO 'wikiuser'@'10.64.%'
[20:06:40] <Krenair>	 GRANT SELECT ON `heartbeat`.`heartbeat` TO 'wikiuser'@'10.64.%'
[20:06:56] <Krenair>	 let's replicate those, almost (our network is setup a little differently so obviously not 10.64)
[20:07:47] <Krenair>	 hm, looks like we don't have the heartbeat db set up :/
[20:14:24] <hashar>	 Krenair: sync with jaime please
[20:14:32] <hashar>	 he is working on mysql over socket iirc
[20:17:38] <Krenair>	 does he deal with deployment-prep dbs?
[20:21:30] <elukey>	 hashar: o/ anything against me increaseing verbosity of mod_rewrite on mediawiki-06 to test a thing?
[20:21:47] <hashar>	 Krenair: yes
[20:22:16] <hashar>	 Krenair: DBA are providing assistance guidance for the database. Cause that is a narrow field and nobody knows how to get it right beside an actual DBA :]
[20:22:39] <hashar>	 elukey: do do do :]  dont forget to remove it eventually
[20:22:56] <hashar>	 elukey: you might want to disable puppet as well or it might well overwrite your hack
[20:23:10] <hashar>	 elukey: beta being a shared platform... {{be bold}}
[20:23:13] <Krenair>	 hashar, you can tell him
[20:25:40] <hashar>	 Krenair: I was refering to https://gerrit.wikimedia.org/r/#/c/321878/  done/merged today
[20:26:54] <hashar>	 marxarelli might know how the permission got setup for the beta cluster database. He did the migration with jaime
[20:27:32] <marxarelli>	 hashar: which permissions?
[20:27:49] <marxarelli>	 everything was migrated from db1, including the mysql database
[20:28:53] <elukey>	 !log temporary increasing verbosity of mod_rewrite on deployment-mediawiki06 as test
[20:28:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[20:29:06] <Krenair>	 hashar, right, I left a note there that I put it to use
[20:30:06] <Krenair>	 marxarelli, btw, one thing I noticed about that
[20:30:28] <Krenair>	 | user      | host           | password                                  |
[20:30:31] <Krenair>	 | root      | deployment-db1 |                                           |
[20:30:59] <Krenair>	 there is still a user for the old host?
[20:32:23] <marxarelli>	 ah, looks like it
[20:32:26] <marxarelli>	 should delete that
[20:34:15] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic07 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:34:57] <shinken-wm>	 PROBLEM - Puppet run on integration-saltmaster is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:35:02] <Krenair>	 ugh, hang on
[20:35:03] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:35:19] <Krenair>	 I forgot one thing on -db04
[20:35:31] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:35:39] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-be02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:36:34] <wikibugs_>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10hardware-requests: codfw: 1 hardware access request for continuous integration - https://phabricator.wikimedia.org/T150865#2800470 (10RobH) a:03mark contint1001 has Dual Intel® Xeon® Processor E5-2640 v3 (2.6GHz/8c), dua...
[20:36:40] <shinken-wm>	 PROBLEM - Puppet run on deployment-memc05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:37:37] <Krenair>	 okay, now it seems to work properly there too
[20:37:42] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1011 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:38:00] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:38:18] <shinken-wm>	 PROBLEM - Puppet run on deployment-db04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:38:20] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic05 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:38:48] <shinken-wm>	 PROBLEM - Puppet run on deployment-restbase01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:38:58] <marxarelli>	 Krenair: just dropped the root@deployment-db1, fyi
[20:39:15] <marxarelli>	 the root@deployment-db1 *user*
[20:39:27] <Krenair>	 ty
[20:39:30] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:40:43] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-precise-1011 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:41:28] <wmf-insecte>	 Project selenium-Echo » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #212: 04FAILURE in 27 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/212/
[20:41:35] <wmf-insecte>	 Project selenium-Echo » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #212: 04FAILURE in 34 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/212/
[20:43:47] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:43:51] <shinken-wm>	 PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:43:57] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-precise-1012 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:44:05] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1003 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:44:28] <shinken-wm>	 PROBLEM - Puppet run on deployment-conf03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:45:04] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-precise-1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:45:23] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:45:38] <shinken-wm>	 PROBLEM - Puppet run on deployment-parsoid09 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:45:51] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:46:55] <shinken-wm>	 PROBLEM - Puppet run on repository is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:47:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-zookeeper01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:47:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-logstash2 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:47:41] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:47:49] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:48:36] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic06 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:48:46] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:48:48] <shinken-wm>	 PROBLEM - Puppet run on deployment-fluorine02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[20:49:08] <shinken-wm>	 PROBLEM - Puppet run on deployment-jobrunner02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:49:10] <shinken-wm>	 PROBLEM - Puppet run on deployment-salt02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:49:27] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki05 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:49:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-urldownloader is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:50:01] <shinken-wm>	 PROBLEM - Puppet run on deployment-mira is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:50:05] <shinken-wm>	 PROBLEM - Puppet run on deployment-apertium02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:50:13] <shinken-wm>	 PROBLEM - Puppet run on deployment-zotero01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:50:23] <shinken-wm>	 PROBLEM - Puppet run on deployment-changeprop is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:50:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-pdfrender02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:51:46] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:52:58] <shinken-wm>	 PROBLEM - Puppet run on deployment-prometheus01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:53:02] <shinken-wm>	 PROBLEM - Puppet run on castor is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:53:20] <elukey>	 !log restored apache2 config on deployment-mediawiki06
[20:53:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[20:53:31] <elukey>	 (results of the experiment - https://phabricator.wikimedia.org/T57857#2800519)
[20:53:32] <shinken-wm>	 PROBLEM - Puppet run on integration-puppetmaster01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:53:57] <Krenair>	 hashar, I'm done messing with things in beta for now, let me know if you notice any problems (beyond the labs-puppet-breakage above)
[21:07:40] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:08:56] <wikibugs_>	 03Scap3, 10Parsoid: Env vars being overwritten - https://phabricator.wikimedia.org/T150897#2800601 (10Arlolra)
[21:10:31] <hashar>	 Krenair: marxarelli: the Jenkins job that runs update.php seems all happy :] ( https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad/12849/console  )
[21:13:01] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:13:15] <shinken-wm>	 RECOVERY - Puppet run on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:13:19] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:13:25] <shinken-wm>	 PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[21:14:16] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:14:30] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:14:56] <shinken-wm>	 RECOVERY - Puppet run on integration-saltmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[21:15:04] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:15:30] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:15:41] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-be02 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:16:43] <shinken-wm>	 RECOVERY - Puppet run on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:17:41] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1011 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:18:48] <shinken-wm>	 RECOVERY - Puppet run on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:19:06] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1003 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:20:04] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-precise-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:20:44] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-precise-1011 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:21:54] <shinken-wm>	 RECOVERY - Puppet run on repository is OK: OK: Less than 1.00% above the threshold [0.0]
[21:22:42] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 10Thumbor: Thumbor keeps losing Swift auth on beta - https://phabricator.wikimedia.org/T150649#2800637 (10fgiunchedi) @Krenair where are you seeing that btw?  The issue afaics is that swift on `deployment-ms-fe01` doesn't have the password for `mw:thumbor` in `/et...
[21:23:19] <hashar>	 elukey: still around? 
[21:23:49] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:23:49] <hashar>	 elukey: apache config test used to be a thing. We can excavate a bunch of already existing test scripts ;]   lets poke each other tomorrow
[21:23:53] <shinken-wm>	 RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0]
[21:23:57] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-precise-1012 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:24:28] <shinken-wm>	 RECOVERY - Puppet run on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:24:30] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:25:14] <shinken-wm>	 RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:25:20] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:25:36] <shinken-wm>	 RECOVERY - Puppet run on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:25:50] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:26:47] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:27:29] <shinken-wm>	 RECOVERY - Puppet run on deployment-zookeeper01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:27:29] <shinken-wm>	 RECOVERY - Puppet run on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:27:51] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:28:01] <hashar>	 commented on task
[21:28:01] <shinken-wm>	 RECOVERY - Puppet run on castor is OK: OK: Less than 1.00% above the threshold [0.0]
[21:28:33] <shinken-wm>	 RECOVERY - Puppet run on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:28:35] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:28:47] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:29:09] <shinken-wm>	 RECOVERY - Puppet run on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:29:11] <shinken-wm>	 RECOVERY - Puppet run on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:29:30] <shinken-wm>	 RECOVERY - Puppet run on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0]
[21:30:00] <shinken-wm>	 RECOVERY - Puppet run on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[21:30:06] <shinken-wm>	 RECOVERY - Puppet run on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:30:24] <shinken-wm>	 RECOVERY - Puppet run on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0]
[21:30:32] <shinken-wm>	 RECOVERY - Puppet run on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:32:58] <shinken-wm>	 RECOVERY - Puppet run on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:39:26] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10hardware-requests: codfw: 1 hardware access request for continuous integration - https://phabricator.wikimedia.org/T150865#2800694 (10hashar) @RobH pointed out contint1001 does not use SSD and that might be an IO bottlenec...
[21:58:29] <wmf-insecte>	 Project selenium-Core » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #220: 04FAILURE in 6 min 28 sec: https://integration.wikimedia.org/ci/job/selenium-Core/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/220/
[22:05:18] <wikibugs_>	 10Continuous-Integration-Infrastructure, 06Operations, 07Nodepool, 13Patch-For-Review: Clean up apt:pin of python modules used for Nodepool - https://phabricator.wikimedia.org/T137217#2800743 (10hashar) Will probably want to cleanup apt.wm.o jessie-wikimedia/backports   I will reach out to European ops to...
[22:17:04] <etonkovidova>	 what's happening on beta?  "Sorry! This site is experiencing technical difficulties. Cannot access the database"....
[22:20:41] <Krenair>	 wfm?
[22:21:01] <wmf-insecte>	 Project selenium-CentralAuth » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #213: 04FAILURE in 1 min 0 sec: https://integration.wikimedia.org/ci/job/selenium-CentralAuth/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/213/
[22:22:13] <greg-g>	 Krenair: i get it when trying to login
[22:22:44] <Krenair>	 huh
[22:23:35] <Krenair>	 oh, right
[22:23:36] <Krenair>	 my bad
[22:24:29] <Krenair>	 etonkovidova, greg-g try now
[22:25:12] <greg-g>	 got further, just logging into LP now :)
[22:25:25] <greg-g>	 yup
[22:25:38] <etonkovidova>	 Krenair: greg-g All looks normal - thx!
[22:25:49] <Krenair>	 when I was making wikiadmin/wikiuser permissions like production's, I forgot about the centralauth database
[22:42:48] <paladox>	 I've updated the arcanist installer for windows https://github.com/paladox/Arcanist-installer-for-windows/releases/tag/1.7.0 :)
[22:43:06] <paladox>	 So easy to install arcanist on windows.
[22:45:14] <paladox>	 twentyafterfour ^^ :)
[22:47:39] <grrrit-wm>	 (03PS1) 10Gergő Tisza: [EmailAuth] add standard endpoints [integration/config] - 10https://gerrit.wikimedia.org/r/322004 
[22:49:42] <grrrit-wm>	 (03CR) 10Paladox: [C: 031] [EmailAuth] add standard endpoints [integration/config] - 10https://gerrit.wikimedia.org/r/322004 (owner: 10Gergő Tisza)
[23:20:21] <wmf-insecte>	 Project beta-update-databases-eqiad build #12852: 04FAILURE in 20 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12852/
[23:34:28] <wikibugs>	 10Beta-Cluster-Infrastructure, 06WMDE-TLA-Team, 13Patch-For-Review, 15User-Ladsgroup: Make beta German Wiktionary - https://phabricator.wikimedia.org/T150764#2801053 (10Mattflaschen-WMF) addWiki failed saying the database existed already (?), so now I have to delete it again.
[23:39:04] <wikibugs>	 10Beta-Cluster-Infrastructure, 06WMDE-TLA-Team, 13Patch-For-Review, 15User-Ladsgroup: Make beta German Wiktionary - https://phabricator.wikimedia.org/T150764#2801056 (10Mattflaschen-WMF) I think addWiki.php somehow creates it twice, not sure where.
[23:50:05] <matt_flaschen>	 twentyafterfour, bd808, ostriches, Reedy, any of you available to review a one-liner to fix addWiki.php (and that job)?
[23:50:17] <twentyafterfour>	 sure
[23:50:26] <twentyafterfour>	 matt_flaschen: gladly
[23:50:36] <matt_flaschen>	 twentyafterfour, thanks: https://gerrit.wikimedia.org/r/#/c/322017/ . 
[23:51:34] <twentyafterfour>	 lgtm
[23:51:36] <Krenair>	 how many one-liners does it really take to fix addWiki.php
[23:52:11] <matt_flaschen>	 Krenair, hopefully two.  I tried to test it locally yesterday, but it just doesn't work.  The Vagrant multi-wiki is too different.
[23:52:18] <matt_flaschen>	 I think this is the last one.
[23:52:20] <Krenair>	 hah
[23:52:22] <Krenair>	 yeah
[23:52:53] <Krenair>	 once upon a time I thought that addWiki could be fixed once and for all
[23:53:07] <Krenair>	 experience has shown that every fix to that script is temporary
[23:53:08] <matt_flaschen>	 None of this is specific to dewiktionary or anything new.  I guess people keep just doing one-off fixes.
[23:53:19] <ostriches>	 I need to rewrite addWiki
[23:53:21] <Krenair>	 next month a new developer will find a new way to break it
[23:53:24] <matt_flaschen>	 Or not fixing the script at all and just fixing the wiki they're setting up.
[23:53:27] <ostriches>	 $backlog++
[23:54:02] <matt_flaschen>	 Okay, to be fair, the External Store is new and I broke that (then fixed it).  But it didn't even get to there today before breaking.
[23:54:06] <Krenair>	 when it breaks in production (which is almost as common as it running in production), we try to fix it
[23:54:23] <ostriches>	 It breaks every time!
[23:54:38] <grrrit-wm>	 (03PS1) 10Dzahn: delete .htaccess files for doc/integration [integration/docroot] - 10https://gerrit.wikimedia.org/r/322020 (https://phabricator.wikimedia.org/T150727) 
[23:54:54] <ostriches>	 "I want to create a wiki" -> runs addwiki and it breaks -> "Shit, lemme fix this." -> creates wiki
[23:54:58] <Reedy>	 ostriches: I find, when you create 4 wikis in a row, it only breaks the on the first one
[23:54:59] <Reedy>	 :P
[23:55:02] <ostriches>	 Go back to the start :p
[23:55:25] <ostriches>	 Reedy: The solution, clearly, is to stop creating wikis!
[23:55:40] <Krenair>	 Dereckson did two the other day, obviously it worked the second time because he live-hacked it to work the first time
[23:55:52] <Reedy>	 They are bad for your health
[23:55:58] <Krenair>	 except the first time was really the first two times because he had to run it, change it to comment the stuff that it had done, then run it again
[23:56:56] <Reedy>	 It's not too bad until you have to keep deleting the whole database in production to re-run it