[00:18:13] (03CR) 10Samwilson: [C: 031] Update squizlabs/PHP_CodeSniffer to 2.8.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/336935 (owner: 10Paladox) [00:21:30] 10MediaWiki-Codesniffer: Release Codesniffer v0.8.0 - https://phabricator.wikimedia.org/T154715#3015402 (10Samwilson) > Core will need validating against it and 0.8.0 to check nothing needs changing But core doesn't have to update to 0.8.0 immediately does it? It will stay on 0.7.2 until all sniffs pass. [00:27:28] 03Scap3, 06Services (later), 15User-mobrovac: Delay repooling trending service after a restart - https://phabricator.wikimedia.org/T156687#3015422 (10mobrovac) p:05Normal>03High Raising the priority since this is blocking the deployment of new features. [00:28:33] 03Scap3, 06Services (later), 15User-mobrovac: Delay repooling trending service after a restart - https://phabricator.wikimedia.org/T156687#3015426 (10mobrovac) [00:52:30] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 06Operations: Better mysql command prompt info - https://phabricator.wikimedia.org/T157714#3015480 (10Reedy) [03:24:46] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 13Patch-For-Review: Phase out scandium.eqiad.wmnet - https://phabricator.wikimedia.org/T150936#2801865 (10Dzahn) after the merge above, puppet run on scandium is unchanged. no-op [03:26:42] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 33 seconds ago with 1 failures. Failed resources (up to 3 shown): File[/var/lib/zuul/.ssh/id_rsa] [03:39:13] ACKNOWLEDGEMENT - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/var/lib/zuul/.ssh/id_rsa] daniel_zahn https://gerrit.wikimedia.org/r/#/c/336807/ [03:50:37] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [03:51:30] fixed ^ [03:51:43] you have zuul-merges on contint1001 and 2001 now [03:51:46] mergers [03:51:57] PROBLEM - git_daemon_running on contint2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/git-core/git-daemon --user [03:53:10] ACKNOWLEDGEMENT - git_daemon_running on contint2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/git-core/git-daemon --user daniel_zahn https://gerrit.wikimedia.org/r/#/c/336807/ [03:53:28] ACKNOWLEDGEMENT - git_daemon_running on contint1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/git-core/git-daemon --user daniel_zahn https://gerrit.wikimedia.org/r/#/c/336807/ [03:54:43] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 13Patch-For-Review: Phase out scandium.eqiad.wmnet - https://phabricator.wikimedia.org/T150936#3015778 (10Dzahn) now there is just this to check https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=git_daemon git_daemon c... [04:19:21] Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #293: 04FAILURE in 23 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/293/ [04:39:35] (03PS1) 10Ejegg: Use upstream civicrm-buildkit [integration/config] - 10https://gerrit.wikimedia.org/r/336960 [04:41:11] (03PS2) 10Ejegg: Use upstream civicrm-buildkit [integration/config] - 10https://gerrit.wikimedia.org/r/336960 [05:12:33] hmm .. what is going on here? https://integration.wikimedia.org/ci/job/parsoidsvc-source-npm-node-6-jessie/184/console [06:14:06] hmm [06:14:48] https://gerrit.wikimedia.org/r/#/c/336961/ [06:59:30] Yippee, build fixed! [06:59:31] Project selenium-Wikibase » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #265: 09FIXED in 2 hr 19 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/265/ [07:01:34] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<30.00%) [07:11:34] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [08:34:37] 10Continuous-Integration-Config, 06Front-end-Standards-Group: Consider moving from npm to yarn for WMF repos? - https://phabricator.wikimedia.org/T148230#3016022 (10Ricordisamoa) [08:55:03] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Better mysql command prompt info - https://phabricator.wikimedia.org/T157714#3016084 (10jcrespo) p:05Triage>03Low Let's get the ok from beta owners on any changes. [08:57:51] https://gerrit.wikimedia.org/r/#/c/336967/ [08:58:01] ehmm, jenkins is broken [09:01:05] Yup, can't connect to contint1001.wikimedia.org [09:01:10] https://integration.wikimedia.org/ci/job/npm-node-6-jessie/4369/console [09:01:40] hashar: ^ Is it going to be solved soon? [09:01:54] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Better mysql command prompt info - https://phabricator.wikimedia.org/T157714#3014147 (10hashar) antoine-approve [09:27:13] Amir1: ah yeah known [09:27:24] that is due to an update/deployment done in puppet overnight [09:27:30] Amir1: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=git_daemon [09:27:41] git-daemon process are not available [09:28:07] PROBLEM - zuul_merger_service_running on contint1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [09:29:17] PROBLEM - zuul_merger_service_running on contint2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [09:31:07] RECOVERY - zuul_merger_service_running on contint1001 is OK: PROCS OK: 1 process with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [09:33:23] Thanks [09:35:07] PROBLEM - zuul_merger_service_running on contint1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [09:36:26] hashar: this is scary, icinga says it has been done for six hours :( [09:36:29] *down [09:37:17] luckily there is not much activity over night :} [09:37:23] i am working on it [09:37:33] I got the process stopped on contint1001 and contint2001 [09:37:44] so that should work now [09:37:53] on scandium? [09:37:59] (the patches should only be handled on scandium.eqiad.wmnet which is reachable / has the git-daemon running) [09:38:06] so "just" recheck and it should pass [09:42:07] RECOVERY - git_daemon_running on contint2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/git-core/git-daemon --user [09:48:02] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: zuul-merger git-daemon process is not start properly by systemd ? - https://phabricator.wikimedia.org/T157785#3016294 (10hashar) [09:50:20] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: zuul-merger git-daemon process is not start properly by systemd ? - https://phabricator.wikimedia.org/T157785#3016310 (10hashar) [09:52:47] Amir1: all fixed now. I have filled a task to follow up on that ^^^ [09:53:07] RECOVERY - zuul_merger_service_running on contint1001 is OK: PROCS OK: 1 process with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [09:53:17] RECOVERY - zuul_merger_service_running on contint2001 is OK: PROCS OK: 1 process with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [09:57:53] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: zuul-merger git-daemon process is not start properly by systemd ? - https://phabricator.wikimedia.org/T157785#3016340 (10hashar) The puppet service definition at https://github.com/wikimedia/operations-puppet/blob/cfbdecb/modules/contint/mani... [10:01:09] nice! [10:01:11] Thanks [10:07:37] PROBLEM - jenkins_service_running on contint2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [10:10:50] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 13Patch-For-Review: Phase out scandium.eqiad.wmnet - https://phabricator.wikimedia.org/T150936#3016384 (10hashar) We now have a zuul-merger on each of contint1001 and contint2001. Assuming they are working properly we will be able to phase... [10:30:37] RECOVERY - jenkins_service_running on contint2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [10:37:58] (03CR) 10Addshore: [C: 032] Update squizlabs/PHP_CodeSniffer to 2.8.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/336935 (owner: 10Paladox) [10:39:01] (03Merged) 10jenkins-bot: Update squizlabs/PHP_CodeSniffer to 2.8.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/336935 (owner: 10Paladox) [10:40:30] Amir1: I think less than 10 patches have been affected :-} [10:41:04] And two of them were mine :D I'm sooo lucky [10:42:40] ;D [11:05:53] Amir1: looks like Ores has troubles on beta cluster: Failed to make ORES request to [https://ores-beta.wmflabs.org/scores/testwiki/?models=damaging%7Cgoodfaith%7Creverted&revids=707201&precache=true&format=json], There was a problem during the HTTP request: 400 BAD REQUEST [11:06:16] apparently due to some jobs deployment-jobrunner02 wikidatawiki 1.29.0-alpha runJobs ERROR: ORESFetchScoreJob [11:06:26] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Better mysql command prompt info - https://phabricator.wikimedia.org/T157714#3016508 (10jcrespo) I have merged the above patch as requested, but as I commented there, I do not think that will solve the tic... [11:06:39] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Better mysql command prompt info for Beta - https://phabricator.wikimedia.org/T157714#3016509 (10jcrespo) [11:07:00] hashar: I take care of it today, can you make a task and assign it to me? [11:07:07] It's config issues [11:07:13] $ curl 'https://ores-beta.wmflabs.org/scores/testwiki/?models=damaging%7Cgoodfaith%7Creverted&revids=706878&precache=true&format=json' [11:07:13] {"error": {"code": "bad request", "message": "Models '['goodfaith']' not available for testwiki."}} [11:07:14] :D [11:07:23] yeah filling one [11:07:38] Thanks! [11:09:18] 10Beta-Cluster-Infrastructure, 10ORES, 10Revision-Scoring-As-A-Service-Backlog: On beta cluster, ORESFetchScoreJob got a HTTP 400 bad request from ores-beta - https://phabricator.wikimedia.org/T157790#3016510 (10hashar) [11:09:23] Amir1: https://phabricator.wikimedia.org/T157790 :) [11:09:35] Amir1: it just an event once every x seconds, not too concerning [11:11:19] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Better mysql command prompt info for Beta - https://phabricator.wikimedia.org/T157714#3014147 (10Marostegui) We can always add it to the `[client]` section too [11:12:46] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Better mysql command prompt info for Beta - https://phabricator.wikimedia.org/T157714#3016526 (10jcrespo) >>! In T157714#3016090, @hashar wrote: > antoine-approve photo You are aging well, you look younge... [11:15:28] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Better mysql command prompt info for Beta - https://phabricator.wikimedia.org/T157714#3016543 (10jcrespo) >>! In T157714#3016524, @Marostegui wrote: > We can always add it to the `[client]` section too No... [11:18:24] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Better mysql command prompt info for Beta - https://phabricator.wikimedia.org/T157714#3016547 (10hashar) > You are aging well, you look younger now than on the profile photo. I am actually younger on the... [11:26:17] Great, thanks [11:34:19] PROBLEM - Puppet run on deployment-tmh01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:36:00] !log Pruning some old caches from castor.integration.eqiad.wmflabs (eg node-4 jobs are gone) [11:36:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:36:52] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Better mysql command prompt info for Beta - https://phabricator.wikimedia.org/T157714#3016567 (10jcrespo) Have a look on how I handle it on production with an alias- enforce the host staticly rathen than \... [12:14:19] RECOVERY - Puppet run on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [12:44:58] (03PS1) 10Addshore: Rename MoveToCommons mw extension [integration/config] - 10https://gerrit.wikimedia.org/r/336997 (https://phabricator.wikimedia.org/T157539) [12:45:12] hasharLunch: ^^ would be epic to have that deployed when you get back :D [12:45:41] 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Don't throttle WMF office IP(s) for account creation in beta - https://phabricator.wikimedia.org/T87841#3016680 (10Aklapper) a:05JohnLewis>03None [12:47:37] addshore: not in sort order ? :} but I don't really care :} [12:49:26] is that file actually fully in sort order? [12:49:28] *looks* [12:49:53] bwhahaa, just found this line part way though [12:49:53] # Start of new currently disabled extensions checks from mass extension import - Addshore [12:50:01] then it starts from A again :P [12:51:59] yeah [12:52:02] it is all messed up [12:52:09] addshore: I am just going to merge as is [12:52:15] [= [12:52:29] (03CR) 10Hashar: [C: 032] Rename MoveToCommons mw extension [integration/config] - 10https://gerrit.wikimedia.org/r/336997 (https://phabricator.wikimedia.org/T157539) (owner: 10Addshore) [12:52:34] It's all messed up, I think due to me X months / years ago xD [12:52:59] to be fair, it should be autogenerated [12:53:15] (03Merged) 10jenkins-bot: Rename MoveToCommons mw extension [integration/config] - 10https://gerrit.wikimedia.org/r/336997 (https://phabricator.wikimedia.org/T157539) (owner: 10Addshore) [12:53:53] deployed [12:53:58] epic! [13:49:07] PROBLEM - Puppet run on buildlog is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:38:29] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations: Phase out scandium.eqiad.wmnet - https://phabricator.wikimedia.org/T150936#3016982 (10hashar) @Dzahn @RobH we no more need scandium.eqiad.wmnet. It was solely running the `zuul-merger` service which is now running on contint10... [14:42:50] bah beta cluster puppet master is stall [14:49:11] !log rebase beta puppet master. Fixed conflicts with https://gerrit.wikimedia.org/r/#/c/321096/ and https://gerrit.wikimedia.org/r/#/c/312523/ [14:49:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:52:15] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Better mysql command prompt info for Beta - https://phabricator.wikimedia.org/T157714#3017025 (10hashar) Worth noting, I think most of us use the deployment server deployment-tin.eqiad.wmnet to connect to... [14:55:02] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Better mysql command prompt info for Beta - https://phabricator.wikimedia.org/T157714#3017032 (10jcrespo) Whatever you decide, I would be happy to help- this is is more of a client issue rather than server... [14:57:06] PROBLEM - Puppet run on deployment-aqs01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [14:58:11] (03PS1) 10Hashar: [operations/puppet/mariadb] switch to rake [integration/config] - 10https://gerrit.wikimedia.org/r/337028 (https://phabricator.wikimedia.org/T154894) [14:59:08] PROBLEM - Puppet run on deployment-restbase02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:00:01] (03CR) 10Hashar: [C: 032] [operations/puppet/mariadb] switch to rake [integration/config] - 10https://gerrit.wikimedia.org/r/337028 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [15:01:10] (03Merged) 10jenkins-bot: [operations/puppet/mariadb] switch to rake [integration/config] - 10https://gerrit.wikimedia.org/r/337028 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [15:08:46] PROBLEM - Puppet run on deployment-restbase01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [15:42:40] jdlrobson: around? looking for the task where you asked to create users on beta cluster via the api, can not find it :| [15:45:05] found it, please ignore https://phabricator.wikimedia.org/T152432 [16:03:27] (03CR) 10Hashar: "Wouldn't it be simpler to update our fork wikimedia/fundraising/civicrm-buildkit ? Else make sure to remember to drop the github.com as " [integration/config] - 10https://gerrit.wikimedia.org/r/336960 (owner: 10Ejegg) [16:03:53] 10Browser-Tests-Infrastructure, 15User-zeljkofilipin: Make it possible to execute tests as a specific (new) MediaWiki user on beta cluster - https://phabricator.wikimedia.org/T152432#3017233 (10zeljkofilipin) Selenium user can create accounts via the API at beta cluster without captcha: ``` $ pry [1] pry(mai... [16:13:45] (03PS3) 10Ejegg: Use upstream civicrm-buildkit [integration/config] - 10https://gerrit.wikimedia.org/r/336960 [16:16:38] (03CR) 10Ejegg: "hashar, I don't understand exactly what you mean by 'remove github.com'? Would that mean using zuul-cloner instead of vanilla 'git clone'," [integration/config] - 10https://gerrit.wikimedia.org/r/336960 (owner: 10Ejegg) [16:18:06] PROBLEM - zuul_merger_service_running on scandium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [16:18:40] !log deployment-puppetmaster02:/var/lib/git/operations/puppet removed untracked file "how", updated submodules [16:18:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:19:53] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10hardware-requests: Phase out scandium.eqiad.wmnet - https://phabricator.wikimedia.org/T150936#3017347 (10hashar) [16:21:23] - zuul_merger_service_running on scandium CRITICAL [16:21:27] that one is scandium being removed [16:21:36] the zuul-merger now run on contint1001 and contint2001 [16:22:36] PROBLEM - git_daemon_running on scandium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/git-core/git-daemon --user [16:23:05] ACKNOWLEDGEMENT - git_daemon_running on scandium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/git-core/git-daemon --user daniel_zahn https://phabricator.wikimedia.org/T150936 [16:23:05] ACKNOWLEDGEMENT - zuul_merger_service_running on scandium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger daniel_zahn https://phabricator.wikimedia.org/T150936 [16:24:05] leaving for now. Have a good friday [16:28:28] PROBLEM - Puppet run on deployment-tin is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:28:59] ^ this is me [16:51:38] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-SecureSessions: Secure Sessions ships with php-geoip, but test infrastructure has it already compiled, which gives failures - https://phabricator.wikimedia.org/T157814#3017456 (10Umherirrender) [16:53:27] RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [16:54:59] PROBLEM - Puppet run on deployment-mira is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:55:43] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-SecureSessions: Secure Sessions ships with php-geoip, but test infrastructure has it already compiled, which gives failures - https://phabricator.wikimedia.org/T157814#3017456 (10Reedy) https://github.com/wikimedia/mediawiki-extensions-SecureSessi... [16:56:48] * thcipriani checks deployment-mira as it's also likely his doing [17:05:38] (03PS1) 10Addshore: Add extension-unittests-generic to FileImporter [integration/config] - 10https://gerrit.wikimedia.org/r/337050 [17:06:51] 10Continuous-Integration-Infrastructure: Cannot access the database: Access denied for user 'jenkins_u0'@'127.0.0.1' to database 'closedwikis' - https://phabricator.wikimedia.org/T157815#3017501 (10Umherirrender) [17:09:09] RECOVERY - Puppet run on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:18:46] RECOVERY - Puppet run on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:20:01] RECOVERY - Puppet run on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [17:25:15] https://twitter.com/MikeRogers0/status/829850420948455424 [17:25:20] This seems relevant [17:27:42] :) [17:28:07] said a better way: "100% unit test coverage, no integration tests" [17:32:03] 10Continuous-Integration-Infrastructure, 07Zuul: zuul-merger fails when repository names overlaps - https://phabricator.wikimedia.org/T157818#3017592 (10hashar) [17:32:08] RECOVERY - Puppet run on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:32:12] 10Continuous-Integration-Infrastructure: operations/software.git commits failing as unable to merge? - https://phabricator.wikimedia.org/T138455#2400861 (10hashar) Refiled as T157818 [17:37:09] Reedy: that is a good one :-}}}}}}} [17:37:21] be back later tonight *wave* [17:39:54] 10Beta-Cluster-Infrastructure, 10ORES, 10Revision-Scoring-As-A-Service-Backlog: On beta cluster, ORESFetchScoreJob got a HTTP 400 bad request from ores-beta - https://phabricator.wikimedia.org/T157790#3017640 (10Ladsgroup) I'm working on this. This is rather weird because we haven't enabled ores in wikidata... [17:44:02] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-SecureSessions, 13Patch-For-Review: Secure Sessions ships with php-geoip, but test infrastructure has it already compiled, which gives failures - https://phabricator.wikimedia.org/T157814#3017662 (10Reedy) Tests now pass on that version... I won... [17:57:25] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-SecureSessions, 13Patch-For-Review: Secure Sessions ships with php-geoip, but test infrastructure has it already compiled, which gives failures - https://phabricator.wikimedia.org/T157814#3017685 (10Reedy) I've just flipped the patches around [18:02:43] 10Continuous-Integration-Infrastructure, 07Zuul: zuul-merger fails when repository names overlaps - https://phabricator.wikimedia.org/T157818#3017592 (10Paladox) I thought ytterbium.wikimedia.org is no more? I thought that's cobalt now? [18:04:05] 03Scap3: Support using scap on localhost without needing ssh and self hosting puppet masters - https://phabricator.wikimedia.org/T156197#3017705 (10mmodell) p:05Triage>03Low This would be nice to have but it isn't a high priority right now. [18:05:08] 10Deployment-Systems, 06Release-Engineering-Team, 03Scap3, 10scap, and 2 others: Error after "Finished deploy": xrange() arg 3 must not be zero - https://phabricator.wikimedia.org/T157136#3017708 (10thcipriani) p:05Triage>03High [18:06:24] 03Scap3, 13Patch-For-Review, 15User-mobrovac: Scap deploy failed to sync git-fat artifacts - https://phabricator.wikimedia.org/T147856#3017717 (10thcipriani) [18:06:28] 03Scap3, 06Analytics-Kanban, 06Operations, 13Patch-For-Review: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3017715 (10thcipriani) 05Open>03Resolved Thanks @Ottomata! [18:07:17] 10Deployment-Systems, 03Scap3, 10scap: scap wikiversions compile happening too late in scap sync - https://phabricator.wikimedia.org/T156851#3017718 (10mmodell) >>! In T156851#2988199, @bd808 wrote: > Moving `tasks.sync_common` after `wikiversions-compile` certainly would have caused this. Moving the sync **... [18:07:21] 03Scap3, 13Patch-For-Review, 15User-mobrovac: Scap deploy failed to sync git-fat artifacts - https://phabricator.wikimedia.org/T147856#2705818 (10thcipriani) 05Open>03Resolved With the new version of git-fat deployed, this should now be fixed! Please, reopen if this recurs. [18:09:53] 03Scap3, 10scap, 06Operations: Trying to scap while l10nupdate is syncing shows unhelpful error - https://phabricator.wikimedia.org/T153278#3017743 (10thcipriani) 05Open>03Resolved a:03thcipriani Removed the backtrace from this error, output should be cleaner and should only show the `LockFailedError`... [18:10:21] 10Beta-Cluster-Infrastructure, 10ORES, 10Revision-Scoring-As-A-Service-Backlog: On beta cluster, ORESFetchScoreJob got a HTTP 400 bad request from ores-beta - https://phabricator.wikimedia.org/T157790#3017746 (10Ladsgroup) In [[https://github.com/wikimedia/operations-mediawiki-config/blob/938d28a2fc35f868036... [18:10:35] 03Scap3: Make scap plugins generally useful - https://phabricator.wikimedia.org/T151470#3017747 (10thcipriani) p:05Triage>03Low [18:11:02] It seems ORES is being enabled in wikidata in beta even though configs are not like that: https://phabricator.wikimedia.org/T157790#3017746 [18:11:20] 10Deployment-Systems, 03Scap3 (Scap3-Adoption-Phase1), 10scap, 10Wikimedia-IEG-grant-review, and 2 others: Deploy iegreview with scap3 - https://phabricator.wikimedia.org/T129154#3017748 (10mmodell) p:05Triage>03Low [18:11:22] Can someone enlighten me why? :( [18:11:28] 10Deployment-Systems, 03Scap3 (Scap3-Adoption-Phase1), 10scap, 10Wikimedia-Stream: Deploy rcstream with scap3 - https://phabricator.wikimedia.org/T129153#3017749 (10mmodell) p:05Triage>03Normal [18:11:34] 03Scap3, 10Phabricator: GPG Sign git tags - https://phabricator.wikimedia.org/T150696#3017750 (10thcipriani) p:05Triage>03Normal I did this for 1 release and then haven't done it since. I think I need to update some gbp config to support requiring this as well. [18:12:36] 03Scap3, 13Patch-For-Review: scap deploy-local should make fewer assumptions about server/directories - https://phabricator.wikimedia.org/T146602#3017752 (10thcipriani) p:05High>03Low Have to figure out a way to deploy this cleanly and haven't done that yet. Lowering priority since it's been taking a back-... [18:13:25] 03Scap3: Support using scap on localhost without needing ssh and self hosting puppet masters - https://phabricator.wikimedia.org/T156197#3017755 (10mmodell) [18:13:29] 10Deployment-Systems, 03Scap3: Deploy mediawiki release tools repo (rMREL) with scap3 - https://phabricator.wikimedia.org/T142588#3017754 (10mmodell) [18:15:26] 06Release-Engineering-Team, 03Scap3 (Scap3-MediaWiki-MVP): /srv/mediawiki on tin not being updated when using scap sync-file - https://phabricator.wikimedia.org/T152005#3017757 (10thcipriani) 05Open>03Resolved a:03thcipriani `/srv/mediawiki` is now being updated on all syncs; however, I created a new bug... [18:17:08] 06Release-Engineering-Team, 03Scap3, 10scap: Cleanup old cache dirs if still around - https://phabricator.wikimedia.org/T157743#3017782 (10thcipriani) [18:18:13] 03Scap3, 10scap, 13Patch-For-Review: Automatically clean up unused wmfXX versions - https://phabricator.wikimedia.org/T73313#3017783 (10thcipriani) [18:22:10] 06Release-Engineering-Team, 03Scap3: scap should handle changes to .gitmodules - https://phabricator.wikimedia.org/T157694#3017801 (10mmodell) [18:22:12] 10Deployment-Systems, 06Release-Engineering-Team, 03Scap3, 06Operations, 15User-Addshore: cannot delete non-empty directory: php-1.29.0-wmf.3 messages on 'scap sync' on mwdebug1002 - https://phabricator.wikimedia.org/T157030#3017802 (10mmodell) [18:22:15] 06Release-Engineering-Team, 03Scap3: Update scap to take care of -labs becoming -beta - https://phabricator.wikimedia.org/T150342#3017803 (10mmodell) [18:22:17] 03Scap3: Improve scap canary check messages - https://phabricator.wikimedia.org/T142342#3017806 (10mmodell) [18:22:20] 10Deployment-Systems, 03Scap3: Considering adding a --no-touch flag to scap that stops automatic touch of InitialiseSettings.php - https://phabricator.wikimedia.org/T149872#3017804 (10mmodell) [18:22:22] 10Deployment-Systems, 03Scap3: handle logstash timeouts separately from spikes in errors reported by logstash - https://phabricator.wikimedia.org/T144033#3017805 (10mmodell) [18:22:24] 03Scap3: Add blacklist support to scap.tasks.check_valid_syntax linter - https://phabricator.wikimedia.org/T136009#3017809 (10mmodell) [18:22:26] 10Deployment-Systems, 03Scap3, 07WorkType-NewFunctionality: Create canary deploy process for MediaWiki - https://phabricator.wikimedia.org/T136883#3017808 (10mmodell) [18:22:28] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3, 06Operations: Make git 2.2.0+ (preferably 2.8.x) available - https://phabricator.wikimedia.org/T140927#3017807 (10mmodell) [18:22:30] 03Scap3: Create sync-config subcommand - https://phabricator.wikimedia.org/T131809#3017810 (10mmodell) [18:22:32] 03Scap3: sync-* and scap should only lint changed files from last deploy - https://phabricator.wikimedia.org/T124171#3017811 (10mmodell) [18:22:35] 03Scap3: scap-purge-l10n-cache hanging - https://phabricator.wikimedia.org/T122008#3017812 (10mmodell) [18:22:37] 10Deployment-Systems, 03Scap3: sync-wikiversions not syncing wikiversions.json with mira - https://phabricator.wikimedia.org/T121585#3017813 (10mmodell) [18:22:38] sorry for the bugspam [18:22:39] 03Scap3: scap shouldn't log completion (it should log fail!) - https://phabricator.wikimedia.org/T110793#3017816 (10mmodell) [18:22:41] 03Scap3: Special:Version on Wikimedia wikis shows outdated commit hashes for submodules - https://phabricator.wikimedia.org/T116345#3017814 (10mmodell) [18:22:44] 06Release-Engineering-Team, 03Scap3: Scap should abort early when Keyholder is not armed - https://phabricator.wikimedia.org/T111062#3017815 (10mmodell) [18:22:46] 03Scap3: Don't continue scap if sync to all proxies failed - https://phabricator.wikimedia.org/T110791#3017817 (10mmodell) [18:22:48] 03Scap3: [scap] New command to sync all of the files touched in a given commit - https://phabricator.wikimedia.org/T108132#3017819 (10mmodell) [18:22:50] 03Scap3: Merge scap scripts "mwscriptwikiset" and "foreachwikiindblist" into one - https://phabricator.wikimedia.org/T109798#3017818 (10mmodell) [18:22:52] 03Scap3: scap should be LCStore-agnostic - https://phabricator.wikimedia.org/T105683#3017820 (10mmodell) [18:22:54] 03Scap3: [scap] Add support for a global mutex to keep multiple masters from clobbering each other - https://phabricator.wikimedia.org/T105195#3017821 (10mmodell) [18:22:56] 03Scap3: scap eats underlying commands output (such as maintenance script stacktrace) - https://phabricator.wikimedia.org/T97140#3017824 (10mmodell) [18:22:58] 03Scap3, 10Icinga, 06Operations: expose hosts in maintenance state so we can prevent scap from running on them - https://phabricator.wikimedia.org/T100777#3017822 (10mmodell) [18:23:01] 03Scap3: scap only one version - https://phabricator.wikimedia.org/T100575#3017823 (10mmodell) [18:23:03] 03Scap3: [scap] Log directly to logstash via syslog input - https://phabricator.wikimedia.org/T86969#3017827 (10mmodell) [18:23:05] 03Scap3, 06WMF-Legal, 07Documentation, 07Software-Licensing: mediawiki/tools/scap is lacking a license - https://phabricator.wikimedia.org/T94239#3017825 (10mmodell) [18:23:07] 03Scap3, 06Operations, 13Patch-For-Review: Decide on /var/lib vs /home as locations of homedir for mwdeploy - https://phabricator.wikimedia.org/T86971#3017826 (10mmodell) [18:23:10] 03Scap3: sync-wikiversions reporting success when all hosts failed - https://phabricator.wikimedia.org/T78024#3017829 (10mmodell) [18:23:12] 03Scap3: [scap] Suppress/de-emphasize errors from hosts marked has being under maintenance - https://phabricator.wikimedia.org/T78319#3017828 (10mmodell) [18:23:14] 03Scap3: Don't allow servers to randomly sync across DC - https://phabricator.wikimedia.org/T76658#3017830 (10mmodell) [18:23:16] 03Scap3: [scap] Add a file recording deploy information to all scap/sync-* calls - https://phabricator.wikimedia.org/T72477#3017832 (10mmodell) [18:23:18] 03Scap3: [scap] Syncing a dblist referencing a nonexistent DB should be prevented - https://phabricator.wikimedia.org/T72132#3017833 (10mmodell) [18:23:20] 03Scap3, 13Patch-For-Review: Automatically clean up unused wmfXX versions - https://phabricator.wikimedia.org/T73313#3017831 (10mmodell) [18:23:22] 03Scap3: [scap] Add a log appender to log to a local file - https://phabricator.wikimedia.org/T68857#3017835 (10mmodell) [18:23:24] 03Scap3: [scap] Sync fewer files from old deploy branches - https://phabricator.wikimedia.org/T68053#3017837 (10mmodell) [18:23:26] 03Scap3: [scap] Add a command line flag to replace DOLOGMSGNOLOG - https://phabricator.wikimedia.org/T68049#3017838 (10mmodell) [18:23:28] 03Scap3: [scap] Make the hostname of a failing host more prominent in the error messages - https://phabricator.wikimedia.org/T68302#3017836 (10mmodell) [18:23:30] 03Scap3: scap should log the names of the last N hosts - https://phabricator.wikimedia.org/T67025#3017839 (10mmodell) [18:23:32] 03Scap3: Include commit hash in log message for every sync - https://phabricator.wikimedia.org/T64340#3017841 (10mmodell) [18:23:34] 03Scap3, 07HHVM: [scap] Compile HHVM bytecode cache as deployment step - https://phabricator.wikimedia.org/T66272#3017840 (10mmodell) [18:23:36] 03Scap3: [scap] Recompute and sync git version cache when sync-* are used - https://phabricator.wikimedia.org/T38271#3017842 (10mmodell) [18:23:38] hebeejebee [18:23:38] 03Scap3: [scap] Local sync script on any individual server should be atomic - https://phabricator.wikimedia.org/T22085#3017843 (10mmodell) [18:24:14] 10Scap, 10Phabricator: GPG Sign git tags - https://phabricator.wikimedia.org/T150696#3017846 (10demon) Something like this? keyid = sign-tags = 1 Question is what key do we sign this with? Should we have a shared key that deployers can access? I'm thinking something like keyholder but for gpg instea... [18:31:26] 10Deployment-Systems, 06Release-Engineering-Team, 10Scap, 06Operations, 15User-Addshore: cannot delete non-empty directory: php-1.29.0-wmf.3 messages on 'scap sync' on mwdebug1002 - https://phabricator.wikimedia.org/T157030#3017856 (10mmodell) 05Open>03Resolved a:03mmodell [18:33:21] 10Scap: [scap] Log directly to logstash via syslog input - https://phabricator.wikimedia.org/T86969#3017860 (10bd808) The notes about redis input are horribly out of date. The redis input queue was killed fairly soon after being deployed. Syslog is the transport used by MediaWiki these days, but there are sever... [19:34:57] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [19:36:25] Project beta-scap-eqiad build #141747: 04FAILURE in 1 min 32 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/141747/ [19:37:43] huh, error rate spike on beta [19:38:23] "Fatal error: Call to undefined method __PHP_Incomplete_Class::hasReached() in /srv/mediawiki/php-master/includes/libs/rdbms/loadbalancer/LoadBalancer.php on line 494", [19:38:37] greg-g: is it known that betalabs is down? there is Fatal error: Call to undefined method __PHP_Incomplete_Class::hasReached() in /srv/mediawiki/php-master/includ... [19:43:00] greg-g: and generally the following for any attempt to edit any content - Error loading data from server: HTTP 503. [19:45:10] hrm. things that merged recently that touched LoadBalancer.php https://gerrit.wikimedia.org/r/#/c/336578/3 [19:45:21] o.O [19:45:36] not sure why that would cause it [19:46:45] Yippee, build fixed! [19:46:45] Project beta-scap-eqiad build #141748: 09FIXED in 1 min 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/141748/ [19:57:01] 10Browser-Tests-Infrastructure, 06Reading-Web-Backlog, 07Jenkins, 07Ruby, 15User-zeljkofilipin: MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job - https://phabricator.wikimedia.org/T144912#3018080 (10Jdlrobson) I'm a little confused. What's the state of this bug? [20:04:59] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [20:19:01] 10Continuous-Integration-Infrastructure, 07Zuul: zuul-merger fails when repository names overlaps - https://phabricator.wikimedia.org/T157818#3018168 (10hashar) [20:19:19] 10Continuous-Integration-Infrastructure, 07Zuul: zuul-merger fails when repository names overlaps - https://phabricator.wikimedia.org/T157818#3017592 (10hashar) Good catch. That is a copy paste from the old setup, we now use `gerrit.wikimedia.org` [20:37:40] 10Continuous-Integration-Config, 10MediaWiki-extensions-NavigationTiming, 06Performance-Team: Create QUnit tests for NavigationTiming extension - https://phabricator.wikimedia.org/T157835#3018186 (10Krinkle) [20:38:09] (03PS1) 10Krinkle: Enable QUnit job for NavigationTiming extension repo [integration/config] - 10https://gerrit.wikimedia.org/r/337079 (https://phabricator.wikimedia.org/T157835) [21:23:08] (03CR) 10Krinkle: [C: 032] Enable QUnit job for NavigationTiming extension repo [integration/config] - 10https://gerrit.wikimedia.org/r/337079 (https://phabricator.wikimedia.org/T157835) (owner: 10Krinkle) [21:24:30] (03Merged) 10jenkins-bot: Enable QUnit job for NavigationTiming extension repo [integration/config] - 10https://gerrit.wikimedia.org/r/337079 (https://phabricator.wikimedia.org/T157835) (owner: 10Krinkle) [21:25:05] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/337079 [21:25:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:45:22] 06Release-Engineering-Team, 10Elasticsearch, 10Phabricator (Search): Phabricator: Support elasticsearch 5.x - https://phabricator.wikimedia.org/T155299#3018405 (10mmodell) [21:45:34] 06Release-Engineering-Team, 10Elasticsearch, 10Phabricator (Search): Phabricator: Support elasticsearch 5.x - https://phabricator.wikimedia.org/T155299#2939806 (10mmodell) [21:45:45] 06Release-Engineering-Team, 10Elasticsearch, 10Phabricator (Search): Phabricator: Support elasticsearch 5.x - https://phabricator.wikimedia.org/T155299#2939806 (10mmodell) [21:46:25] 06Release-Engineering-Team, 10Elasticsearch, 10Phabricator (Search): Phabricator: Support elasticsearch 5.x - https://phabricator.wikimedia.org/T155299#2939806 (10mmodell) @EBernhardson We should be ready to go after merging the attached differential revisions. [21:53:47] 06Release-Engineering-Team, 10Elasticsearch, 10Phabricator (Search): Phabricator: Support elasticsearch 5.x - https://phabricator.wikimedia.org/T155299#3018415 (10Paladox) [22:07:06] 10Continuous-Integration-Infrastructure, 07Zuul: zuul-merger fails when repository names overlaps - https://phabricator.wikimedia.org/T157818#3018427 (10hashar) Did a basic attempt at https://review.openstack.org/#/c/432477/ . But really I dont think we can easily mimic `git clone` :/ [22:11:01] !log deployed ores:a15ec90 [22:11:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:12:54] 10Beta-Cluster-Infrastructure, 10MediaWiki-User-login-and-signup: Can't log in to Beta Cluster wikis - https://phabricator.wikimedia.org/T157650#3018441 (10kaldari) 05Open>03Resolved a:03kaldari Seems to be working now. [22:21:30] Anyone familiar with Wikimedia Maps? (asking in a few channels, so mind the slight repetition here) [22:22:03] 06Release-Engineering-Team, 10Elasticsearch, 10Phabricator (Search): Phabricator: Support elasticsearch 5.x - https://phabricator.wikimedia.org/T155299#3018458 (10Paladox) [22:23:26] (03CR) 10Hashar: "One problem is that the upstream repository can change at anytime and might cause the job to fail randomly. But maybe it is not much of an" [integration/config] - 10https://gerrit.wikimedia.org/r/336960 (owner: 10Ejegg) [22:24:43] thcipriani: werent you assisting ejegg for JJB related duties? [22:25:05] I think you helped a bit set up some fundraising related jobs [22:25:21] I did poke at some fundraising things once upon a time :) [22:26:19] (03CR) 10Hashar: [C: 031] "Let me know if you need assistance to refresh the job in Jenkins though https://www.mediawiki.org/wiki/CI/JJB should cover it. Else poke #" [integration/config] - 10https://gerrit.wikimedia.org/r/336960 (owner: 10Ejegg) [22:26:31] thcipriani: :]]] [22:26:33] we will see [22:32:28] thanks hashar and thcipriani, I'll try doing a test run! [22:32:45] ejegg: do you know how to refresh the job ? [22:33:30] I did it once, but I definitely need to refresh my memory with that page [22:34:42] ejegg: I am heading to bed myself but pretty sure Tyler could assist [22:34:54] k, thanks hashar [22:34:56] else I can deploy it over the week-end, just reply on the task if you want me to do it tomorrow :] [22:35:05] and we can follow up on monday [22:35:07] heh, no need to rush! [22:35:26] We'd been ignoring that repo for months [22:35:32] :p [22:35:52] I can update it :) [22:36:22] thcipriani: that would be great! [22:37:20] * thcipriani pulls up notes [22:37:22] :) [22:37:43] (03CR) 1020after4: [C: 04-1] "This is out of date, I need to sync it up with the current phab ci job configs." [integration/config] - 10https://gerrit.wikimedia.org/r/295396 (https://phabricator.wikimedia.org/T130950) (owner: 1020after4) [22:38:28] * hashar heads to bed [22:41:57] 10Continuous-Integration-Config, 10MediaWiki-extensions-NavigationTiming, 06Performance-Team, 13Patch-For-Review: Create QUnit tests for NavigationTiming extension - https://phabricator.wikimedia.org/T157835#3018512 (10Krinkle) p:05Triage>03Normal [22:43:30] ejegg: just checking output of jenkins job builder locally now [22:44:08] cool [22:44:17] ejegg: this is just wikimedia-fundraising-civicrm correct? [22:45:17] yep! [22:45:38] cool beans, deploying! [22:46:05] right on! [22:46:32] ejegg: should be up-to-date now. If everything looks good to you, lemme know and I'll +2 and merge! [22:46:44] k, I'll try a build [22:46:58] okie doke [22:48:10] 10Beta-Cluster-Infrastructure, 10MediaWiki-User-login-and-signup: Can't log in to Beta Cluster wikis - https://phabricator.wikimedia.org/T157650#3011927 (10greg) Oops, sorry, I saw this, had the tab open, meant to merge with the other T157636. [22:51:52] drat, failed, checking why [22:53:53] thcipriani: oops, I need to clear out that dir on the integration box, one sec [22:54:07] alrighty [22:54:52] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018553 (10greg) [22:56:06] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018555 (10JustBerry) p:05Normal>03Unbreak! @greg Important enough. [22:56:14] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018558 (10Paladox) p:05Unbreak!>03High Happened again. This time it is different as we have switched gc off. But cpu looks high. Happened every 2 minutes in the last 20... [22:56:30] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018560 (10Paladox) p:05High>03Unbreak! [22:57:51] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018561 (10JustBerry) @Andrew @daniel mentioned similar issues over IRC not long ago. [22:57:57] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018564 (10Paladox) [23:03:11] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018571 (10JustBerry) Task seems important in that it is affecting users' (such as andrew's) abilities to upload critical patches, such as patches relevant to the issues highl... [23:04:40] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018573 (10JustBerry) ``` 18:03 DanielK_WMDE: greg-g: it has been slow for at least half an hour, seems to be getting worse ... 18:03 greg-g: DanielK_WMDE: yep 18:03 mafk: yup... [23:05:11] well phooey: 'ERROR: Failed to find required PHP extension "mcrypt".' [23:05:24] thcipriani: do you mind rolling that job back? [23:05:34] hrm [23:05:41] yeah, I can rollback, one sec [23:06:12] thanks. I'll -2 my patch till we can figure out what to do [23:06:38] * thcipriani nods [23:07:16] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#2731899 (10JustBerry) ``` 18:06 icinga-wm: PROBLEM - Unmerged changes on repository puppet on labcontrol1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. 18:06 ic... [23:07:23] ejegg: should be back to normal. [23:07:25] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018577 (10daniel) Some observations: pushing takes about a minute, but gets through eventually. Auto-compelte for reviewers is broken in the UI (times out, i guess). Everyth... [23:08:29] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018579 (10Paladox) also this just popped up PROBLEM - Check whether ferm is active by checking the default input chain on cobalt is CRITICAL: ERROR ferm input dr... [23:11:09] thanks again thcipriani ! [23:11:32] ejegg: np :) [23:12:08] Project beta-code-update-eqiad build #142667: 04FAILURE in 9 min 7 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/142667/ [23:12:50] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018587 (10JustBerry) Quick update, if I may: ``` 18:10 icinga-wm: RECOVERY - Unmerged changes on repository puppet on rhodium is OK: No changes to merge. 18:10 icinga-wm: RE... [23:13:32] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018588 (10demon) Yes, that's a cascading issue. We routinely get puppet failures when Gerrit/Git is down [23:13:43] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slowdowns - https://phabricator.wikimedia.org/T148478#3018589 (10Paladox) Those most likely are using gerrit to clone repo's. Which means if gerrit goes down then puppet fails on those hosts as they will be unable to clone. [23:14:27] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3018590 (10Paladox) [23:15:08] addshore: ... [23:16:04] Project beta-code-update-eqiad build #142668: 04STILL FAILING in 3 min 4 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/142668/ [23:19:37] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3018597 (10JustBerry) [23:20:06] (03CR) 10Ejegg: [C: 04-1] "Upstream seems to need mcrypt extension, civicrm build fails" [integration/config] - 10https://gerrit.wikimedia.org/r/336960 (owner: 10Ejegg) [23:24:34] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3018599 (10JustBerry) + `2017-02-10 23:12 RainbowSprinkles: gerrit: restarting service` to https://wikitech.wikimedia.org/wiki/Server_Admin_Log. After the restart, a handful... [23:24:50] Yippee, build fixed! [23:24:50] Project beta-code-update-eqiad build #142669: 09FIXED in 1 min 49 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/142669/ [23:29:43] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3018602 (10Paladox) p:05Unbreak!>03High As it is fixed now we can lower it to high. [23:42:56] PROBLEM - Puppet run on repository is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [23:45:19] 10Gerrit, 06Operations, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3018631 (10Dzahn) https://gerrit.wikimedia.org/r/#/c/337193/ https://gerrit.wikimedia.org/r/#/c/337193/1/modules/gerrit/templates/gerrit.config.erb