[05:39:51] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:04:49] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:57:19] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [07:29:03] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324#2494353 (10Gehel) We already have a number of solutions to ship logs from log4j to logstash. A number of appl... [07:46:18] (03CR) 10Hashar: [C: 032] Archive Extension:Ads [integration/config] - 10https://gerrit.wikimedia.org/r/377071 (https://phabricator.wikimedia.org/T175495) (owner: 10MarcoAurelio) [07:47:15] (03Merged) 10jenkins-bot: Archive Extension:Ads [integration/config] - 10https://gerrit.wikimedia.org/r/377071 (https://phabricator.wikimedia.org/T175495) (owner: 10MarcoAurelio) [08:54:51] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [09:19:14] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324#3595636 (10Paladox) @gehel oh, I wasn’t aware there was gelf4j. I’m wondering could you help with that please... [09:24:32] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:31:39] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324#3595677 (10Gehel) As hostname, you want to use the LVS service name (logstash.svc.eqiad.wmnet) and the 12201/... [09:54:12] Do https://gerrit.wikimedia.org/r/#/admin/groups/uuid-3c0d83771371fb076d55d709f918d87625f45a60 give you error? [09:56:46] hashar: can you please reactivate extension-Ads so I can get one patch merged? [09:57:22] tabbycat: sure [09:57:31] thanks [09:57:40] well, I cannot merge it, but maybe you can? [09:57:59] https://gerrit.wikimedia.org/r/#/c/377075/ [09:58:37] tabbycat: done ! [09:58:56] hashar: once that is merged and submitted, you can read-only the repo again :) [09:58:58] thanks [10:00:00] tabbycat: i did :) [10:00:03] Hmph, rebasing https://gerrit.wikimedia.org/r/#/c/377063/ gives "internal server error" [10:01:08] tabbycat: try to rebase it manually if you can ? [10:01:49] hashar: I'll try something, because I don't want to clone the whole rMEXT again [10:02:05] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<100.00%) [10:04:32] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:05:01] hashar: new patch set https://gerrit.wikimedia.org/r/#/c/377225/ [10:05:05] different change-id, etc [10:13:49] (03PS5) 10MarcoAurelio: Whitelist Dvorapa on Zuul CI [integration/config] - 10https://gerrit.wikimedia.org/r/375765 [10:15:08] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324#3595852 (10Paladox) Thanks. Side note i filled upstream https://bugs.chromium.org/p/gerrit/issues/detail?id=... [11:11:35] tabbycat: done :) [11:11:49] Merci bien Monsieur [11:12:33] not sure if Reedy can take care of https://github.com/wikimedia/mediawiki-extensions-Ads deletion, or you [11:32:11] hashar: how can I update cxserver in deployment-sca02? It seems point to old revision, but I'm sure that we have not update it manually (auto deploys with cxserver/deploy patch merge) [11:36:02] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [11:44:55] 10Gerrit: Make extension-CheckUser and extension-SecurePoll groups on gerrit visible to all - https://phabricator.wikimedia.org/T175547#3596138 (10Huji) p:05Triage>03Normal [11:45:11] 10Gerrit: Make extension-CheckUser and extension-SecurePoll groups on gerrit visible to all - https://phabricator.wikimedia.org/T175547#3596126 (10Huji) a:03Legoktm [11:49:41] kart_: I would guess it is done manually from deployement-tin using scap ? [12:10:59] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [12:19:37] 10Gerrit: Make extension-CheckUser and extension-SecurePoll groups on gerrit visible to all - https://phabricator.wikimedia.org/T175547#3596274 (10Huji) a:05Legoktm>03Huji [12:19:55] 10Gerrit: Make extension-CheckUser and extension-SecurePoll groups on gerrit visible to all - https://phabricator.wikimedia.org/T175547#3596126 (10Huji) 05Open>03Resolved Turns out I could do it myself! Done. [12:20:08] (03PS3) 10Addshore: dockerfiles: export LOG_DIR [integration/config] - 10https://gerrit.wikimedia.org/r/375760 (owner: 10Hashar) [12:22:45] Project selenium-GettingStarted » firefox,beta,Linux,BrowserTests build #522: 04FAILURE in 44 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/522/ [12:27:17] PROBLEM - Free space - all mounts on deployment-tin is CRITICAL: CRITICAL: deployment-prep.deployment-tin.diskspace._mnt.byte_percentfree (No valid datapoints found)deployment-prep.deployment-tin.diskspace._srv.byte_percentfree (<11.11%) [12:40:07] (03PS1) 10Hashar: dib: add php5.5-luasandbox [integration/config] - 10https://gerrit.wikimedia.org/r/377245 (https://phabricator.wikimedia.org/T161882) [12:40:15] hashar:can i get a quick opinion? How do you feel about CI docker image tags having names such as "2017.8.3.14.40" ? [12:40:39] is there a slightly different format or resolution of date / time you would use? [12:42:57] maybe it should have a v prefix? "v2017.8.3.14.40" .... [12:43:09] (03CR) 10Hashar: [C: 032] dib: add php5.5-luasandbox [integration/config] - 10https://gerrit.wikimedia.org/r/377245 (https://phabricator.wikimedia.org/T161882) (owner: 10Hashar) [12:44:44] hashar: I guess, we need to update: https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated :) [12:45:03] hashar: since deployment-cxserver03 also doesn't exists. [12:45:27] kart_: that is probably defined in scap/scap.cfg nowadays ? [12:46:08] (03Merged) 10jenkins-bot: dib: add php5.5-luasandbox [integration/config] - 10https://gerrit.wikimedia.org/r/377245 (https://phabricator.wikimedia.org/T161882) (owner: 10Hashar) [12:47:18] !log Nodepool: refreshing jessie snapshot to get php5.5-luasandbox installed [12:47:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:48:56] hashar: OK. reading scap docs. [12:49:14] kart_: well apparently it is not migrated to scap. So I have no idea how it is deployed on beta :( [12:49:26] nor do I know on which instance it is running [12:49:40] hashar: deployment-sca02 is for cxserver [12:50:18] hashar: AFAIK, beta-cxserver-update-eqiad job was updating cxserver, which doesn't exists. [12:50:38] hashar: Possible to bring it back? [12:55:31] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:03:07] (03PS1) 10Addshore: Tag images when building with a date stamp [integration/config] - 10https://gerrit.wikimedia.org/r/377249 [13:15:55] 10Beta-Cluster-Infrastructure: Steward rights on the beta cluster - https://phabricator.wikimedia.org/T175555#3596421 (10Sau226) [13:18:25] 10Beta-Cluster-Infrastructure: Steward rights on the beta cluster - https://phabricator.wikimedia.org/T175555#3596435 (10Sau226) [13:30:30] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [13:45:18] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #519: 04FAILURE in 1 min 17 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/519/ [14:10:51] (03CR) 10Addshore: [V: 031 C: 04-1] "So I looked into this a bit:" [integration/config] - 10https://gerrit.wikimedia.org/r/374507 (owner: 10Addshore) [14:12:40] (03CR) 10Addshore: [C: 032] "Already pushed to dockerhub as docker.io/wmfreleng/operations-puppet:v2017.09.11.15.14" [integration/config] - 10https://gerrit.wikimedia.org/r/375760 (owner: 10Hashar) [14:15:09] (03PS2) 10Addshore: Tag images when building with a date stamp [integration/config] - 10https://gerrit.wikimedia.org/r/377249 [14:28:56] 10Release-Engineering-Team (Watching / External), 10Operations, 10ops-eqdfw: setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#3596894 (10Cmjohnson) Swapped the backplane with another from a similar server w/same specs and the error came back. Swapped the cables... [14:29:26] (03CR) 10Addshore: [V: 031 C: 04-1] "Although it looks like you can also fetch using --depth 1:" [integration/config] - 10https://gerrit.wikimedia.org/r/374507 (owner: 10Addshore) [14:30:10] (03Merged) 10jenkins-bot: dockerfiles: export LOG_DIR [integration/config] - 10https://gerrit.wikimedia.org/r/375760 (owner: 10Hashar) [14:32:34] (03PS4) 10Addshore: dockerfiles: puppet, just do a fetch, not a pull & fetch [integration/config] - 10https://gerrit.wikimedia.org/r/374507 [14:33:37] Project selenium-WikiLove » firefox,beta,Linux,BrowserTests build #513: 04FAILURE in 1 min 37 sec: https://integration.wikimedia.org/ci/job/selenium-WikiLove/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/513/ [14:33:57] 10Beta-Cluster-Infrastructure: Steward rights on the beta cluster - https://phabricator.wikimedia.org/T175555#3596898 (10Sau226) a:03AlvaroMolina Active trusted user on IRC. Closest active user who may be able to help me out here [14:35:09] RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is OK: OK: Less than 100.00% above the threshold [0.0] [14:40:20] oh my, the chown is horrible [14:47:16] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [15:12:17] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [15:12:59] 10Beta-Cluster-Infrastructure: Steward rights on the beta cluster - https://phabricator.wikimedia.org/T175555#3597130 (10AlvaroMolina) 05Open>03Resolved Done, although the user does not have much trajectory in Wikimedia, I have seen its work in other external wikis and I do not see problem by now. Please use... [15:14:23] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [15:25:31] 10Beta-Cluster-Infrastructure: Deployment wiki is flooded by spam and should be cleaned up, perhaps even restricted more - https://phabricator.wikimedia.org/T175197#3597213 (10AlvaroMolina) 05Open>03Resolved a:03AlvaroMolina @Mainframe98: I've given you the administrator permission on the Beta Cluster so t... [15:49:21] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [16:07:48] (03PS1) 10Addshore: docker: make puppet example-run.sh executable [integration/config] - 10https://gerrit.wikimedia.org/r/377294 [16:09:03] (03PS2) 10Addshore: docker: add shebang to puppet example-run.sh [integration/config] - 10https://gerrit.wikimedia.org/r/377294 [16:09:14] (03CR) 10Addshore: [C: 032] docker: add shebang to puppet example-run.sh [integration/config] - 10https://gerrit.wikimedia.org/r/377294 (owner: 10Addshore) [16:10:20] (03PS2) 10Addshore: docker: use tox --notest when populating cache [integration/config] - 10https://gerrit.wikimedia.org/r/369605 (owner: 10Hashar) [16:11:46] (03PS5) 10Addshore: dockerfiles: puppet, shallow fetches [integration/config] - 10https://gerrit.wikimedia.org/r/374507 [16:12:32] (03Merged) 10jenkins-bot: docker: add shebang to puppet example-run.sh [integration/config] - 10https://gerrit.wikimedia.org/r/377294 (owner: 10Addshore) [16:17:39] (03PS3) 10Addshore: Tag images when building with a date stamp [integration/config] - 10https://gerrit.wikimedia.org/r/377249 [16:18:24] (03PS6) 10Addshore: docker: puppet, shallow fetches & no second clone [integration/config] - 10https://gerrit.wikimedia.org/r/374507 [16:18:25] (03PS4) 10Addshore: docker: Tag images when building with a date stamp [integration/config] - 10https://gerrit.wikimedia.org/r/377249 [16:20:13] 16:18:44 mv: cannot stat ‘/tmp/cache/puppet/.tox/log/*’: No such file or directory [16:20:24] getting this as a CI error on my reverting a patchset by someone else... [16:20:35] oh, nm, my bad [16:25:37] 10Continuous-Integration-Infrastructure (phase-out-trusty), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Package php modules for Zend 5.5 on Jessie - https://phabricator.wikimedia.org/T174972#3597469 (10hashar) [16:26:01] 10Beta-Cluster-Infrastructure, 10Multimedia, 10Thumbor: On beta commons, thumbnailing of 3D files is broken still - https://phabricator.wikimedia.org/T170444#3597474 (10MarkTraceur) p:05Normal>03High [16:27:37] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10Developer-Relations (Jul-Sep 2017), 10User-zeljkofilipin: WebdriverIO tech talk - https://phabricator.wikimedia.org/T171852#3597501 (10zeljkofilipin) 05stalled>03Open [16:29:13] robh: I think that one always appears [16:29:21] yeah i fixed my issue [16:29:34] i reverted a ps in gerrit and gerrit introduced a line wrap error for the commit message [16:29:47] gerrit wasnt smart enough to wrap the line and caused it to fail a check. [16:30:10] not sure if the gerrit revert commit message hook can be made to wrap properly [16:40:37] 10Release-Engineering-Team (Watching / External), 10CirrusSearch, 10Discovery, 10Discovery-Search, and 2 others: Figure out why browser tests can't create suggestion box - https://phabricator.wikimedia.org/T162966#3597589 (10greg) [16:54:53] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:06:57] 10Gerrit: reverting patchsets results in gerrit not wrapping lines in commit message - https://phabricator.wikimedia.org/T175607#3597685 (10RobH) [17:07:40] 10Gerrit: reverting patchsets results in gerrit not wrapping lines in commit message - https://phabricator.wikimedia.org/T175607#3597698 (10RobH) I found this error when reverting that patchset, and @paladox helpfully advised I should file a task about it for potential upstream filing fix in #gerrit for review. [17:12:37] 10Gerrit, 10Upstream: reverting patchsets results in gerrit not wrapping lines in commit message - https://phabricator.wikimedia.org/T175607#3597710 (10Paladox) Filled upstream at https://bugs.chromium.org/p/gerrit/issues/detail?id=7190 I will see what needs doing and see how easy it is with js code. [17:18:03] (03CR) 10Addshore: [V: 04-1] docker: puppet, shallow fetches & no second clone [integration/config] - 10https://gerrit.wikimedia.org/r/374507 (owner: 10Addshore) [17:18:10] someone might know ... in wikimedia-config we have 'labswiki' and 'wikitech'. I'm 95% sure labswiki is the correct one, so what is wikitech? a vestige? [17:18:44] s/wikimedia-config/mediawiki-config/ [17:19:36] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Install Blubber on contint1001 - https://phabricator.wikimedia.org/T175296#3597729 (10dduvall) p:05Triage>03Normal a:03dduvall [17:20:10] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Provision Docker >= 17.05 on contint1001 - https://phabricator.wikimedia.org/T175293#3597733 (10thcipriani) a:03thcipriani [17:20:29] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Provision Docker >= 17.05 on contint1001 - https://phabricator.wikimedia.org/T175293#3589181 (10thcipriani) p:05Triage>03Normal [17:22:06] ebernhardson: labswiki = wikitech.wikimedia.org. However, there's a testlabswiki which is the "test wiki" for wikitech, so 'wikitech' is a dblist that refers to both [17:22:11] ohia legoktm [17:22:26] hey addshore [17:22:48] legoktm: https://phabricator.wikimedia.org/T175260 thoughts? [17:23:11] ugh [17:23:41] my thought is that EditPage sucks [17:24:47] (03PS7) 10Addshore: docker: puppet, shallow fetches & no second clone [integration/config] - 10https://gerrit.wikimedia.org/r/374507 [17:24:50] (03PS5) 10Addshore: docker: Tag images when building with a date stamp [integration/config] - 10https://gerrit.wikimedia.org/r/377249 [17:28:23] (03CR) 10Addshore: [V: 031] docker: puppet, shallow fetches & no second clone [integration/config] - 10https://gerrit.wikimedia.org/r/374507 (owner: 10Addshore) [17:29:25] addshore: commented [17:30:09] not a clean revert, cba to rebase now :D [17:30:13] * addshore will do it wednesday [17:31:47] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Package Blubber - https://phabricator.wikimedia.org/T175609#3597769 (10dduvall) [17:32:31] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Package Blubber - https://phabricator.wikimedia.org/T175609#3597793 (10dduvall) p:05Triage>03Normal a:03dduvall @Joe, any experience with this? [17:34:55] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [17:40:15] (03CR) 10Addshore: [V: 031] "Pushed to docker hub as:" [integration/config] - 10https://gerrit.wikimedia.org/r/374507 (owner: 10Addshore) [17:45:28] thcipriani: just nearly halved the size of the puppet image and should shouldn't notice any extra runtime really :) In-fact it should run faster! [18:00:29] addshore: daaaaamn. Nice :) [18:00:38] Made a bit of a chain [18:11:06] (03PS1) 10Addshore: docker: build.sh allow specifying a single image to build [integration/config] - 10https://gerrit.wikimedia.org/r/377314 [18:11:21] (03CR) 10Addshore: [C: 031] docker: use tox --notest when populating cache [integration/config] - 10https://gerrit.wikimedia.org/r/369605 (owner: 10Hashar) [18:15:40] (03PS6) 10Addshore: WIP Docker: contint-mediawiki-extensions-phan [integration/config] - 10https://gerrit.wikimedia.org/r/371708 [18:15:43] (03CR) 10Addshore: "PS6 is just a rebase" [integration/config] - 10https://gerrit.wikimedia.org/r/371708 (owner: 10Addshore) [18:16:32] 10Beta-Cluster-Infrastructure: Deployment wiki is flooded by spam and should be cleaned up, perhaps even restricted more - https://phabricator.wikimedia.org/T175197#3598070 (10Mainframe98) @AlvaroMolina thank you! I will add some of the restricted abuse filters when I have the time. [18:17:46] (03CR) 10Legoktm: [C: 032] Fix @returns and @throw in function docs [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/377028 (owner: 10Umherirrender) [18:23:15] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:28:46] (03Merged) 10jenkins-bot: Fix @returns and @throw in function docs [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/377028 (owner: 10Umherirrender) [18:30:05] thcipriani: whats the reason for using the jenkins user and matching the UIDs and GIDs? [18:30:47] thcipriani: how often does nodepool gracefully refresh config? [18:31:19] chasemp: unsure [18:31:24] * thcipriani looks for docs [18:31:40] I know a restart is invasive so I did a soft change fyi [18:32:21] (03PS1) 10Addshore: docker: ops-puppet use git cache for cache-buster instead of time [integration/config] - 10https://gerrit.wikimedia.org/r/377320 [18:32:44] (03PS2) 10Addshore: docker: ops-puppet use git cache for cache-buster instead of time [integration/config] - 10https://gerrit.wikimedia.org/r/377320 [18:32:53] thcipriani: ^^ wheee [18:37:09] 10Beta-Cluster-Infrastructure: Deployment wiki is flooded by spam and should be cleaned up, perhaps even restricted more - https://phabricator.wikimedia.org/T175197#3598194 (10Mainframe98) I'm not able to create restricted abuse filters, the required right (`abusefilter-modify-restricted`) is not available to us... [18:44:48] 10Beta-Cluster-Infrastructure: Deployment wiki is flooded by spam and should be cleaned up, perhaps even restricted more - https://phabricator.wikimedia.org/T175197#3598221 (10AlvaroMolina) @Mainframe98: I've already given you the global permission that allows you to manage restricted abuse filters. Regards. [18:50:22] 10Beta-Cluster-Infrastructure: Deployment wiki is flooded by spam and should be cleaned up, perhaps even restricted more - https://phabricator.wikimedia.org/T175197#3598238 (10Mainframe98) Thank you! I managed to successfully import a filter. [18:55:53] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [19:03:14] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:08:13] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Install Blubber on contint1001 - https://phabricator.wikimedia.org/T175296#3598294 (10dduvall) [19:08:15] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Package Blubber - https://phabricator.wikimedia.org/T175609#3598293 (10dduvall) [19:10:09] (03PS7) 10Addshore: docker: mediawiki-extensions-phan image [integration/config] - 10https://gerrit.wikimedia.org/r/371708 [19:15:43] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Cloud-VPS, 10Nodepool, and 2 others: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3598327 (10hashar) From this week-end logs, nova.network.manager had the... [19:16:28] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.30.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T170635#3598345 (10Etonkovidova) [19:21:59] (03CR) 10Addshore: [V: 031] "Pushed to dockerhub as v2017.09.11.19.08" [integration/config] - 10https://gerrit.wikimedia.org/r/371708 (owner: 10Addshore) [19:22:22] legoktm, what was the python job you thought I should jab at? [19:22:55] https://integration.wikimedia.org/ci/job/tox-jessie/ ? [19:24:27] oh... looks like beta-mediawiki-config-update-eqiad is stuck in a deadlock again? as of 5-6 hours ago. [19:24:36] 10Continuous-Integration-Config, 10Gerrit, 10Cleanup, 10Diffusion, 10GitHub-Mirrors: Tool to archive extensions (and do related stuff)? - https://phabricator.wikimedia.org/T175499#3598362 (10demon) [19:24:56] 10Continuous-Integration-Config, 10Gerrit, 10Cleanup, 10Diffusion, 10GitHub-Mirrors: Tool to archive extensions (and do related stuff)? - https://phabricator.wikimedia.org/T175499#3595061 (10demon) Basically I want to automate as much of the current checklist as possible. [19:26:26] 10Continuous-Integration-Config, 10Gerrit, 10Cleanup, 10Diffusion, 10GitHub-Mirrors: Tool to archive extensions (and do related stuff)? - https://phabricator.wikimedia.org/T175499#3598364 (10demon) Doesn't even have to be a bot/automated -- could run as ourselves so we don't need a new bot account or any... [19:32:42] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Package Blubber - https://phabricator.wikimedia.org/T175609#3598392 (10dduvall) Looks like `dh-make-golang` is yet another golang tool that depends heavily on GitHub hosting. ``` $ dh-make-golang -allow_unknown_hoster phabricator.wikimedia.org... [19:35:55] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [19:50:23] thcipriani: So, as always I could love some review on the docker stuff, but no rush as I'm not actually going to be working tommorrow anyway! [19:51:17] addshore: okie doke, will try to take a look, but I won't have time until tomorrow afternoon. [19:51:26] thats perfect :) [19:51:30] the job we talked about on saturday was tox-jessie (looking at scrollback) [19:51:55] great, apparently my scrollback is broken! I'll have a little look at tox-jessie this evening [20:11:31] (03PS2) 10Umherirrender: Skip function comments with @deprecated [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/377025 [20:12:19] (03CR) 10Umherirrender: "Patch Set 2: Rebased" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/377025 (owner: 10Umherirrender) [20:19:48] PROBLEM - Puppet errors on deployment-salt02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:22:27] PROBLEM - Puppet errors on deployment-zookeeper02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:24:28] (03CR) 10Jforrester: [C: 032] Skip function comments with @deprecated [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/377025 (owner: 10Umherirrender) [20:25:10] (03Merged) 10jenkins-bot: Skip function comments with @deprecated [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/377025 (owner: 10Umherirrender) [20:29:42] (03PS1) 10Addshore: WIP (I dont know tox) tox Dockefile [integration/config] - 10https://gerrit.wikimedia.org/r/377337 [20:38:17] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:41:50] Yippee, build fixed! [20:41:50] Project selenium-Echo » chrome,beta,Linux,BrowserTests build #514: 09FIXED in 49 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/514/ [20:41:51] Yippee, build fixed! [20:41:51] Project selenium-Echo » firefox,beta,Linux,BrowserTests build #514: 09FIXED in 49 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/514/ [20:56:16] (03CR) 10Hashar: "o!!!!o!!!" (034 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/377337 (owner: 10Addshore) [20:57:26] RECOVERY - Puppet errors on deployment-zookeeper02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:59:48] RECOVERY - Puppet errors on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:01:55] (03CR) 10Addshore: WIP (I dont know tox) tox Dockefile (034 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/377337 (owner: 10Addshore) [21:02:47] hasharAway: thanks for the review, I literally have no idea how tox works ;) [21:04:37] addshore: it is magic :] [21:04:44] coolio [21:04:53] addshore: I can give you a brief overview tomorrow [21:05:02] I wont be around much tommorrow, we have an offsite [21:05:08] lucky one! [21:05:26] hasharAway: for my future docker playing, what do you think is the job that runs the most on nodepool? [21:05:40] / takes up the most nodes throughout a typical day [21:05:52] but in short python has an utility named virtualenv, that let you create a sandbox python installation in which you cn then install whatever python modules you want [21:06:03] tox is a way to maintain several of those sandboxes [21:06:08] and run them all serially [21:07:01] addshore: https://grafana.wikimedia.org/dashboard/db/zuul-job?orgId=1 should more or less give an idea [21:07:06] at top pick a job [21:07:25] and you can probably build a dashboard to find out the top offenders [21:09:14] zuul.pipeline.*.job.*.*.count ! [21:09:26] addshore: do you have write access to grafana? [21:13:18] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [21:15:36] addshore: https://grafana.wikimedia.org/dashboard/db/zuul-job?panelId=7&fullscreen&orgId=1 [21:16:38] hasharAway: I do have write access [21:16:52] hasharAway: I might make a draft version of mwgate-npm-node-6-jessie at some point too then [21:17:14] mwgate-php55lint also seems trivial [21:17:32] and the mwconfig typos and lint [21:17:41] mwgate-php55lint is probably the most trivial of all of them [21:17:47] hasharAway: indeed [21:17:57] time for some sleeping now though ;)_ [21:18:06] and we would need a base image to build on top of :] [21:18:09] yeah about time [21:18:10] !!! [21:18:15] have a good offsite! [21:18:38] have a good tuesday! [22:11:34] addshore: yep, tox-jessie :) [22:14:30] (03CR) 10Legoktm: WIP (I dont know tox) tox Dockefile (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/377337 (owner: 10Addshore) [22:20:55] (03CR) 10Legoktm: WIP (I dont know tox) tox Dockefile (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/377337 (owner: 10Addshore) [22:36:13] (03PS1) 10Legoktm: Add configuration for mediawiki/tools/minus-x [integration/config] - 10https://gerrit.wikimedia.org/r/377361 [22:37:34] 10Release-Engineering-Team (Kanban), 10User-greg: 201718Q2 RelEng related program goals - https://phabricator.wikimedia.org/T174835#3598833 (10greg) Draft done, waiting on feedback. [22:37:45] (03CR) 10Legoktm: [C: 032] Add configuration for mediawiki/tools/minus-x [integration/config] - 10https://gerrit.wikimedia.org/r/377361 (owner: 10Legoktm) [22:38:42] (03Merged) 10jenkins-bot: Add configuration for mediawiki/tools/minus-x [integration/config] - 10https://gerrit.wikimedia.org/r/377361 (owner: 10Legoktm) [22:38:54] !log deploying https://gerrit.wikimedia.org/r/377361 [22:38:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:21:48] zuul/CI is stuck [23:21:59] thcipriani: around? [23:22:08] yeah, I just noticed [23:22:17] https://grafana.wikimedia.org/dashboard/db/nodepool?orgId=1 says nodepool is happy [23:22:19] it looks like all the slaves are stuck [23:22:35] https://integration.wikimedia.org/ci/computer/ [23:22:36] or offline [23:22:37] legoktm: https://integration.wikimedia.org/ci/ [23:25:23] hrm, I can hit one of the permanent slaves from the cli: nc -vz 10.68.23.143 -w 1 22 => open [23:25:30] but jenkins isn't happy with it [23:27:04] > Caused: java.io.IOException: Unexpected termination of the channel [23:27:34] !log restarting jenkins [23:27:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:27:58] was about to ask if we should do that. it used so much memory on contint1001 [23:28:01] and told me it ran out [23:28:04] restarting on the theory that since I can hit a permanent agent on the cli something inside the JVM is fiddled up [23:35:25] still starting up? [23:35:32] it is dead now [23:35:36] and starting [23:35:45] it wasnt actually ended before, afaict [23:36:20] but now it looks like it's coming back "Please wait while Jenkins is getting ready to work... [23:36:25] seems to be launching ssh agents ok [23:36:29] according to logs [23:36:44] working again :) [23:36:50] thanks all [23:36:53] fatal: The remote end hung up unexpectedly [23:36:55] lots of this though [23:37:00] from git-daemon [23:37:22] mutante: did you kill anything? Or did the java process eventually stop? [23:37:42] no, i just watched things [23:37:55] it looked like it wasnt ended [23:37:56] k, I did the safeRestart in the browser [23:38:00] then eventually it did finally die [23:38:06] aha [23:38:24] and then I was watching, and java was going nuts, eventually I ran a stop on the jenkins process [23:38:36] and a minute later or so it finally stopped [23:41:18] guess I should unstick the backlog of postmerge tasks :( [23:52:07] and postmerge queue is cleared [23:54:07] 10Continuous-Integration-Config, 10MinusX, 10Patch-For-Review: Reject non-executable files with execute bits with a build check - https://phabricator.wikimedia.org/T168659#3599023 (10Legoktm)