[05:59:16] 10Continuous-Integration-Infrastructure, 10BlueSpice: Autofixing commits on BlueSpiceEchoConnector - https://phabricator.wikimedia.org/T200519 (10Osnard) Thanks for the explanation. I've started to add `phpcs` and `phpcdf` scripts to other BlueSpice extensions. It will surely take some time to convert all, but... [05:59:44] 10Continuous-Integration-Infrastructure, 10BlueSpice: Autofixing commits on BlueSpiceEchoConnector - https://phabricator.wikimedia.org/T200519 (10Osnard) 05Open>03Resolved [07:38:16] !log deployment-maps04 updated tilerator and kartotherian node modules (T195513, T200594) [07:38:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [07:38:24] T200594: Add client identifier to requests sent from Kartotherian to WDQS - https://phabricator.wikimedia.org/T200594 [07:38:25] T195513: Load map file in non-strict mode - https://phabricator.wikimedia.org/T195513 [07:47:08] PROBLEM - Puppet errors on deployment-elastic06 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:02:01] PROBLEM - Puppet errors on deployment-elastic07 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:02:33] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:02:38] Can i be an admin on https://en.wikipedia.beta.wmflabs.org ? [08:12:35] PROBLEM - Puppet errors on deployment-elastic05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:21:36] bawolff_: you should probably create a phab task so this doesn't get forgotten [08:22:27] I was kind of hoping someone would happen to be around just now who could just do it, but I will since I got no response [08:23:19] bawolff: I have no clue if there is a process that should be followed :/ [08:23:37] This is beta after all :P [08:24:07] For context, we are going to have an external pentest, and I wanted to give the pentesters accounts on a staging environment [08:24:19] And to create an account with WMF in the name, i need admin rights [10:35:35] (03CR) 10Thiemo Kreuz (WMDE): "Lego, I tried to solve the confusion this particular sniff was causing for me in another way. Can you have another look?" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/438031 (owner: 10Thiemo Kreuz (WMDE)) [12:20:27] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [12:21:20] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [14:10:27] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [14:11:13] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [14:13:51] ba zeljkof you can jfdi in these cases (people with NDAs/on the security team) [14:14:53] greg-g: ok, I really didn't know what the process is [14:27:01] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [14:44:12] 10Release-Engineering-Team (Watching / External), 10DBA, 10Datasets-General-or-Unknown, 10Patch-For-Review, and 2 others: Automate the check and fix of object, schema and data drifts between mediawiki HEAD, production masters and slaves - https://phabricator.wikimedia.org/T104459 (10jcrespo) [14:52:17] 10Gerrit, 10Phabricator, 10Release-Engineering-Team (Someday), 10Technical-Debt: Replace deprecated phabricator conduit api calls in gerrit's its-phabricator plugin - https://phabricator.wikimedia.org/T159041 (10Paladox) p:05Low>03Normal I've done this https://gerrit-review.googlesource.com/c/plugins/i... [15:13:05] (03CR) 10Reedy: [C: 032] Stop branching EducationProgram [tools/release] - 10https://gerrit.wikimedia.org/r/462735 (https://phabricator.wikimedia.org/T125618) (owner: 10Jforrester) [15:13:43] (03Merged) 10jenkins-bot: Stop branching EducationProgram [tools/release] - 10https://gerrit.wikimedia.org/r/462735 (https://phabricator.wikimedia.org/T125618) (owner: 10Jforrester) [15:21:34] 10Release-Engineering-Team (Kanban), 10Education-Program-Dashboard, 10MediaWiki-extensions-EducationProgram, 10Epic, and 2 others: Deprecate and remove the EducationProgram extension from Wikimedia servers after June 30, 2018 - https://phabricator.wikimedia.org/T125618 (10Reedy) [15:27:02] 10Release-Engineering-Team (Kanban), 10Education-Program-Dashboard, 10MediaWiki-extensions-EducationProgram, 10Epic, 10User-greg: Deprecate and remove the EducationProgram extension from Wikimedia servers after June 30, 2018 - https://phabricator.wikimedia.org/T125618 (10Reedy) [16:08:27] PROBLEM - SSH on integration-slave-docker-1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:23:15] RECOVERY - SSH on integration-slave-docker-1021 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u7 (protocol 2.0) [16:40:29] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.26 deployment blockers - https://phabricator.wikimedia.org/T191072 (10greg) a:03mmodell [16:40:54] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T191070 (10greg) a:03dduvall [16:41:42] thcipriani: greg-g got a moment to chat? [16:43:52] sure [16:44:51] was hoping we three could noodle on T205563 [17:00:27] 10Beta-Cluster-Infrastructure, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, and 2 others: Prometheus resources in deployment-prep to create grafana graphs of EventLogging - https://phabricator.wikimedia.org/T204088 (10Jdlrobson) We're seeing [[ https://grafana-labs-admin.wikimedia.org/dashboard/db... [17:03:47] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [17:07:52] greg-g, so at some point deployment-prep is gonna need to be migrated to the new region [17:08:34] I'm wondering if we should try to get it done earlier or later in the process [17:08:35] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [17:09:31] heh, uhhhh [17:09:46] I have no real idea :) [17:11:38] PROBLEM - Host integration-slave-docker-1030 is DOWN: CRITICAL - Host Unreachable (10.68.21.92) [17:11:40] PROBLEM - Host integration-slave-docker-1031 is DOWN: CRITICAL - Host Unreachable (10.68.20.213) [17:18:21] Krenair: what's your thought? I haven't been following that whole thing at all [17:19:09] if we do it earlier then the process is likely better tested for the majority of the rest of labs [17:19:29] if we do it later then the process is likely fairly well battle-hardened when it gets to stuff we care about [17:23:12] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [17:25:16] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [17:26:01] PROBLEM - Host integration-slave-docker-1008 is DOWN: CRITICAL - Host Unreachable (10.68.17.85) [17:26:39] PROBLEM - Host integration-slave-docker-1007 is DOWN: CRITICAL - Host Unreachable (10.68.19.105) [17:53:48] !log deployment-mcs01 deployed [mobileapps/deploy@a0054ba]: Update mobileapps to 0d6c2b7 [17:53:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:13:44] Krenair: then wait :) [18:14:06] ok [18:46:28] 10Release-Engineering-Team (Watching / External), 10Scap, 10Operations, 10Datacenter-Switchover-2018, 10Wikimedia-Incident: Scap is checking canary servers in dormant instead of active-dc - https://phabricator.wikimedia.org/T204907 (10Krinkle) In addition to the servers Scap checks, there is also the url... [18:46:51] 10Release-Engineering-Team (Watching / External), 10Scap, 10Operations, 10Datacenter-Switchover-2018, 10Wikimedia-Incident: Scap is checking canary servers in dormant instead of active-dc - https://phabricator.wikimedia.org/T204907 (10Krinkle) (I see James reported that at T205559.) [19:54:52] Is there a way a RelEnger can make a sub-project of a project /without/ moving all the members and watchers to the new sub-project? (Or, at least, use their super-admin-powers to make all the people watchers in the original project again?) [20:00:54] well from gerrit 2.16 you will be able to get GerritBot to add projects for each repo (i think) :) [20:06:40] James_F: subprojects don't work that way afaik, you are wanting a milestone afaik [20:07:16] p858snake|_: No I don't. :-) [20:11:19] hi folks! our CiviCRM CI jobs have been failing with what looks like an out of memory error [20:11:42] And I don't think we've increased the RAM requeirementess significantly with any recent commit [20:11:50] is there a new, lower limit? [20:12:39] it's funny - the main test builds fail, but the gate-and-submit builds are OK [20:12:50] e.g. on https://gerrit.wikimedia.org/r/463169 [20:13:35] oh oops, that one seems to have a different error, sorry! [20:13:46] but... still confusing [20:14:33] thcipriani twentyafterfour im going to merge https://gerrit-review.googlesource.com/c/plugins/its-phabricator/+/197510 and backport to 2.15. I've tested it and it seems to work. [20:14:34] same test passes in gate+submit as failed in main pipeline [20:15:42] James_F: it's not really possible because super-projects can't have members [20:16:11] James_F: https://secure.phabricator.com/book/phabricator/article/projects/#parent-projects [20:16:14] Hmm. Can we move an existing project into a new super-project? [20:16:17] so when you make a sub-project you convert a project into a super-project and it can't have members anymore [20:17:36] twentyafterfour: Specifically, we want to have a Structured Data on Commons Engineering project, underneath the overall SDC project. Right now https://phabricator.wikimedia.org/project/profile/34/ is everything – management, comms, engineering, etc., except documentation which got spun off to the not-sub-project https://phabricator.wikimedia.org/project/profile/3513/ [20:18:19] So new super-project SDC, with 34 -> SDC Management, 3513 -> SDC Design and Documentation, and new -> SDC Engineering? [20:19:50] I know I'm asking for magic. :-) [20:25:10] wooo one last method to migrate in its-phabricator then the task is complete :) [20:25:54] James_F: that sounds reasonable I think [20:26:06] twentyafterfour: OK, let me check with the team that they're OK with that. [20:33:13] twentyafterfour: Sounds like we're go. Do you have time now? Should I file a task? [20:52:25] twentyafterfour question, what does this mean [20:52:28] "(Deprecated.) Search for projects with a given name or hashtag using tokenizer/datasource query matching rules. This is deprecated in favor of the more powerful "query" constraint." [20:52:49] im confused how you would use the query constrait and how you would enter a project name in there [20:52:55] this is for https://phab.wmflabs.org/conduit/method/project.search/ [20:53:01] James_F: please file a task so that we have a record of the details [20:53:11] I'll do it shortly, it shouldn't take too long [20:53:13] 10Phabricator, 10Structured-Data-Commons, 10Wikidata: Re-organise Phabricator projects for the Structured Data on Commons programme - https://phabricator.wikimedia.org/T205664 (10Jdforrester-WMF) [20:53:18] paladox: I'm not sure [20:53:22] ok [20:53:26] twentyafterfour: T205664 :-) [20:53:27] T205664: Re-organise Phabricator projects for the Structured Data on Commons programme - https://phabricator.wikimedia.org/T205664 [20:54:22] paladox: I think it just means that you can use regular search syntax for querying projects [20:54:29] hmm [20:54:39] i doin't see on the page a way to do the searching [20:54:48] oh [20:54:49] nvm [20:54:52] found the query key [20:58:28] twentyafterfour works! [20:58:30] { "query": "Patch-For-Review" } [20:59:21] James_F: cool I'm creating the parent project, I'll let you edit the details as you see fit (description, icon, etc.) [20:59:37] Thanks! [21:06:07] 10Phabricator, 10SDC General, 10Wikidata: Re-organise Phabricator projects for the Structured Data on Commons programme - https://phabricator.wikimedia.org/T205664 (10mmodell) Ok done - the project hierarchy is created, please feel free to edit the project icons, colors, descriptions and hashtags as appropri... [21:06:23] paladox: that's cool, glad the query works [21:06:37] twentyafterfour: Thanks! [21:06:42] James_F: you're welcome! :) [21:06:45] 10Phabricator, 10SDC General, 10Wikidata: Re-organise Phabricator projects for the Structured Data on Commons programme - https://phabricator.wikimedia.org/T205664 (10mmodell) 05Open>03Resolved [21:06:49] :) [21:55:18] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 35.71% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [22:12:34] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [22:12:48] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [22:36:54] PROBLEM - Host integration-slave-docker-1024 is DOWN: CRITICAL - Host Unreachable (10.68.17.114) [22:37:00] PROBLEM - Host integration-slave-docker-1012 is DOWN: CRITICAL - Host Unreachable (10.68.16.21) [22:37:08] PROBLEM - Host integration-slave-docker-1013 is DOWN: CRITICAL - Host Unreachable (10.68.19.155) [22:37:59] PROBLEM - Puppet errors on deployment-webperf11 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [22:38:35] PROBLEM - Host integration-slave-docker-1023 is DOWN: CRITICAL - Host Unreachable (10.68.16.200) [22:38:47] PROBLEM - Host integration-slave-docker-1015 is DOWN: CRITICAL - Host Unreachable (10.68.19.76) [22:38:55] PROBLEM - Host integration-slave-docker-1014 is DOWN: CRITICAL - Host Unreachable (10.68.19.123) [22:39:03] PROBLEM - Host integration-slave-docker-1010 is DOWN: CRITICAL - Host Unreachable (10.68.18.61) [22:39:45] PROBLEM - Host integration-slave-docker-1009 is DOWN: CRITICAL - Host Unreachable (10.68.21.208) [22:39:49] PROBLEM - Host integration-slave-docker-1022 is DOWN: CRITICAL - Host Unreachable (10.68.19.33) [22:39:53] PROBLEM - Host integration-slave-docker-1011 is DOWN: CRITICAL - Host Unreachable (10.68.23.221) [22:40:15] hmm [22:40:21] thcipriani ^^ [22:40:34] or marxarelli ^^ [22:44:55] ehmm. that doesnt look good [22:45:22] those nodes/instances were removed yesterday [22:45:29] oh [22:45:31] ah [22:45:33] i'm not sure why those alerts are so delayed [22:45:51] quiet timed out? [22:45:51] does shinken get the list of hosts from LDAP? [22:46:00] hmm, not sure [22:46:03] it has a script i think [22:46:09] shinkengen [22:47:00] RECOVERY - Puppet errors on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [22:47:10] RECOVERY - Puppet errors on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [22:47:18] if it was production it would happen if a host gets shut down but not removed from puppetdb [22:47:34] because icinga generates checks from that [22:47:36] RECOVERY - Puppet errors on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [22:47:49] now some are coming back? heh [22:48:37] different hosts :P [22:57:11] mutante, I don't think we've stored that kind of data in LDAP for a long time now [22:57:22] shinkengen gets data about what hosts exist from nova [22:57:58] RECOVERY - Puppet errors on deployment-webperf11 is OK: OK: Less than 1.00% above the threshold [0.0] [22:58:27] ok! thanks. just wanted to eliminate one possible reason why it could know about hosts that are already gone. [22:58:50] specifically, puppet execs shinkengen which asks nova [22:59:07] and then writes stuff to a config for shinken to pick up [22:59:25] the behaviour is weird then.. either nova thinks the hosts exists or it doesnt.. right [22:59:37] schroedingers instances [22:59:47] yes I'm looking into it [22:59:50] cool [22:59:52] I can't see those hosts listed in horizon [23:00:18] they were deleted about 24 hours ago I believe [23:01:17] okay here's something interesting [23:01:21] they're not in /etc/shinken [23:01:27] which makes me think shinkengen did its job [23:01:30] did shinken not reload? [23:02:40] ran a service shinken reload and it's fine [23:02:48] puppet y u no do ur job [23:03:14] oh I know [23:03:29] Earlier on we were running shinkengen manually for some separate problem [23:03:52] I bet it changed shinken config without me reloading shinken, then when puppet did it there was no difference, so no reload [23:11:00] twentyafterfour thcipriani https://gerrit-review.googlesource.com/c/plugins/its-phabricator/+/197590 :) [23:11:06] it works in my testing too [23:47:06] 10Gerrit, 10Phabricator, 10Release-Engineering-Team (Someday), 10Technical-Debt: Replace deprecated phabricator conduit api calls in gerrit's its-phabricator plugin - https://phabricator.wikimedia.org/T159041 (10Paladox) 05Open>03Resolved I have now done https://gerrit-review.googlesource.com/c/plugins... [23:47:08] twentyafterfour ^^