[00:47:27] 10Phabricator: @Phabricator_maintenance is sending email notifications - https://phabricator.wikimedia.org/T216867 (10greg) Just to be clear, does the cli bulk edit option still suppress notifications (as a workaround)? [01:39:33] So i've setup https://gerrit.gerrit.wmflabs.org/r/ with the intention of it being a multi master alongside https://gerrit.git.wmflabs.org/r/ (ie what ever is done on gerrit.git is forwarded to gerrit.gerrit (and in reverse too). [03:07:51] paladox: nice [03:09:53] 10Phabricator: @Phabricator_maintenance is sending email notifications - https://phabricator.wikimedia.org/T216867 (10mmodell) @greg: yes it should but the usefulness might be limited by {T205258} (I still need to test that more thoroughly, that's tracked in {T215079}) [04:20:26] Project beta-update-databases-eqiad build #32239: 04FAILURE in 25 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/32239/ [05:21:31] Yippee, build fixed! [05:21:32] Project beta-update-databases-eqiad build #32240: 09FIXED in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/32240/ [06:00:58] Project beta-scap-eqiad build #240434: 04FAILURE in 1.4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240434/ [06:12:53] Yippee, build fixed! [06:12:54] Project beta-scap-eqiad build #240435: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240435/ [06:25:45] Project beta-scap-eqiad build #240437: 04FAILURE in 1.2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240437/ [06:44:37] Yippee, build fixed! [06:44:37] Project beta-scap-eqiad build #240438: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240438/ [07:02:16] (03PS3) 10Gergő Tisza: Stop running php7.1 checks on Parsoid [integration/config] - 10https://gerrit.wikimedia.org/r/494802 (https://phabricator.wikimedia.org/T216102) (owner: 10C. Scott Ananian) [07:02:48] (03CR) 10Gergő Tisza: "Restored phan per Legoktm." [integration/config] - 10https://gerrit.wikimedia.org/r/494802 (https://phabricator.wikimedia.org/T216102) (owner: 10C. Scott Ananian) [07:20:25] Project beta-update-databases-eqiad build #32242: 04FAILURE in 24 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/32242/ [07:20:49] Project beta-scap-eqiad build #240442: 04FAILURE in 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240442/ [07:21:00] Project beta-scap-eqiad build #240443: 04STILL FAILING in 0.98 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240443/ [07:34:35] Yippee, build fixed! [07:34:35] Project beta-scap-eqiad build #240444: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240444/ [07:47:38] Project beta-scap-eqiad build #240446: 04FAILURE in 1.4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240446/ [08:04:22] Yippee, build fixed! [08:04:22] Project beta-scap-eqiad build #240447: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240447/ [08:05:42] Project beta-scap-eqiad build #240448: 04FAILURE in 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240448/ [08:12:02] 10Gerrit, 10Release-Engineering-Team (Watching / External), 10Icinga, 10Operations, and 3 others: gerrit: Add a icinga check that uses the healthcheck endpoint - https://phabricator.wikimedia.org/T215457 (10Dzahn) Amended the patch to use the regular check_https_url check command and to link to the full ou... [08:15:17] (03CR) 10Giuseppe Lavagetto: "> Patch Set 2: Code-Review-1" [tools/scap] - 10https://gerrit.wikimedia.org/r/491412 (owner: 10Giuseppe Lavagetto) [08:19:32] Yippee, build fixed! [08:19:32] Project beta-scap-eqiad build #240449: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240449/ [08:21:09] Yippee, build fixed! [08:21:10] Project beta-update-databases-eqiad build #32243: 09FIXED in 1 min 8 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/32243/ [08:35:10] 10Gerrit, 10Release-Engineering-Team (Watching / External), 10Icinga, 10Operations, and 3 others: gerrit: Add a icinga check that uses the healthcheck endpoint - https://phabricator.wikimedia.org/T215457 (10Dzahn) 05Open→03Resolved works now: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?typ... [09:45:33] (03CR) 10Jforrester: [C: 03+1] mediawiki.d: Avoid vars that look like core or wmf names [integration/quibble] - 10https://gerrit.wikimedia.org/r/494803 (owner: 10Krinkle) [09:53:30] hashar: if you have a moment: https://gerrit.wikimedia.org/r/c/integration/config/+/493340 [09:54:49] (03CR) 10Hashar: [C: 03+2] "Note that all 00_dev_settings.php is superseeded by mediawiki/core includes/DevelopmentSettings.php . It has been introduced in REL1_31, t" [integration/quibble] - 10https://gerrit.wikimedia.org/r/494803 (owner: 10Krinkle) [09:56:07] gehel: oui :) [09:56:20] hashar: merci! [09:56:22] gehel: ahh I have forgot about this change :/// Poor SMalyshev [09:56:28] (03Merged) 10jenkins-bot: mediawiki.d: Avoid vars that look like core or wmf names [integration/quibble] - 10https://gerrit.wikimedia.org/r/494803 (owner: 10Krinkle) [09:56:53] gehel: also the tests are failling :/ [09:56:57] (03CR) 10jenkins-bot: mediawiki.d: Avoid vars that look like core or wmf names [integration/quibble] - 10https://gerrit.wikimedia.org/r/494803 (owner: 10Krinkle) [09:57:08] yep, but we can address that once it is merged [09:58:06] at least I hope we can :) [09:58:27] (03CR) 10Hashar: [C: 03+2] CI configuration for Blazegraph [integration/config] - 10https://gerrit.wikimedia.org/r/493340 (https://phabricator.wikimedia.org/T216855) (owner: 10Smalyshev) [09:58:37] gehel: deploying the job and reloading zuul in a couple minutes [09:58:42] then we can "recheck" and see what happens [09:58:47] also [09:58:50] well [09:58:51] no [10:00:08] why no? [10:00:30] ignore me. I am just terribly confused this morning [10:00:43] just blame coffee like everyone else :) [10:00:48] (03Merged) 10jenkins-bot: CI configuration for Blazegraph [integration/config] - 10https://gerrit.wikimedia.org/r/493340 (https://phabricator.wikimedia.org/T216855) (owner: 10Smalyshev) [10:01:38] gehel: you can "recheck" now ;) [10:01:43] hashar: and at the same time: https://gerrit.wikimedia.org/r/c/integration/config/+/487285 [10:01:49] I think I tried it out with the same docker container locally [10:01:53] but some unit tests were failling [10:02:01] needs your review and maybe a merge [10:02:03] one due to me having LANG=fr_FR.UTF-8 [10:02:12] but setting LANG=C like on CI worked fine [10:02:19] yeah, I tried running the tests locally and I also have some failures [10:02:23] (due to numeric separator: . !== , [10:02:28] maybe related to me running Java 11 [10:02:40] then some other tests failed but that was the kind of failure above my paygrade [10:03:10] next "maven wrapper" [10:03:19] I thought you already added support for it !?! [10:03:27] never been merged [10:03:35] and I forgot to ping you about it [10:03:51] ah yeah that is that change [10:03:53] bah [10:03:56] that one has been wanting some love for probably a year :( [10:04:16] ah yeah and I actually reviewed it at some point [10:05:15] recheck worked, there are tests in failure, now we have to fix them! [10:05:25] cool [10:05:43] so eventually I prpoosed to have a job that ran with -Dskip.Tests=true or something like that [10:05:49] but that defeat the purpose [10:05:55] also [10:05:59] is Blazegraph abandoned ? [10:06:04] yep, better to have a failing job and fix the teests [10:06:27] upstream has more or less abandoned it, but we're still using it and need to modify it sometimes [10:06:43] we're trying to find a replacement, but that's not an easy task [10:06:50] oh [10:06:53] it got acquired by AWS :( [10:07:42] so same story as DataStax (the company behind cassandra) which acquired Titan/Tinkerpop [10:07:56] I had that conversaion with Nik Everett when he was evaluating potential graph database [10:08:12] well, AWS acquired the company, but they are not really interested in the product [10:08:12] and suggested to him to look at Titan/Tinkerpop (my brother was involved in that project at time) [10:08:36] only to have DataStax to announce the acquirement / recruitement of the whole team and the project being placed / donated to Apache Foundation [10:08:55] which kind of mean that without core developers, the project would have less interest / dynamic :( [10:09:06] so same situation apparently ;-((((( [10:09:12] Janus is alive (fork of titan) [10:09:33] but does not have a SPARQL endpoint, which is a BIG minus for wdqs [10:12:52] (03CR) 10Hashar: "naming is hard" (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/487285 (https://phabricator.wikimedia.org/T208938) (owner: 10Gehel) [10:13:22] (03CR) 10Hashar: java: build maven projects with maven wrapper if it exists (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/487285 (https://phabricator.wikimedia.org/T208938) (owner: 10Gehel) [10:13:33] gehel: https://gerrit.wikimedia.org/r/#/c/integration/config/+/487285/4/dockerfiles/java8/mvn I am nitpicking at this point :/ [10:13:45] looking [10:13:55] gehel: the CI mvn local scrript prints out messages prefixed with "maven wrapper: " [10:14:00] when it is not mvnw [10:14:05] I am taking suggestion for a better prefix [10:14:52] and a suggestion to highlight that we switched to .mvnw , then the "$MAVEN_BIN" execution line has a set -x so the whole command is shown anyway [10:14:59] or in short, might last comment can be dismissed probably [10:15:07] I don't know. I am over engineering again :\ [10:17:45] I don't think the messages L18/21 need to be changed [10:18:16] but adding a message about using mvnwrapper would be nice [10:18:17] I'm on it [10:20:24] (03PS5) 10Gehel: java: build maven projects with maven wrapper if it exists [integration/config] - 10https://gerrit.wikimedia.org/r/487285 (https://phabricator.wikimedia.org/T208938) [10:20:35] hashar: ^would that be sufficient? [11:06:21] just a heads up, I'll be updating seaborgium (LDAP/eqiad) to Stretch.. clients will failover to serpens (LDAP/codfw) but there might be some auth hiccups. I'll try to minimize the downtime as much as possible but just wanted to warn you some CI jobs could fail (if they are caught in the middle of this and haven't switched to LDAP/codfw) [11:12:53] 10Project-Admins, 10PAWS, 10cloud-services-team: Create "zero-to-jupyterhub-k8s 0.8.0" milestone for PAWS project - https://phabricator.wikimedia.org/T217477 (10Chicocvenancio) Should I create a separate task to archive #JupyterHub-0.9? [11:14:15] 10Release-Engineering-Team (Kanban): Consider and evaluate possible new CI tooling - https://phabricator.wikimedia.org/T217325 (10zeljkofilipin) [11:15:48] 10Continuous-Integration-Infrastructure: zuul should comment on gerrit when it fails to enqueue patches due to patch-chains - https://phabricator.wikimedia.org/T196910 (10zeljkofilipin) [11:18:07] 10Release-Engineering-Team, 10MediaWiki-Core-Testing, 10Epic: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740 (10zeljkofilipin) [11:19:47] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: 5 of the 15 prioritized repositories have at least 1 end-to-end test - https://phabricator.wikimedia.org/T206621 (10zeljkofilipin) 05Open→03Declined In discussion with @greg decided not to continue working on this. Already created sub-tasks will be... [11:20:12] Project beta-scap-eqiad build #240465: 04FAILURE in 1.4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240465/ [11:21:49] 10Release-Engineering-Team (Kanban), 10ContentTranslation, 10Patch-For-Review, 10User-zeljkofilipin: The first Selenium test for ContentTranslation - https://phabricator.wikimedia.org/T216424 (10zeljkofilipin) @Petar.petkovic @Nikerabbit @Etonkovidova do you need help finishing the patch? [11:25:48] gtirloni: good morning. Eventually I have replied on the task with some assumption that the threads pool handling requests might just be exhausted https://phabricator.wikimedia.org/T217280#5007822 [11:26:10] gtirloni: not knowing anything about slapd/openldap. Looks like we want to raise the max number of threads from 16 to something larger [11:26:42] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Develop set of metrics to assess incident reports/post mortems - https://phabricator.wikimedia.org/T206622 (10zeljkofilipin) [11:27:10] (03PS6) 10Hashar: java: build maven projects with maven wrapper if it exists [integration/config] - 10https://gerrit.wikimedia.org/r/487285 (https://phabricator.wikimedia.org/T208938) (owner: 10Gehel) [11:27:14] gehel: yeah looks good [11:27:40] 10Release-Engineering-Team (Kanban): Develop set of metrics to assess incident reports/post mortems - https://phabricator.wikimedia.org/T206622 (10zeljkofilipin) [11:27:45] gehel: I am going to lunch then will further amend the commit to also update all descendant images (that is a pain to do but docker-pkg has a pending patch to somehow make it easier) [11:27:53] then build those images [11:27:59] bump the jenkins jobs to the new containers [11:28:01] ok, lunch too [11:28:04] and I guess we can have a glass of champagne [11:28:09] bon apétit! [11:28:32] 10Release-Engineering-Team, 10MediaWiki-Core-Testing, 10Epic, 10Tracking, 10User-zeljkofilipin: Selenium framework improvements - https://phabricator.wikimedia.org/T182986 (10zeljkofilipin) [11:33:39] hasharLunch: depends on the champagne brand - I'd fancy a Vve. Clicquot-Ponserdin or a Piper Heidsieck :) [11:35:03] Yippee, build fixed! [11:35:04] Project beta-scap-eqiad build #240466: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240466/ [11:41:03] (03PS2) 10Joewalsh: Remove Jenkins jobs for Android app [integration/config] - 10https://gerrit.wikimedia.org/r/494828 [11:45:53] (03CR) 10Joewalsh: "Updated to remove the Android jobs from the jjb folder. Phab ticket for this is https://phabricator.wikimedia.org/T198862" [integration/config] - 10https://gerrit.wikimedia.org/r/494828 (owner: 10Joewalsh) [11:53:37] hasharLunch: that's some really cool research you did there, thank you! I've increased tools-threads 1->8 (as per official recommendation) and threads 16->32 (4*procs). I seems running `sudo su -` on toolforge cluster is much faster now, but it's just a feeling [12:12:09] Project beta-scap-eqiad build #240470: 04FAILURE in 4.5 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240470/ [12:23:40] hauskatze: all unknown to me. But I confess I have only been drinking the same brand of Champagne for 15 years or so (some small familly owned vineyard) [12:23:57] hasharLunch: then it's probably better [12:24:04] curated with family love [12:24:16] gtirloni: tools-threads I have no idea what it is for, that sounded unrelated or for maintenance/admin tasks. I guess it does not hurt to bump it up [12:24:25] Yippee, build fixed! [12:24:26] Project beta-scap-eqiad build #240471: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240471/ [12:24:30] gtirloni: we will see whether the Backload and Pending metrics goes down :) [12:25:05] gtirloni: what would be left to figure out is the root cause of exhaustion :/ Maybe something started doing way more queries than before [12:25:15] hashar: yeah, I've applied that change manually to test it.. at 11:45 UTC -- https://grafana.wikimedia.org/d/000000181/openldap-labs?orgId=1&from=now-3h&to=now [12:25:22] Project beta-update-databases-eqiad build #32247: 04FAILURE in 54 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/32247/ [12:25:39] yep, agreed. at some point we have to look at the workload to figure this out. slapd isn't behaving well but that's not the whole story :) [12:25:49] Project beta-scap-eqiad build #240472: 04FAILURE in 0.98 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240472/ [12:26:42] running sudo on toolforge feels faster now but that might be wishful thinking, I don't have numbers to back this up [12:27:09] gtirloni: also on the codfw threads graph, I have noticed a spike of pending/backload every five minutes sharp ( https://grafana.wikimedia.org/d/000000181/openldap-labs?orgId=1&from=1551952405384&to=1551958526440&panelId=13&fullscreen ) [12:28:01] who knows what kind of process originates from those queries [12:28:37] so I've noticed this: `nslcd` is the daemon doing the caching of LDAP lookups on the clients. if I login to a Toolforge node after a while, I see it doing LDAP requests. If I login immediately again, no requests are made... that's caching working. However! When I do a `sudo su -`, it does LDAP requests every single time, no caching. And also for things running from cron as well [12:29:31] so yeah nslcd does cache stuff [12:29:41] it could be a bunch of cronjobs running at */5, I don't know.. but that's an interesting finding. we should be able to capture some LDAP queries and look at them [12:30:15] but some systemcalls / glibc calls are not cached. I had that issue due to python using an outdated system call which does not have any caching enabled [12:30:23] interesting [12:30:24] but those are hard to track down :-\ [12:30:27] yeah [12:30:36] I mentionned it on the task [12:30:38] the comment is https://phabricator.wikimedia.org/T204681#4598659 [12:31:06] and also, it seems nslcd is configured with 'shared caching' which means glibc calls can look into nslcd's cache directly without asking it.. I was looking into that when I saw the cache hit rate is 0% everywhere I looked [12:31:16] namely python grp.getgrall() uses POSIX getgrent() which glibc developers consider to be a bad API and they are not willing to add caching support to that [12:31:52] the rest is of the explanation took me a while to capture, fixing it properly (eg propsing a patch to python to use a more modern call) is way over what I can technically achieve [12:31:53] ah interesting task, that seems indeed a complicator [12:31:55] (I dont know C at all) [12:32:13] then grp.getgrall is just one offender for just python [12:32:19] python must have other similar uncacheable calls [12:32:31] and I would guess that the example you gave with su - is probably similar [12:32:54] and last night, I eventually looked at nslcd cache statistics but fall asleep on my computer before drawing any conclusion [12:33:22] understandable :) [12:33:33] (or what is nscd) [12:36:12] I don't know how they interact (nscd/nslcd) but nscd is the traditional stuff for name caching.. one wonders why it wasn't enhanced for LDAP as well but that's it... then there's `sssd` which seems to be the modern alternative to it all but I'd rather not go there (yet) :) [12:39:57] Yippee, build fixed! [12:39:58] Project beta-scap-eqiad build #240473: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240473/ [12:40:41] gtirloni: yeah all of that is a mess imho [12:40:49] I am waiting for systemd to take over eventually ;] [12:41:00] lol [12:41:16] i have no doubt of that happening at this point :) [12:47:47] systemd-resolved ? [12:51:23] hmm looks like it depends on NSS as well (i.e. it's an opt-in from the app point of view) [13:01:10] I ran nslcd with debug logging [13:01:24] top requesters were keyholder (known bug) [13:01:28] and nscd :) [13:02:10] mutante: if systemd-resolved is able to cache those system calls yes. But I think it is only for DNS [13:02:21] also on Debian it annoyingly fallback to Google DNS 8.8.8.8 :-((( [13:03:01] a better fallback could be 1.1.1.1 maybe. that's Cloudflare [13:09:34] (03PS3) 10Krinkle: Remove Jenkins jobs for Android app [integration/config] - 10https://gerrit.wikimedia.org/r/494828 (https://phabricator.wikimedia.org/T198862) (owner: 10Joewalsh) [13:10:33] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10zeljkofilipin) [13:11:21] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10zeljkofilipin) p:05Triage→03High [13:13:08] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10zeljkofilipin) [13:14:29] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10zeljkofilipin) [13:17:16] bah I circle back to what I did months ago [13:17:22] nslcd has some cache system [13:17:23] but [13:17:44] Currently, only the dn2uid cache is supported that is used to remember DN to username lookups that are used when the member attribute is used [13:17:49] default 15 mins [13:17:56] so nslcd cant cache groups :) [13:18:38] so most requests that are not caught by nscd ends up hitting ldap [13:18:41] that being said [13:18:50] nscd group cache has a short ttl of 60 seconds :/ [13:19:19] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10zeljkofilipin) [13:20:33] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10zeljkofilipin) [13:21:22] Yippee, build fixed! [13:21:23] Project beta-update-databases-eqiad build #32248: 09FIXED in 1 min 21 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/32248/ [13:21:28] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10zeljkofilipin) [13:22:44] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): Delete Jenkins job https://releases-jenkins.wikimedia.org/job/make-deploy-notes/ - https://phabricator.wikimedia.org/T217793 (10thcipriani) 05Open→03Resolved a:03thcipriani > I would delete it right away in the web UI but I hav... [13:34:56] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10MarcoAurelio) My understanding is that the dblist is used on InitialiseSettings and/or CommonSettings, if you don't touch those the change doesn't... [13:35:11] hashar: seems worth tweaking for labs -- https://gerrit.wikimedia.org/r/c/operations/puppet/+/494922 [13:50:54] gtirloni: yeah I was looking at the same cache value. I replied on the change :/ [13:51:09] also the ldap servers are back to 16 threads [13:56:22] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10zeljkofilipin) [14:03:20] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10hashar) The dblist files are read and are part of the cached config. The relevant code (inline annotations are mine): ` lang=php,name=wmf-config/... [14:09:36] yep, I reverted the manual change, it seems an hour of data was worth it to inform people's decision on that other change [14:10:00] hashar: thanks for all your research, really nice! [14:48:10] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.33.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T206674 (10hashar) [15:13:28] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10thcipriani) p:05High→03Normal Lowering priority since touching `InitialiseSettings.php` has been happening since {T60618} (circa 2014) >>! In... [15:14:42] twentyafterfour yup, pretty cool! (GerritHub are running it (which is owned by gerritforge the owner of the plugin)). [15:21:46] (03PS7) 10Hashar: java: build maven projects with maven wrapper if it exists [integration/config] - 10https://gerrit.wikimedia.org/r/487285 (https://phabricator.wikimedia.org/T208938) (owner: 10Gehel) [15:22:01] (03CR) 10jerkins-bot: [V: 04-1] java: build maven projects with maven wrapper if it exists [integration/config] - 10https://gerrit.wikimedia.org/r/487285 (https://phabricator.wikimedia.org/T208938) (owner: 10Gehel) [15:23:22] (03PS8) 10Hashar: java: build maven projects with maven wrapper if it exists [integration/config] - 10https://gerrit.wikimedia.org/r/487285 (https://phabricator.wikimedia.org/T208938) (owner: 10Gehel) [15:23:40] gehel: ok edited to update other java8 containers and I rebased the change. Will look at rebuilding those [15:25:23] hashar: thanks! ping me if you need anything else from me [15:37:53] 10Project-Admins: Create milestones of Deployments project - https://phabricator.wikimedia.org/T217843 (10MarkAHershberger) [15:42:34] Project beta-scap-eqiad build #240490: 04FAILURE in 1.2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240490/ [15:44:11] So, I'm trying to use Jenkins to create tarballs. I have a docker instance of jenkins to play around with. [15:45:11] 10Deployments, 10HHVM: mw conf cache is not properly invalidated - https://phabricator.wikimedia.org/T134448 (10hashar) 05Resolved→03Open Courtesy of @thcipriani : https://apenwarr.ca/log/20181113 [15:46:03] The use case I'm thinking of is "One of CindyCicaleseWMF's delegates wants to do a release." Could they go to Jenkins and push a button to do that? Or would I have to kick it off remotely somehow? (Cc: Reedy) [15:46:22] In theory, yes [15:46:35] You can kick builds off manually via the jenkins interface [15:46:44] Just give it the branch/hash/tag you want to do, and let it do whatever [15:48:11] Reedy: is there some documentation that would help me with this use case so I don't have to bug you with every little question? [15:48:45] 10Project-Admins: Create milestones of Deployments project - https://phabricator.wikimedia.org/T217843 (10Aklapper) Hmm... I'm not sure I get it. :) * Is this about creating milestones directly under https://phabricator.wikimedia.org/project/subprojects/349/ ? * Or about creating a new subproject under https://... [15:48:51] I suppose I could just "learn jenkins" and I will, but it would be nice for my learning to have direction. [15:49:16] I don't know, sorry [15:49:28] I know kunal created https://github.com/wikimedia/integration-config/tree/master/dockerfiles/mediawiki-tarball as some prep work towards being able to do that for MW releases [15:50:02] ty for your pointers [15:51:05] You should be able to setup jenkins as basically a gui wrapper around a python/other script [15:53:58] Yippee, build fixed! [15:53:59] Project beta-scap-eqiad build #240491: 09FIXED in 9 min 38 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240491/ [15:55:20] Project beta-scap-eqiad build #240492: 04FAILURE in 1.3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240492/ [15:55:34] (03PS1) 10Volans: Add tox setup for external-monitoring repo [integration/config] - 10https://gerrit.wikimedia.org/r/494968 (https://phabricator.wikimedia.org/T217599) [16:01:05] gehel: error too many duties sorry :( [16:01:19] hashar: I'll ping you again! [16:01:27] We'll get this done at some point [16:01:32] (03CR) 10Hashar: [C: 03+2] java: build maven projects with maven wrapper if it exists [integration/config] - 10https://gerrit.wikimedia.org/r/487285 (https://phabricator.wikimedia.org/T208938) (owner: 10Gehel) [16:01:37] yeah [16:02:03] (03CR) 10Hashar: [C: 03+2] Add tox setup for external-monitoring repo [integration/config] - 10https://gerrit.wikimedia.org/r/494968 (https://phabricator.wikimedia.org/T217599) (owner: 10Volans) [16:03:07] (03Merged) 10jenkins-bot: java: build maven projects with maven wrapper if it exists [integration/config] - 10https://gerrit.wikimedia.org/r/487285 (https://phabricator.wikimedia.org/T208938) (owner: 10Gehel) [16:04:05] (03Merged) 10jenkins-bot: Add tox setup for external-monitoring repo [integration/config] - 10https://gerrit.wikimedia.org/r/494968 (https://phabricator.wikimedia.org/T217599) (owner: 10Volans) [16:08:02] 10Project-Admins: Create milestones of Deployments project - https://phabricator.wikimedia.org/T217843 (10MarkAHershberger) Let's number these: 1. I was told "milestones" and didn't realize there was an actual thing in phab called "milestones" before reading the docs you pointed me to. I initially thought the d... [16:14:32] Yippee, build fixed! [16:14:32] Project beta-scap-eqiad build #240493: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240493/ [16:24:07] Project beta-scap-eqiad build #240494: 15ABORTED in 8 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240494/ [16:24:10] ^ me [16:26:33] stop breaking stuff!!!!!! [16:28:21] 10Phabricator, 10Release-Engineering-Team (Kanban): Mass-edits via @Phabricator_maintenance account stop after 11 tasks - https://phabricator.wikimedia.org/T205258 (10MBinder_WMF) FWIW, @Phabricator_maintenance seems to be batching just fine for me. The notifications are being sent, which is a problem already... [16:30:59] Reedy: would that it were so simple :( [16:45:55] 10Release-Engineering-Team (Kanban): Investigate sourcehut builds - https://phabricator.wikimedia.org/T217852 (10brennen) [17:20:24] Project beta-scap-eqiad build #240499: 04FAILURE in 6.6 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240499/ [17:34:27] Yippee, build fixed! [17:34:27] Project beta-scap-eqiad build #240500: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240500/ [18:36:44] 10Continuous-Integration-Config, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint, 10User-Smalyshev: Set up Blazegraph test suite on CI - https://phabricator.wikimedia.org/T216855 (10Gehel) We now have a Jenkins job for blazegraph, but the tests are currently failing. [18:56:51] (03CR) 10Hashar: [C: 03+2] "Deployed! :)" [integration/config] - 10https://gerrit.wikimedia.org/r/494968 (https://phabricator.wikimedia.org/T217599) (owner: 10Volans) [18:57:42] hashar: lol, I was checking exactly right now why the recheck didn't worked and was about to ask here :) [18:57:45] thanks a lot [18:58:11] you read my mind... kinda scary :) [18:58:22] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10thcipriani) I think I have a theory, and I think it could explain how this keeps happening. Here's a diagram: | Time | Action | `InitialiseSett... [19:00:11] volans: sorry I was in a meeting / digged into ldap/nscd/nslcd / and looking at some Docker madness :) [19:00:36] no problem at all, you already solved it, that's great :) [19:01:09] if/when you've time I also have https://gerrit.wikimedia.org/r/c/integration/config/+/491793 , but no hurry at all, can wait [19:16:33] 10Continuous-Integration-Config, 10Release-Engineering-Team (Backlog), 10Discovery-Search (Current work), 10Patch-For-Review: Use maven wrapper (mvnw) to build maven based project from search platform team - https://phabricator.wikimedia.org/T208938 (10hashar) I got the images build: ` docker-registry.wiki... [19:24:38] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10thcipriani) >>! In T217830#5009234, @thcipriani wrote: > How do we fix this? We could set the mtime way in the future in scap, I guess. This syste... [19:29:06] (03PS1) 10Hashar: Update Jenkins job to use latest java8 containers [integration/config] - 10https://gerrit.wikimedia.org/r/495022 (https://phabricator.wikimedia.org/T208938) [19:29:47] (03CR) 10Hashar: "Lets sync up and deploy together?" [integration/config] - 10https://gerrit.wikimedia.org/r/495022 (https://phabricator.wikimedia.org/T208938) (owner: 10Hashar) [19:49:43] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10hashar) From git log the filemtime has been there since at least 2011 / MediaWiki 1.17. We had php 5.2 at that time [[ https://web.archive.org/web... [19:50:10] hey all! How can I find out the precise date wmf/1.33.0-wmf.16 went live to English Wikipedia? [19:50:17] with difficulty [19:50:39] there's a schedule but sometimes deployments go back and forth a bit, let's see if this one did [19:50:50] via @Niharika: : https://www.mediawiki.org/wiki/MediaWiki_1.33/Roadmap [19:51:03] according to the schedule Thursday, 07 February 2019 [19:51:18] that seems to match that table! [19:51:29] that was what I was reading from [19:51:50] thanks @Krenair ! [19:51:57] 21:43 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.33.0-wmf.16 refs T206670 [19:51:58] T206670: 1.33.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T206670 [19:52:02] from SAL for that day [19:52:35] looks like it went fine [19:55:41] jdlrobson, I don't know how common it is now but in the past there have been occasions where a deployment had to get reverted and re-run later [19:55:58] so wasn't necessarily a singular precise date [19:56:16] probably doesn't happen very often to enwiki [19:59:00] 10MediaWiki-Codesniffer: Missing detection of incorrect spacing in function syntax - https://phabricator.wikimedia.org/T217861 (10Anomie) [20:34:26] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Problems deploying dblists/commonsuploads.dblist - https://phabricator.wikimedia.org/T217830 (10hashar) I am tempted to just merge this as a duplicate of the old task T181833 (and maybe update its description with Tyler comments above). For... [20:37:45] hashar: odds are vry good that TimStarling has a copy of that old config repo, [20:38:02] he has been able to dig up stuff like that in the past [20:40:16] I have a feeling people went looking for these files before [20:58:59] Project beta-scap-eqiad build #240518: 04FAILURE in 1.3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240518/ [20:59:16] 10Continuous-Integration-Infrastructure: phpunit drops dead on some extension tests - https://phabricator.wikimedia.org/T217384 (10Smalyshev) Happened again: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/38428/console [21:01:37] apergos: thank you. Yeah I would guess Tim/Daniel/B.rion would know :] [21:02:06] apergos: Krenair: if we found them we will look at documenting where they are on wikitech (maybe it is already documented but I could not find it out ) [21:02:16] and I am off to bed^H^H^Hfixing a computer [21:02:18] no, it would be on someone's private disks [21:02:22] awww [21:02:27] good luck, hope you get to sleep soon! [21:02:31] thanks! :) [21:20:38] Yippee, build fixed! [21:20:39] Project beta-scap-eqiad build #240519: 09FIXED in 16 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240519/ [21:45:31] thcipriani: marxarelli: If I wanted to add tests using the `service-pipeline-test-and-publish` template where would I do that? Is it this repo https://gerrit.wikimedia.org/g/operations/deployment-charts? [21:48:35] clarakosi: what kinds of tests? like functional/systems/end-to-end tests? [21:48:54] yeah functional tests [21:49:12] got it [21:50:06] so tl;dr: yes, you'll need a deployment chart so the application can be deployed to a "ci" cluster by the service-pipeline-test-and-publish job [21:50:42] after which, `helm test` will be executed against the deployment [21:51:34] Is there a dev environment to test this or is my local one enough? And what happens if the tests fail does it revert the merge? [21:51:42] helm will then deploy any pods defined under the chart's `templates/tests` directory [21:52:46] the pods are meant to execute whatever system test you like [21:53:12] clarakosi: the image will remain published, but it won't be tagged as `[timestamp]-production` [21:53:25] (if `helm test` fails that is) [21:54:07] clarakosi: see mathoid's `templates/tests` directory for an example of a Pod definition used for testing https://releases.wikimedia.org/charts/mathoid/templates/tests/test-service-checker.yaml [21:55:51] hmmm we haven't implemented our service checker spec yet but I think I get the gist [21:56:40] it can be any process, not necessarily service-checker, though that is what we've decided to use for k8s hosted services thus far [21:57:07] and a very basic means of ensuring successful deployment [21:58:50] *as* a very basic [22:01:14] Project beta-scap-eqiad build #240523: 04FAILURE in 5.9 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240523/ [22:01:16] assuming we can figure out the cassandra aspect of it this might actually work pretty well for both our functional and integration tests [22:01:35] awesome [22:02:42] i'm not sure about the cassandra part either but definitely down to help give it a shot [22:03:27] 10Release-Engineering-Team (Backlog), 10Keyholder, 10Operations: Keyholder phab repo duplicate work - https://phabricator.wikimedia.org/T203003 (10hashar) [22:07:03] marxarelli: thanks! And last question how does the ci go about installing dependencies? Does it automatically run `helm dep update`? [22:10:25] clarakosi: it doesn't handle dependencies... yet :) [22:10:48] er, rather we haven't tried to make it handle dependencies yet [22:12:48] ahh ok [22:16:36] Yippee, build fixed! [22:16:36] Project beta-scap-eqiad build #240524: 09FIXED in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240524/ [22:18:25] marxarelli: so we currently only run helm test as part of gate-and-submit, should we also be running that for test/should that be configurable? [22:18:40] clarakosi: yeah, but looking at the helm docs, modifying the job to install dependencies _should_ be straightforward... oh, and to your question about where to test, a local minikube setup is one option [22:18:53] thcipriani: oh, right. that's true [22:20:33] the only issue i can see with having that be configurable (allowing publishing of images and test deployment) pre-merge is that we'd be polluting the registry quite a bit [22:20:35] that other part of that is, of course, we have to deploy to the "ci" namespace in staging, which means we won't be able to use wmcs hosts for that :\ [22:20:38] docker registry [22:21:08] oh right, that, too [22:21:14] Project beta-update-databases-eqiad build #32257: 04FAILURE in 1 min 13 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/32257/ [22:21:16] although we don't have too many projects currently [22:21:28] 10Release-Engineering-Team (Kanban), 10Developer Productivity: Add tests to local-charts / configure local-charts for CI - https://phabricator.wikimedia.org/T217868 (10brennen) [22:21:57] 10Release-Engineering-Team, 10Developer Productivity, 10Epic: Improve Developer Tooling - https://phabricator.wikimedia.org/T212449 (10brennen) [22:21:59] 10Release-Engineering-Team (Kanban), 10Developer Productivity: Add tests to local-charts / configure local-charts for CI - https://phabricator.wikimedia.org/T217868 (10brennen) [22:23:37] marxarelli: thanks I'll try minikube :) [22:23:43] np! [22:25:16] thcipriani: we don't have that many projects now, but we should really think about capacity soon :/ [22:25:46] I worry more about contint1001 than I do about the docker registry in that respect. [22:26:08] yeah, overloading contint1001 would be terrible [22:26:16] speaking of which, I know I owe you some code review for your cleanup [22:26:26] oh yeah! [22:26:37] I made a drive-by comment last week [22:27:20] i noticed :) [22:27:25] it was accurate [22:27:41] not sure where i saw that `chomp` method [22:27:49] er `chop` [22:28:43] but `collate` is right [22:29:12] Project beta-scap-eqiad build #240526: 04FAILURE in 5.2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240526/ [22:30:03] cool, I'll try to get that some review this week :) [22:41:35] 10Release-Engineering-Team (Kanban), 10Developer Productivity: Gather and Analyze Information Around Developer Tooling Woes - https://phabricator.wikimedia.org/T212454 (10jeena) Analysis was completed and results were posted as well as emailed to engineering and wikitech-l: https://www.mediawiki.org/wiki/Devel... [22:44:24] Yippee, build fixed! [22:44:25] Project beta-scap-eqiad build #240527: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/240527/ [22:47:12] 10Release-Engineering-Team (Kanban), 10Developer Productivity, 10Epic: Automate LocalSettings.php creation in local charts - https://phabricator.wikimedia.org/T217869 (10jeena) p:05Triage→03Normal [22:48:45] 10Release-Engineering-Team (Kanban), 10Developer Productivity, 10Epic: Automate LocalSettings.php creation for local-charts - https://phabricator.wikimedia.org/T217869 (10jeena) [23:07:26] 10Release-Engineering-Team, 10Developer Productivity, 10Epic: Create official docker images for Mediawiki and services used in the local development environment - https://phabricator.wikimedia.org/T217872 (10jeena) p:05Triage→03Normal [23:15:17] 10Release-Engineering-Team (Kanban), 10Developer Productivity, 10local-charts: Add tests to local-charts / configure local-charts for CI - https://phabricator.wikimedia.org/T217868 (10brennen) [23:15:42] 10Release-Engineering-Team (Kanban), 10Developer Productivity, 10local-charts, 10Epic: Automate LocalSettings.php creation for local-charts - https://phabricator.wikimedia.org/T217869 (10brennen) [23:16:01] 10Release-Engineering-Team, 10Developer Productivity, 10local-charts, 10Epic: Create official docker images for Mediawiki and services used in the local development environment - https://phabricator.wikimedia.org/T217872 (10brennen) [23:21:02] Yippee, build fixed! [23:21:03] Project beta-update-databases-eqiad build #32258: 09FIXED in 1 min 1 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/32258/ [23:30:42] 10Continuous-Integration-Infrastructure: phpunit drops dead on some extension tests - https://phabricator.wikimedia.org/T217384 (10Smalyshev) This time the log is produced and ends with: ` PHPUnitCommand] Start test Scribunto_LuaTextLibraryTest::testLua with data set #70 Parser: using preprocessor: Preprocessor... [23:31:24] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-Scribunto: phpunit drops dead on some extension tests - https://phabricator.wikimedia.org/T217384 (10Smalyshev) [23:32:51] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-Scribunto: phpunit drops dead on some extension tests - https://phabricator.wikimedia.org/T217384 (10Smalyshev) p:05Triage→03Unbreak! [23:32:53] 10Phabricator, 10User-greg: Document what "routine maintenance tasks" are (performed by @Phabricator_maintenance) - https://phabricator.wikimedia.org/T142904 (10greg) 05Open→03Resolved a:03greg Edited. [23:35:06] greg-g: do you know anything about this per chance: https://phabricator.wikimedia.org/T217384 [23:36:00] SMalyshev: looks like you're right in your guess [23:36:44] greg-g: has it happened before? any ideas what to do with it? it breaks several builds for me :( [23:37:10] no updates to scribunto for a while: https://gerrit.wikimedia.org/r/q/project:mediawiki%252Fextensions%252FScribunto (non-l10n that is) [23:40:13] 10Continuous-Integration-Infrastructure, 10Editing-team, 10MediaWiki-extensions-Scribunto: phpunit drops dead on some extension tests - https://phabricator.wikimedia.org/T217384 (10greg) Adding #editing-team per Dev/Maintainers. Please take a look at this test failure. [23:40:34] give that it's a segfault I wonder if it's some random old bug... [23:45:31] (03PS3) 10Thcipriani: Sonar: job template for change vs branch [integration/config] - 10https://gerrit.wikimedia.org/r/490950 (https://phabricator.wikimedia.org/T215175)