[00:13:34] 10Continuous-Integration-Config, 10Fundraising-Backlog: wikimedia/fundraising/tools CI jobs are broken - https://phabricator.wikimedia.org/T117818#1784282 (10awight) 3NEW [00:18:46] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:36] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 30325 bytes in 0.690 second response time [01:07:38] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [01:33:43] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 1.10 ms [01:47:44] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [02:27:44] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [04:22:29] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #613: 04FAILURE in 30 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/613/ [05:34:32] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #593: 04FAILURE in 32 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/593/ [07:05:52] 10Gitblit-Deprecate, 10Diffusion: Replicate open patchsets to diffusion - https://phabricator.wikimedia.org/T89940#1784795 (10demon) >>! In T89940#1783787, @Paladox wrote: > Oh well then how does gitblit replicate patches that haven't been merged yet. Gitblit doesn't, Gerrit replicates all of refs/* to Gitbli... [08:34:08] 10Gitblit-Deprecate, 10Diffusion: Replicate open patchsets to diffusion - https://phabricator.wikimedia.org/T89940#1784894 (10Paladox) Oh ok. Could that be a bug. [08:52:58] (03CR) 10Nikerabbit: "I don't see how this can work. in_array and array_key_exists do not do the same thing. On compares values, other keys." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/250956 (owner: 10Thiemo Mättig (WMDE)) [08:54:37] 10Differential: Allow self-reviewing changes; make test plan optional - https://phabricator.wikimedia.org/T114575#1784907 (10mmodell) I already made test plan optional. [09:30:38] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:39:55] (03CR) 10Thiemo Mättig (WMDE): "It works because we made sure that all arrays used in the new array_key_exist calls have keys to work with." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/250956 (owner: 10Thiemo Mättig (WMDE)) [09:44:06] (03CR) 10Thiemo Mättig (WMDE): "No, because /#+/ would also match hash characters in the middle of a comment. I assume the regex should be /^##+$/ to match sequences of t" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/250996 (owner: 10Thiemo Mättig (WMDE)) [09:47:57] Project browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce-T99655 build #3: 04FAILURE in 58 sec: https://integration.wikimedia.org/ci/job/browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce-T99655/3/ [09:57:25] PROBLEM - Host deployment-cache-parsoid04 is DOWN: CRITICAL - Host Unreachable (10.68.19.197) [09:58:29] (03PS1) 10Hashar: Let fundraising/tools gate properly [integration/config] - 10https://gerrit.wikimedia.org/r/251208 [10:00:36] (03CR) 10Hashar: [C: 032] ""We're a little hosed, this repo is not actually gate-and-submitting: https://gerrit.wikimedia.org/r/#/c/250746/1"" [integration/config] - 10https://gerrit.wikimedia.org/r/251208 (owner: 10Hashar) [10:02:01] (03Merged) 10jenkins-bot: Let fundraising/tools gate properly [integration/config] - 10https://gerrit.wikimedia.org/r/251208 (owner: 10Hashar) [10:07:09] (03CR) 10Nikerabbit: "Would have been helpful to mention that PHP_CodeSniffer_Tokens::$emptyTokens has same keys as values." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/250956 (owner: 10Thiemo Mättig (WMDE)) [10:15:18] 10Continuous-Integration-Config, 10Fundraising-Backlog: wikimedia/fundraising/tools CI jobs are broken - https://phabricator.wikimedia.org/T117818#1785044 (10hashar) 5Open>3Resolved a:3hashar I fixed it with https://gerrit.wikimedia.org/r/#/c/251208/ , the reason is gate-and-submit only has the jobs `jsh... [10:20:17] 10Differential, 5Gerrit-Migration: Commit hashes of landed patches need not match the latest ones shown in Differential - https://phabricator.wikimedia.org/T162#1785051 (10hashar) For what it is worth, it seems arc default behavior is similar to the Gerrit //cherry-pick// merge strategy. If set on a repo, Gerr... [10:21:48] 10Differential, 5Gerrit-Migration: Pulling patches from Phabricator does not give consistent commit hashes - https://phabricator.wikimedia.org/T136#1785052 (10hashar) [10:22:01] 10Differential, 5Gerrit-Migration: Commit hashes of landed patches need not match the latest ones shown in Differential - https://phabricator.wikimedia.org/T162#2182 (10hashar) [10:24:25] 10Continuous-Integration-Infrastructure, 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-GettingStarted: Delete or fix failed GettingStarted browsertests Jenkins job - https://phabricator.wikimedia.org/T94154#1785061 (10zeljkofilipin) 5duplicate>3Open [10:24:26] 5Release-Engineering-Epics, 10Browser-Tests-Infrastructure, 7Epic, 7Tracking: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#1785062 (10zeljkofilipin) [11:12:35] 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 7Zuul: Zuul-cloner should use git-clean to reset workspace - https://phabricator.wikimedia.org/T76304#1785182 (10hashar) [11:12:36] 10Continuous-Integration-Infrastructure, 7Zuul: Zuul: python git assert error assert len(fetch_info_lines) == len(fetch_head_info) - https://phabricator.wikimedia.org/T61991#1785183 (10hashar) [11:12:38] 10Continuous-Integration-Infrastructure: Upgrade Zuul server to latest upstream - https://phabricator.wikimedia.org/T94409#1785179 (10hashar) 5Open>3Resolved a:3hashar Upgraded Zuul on gallium and labs slave to a recent version: ``` $ git show 1cc37f7 commit 1cc37f7b469a892cdbd16db6aa1d500a1200c417 Merge:... [11:14:43] (03Abandoned) 10Hashar: Merge branch 'debian/precise-wikimedia' into debian/trusty-wikimedia [integration/zuul] (debian/trusty-wikimedia) - 10https://gerrit.wikimedia.org/r/227510 (owner: 10Hashar) [11:15:41] (03CR) 10Hashar: [C: 032] "$ ssh integration-slave-trusty-1012.integration.eqiad.wmflabs zuul-cloner --version" [integration/zuul] (debian/trusty-wikimedia) - 10https://gerrit.wikimedia.org/r/250442 (owner: 10Hashar) [11:15:50] (03CR) 10Hashar: [V: 032] "$ ssh integration-slave-trusty-1012.integration.eqiad.wmflabs zuul-cloner --version" [integration/zuul] (debian/trusty-wikimedia) - 10https://gerrit.wikimedia.org/r/250442 (owner: 10Hashar) [11:18:31] 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 7Zuul: Zuul-cloner should use git-clean to reset workspace - https://phabricator.wikimedia.org/T76304#1785184 (10hashar) 5stalled>3Resolved a:3hashar I think I got Ori patch ( https://review.openstack.org/#/c/148121/ //Reset & clean w... [11:24:07] 10Continuous-Integration-Infrastructure, 7Zuul: Zuul: python git assert error assert len(fetch_info_lines) == len(fetch_head_info) - https://phabricator.wikimedia.org/T61991#1785193 (10hashar) 5stalled>3Resolved a:3hashar The Zuul Debian package embeds some python modules including `GitPython` which is n... [11:24:29] 10Continuous-Integration-Infrastructure, 7Zuul: Upgrade Zuul server to latest upstream - https://phabricator.wikimedia.org/T94409#1785196 (10hashar) [11:47:17] 10Differential, 5Gerrit-Migration, 7Documentation: Create example workflows for differential showing old way and new way side by side - https://phabricator.wikimedia.org/T117058#1785237 (10mmodell) [11:48:55] would someone also add me to the wmf-deployment gerrit group? [11:49:16] aude: ?^^ [11:51:26] jzerebecki: can you link me to the ticket for that? [11:51:45] * aude is about to run away for a few hours but maybe can do later [11:51:53] aude: https://phabricator.wikimedia.org/T116487 [11:51:55] k [11:52:04] no hurry [11:52:05] 10Differential, 5Gerrit-Migration, 7Documentation: Create example workflows for differential showing old way and new way side by side - https://phabricator.wikimedia.org/T117058#1785250 (10mmodell) [11:52:45] \o/ [11:52:59] so you can merge wmf/* stuff? [11:54:45] jzerebecki: "a new build which should be prepared before a SWAT, to not overly tax the time required during a SWAT" [11:55:16] we can merge wikibase and make the build, but not merge the wikidata deployment build before swat [11:55:36] aude: yes [11:56:24] most everything is documented and as long we follow that, then evertyhign is nornmally ok [12:05:53] jzerebecki: added :) [12:06:33] works. thx. [12:13:30] 10Beta-Cluster-Infrastructure: Logging out of Commons beta: Cannot contact the database server: Unknown database 'incubatorwiki' - https://phabricator.wikimedia.org/T71898#1785311 (10hashar) 5Open>3Resolved a:3hashar There are no more any occurrences of `incubatorwiki` in the CentralAuth database: ``` mysq... [12:26:06] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 6Security: Update all repositories that use mediawiki_selenium Ruby gem to version 1.6.x - https://phabricator.wikimedia.org/T114241#1785326 (10zeljkofilipin) [12:39:10] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: use one job for all CI entry points - https://phabricator.wikimedia.org/T111181#1785344 (10hashar) Nodepool has the ability to spawn several nodes associated to a job. That is done on the labels: ``` labels: - name: multi-precise image: precise... [12:50:39] 5Continuous-Integration-Scaling, 6operations, 5Patch-For-Review: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1785355 (10hashar) @robh scandium has been installed with Trusty. Would need to reimage it to Jessie instead (sorry). Some firewall rules have been... [12:51:09] 5Continuous-Integration-Scaling, 6operations, 5Patch-For-Review: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1785357 (10hashar) [14:08:35] (03PS3) 10Hashar: [Offline] Update jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/245928 (owner: 10Paladox) [14:11:40] (03CR) 10Hashar: [C: 032] [Offline] Update jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/245928 (owner: 10Paladox) [14:13:45] (03Merged) 10jenkins-bot: [Offline] Update jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/245928 (owner: 10Paladox) [14:17:02] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [14:45:12] (03PS1) 10Hashar: zuul: experimental debian-glue for Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/251244 [14:45:26] (03PS2) 10Hashar: zuul: experimental debian-glue for Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/251244 [14:45:37] (03CR) 10Hashar: [C: 032] zuul: experimental debian-glue for Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/251244 (owner: 10Hashar) [14:47:03] (03Merged) 10jenkins-bot: zuul: experimental debian-glue for Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/251244 (owner: 10Hashar) [14:48:00] 10Deployment-Systems, 6Release-Engineering-Team: Move the train deployment from Thursday to Wednesday for some Wikipedia sites - https://phabricator.wikimedia.org/T115002#1785473 (10Amire80) [14:52:45] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 6Security: Update all repositories that use mediawiki_selenium Ruby gem to version 1.6.x - https://phabricator.wikimedia.org/T114241#1785480 (10JanZerebecki) [15:05:19] ostriches: morning :-) Any clue how to change the HEAD of a Gerrit repository? operations/debs/nodepool has HEAD -> master but there is no master branch only 'debian'. [15:05:37] ostriches: seems recent gerrit versions have a 'set-head' command but it is not available on our version :/ [15:05:58] `git symbolic-link HEAD refs/heads/master` [15:06:05] Swap 'master' for your branch of choice [15:06:18] I need server side [15:06:21] for zuul merger [15:06:39] when it update / reset the repo it does something like: [15:06:45] repo.head.reference = origin.refs['HEAD'] [15:06:45] repo.reset() [15:06:52] thus expect origin/HEAD to point to something valid [15:07:40] or is that something we should do on ytterbium + restart Gerrit? [15:07:47] No restart needed. [15:07:48] Just did it [15:07:59] gerrit2@ytterbium /var/lib/gerrit2/review_site/git/operations/debs/nodepool.git (BARE:master)$ git symbolic-ref HEAD refs/heads/debian [15:08:11] you are a hero :-) [15:08:16] Now shows: [15:08:17] gerrit2@ytterbium /var/lib/gerrit2/review_site/git/operations/debs/nodepool.git (BARE:debian)$ [15:14:26] ostriches: all good [15:21:43] 10Continuous-Integration-Config: Make debian-glue to use Gerrit as the upstream repository - https://phabricator.wikimedia.org/T117869#1785557 (10hashar) 3NEW [15:35:30] ostriches: do I merge https://gerrit.wikimedia.org/r/#/c/251028/? [15:37:50] PROBLEM - Host deployment-cache-upload04 is DOWN: CRITICAL - Host Unreachable (10.68.18.109) [15:38:43] !log kicking puppetmaster on integration [15:40:08] upload04 is down? [15:41:00] chasemp: Left myself a nitpick about it, but anyone can ya. [15:42:10] ok I'll leave it be for your self nitpick :) [15:42:19] !log poke [15:42:23] stupid bots [15:44:19] jstart -N qa-morebot /usr/lib/adminbot/adminlogbot.py --config ./confs/qa-logbot.py [15:44:21] does thetrick [15:44:34] !log kicking puppetmaster on integration [15:44:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:46:42] Project UploadWizard-api-commons.wikimedia.beta.wmflabs.org build #2902: 04FAILURE in 1 min 41 sec: https://integration.wikimedia.org/ci/job/UploadWizard-api-commons.wikimedia.beta.wmflabs.org/2902/ [16:00:16] (03Abandoned) 10Paladox: [Translate] Add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/243683 (owner: 10Paladox) [16:01:36] PROBLEM - Host deployment-memc04 is DOWN: CRITICAL - Host Unreachable (10.68.17.69) [16:02:55] (03PS1) 10Paladox: [Translate] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/251253 [16:04:34] (03PS2) 10Paladox: [Translate] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/251253 [16:05:04] (03CR) 10Paladox: "This can be merged now. Source patch has been merged." [integration/config] - 10https://gerrit.wikimedia.org/r/251253 (owner: 10Paladox) [16:13:37] 5Continuous-Integration-Scaling, 6operations, 7Nodepool: Backport python-shade from debian/testing to jessie-wikimedia - https://phabricator.wikimedia.org/T107267#1785699 (10fgiunchedi) [16:13:39] 5Continuous-Integration-Scaling, 6operations: Backport python-os-client-config 1.3.0-1 from Debian Sid to jessie-wikimedia - https://phabricator.wikimedia.org/T104967#1785696 (10fgiunchedi) 5Open>3Resolved a:3fgiunchedi ok I've uploaded `python-os-client-config` ``` root@carbon:~# reprepro -C backports... [16:14:04] 5Continuous-Integration-Scaling, 6operations, 7Nodepool: Backport python-shade from debian/testing to jessie-wikimedia - https://phabricator.wikimedia.org/T107267#1491090 (10fgiunchedi) all dependencies should be available now internally, please try to backport [16:24:22] PROBLEM - Host deployment-db1 is DOWN: CRITICAL - Host Unreachable (10.68.16.193) [16:29:13] PROBLEM - Host integration-slave-precise-1014 is DOWN: CRITICAL - Host Unreachable (10.68.17.16) [16:35:24] 6Release-Engineering-Team, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1785750 (10BBlack) It's been over a week since the email, which ended up going out a bit later after the releas... [16:36:18] Project beta-update-databases-eqiad build #4166: 04FAILURE in 16 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/4166/ [16:36:28] (03PS1) 10Hashar: debian-glue should use our /etc/pbuilderrc [integration/config] - 10https://gerrit.wikimedia.org/r/251264 [16:39:24] !log deployment-db1 instance is down. I guess beta cluster is dead now. [16:39:28] qa-morebots: poke [16:39:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:39:30] I am a logbot running on tools-exec-1213. [16:39:30] Messages are logged to https://tools.wmflabs.org/sal/releng. [16:39:30] To log a message, type !log . [16:41:10] (03CR) 10Hashar: [C: 032] "That is for operations/debs/nodepool which targets jessie-wikimedia" [integration/config] - 10https://gerrit.wikimedia.org/r/251264 (owner: 10Hashar) [16:42:42] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 1.05 ms [16:42:55] (03Merged) 10jenkins-bot: debian-glue should use our /etc/pbuilderrc [integration/config] - 10https://gerrit.wikimedia.org/r/251264 (owner: 10Hashar) [16:49:22] thcipriani: good morning. I haven't looked at the zuul_uuid thinkg [16:49:42] one sure thing, jenkins shells are isolated :-/ [16:49:54] so if we set uuid somewhere, the next shell command wouldn't know about it [16:49:57] hashar: I'm not sure it'll work after talking with marxarelli so yeah the isolation thing. [16:50:33] the whole point was to namespace the builds when sending them to integration-publisher [16:50:33] is there a pattern for passing dynamic parameters to the next build? [16:50:39] there is probably another way to do it [16:50:53] maybe just use repo name / date / time [16:51:16] that way the id would be determinable [16:51:47] the reason I went with ZUUL_UUID is that it is already available on jobs and let me correlate a folder on integration-publisher with the event that triggered it [16:51:52] was quite useful for the implementation [16:51:58] but now, we can go with something else [16:52:13] I should write on the task [16:55:18] Sorry! This site is experiencing technical difficulties. <-- at beta, I tried to login [16:55:35] and beta is slow [16:56:18] 5Continuous-Integration-Scaling, 6operations, 5Patch-For-Review: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1785779 (10RobH) a:5hashar>3RobH [16:57:32] Luke081515: deployment-db1 is down [16:57:35] that is the master database [16:58:00] hashar: Can someone fix it, or should I create a task? [17:01:26] 10Continuous-Integration-Config, 10Differential, 5Patch-For-Review: Allow `doc-publish` to be run without zuul dependency - https://phabricator.wikimedia.org/T117770#1785791 (10hashar) The change above got reverted. It was setting a UUID variable but shells in Jenkins are isolated and come up with a clean e... [17:01:39] thcipriani: replied https://phabricator.wikimedia.org/T117770#1785791 [17:02:30] hashar: awesome, thank you! I'll try that today. [17:02:35] Luke081515: would you mind helping by filling a task about deployment-db1 being down please ? [17:04:27] hashar: ok, wait a moment [17:04:35] thcipriani: and mukunda suggested creating a copy of the job just for scap [17:04:44] thcipriani: this way if something screw up it has barely any impact. [17:05:16] be extra careful with the rsync between gallium - integration-publisher. An unset variable might end up killing the whole tree on /srv/org/wikimedia/doc/ :/ [17:05:39] done [17:05:57] T117881 [17:05:58] hasharmeeting: yup, that's the plan for testing: make a "new" job for downstream. [17:06:26] Luke081515: awesome thanks. Going to ping some ops list in #wikimedia-labs [17:07:03] 10Beta-Cluster-Infrastructure, 7Database: deployment-db1 is down - https://phabricator.wikimedia.org/T117881#1785817 (10Luke081515) 3NEW [17:07:14] hm, wikibugs is slow [17:07:22] might be related [17:08:20] some labvirt** has some issue [17:08:27] that might impact nodepool instances [17:08:30] :-( [17:11:55] hasharmeeting: yeah, horizon seems to be on the fritz as well [17:12:01] libvirt issues would explain that [17:12:32] 10Beta-Cluster-Infrastructure, 7Database: deployment-db1 is down - https://phabricator.wikimedia.org/T117881#1785830 (10coren) labvirt1002 seems ill; not sure why yet. Probably affected instances are: ```| e68c80c9-e6de-4be7-83b8-e2b04fc89308 | accounts-application3 | labvirt100... [17:14:32] marxarelli: yeah mentioned that to -labs [17:14:48] marxarelli: also I replied earlier about scap doc / zuul uuid on https://phabricator.wikimedia.org/T117770#1785791 [17:14:48] PROBLEM - Host deployment-mathoid is DOWN: CRITICAL - Host Unreachable (10.68.17.222) [17:14:49] 10Beta-Cluster-Infrastructure, 6Labs, 10Tool-Labs, 7Database: deployment-db1 is down - https://phabricator.wikimedia.org/T117881#1785848 (10valhallasw) [17:16:28] 10Beta-Cluster-Infrastructure, 6Labs, 10Tool-Labs, 7Database: Several hosts on virt1002 are down - https://phabricator.wikimedia.org/T117881#1785861 (10valhallasw) [17:23:39] 10Gitblit-Deprecate, 10Diffusion: redirect gerrit repo paths to diffusion callsigns - https://phabricator.wikimedia.org/T110607#1785892 (10mmodell) a:3mmodell [17:23:56] 10Gitblit-Deprecate, 10Diffusion: redirect gerrit repo paths to diffusion callsigns - https://phabricator.wikimedia.org/T110607#1582193 (10mmodell) p:5Triage>3High [17:24:37] 6Release-Engineering-Team, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1785897 (10BBlack) 5Open>3Resolved a:3BBlack [17:47:13] RECOVERY - Host deployment-db1 is UP: PING OK - Packet loss = 0%, RTA = 453.13 ms [17:48:05] RECOVERY - Host deployment-cache-upload04 is UP: PING OK - Packet loss = 0%, RTA = 0.57 ms [17:49:38] RECOVERY - Host deployment-mathoid is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms [17:51:16] RECOVERY - Host integration-slave-precise-1014 is UP: PING OK - Packet loss = 0%, RTA = 1.78 ms [17:53:20] RECOVERY - Host deployment-memc04 is UP: PING OK - Packet loss = 0%, RTA = 0.97 ms [18:10:41] !log Running mysqlcheck to verify databases on deployment-db1 after https://phabricator.wikimedia.org/T117881 [18:10:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:15:20] 10Beta-Cluster-Infrastructure, 6Labs, 10Tool-Labs, 7Database: Several hosts on virt1002 are down - https://phabricator.wikimedia.org/T117881#1786036 (10dduvall) `deployment-db1` is back up and replication to `deployment-db2` is running. [18:17:02] hey! if I have a new repo with python stuff, and I want to add generic python checker to it, how do I do that? [18:18:32] SMalyshev: we use tox to run everything [18:18:48] SMalyshev: https://www.mediawiki.org/wiki/Continuous_integration/Tutorials/Test_your_python [18:19:56] legoktm: thanks! [18:21:45] legoktm: so I don't need to do any config changes, as soon as the repo has the files it will run the tests? [18:21:56] or I still need to patch the yaml? [18:23:07] Yippee, build fixed! [18:23:08] Project beta-update-databases-eqiad build #4168: 09FIXED in 3 min 6 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/4168/ [18:23:17] !log All deployment-db1 tables appear OK [18:23:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:28:02] SMalyshev: there's a 3 line change to zuul/layout.yaml that needs to be done, I'll update the wiki page in a few minutes [18:28:24] legoktm: thank you! [18:28:48] legoktm: ah, I think I got it, https://www.mediawiki.org/wiki/Continuous_integration/Tutorials/Test_your_python#Editing_Zuul_configuration [18:28:52] righit? [18:28:57] *right [18:29:00] no, that's outdated too :P [18:29:04] :S [18:29:41] heh [18:34:17] SMalyshev, ebernhardson: https://www.mediawiki.org/wiki/Continuous_integration/Tutorials/Test_your_python#Editing_Zuul_configuration [18:34:20] much simpler now! [18:39:15] 10Beta-Cluster-Infrastructure, 10Deployment-Systems, 5Patch-For-Review, 15User-bd808: scap on beta does not sync deployment-bastion /srv/mediawiki - https://phabricator.wikimedia.org/T117574#1786282 (10demon) [18:42:44] legoktm: cool, thanks! [18:46:30] Yippee, build fixed! [18:46:31] Project UploadWizard-api-commons.wikimedia.beta.wmflabs.org build #2904: 09FIXED in 17 sec: https://integration.wikimedia.org/ci/job/UploadWizard-api-commons.wikimedia.beta.wmflabs.org/2904/ [18:51:09] 10Gitblit-Deprecate, 10Diffusion: Replicate open patchsets to diffusion - https://phabricator.wikimedia.org/T89940#1786308 (10Paladox) Please see https://secure.phabricator.com/T4292 [18:58:51] (03PS1) 10Smalyshev: Add wikimedia/discovery/relevancylab project with python tests [integration/config] - 10https://gerrit.wikimedia.org/r/251305 [19:10:51] (03PS2) 10Legoktm: Add wikimedia/discovery/relevancylab project with python tests [integration/config] - 10https://gerrit.wikimedia.org/r/251305 (owner: 10Smalyshev) [19:11:09] (03CR) 10Legoktm: [C: 032] "PS2: I moved the repository up with the other wikimedia/* ones in the file" [integration/config] - 10https://gerrit.wikimedia.org/r/251305 (owner: 10Smalyshev) [19:15:11] (03Merged) 10jenkins-bot: Add wikimedia/discovery/relevancylab project with python tests [integration/config] - 10https://gerrit.wikimedia.org/r/251305 (owner: 10Smalyshev) [19:16:36] !log deploying https://gerrit.wikimedia.org/r/251305 [19:16:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:17:16] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [19:17:19] SMalyshev: ^^ [19:17:50] legoktm: thanks! [19:37:33] (03PS1) 10Legoktm: Create and use tox-jessie zuul template [integration/config] - 10https://gerrit.wikimedia.org/r/251313 [19:39:24] (03CR) 10Legoktm: [C: 032] "no-op" [integration/config] - 10https://gerrit.wikimedia.org/r/251313 (owner: 10Legoktm) [19:43:26] (03Merged) 10jenkins-bot: Create and use tox-jessie zuul template [integration/config] - 10https://gerrit.wikimedia.org/r/251313 (owner: 10Legoktm) [19:44:23] !log deploying https://gerrit.wikimedia.org/r/251313 [19:44:29] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:17:50] 10Deployment-Systems, 6operations, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1786577 (10demon) [20:17:52] 10Deployment-Systems, 3Scap3, 5Patch-For-Review: [scap] Add support for syncing /srv/mediawiki-staging including fully working git data to warm spare deploy server - https://phabricator.wikimedia.org/T104826#1786575 (10demon) 5Open>3Resolved a:3demon [20:36:36] 10Deployment-Systems, 6operations, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1786622 (10demon) Anything left on this? [20:59:14] 10Deployment-Systems, 3Scap3, 5Patch-For-Review: [scap] Add support for syncing /srv/mediawiki-staging including fully working git data to warm spare deploy server - https://phabricator.wikimedia.org/T104826#1786695 (10bd808) Thanks for carrying this across the finish line @demon. :) [21:35:13] Could someone help me to fix echo in metrolook. A year ago it was split out of personal bar in metrolook and its location is to the left off it. Recent updates to echo seem to stop it working with the code. Code looks like https://dpaste.de/LgeO and repo located at https://git.wikimedia.org/summary/mediawiki%2Fskins%2FMetrolook [21:35:30] 10MediaWiki-Releasing, 6Release-Engineering-Team, 10Architecture, 10Parsoid, and 2 others: Evaluate and decide on a distribution strategy targeted at VMs - https://phabricator.wikimedia.org/T87774#1786827 (10cscott) @Nemo_bis [suggested](https://phabricator.wikimedia.org/T113210#1768956) that {T114457} mig... [21:35:50] 6Release-Engineering-Team, 6operations, 7Database, 5Patch-For-Review, 7WorkType-Maintenance: Recover missing values from user_properties tables - https://phabricator.wikimedia.org/T114899#1786833 (10Ciencia_Al_Poder) 5Resolved>3declined [21:43:03] hello people [21:43:08] I have a scap3 question! [21:43:24] mostly about wether I can start using it for arbitrary things (in labs) already [21:43:26] or if I should wait [21:43:29] and if I can, if there are docs [21:44:09] YuviPanda: we've been working on docs this past week or so [21:44:23] Also, yes we could set it up for a project. [21:44:23] https://doc.wikimedia.org/mw-tools-scap/ [21:44:32] * YuviPanda clicks [21:44:50] doh. it's not up-to-date [21:44:57] well, mostly [21:45:19] marxarelli: I can rerun the update manually. [21:45:27] can I run it locally? [21:45:38] like if I have only one host to deploy to I don't want to setup a tin-like different host... [21:46:13] thcipriani: i got it [21:46:42] btw, did you see hashar's comment about factoring out ZUUL_UUID? [21:46:59] marxarelli: yeah, seems like a good workaround [21:47:37] oh shit. i broke the doc build by adding fontconfig to requirements.txt [21:47:42] time to submite a patch ... [21:47:44] "It used to mean, “Sync Common All PHP.” Now, it doesn’t make sense." [21:47:50] I love that one [21:47:53] +1 [21:48:34] YuviPanda: it currently depends on your repo being accessible via http://config['deploy_host']/[repo] it also expects the root of [repo] to be where you're deploying from. [21:48:43] Once we merge scap3 and scap codebases to do the same backend stuff, we'll alias scap to deploy for old time sake [21:48:56] ah, so it needs http? [21:49:05] which means I can't really just use it to replace fabric :( [21:49:09] indeed. there's a ticket about removing the external dependency. [21:49:10] but that's ok I guess. [21:49:18] oh? [21:49:20] Yeah what thcipriani said, make it self-hosted [21:49:23] With like twisted or something [21:49:40] https://phabricator.wikimedia.org/T116630 [21:50:25] but even then - if I'm running this locally, it can't access http on my local host... [21:50:34] I wouldn't say this is something coming in the short term. [21:51:06] YuviPanda: we'd talked about using a tunnel on the existing ssh connection [21:51:16] ah interesting [21:51:27] yeah I agree this doesn't look like it should be a priority :) [21:51:41] I'm just slightly pissed at fabric for having an ancient ssh implementation >_> [21:53:48] https://phabricator.wikimedia.org/D37 fixes the fontconfig issue [21:54:31] thcipriani, ostriches ^ [21:54:39] accepted [21:54:40] looking [21:54:41] ah [21:54:58] yup, worked for me too [21:55:37] For some reason this made me laugh... [21:55:42] chad@notsexy /a/ops/mediawiki-config (master)$ brew search tox [21:55:42] detox [21:56:09] * YuviPanda detoxes ostriches from all the brew he has consumed [21:56:16] (03PS1) 10Ori.livneh: Configure thumbor/conditional-sharpen and thumbor/purger [integration/config] - 10https://gerrit.wikimedia.org/r/251418 [21:56:45] We should sort out having a bot comment when things merge to scap. [21:56:57] indeed. [21:57:02] (03CR) 10Gilles: [C: 031] Configure thumbor/conditional-sharpen and thumbor/purger [integration/config] - 10https://gerrit.wikimedia.org/r/251418 (owner: 10Ori.livneh) [21:57:22] crap. "--enable-jpeg requested but jpeg not found, aborting" [21:59:46] legoktm: do you per chance have any idea what's wrong here: https://integration.wikimedia.org/ci/job/tox-jessie/1309/console ? [21:59:55] it's for https://gerrit.wikimedia.org/r/#/c/251302/ and it does have tox.ini [22:00:01] ostriches: is there a phab event that can be seen on merge or do you just need a post-update hook or something? [22:02:11] SMalyshev: 1309 is associated with https://gerrit.wikimedia.org/r/#/c/251177/ which didn't? [22:02:31] 10Beta-Cluster-Infrastructure, 6Labs, 10Tool-Labs, 7Database: Several hosts on virt1002 are down - https://phabricator.wikimedia.org/T117881#1786928 (10yuvipanda) 5Open>3Resolved a:3yuvipanda It just ran out of space again. [22:02:47] 10Beta-Cluster-Infrastructure, 6Labs, 10Tool-Labs, 7Database: Several hosts on virt1002 are down - https://phabricator.wikimedia.org/T117881#1786931 (10yuvipanda) (some instances were migrated off, has space now, and there's a critical alert for disk space on these hosts) [22:02:57] legoktm: ah, hm, maybe I mis-matched the jobs? [22:03:12] now at least I see what is broken... [22:03:18] https://gerrit.wikimedia.org/r/#/c/251302/ is failing for other reasons [22:11:33] twentyafterfour, thcipriani, ostriches: so i guess we need libjpeg-dev on ci instances to build the docs now [22:11:48] on account of blockdiag [22:11:58] yeah, that was my experience with it locally too [22:12:17] heh... I would have thought it already existed since we have other blockdiag stuff [22:13:05] it's the other thing, actdiag [22:14:54] bd808: I'd hope we could notify on change close but cc twentyafterfour [22:16:08] ostriches: bd808: in what sense do you want to be notified? like the phab email/in-app notifications? [22:16:48] hello folks .. anyone know what is up with this php parser tests jenkins failures ... https://integration.wikimedia.org/ci/job/parsoidsvc-php-parsertests/6218/console ? [22:17:24] ostriches: bd808: https://phabricator.wikimedia.org/settings/panel/emailpreferences/ [22:17:54] "[notify] (..when) a revision is closed" [22:18:01] IRC [22:18:03] Like we do for tasks [22:18:30] wikibugs needs to be modified. It filters out all the differential activity [22:19:00] We got a task for that? [22:19:04] Really we should be using http hooks instead of polling phabricator anyway... [22:19:12] That [22:19:26] https://phabricator.wikimedia.org/T116330 [22:19:31] subbu: that looks related to one of Aaron's recent patches [22:19:47] \o/ [22:20:11] 10Differential, 5Gerrit-Migration, 10Wikibugs: Broadcast Differential activity to IRC - https://phabricator.wikimedia.org/T116330#1787017 (10mmodell) wikibugs code is scary. that is all. [22:20:22] legoktm, i see. something that one of you guys can help fix? :) [22:21:05] 10Differential, 5Gerrit-Migration, 10Wikibugs: Broadcast Differential activity to IRC - https://phabricator.wikimedia.org/T116330#1787023 (10demon) >>! In T116330#1747176, @greg wrote: > Also, fwiw, you can still use Gerrit for scap dev, the scap3 team is just dogfooding differential to find the things we do... [22:43:41] 10MediaWiki-Releasing, 6Release-Engineering-Team, 10Architecture, 10Parsoid, and 2 others: Evaluate and decide on a distribution strategy targeted at VMs - https://phabricator.wikimedia.org/T87774#1787061 (10GWicke) @cscott, tcp ports can be bound to localhost. All of the most important services support th... [22:45:18] (03PS2) 10Legoktm: Configure thumbor/conditional-sharpen and thumbor/purger [integration/config] - 10https://gerrit.wikimedia.org/r/251418 (owner: 10Ori.livneh) [22:46:57] (03CR) 10Legoktm: [C: 032] "PS2: Used zuul tox-jessie template" [integration/config] - 10https://gerrit.wikimedia.org/r/251418 (owner: 10Ori.livneh) [22:47:12] PROBLEM - Puppet failure on integration-slave-precise-1013 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [22:47:12] PROBLEM - Puppet failure on integration-slave-trusty-1017 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [22:48:38] (03Merged) 10jenkins-bot: Configure thumbor/conditional-sharpen and thumbor/purger [integration/config] - 10https://gerrit.wikimedia.org/r/251418 (owner: 10Ori.livneh) [22:49:57] thcipriani: can i steal the docs publisher bug or are you working on it? [22:50:00] !log deploying https://gerrit.wikimedia.org/r/251418 [22:50:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:50:29] marxarelli: I was digging back in...wanna pair on it? [22:50:53] thcipriani: yeah, that works. though i'm at the office so hangouts are more difficult [22:51:03] but we can irc [22:51:17] kk, I'll msg you. [22:51:53] hold on one sec, maybe one of these rooms is free [23:02:09] RECOVERY - Puppet failure on integration-slave-trusty-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [23:02:09] RECOVERY - Puppet failure on integration-slave-precise-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [23:14:01] stupid offices not having offices [23:14:28] you can blame me, I was talking about rsi/carpel tunnel next to dan probably distracting him but he was too nice to complain [23:22:56] (03CR) 10Alex Monk: [C: 032] Add QuickSurveys branch [tools/release] - 10https://gerrit.wikimedia.org/r/251144 (https://phabricator.wikimedia.org/T110661) (owner: 10Jdlrobson) [23:23:34] (03Merged) 10jenkins-bot: Add QuickSurveys branch [tools/release] - 10https://gerrit.wikimedia.org/r/251144 (https://phabricator.wikimedia.org/T110661) (owner: 10Jdlrobson) [23:28:12] PROBLEM - Puppet failure on integration-slave-trusty-1017 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [23:28:12] PROBLEM - Puppet failure on integration-slave-precise-1013 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [23:31:51] 10Differential, 5Gerrit-Migration, 10Wikibugs: Broadcast Differential activity to IRC - https://phabricator.wikimedia.org/T116330#1787193 (10Legoktm) I started looking at adding differential support to wikibugs, the main blocker is a lack of transaction data which is currently provided by the `maniphest.gett... [23:32:34] PROBLEM - Puppet failure on integration-slave-trusty-1023 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [23:35:34] PROBLEM - Puppet failure on integration-slave-trusty-1014 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [23:38:14] PROBLEM - Puppet failure on integration-lightslave-jessie-1002 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [23:39:51] PROBLEM - Puppet failure on integration-slave-trusty-1011 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [23:41:47] PROBLEM - Puppet failure on integration-slave-precise-1011 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [23:42:45] PROBLEM - Puppet failure on integration-slave-precise-1012 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [23:47:21] PROBLEM - Puppet failure on integration-slave-trusty-1015 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [23:54:52] PROBLEM - Puppet failure on pmcache is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [23:55:40] PROBLEM - Puppet failure on integration-slave-precise-1014 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [23:56:14] PROBLEM - Puppet failure on integration-slave-trusty-1016 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [23:57:12] PROBLEM - Puppet failure on integration-slave-trusty-1013 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [23:58:12] PROBLEM - Puppet failure on integration-slave-trusty-1012 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]