[01:16:39] 10Continuous-Integration-Infrastructure, 10pywikibot-core, 13Patch-For-Review, 07Pywikibot-tests: Add pypy to CI build machines - https://phabricator.wikimedia.org/T134235#2262290 (10jayvdb) Confirmed. pyflakes on pypy now passes on that patch. [02:09:21] SMalyshev: I was distracted with trying to fix deployment-tin earlier, I'm sorry that I never replied to your second question. I'm really not sure how to make it install from backports. I would assume that is intentionally overridden by the apt preferences that I mentioned. Removing wikimedia.pref might do the trick but puppet will likely put it back. [02:45:35] RECOVERY - Puppet run on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:12:52] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #1064: 04FAILURE in 30 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/1064/ [03:42:51] PROBLEM - Puppet run on deployment-mediawiki01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [04:22:58] RECOVERY - Puppet run on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [05:17:20] twentyafterfour: thanks, I think I found the way: apt::pin seems to work [05:37:45] 10Deployment-Systems, 03Scap3: Rebuild Scap Debian package - https://phabricator.wikimedia.org/T134338#2262510 (10thcipriani) [06:15:43] (03PS1) 10Madhuvishy: Add partial support for maven-release-plugin [integration/jenkins-job-builder] - 10https://gerrit.wikimedia.org/r/286788 (https://phabricator.wikimedia.org/T132175) [06:22:22] (03PS2) 10Madhuvishy: Add partial support for maven-release-plugin [integration/jenkins-job-builder] - 10https://gerrit.wikimedia.org/r/286788 (https://phabricator.wikimedia.org/T132175) [06:56:55] (03PS3) 10Madhuvishy: Add partial support for maven-release-plugin [integration/jenkins-job-builder] - 10https://gerrit.wikimedia.org/r/286788 (https://phabricator.wikimedia.org/T132175) [07:00:45] PROBLEM - Puppet run on deployment-db2 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:08:30] 05Gitblit-Deprecate, 06Release-Engineering-Team, 10Diffusion, 07WorkType-NewFunctionality: Use Diffusion as canonical location for browsing code repos (not gitblit) - https://phabricator.wikimedia.org/T752#2262600 (10Nemo_bis) Cf. https://www.mediawiki.org/w/index.php?title=Special:LinkSearch&limit=500&off... [07:10:07] 05Gitblit-Deprecate, 10Diffusion: Update mediawiki.org templates to link to Diffusion, not gitblit - https://phabricator.wikimedia.org/T108864#2262602 (10Nemo_bis) [07:10:10] 05Gitblit-Deprecate, 06Release-Engineering-Team, 10Diffusion, 07WorkType-NewFunctionality: Use Diffusion as canonical location for browsing code repos (not gitblit) - https://phabricator.wikimedia.org/T752#2262601 (10Nemo_bis) [07:11:41] 05Gitblit-Deprecate, 10Diffusion: Update mediawiki.org templates to link to Diffusion, not gitblit - https://phabricator.wikimedia.org/T108864#1532501 (10Nemo_bis) Special:LinkSearch seems to indicate that the description is outdated. [07:26:33] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:27:39] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:47:48] beta wp is down [07:55:00] HHVM locked up; restarted it. [08:03:56] still down thouhg [08:18:43] 05Gerrit-Migration, 10Differential: Align Differential with Wikimedia workflow (instead of aligning Wikimedia workflow with Differential) - https://phabricator.wikimedia.org/T130094#2262720 (10Aklapper) [08:22:39] hashar: for a nice bonjour - beta wp is down [08:28:11] mobrovac: bonjour! [08:28:15] mobrovac: is there a task ? ;-} [08:28:19] nope [08:28:27] started happening 30 mins ago [08:28:34] ori restarted HHVM, but no luck [08:29:43] 10Beta-Cluster-Infrastructure: whole beta cluster is unreachable - https://phabricator.wikimedia.org/T134346#2262742 (10hashar) [08:29:56] mobrovac: I have filled a dummy task :-} [08:30:55] hehehe [08:30:59] thnx hashar! [08:31:04] will you investigate? [08:31:10] dcausse: good morning! are you looking at the beta cluster? I noticed you are logged on deployment-cache-text04 [08:31:13] mobrovac: yes [08:31:38] hashar: yes I was curious to see if the issue is the same we had last time [08:31:44] seems to be the varnish being crazy somehow [08:32:03] last time it was because varnish created tons of socket that were not properly closed [08:32:33] but today I can't get the output for sudo lsof | wc -l :/ [08:32:58] and puppet is broken due to a known bug :( [08:33:20] sudo lsof -n | wc -l [08:33:22] 3705818 [08:33:24] !log fixed puppet on deployment-cache-text04 (race condition generating puppet.conf ) [08:33:28] eek [08:33:29] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:33:32] so varnish being crazy ? [08:33:38] yes I suppose :/ [08:33:41] a leak somewhere [08:33:54] and it shows high cpu [08:34:02] last time gehel restarted varnish frontend and it fixed the issue [08:34:16] I wonder if we can get traces for the varnish guru to investigate [08:34:36] I can keep the lsof dump if that helps [08:35:07] that is apparently the frontend varnish being stuck somehow [08:35:12] it shows high cpu [08:35:26] PID 26670 , might want a lsof of that specific process [08:36:10] capturing [08:36:12] * gehel reading back.. [08:37:06] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [08:37:16] https://phab.wmfusercontent.org/file/data/zgjt54hin4dczrsd5mto/PHID-FILE-vnyb5cu67kjqxmmd3xr3/lsof_varnish_frontend_%28beta_cluster%29 [08:37:40] it has a shit lot of CLOSE_WAIT :D [08:37:55] yes looks like exactly the same issue we had [08:38:13] netstat does not report the same amount of CLOSE_WAIT as lsof [08:38:36] so probably a bug in varnish [08:38:53] 10Beta-Cluster-Infrastructure: whole beta cluster is unreachable - https://phabricator.wikimedia.org/T134346#2262764 (10hashar) p:05Triage>03Unbreak! @dcausse @gehel looking at it as well. The varnish frontend shows high CPU usage. David captured lsof output: {F3965411} That shows a lot of CLOSE_WAIT. [08:39:04] CLOSE _WAIT should not take up resources... probably an indirect consequence [08:39:58] Last time, we did generate a lot of traffic, which was probably the cause of those issues... [08:41:10] bunch are deployment-cache-upload-04:80 -> deployment-restbase02:X [08:42:06] 10Beta-Cluster-Infrastructure: whole beta cluster is unreachable - https://phabricator.wikimedia.org/T134346#2262780 (10hashar) CLOSE_WAIT are mostly: ``` deployment-cache-upload-04:80 -> deployment-restbase02:X 127.0.0.1:X->127.0.0.1:3128 ``` Might just be a consequence. [08:42:43] ideally a smart one will hook gdb to get some kind of trace [08:42:48] but that is really beyond my knowledge :-} [08:42:56] maybe a strace can help somehow [08:45:50] 10Beta-Cluster-Infrastructure: whole beta cluster is unreachable - https://phabricator.wikimedia.org/T134346#2262782 (10dcausse) full lsof dump on deployment-cache-text04:/home/dcausse/lsof_all_T134346.gz (37M compressed) [08:46:15] 10Beta-Cluster-Infrastructure: whole beta cluster is unreachable - https://phabricator.wikimedia.org/T134346#2262783 (10hashar) Random varnishlog for the backend: ``` varnishlog -n deployment-cache-text04 0 Debug - "VCL_error(200, OK)" 17 SessionOpen c 127.0.0.1 8402 :3128 17 ReqStart c 12... [08:48:10] fyi, there seem to be some wikidata updates going on in beta [08:48:18] saw them right when beta stopped working [08:48:52] hashar: those conns to deployment-restbase02 are because of dependency updates triggered by the wikidata updates ^ [08:48:54] I am trying a strace [08:48:59] If I remember correctly, last time we had close to 64K sockets in TIME_WAIT, this time I see around 1K [08:49:44] gehel: reported by netstat? [08:49:54] yes [08:50:07] there are more than 3M lines reported lsof [08:50:33] the dump is a 37M gzip :) [08:51:05] * gehel needs to understand better how lsof works... [08:51:35] it is crippled with so many features :-) [08:51:42] I did a strace -f on the frontend varnish [08:51:51] and cancelling it spurts a spam of "Process X detached" [08:52:05] seems it has a bunch of childs [08:52:10] gehel: the best I found was: http://serverfault.com/questions/153983/sockets-found-by-lsof-but-not-by-netstat [08:52:36] looks like a socket created but connect() was never called [08:52:47] so leaking a file descriptor resource [08:52:50] ps -p 26670 H|wc -l [08:52:50] 1049 [08:52:54] that is the # of threads [08:54:33] 10Beta-Cluster-Infrastructure: whole beta cluster is unreachable - https://phabricator.wikimedia.org/T134346#2262791 (10hashar) The frontend process has a lot of threads (?): ``` # ps -p 26670 H|wc -l 1049 ``` Which look like: ``` 26670 ? Sl 0:00 /usr/sbin/varnishd -P /run/varnish-frontend.pid -a :80... [08:54:41] I am copy pasting to the task in case it can help later on [08:54:54] I have to run a quick errand, be back as soon as I can... [08:56:15] gehel: take your time [08:56:19] it is just beta after all :-} [08:56:57] and it's not like there is much I can do / know that you don't ... [08:56:58] 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic, 07Varnish: varnish text on beta is unreachable / stuck - https://phabricator.wikimedia.org/T134346#2262798 (10hashar) [08:57:53] hashar: do you have a few times to give me feedback on "Ownership of Selenium tests" e-mail sent to qa mailing list? [08:58:07] also at https://etherpad.wikimedia.org/p/T128190 [08:58:19] I would like to send it to engineering and wikitech lists [09:01:01] zeljkof: I am busy looking at the varnish issue ongoing on beta. [09:01:05] zeljkof: but yeah will look after ;} [09:01:22] mobrovac: and on restbase02 most of the requests come from deployment-changeprop [09:01:36] hashar: yes, dependecy updates [09:01:56] hashar: great, thanks, this is not urgent [09:02:00] mobrovac: can you pause the updater / or rate limit somehow? [09:02:10] zeljkof: the email is pretty good [09:02:20] hashar: yes, stopping now [09:02:22] mobrovac: thanks [09:02:46] hashar: done [09:03:48] zeljkof: just maybe don't explicitly say you'll take ownership, as people might stop reading after that :) [09:04:10] rather, just say those would eventually be permanently disabled [09:04:11] so change prop has way less ESTABLISHED connections [09:04:21] which were to restbase02 [09:04:25] PROBLEM - Puppet run on integration-slave-trusty-1024 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [09:04:44] hashar: way less than what? [09:05:10] change prop had a lot of 10.68.16.88:40514 10.68.17.189:7231 ESTABLISHED [09:05:15] mobrovac: good point, I wanted to say I can take ownership while jobs are green, but as soon as they need debugging I will probably not have the time to do it [09:05:23] which are from change prop to restbase02 port 7231 [09:05:42] yes hashar [09:05:50] they are now in TIME_WAIT [09:06:04] waiting on a timeout [09:06:26] 05Gitblit-Deprecate, 10Diffusion, 13Patch-For-Review: Replicate open patchsets to diffusion - https://phabricator.wikimedia.org/T89940#2262839 (10Ricordisamoa) >>! In T89940#2249117, @mmodell wrote: >>>! In T89940#2249071, @Ricordisamoa wrote: >> Also when I list my commits all sorts of intermediate patch se... [09:06:32] restbase02 has less connections as well now [09:06:51] and the cache-text04 has less as well. Maybe it is back [09:06:55] i can restart restbase there [09:07:01] to clear it [09:07:13] mobrovac: reworded it: "Jobs for repositories without contact person will be running while they are passing, and will be deleted when they start failing." [09:07:27] zeljkof: much better! [09:08:29] PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:09:40] the varnish frontend cache still has 3900 connections [09:09:45] most in CLOSE_WAIT [09:13:07] maybe because restbase did not cleanly ack the the tcp connection termination [09:13:32] 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic, 13Patch-For-Review, 07Varnish: Beta cluster varnish sets overly broad domain on GeoIP cookie - https://phabricator.wikimedia.org/T133936#2262859 (10Aklapper) [09:16:40] ok, i'll change restbase not to go through varnish in beta, but contact the mw host directly [09:17:25] zeljkof: looks accurate. did a few changes [09:17:33] mobrovac: or have it throttled ? [09:17:42] mobrovac: how is that going to be handled in prod? [09:18:05] hashar: in prod restbase already bypasses varnish [09:20:03] 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic: varnish text on beta is unreachable / stuck - https://phabricator.wikimedia.org/T134346#2262964 (10hashar) The flow of traffic comes from deployment-changeprop, which hit restbase02 and then the text cache frontend. Marko has stopped the change prop pr... [09:20:50] dcausse: gehel: mobrovac: I guess I will just restart the varnish frontend to flush the CLOSE_WAIT tcp connection and see whether that resumes it [09:21:02] +1 [09:21:04] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #800: 04FAILURE in 3.6 sec: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/800/ [09:21:55] hashar: +1, I think it's pretty easy to reproduce the issue by just flooding varnish with ab [09:23:51] trying systemctl restart varnish-frontend.service [09:23:53] bah [09:26:25] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 43317 bytes in 1.053 second response time [09:26:38] \o/ [09:26:39] \o/ [09:26:40] \O/ [09:27:06] !log deployment-cache-text04 systemctl stop varnish-frontend.service . To clear out all the stuck CLOSE_WAIT connections T134346 [09:27:08] T134346: varnish text on beta is unreachable / stuck - https://phabricator.wikimedia.org/T134346 [09:27:11] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:27:25] cat /proc/sys/net/ipv4/tcp_keepalive_time [09:27:25] 300 [09:27:37] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 29875 bytes in 4.648 second response time [09:27:40] not sure whether varnish or the kernel would have disposed them [09:27:52] the kernel usually does that [09:30:12] 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic: varnish text on beta is unreachable / stuck - https://phabricator.wikimedia.org/T134346#2263008 (10hashar) p:05Unbreak!>03Normal So the root cause is apparently change prop sending too many updates that ends up overloading the varnish text frontend.... [09:30:15] so it is back [09:30:27] I am leaving the task open in case the varnish gurus want to have a look at it [09:30:40] hashar: thanks for the edits, will wait a bit more if anybody else leaves feedback and will send it later today [09:30:53] zeljkof: just send it as is imho [09:31:06] might want to eventually re confirm with every point of contacts whether they care of the jobs or not [09:35:46] 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic: varnish text on beta is unreachable / stuck - https://phabricator.wikimedia.org/T134346#2262742 (10mobrovac) >>! In T134346#2263008, @hashar wrote: > So the root cause is apparently change prop sending too many updates that ends up overloading the varni... [09:44:22] RECOVERY - Puppet run on integration-slave-trusty-1024 is OK: OK: Less than 1.00% above the threshold [0.0] [09:55:07] Yippee, build fixed! [09:55:08] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #801: 09FIXED in 23 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/801/ [09:58:11] 10Deployment-Systems, 06Release-Engineering-Team, 03Scap3, 06Operations: setup automatic deletion of old l10nupdate - https://phabricator.wikimedia.org/T130317#2263057 (10ori) p:05Normal>03High @mmodell, blocking this on porting l10nupdate to scap doesn't seem reasonable. Could you simply make pruning... [10:03:03] I have a patch that was building properly yesterday but today it fails (https://gerrit.wikimedia.org/r/#/c/284918/) [10:03:21] I don't see any obvious errors from jenkins logs except maybe: Error: your composer.lock file is not up to date. Run "composer update" to install newer dependencies [10:07:26] dcausse: looking [10:07:51] oojs/oojs-ui: 0.17.0 installed, 0.17.1 required. [10:10:23] mediawiki/vendor has 0.17.1 [10:10:32] and https://gerrit.wikimedia.org/r/#/c/286767/ [10:10:40] but mediawiki/core still references 0.17.0 [10:11:04] oh [10:11:54] oh that is es2.x branch [10:12:05] can it be related to T90303 ? This patch is on a branch [10:12:05] T90303: Fetch dependencies using composer instead of cloning mediawiki/vendor for non-wmf branches - https://phabricator.wikimedia.org/T90303 [10:13:19] maybe I need to rebase my branch because of core dep updates? [10:13:26] dcausse: your CirrusSearch patch targets es2.x branch [10:13:40] and the job (via zuul-cloner) does checkout es2.x branch of mediawiki/vendor [10:13:49] so if you bump oojs-ui in mediawiki/vendor@es2.x that should fix it [10:14:06] ah thanks! will have a look [10:14:24] dcausse: the zuul-cloner process that clones the various repositories attempt to match branches [10:14:42] makes sense,thanks [10:14:44] hashar: But woulden it fall baxck to master if it carnt find es2.x branch. [10:17:01] dcausse: you can propose the change to mediawiki/vendor@es2.x then update the CirrusSearch change with a Depends-On and then it should work ™ [10:17:23] paladox: yeah it would fallback to master. But in this case mediawiki/vendor has an es2.x branch so zuul-cloner happily check it out [10:17:26] hashar: got it, will do. Thanks! :) [10:17:34] Oh [10:17:47] which is definitely entirely confusing [10:17:56] but has some good use cases to ensure everything is consistent [10:18:23] dcausse https://github.com/wikimedia/mediawiki-vendor/tree/es2.x [10:21:00] hashar ^^ [10:21:44] paladox: oh that's right this patch should include the oojui update? [10:21:59] dcausse Yep. [10:22:34] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:22:40] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:23:07] dcausse, you would need to backport https://gerrit.wikimedia.org/r/#/c/286770/ to that branch. [10:27:22] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 43315 bytes in 0.794 second response time [10:27:30] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 42946 bytes in 0.721 second response time [10:27:33] RECOVERY - Puppet staleness on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [3600.0] [10:29:26] paladox: thanks! will try [10:30:38] Ok [10:33:39] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:38:31] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 42956 bytes in 3.166 second response time [10:42:40] 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362#2263236 (10hashar) [10:43:09] 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362#2263249 (10hashar) ``` # grep -n spdy /etc/nginx/sites-enabled/unified 6: listen [::]:443 default_server def... [10:44:24] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic, 13Patch-For-Review, and 2 others: Fix scenarios that fail at en.wikipedia.beta.wmflabs.org or do not run them daily - https://phabricator.wikimedia.org/T94150#2263263 (10zeljkofilipin) a:03zeljkofilipin [10:44:46] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic, 13Patch-For-Review, and 2 others: Fix scenarios that fail at en.wikipedia.beta.wmflabs.org or do not run them daily - https://phabricator.wikimedia.org/T94150#2028481 (10zeljkofilipin) [10:44:58] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic, 13Patch-For-Review, and 2 others: Fix scenarios that fail at en.wikipedia.beta.wmflabs.org or do not run them daily - https://phabricator.wikimedia.org/T94150#2030888 (10zeljkofilipin) [10:45:05] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic, 13Patch-For-Review, and 2 others: Fix scenarios that fail at en.wikipedia.beta.wmflabs.org or do not run them daily - https://phabricator.wikimedia.org/T94150#2031852 (10zeljkofilipin) [10:48:10] !log Manually fixing nginx upgrade on deployment-cache-text04 and deployment-cache-upload04 see T134362 for details [10:48:11] T134362: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362 [10:48:15] qa-morebots: ping [10:48:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:48:16] I am a logbot running on tools-exec-1220. [10:48:16] Messages are logged to https://tools.wmflabs.org/sal/releng. [10:48:16] To log a message, type !log . [10:48:38] 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic, 07WorkType-Maintenance: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362#2263270 (10hashar) 05Open>03Resolved a:03hashar I have moved out the config f... [10:51:06] 07Browser-Tests, 10MediaWiki-extensions-CentralNotice, 15User-zeljkofilipin: RSpec::Expectations::ExpectationNotMetError in CentralNotice Selenium Jenkins job - https://phabricator.wikimedia.org/T134366#2263303 (10zeljkofilipin) [10:52:22] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic, 13Patch-For-Review, and 2 others: Fix scenarios that fail at en.wikipedia.beta.wmflabs.org or do not run them daily - https://phabricator.wikimedia.org/T94150#2043289 (10zeljkofilipin) [10:53:55] !log beta: clearing out leftover apt conf that points to unreachable web proxy : salt -v '*' cmd.run "find /etc/apt -name '*-proxy' -delete" [10:54:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:57:10] !log CI: mass upgrading deb packages [10:57:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:01:54] PROBLEM - Puppet run on integration-publisher is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [11:02:08] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic, 13Patch-For-Review, and 2 others: Fix scenarios that fail at en.wikipedia.beta.wmflabs.org or do not run them daily - https://phabricator.wikimedia.org/T94150#2263349 (10zeljkofilipin) [11:07:00] !log restarted CI puppetmaster (out of memory leak) [11:07:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:10:15] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10Ops-Access-Requests: Allow RelEng nova log access - https://phabricator.wikimedia.org/T133992#2263372 (10MoritzMuehlenhoff) p:05Triage>03Normal [11:16:46] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic, 13Patch-For-Review, and 2 others: Fix scenarios that fail at en.wikipedia.beta.wmflabs.org or do not run them daily - https://phabricator.wikimedia.org/T94150#2263388 (10zeljkofilipin) [11:20:13] 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic, 07WorkType-Maintenance: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362#2263396 (10BBlack) 05Resolved>03Open This is because we're halfway through the... [11:21:08] 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic, 07WorkType-Maintenance: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362#2263401 (10BBlack) Oh, I see now also that puppet auto-upgraded the package for you... [11:35:33] PROBLEM - Puppet run on integration-slave-trusty-1025 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:36:47] RECOVERY - Puppet run on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0] [11:59:26] 06Release-Engineering-Team, 06Team-Practices, 06Developer-Relations (Jul-Sep-2016): Developer Summit 2017: Work with TPG and RelEng on solution to event documenting - https://phabricator.wikimedia.org/T132400#2263541 (10Qgil) [12:05:22] 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic, 13Patch-For-Review, 07WorkType-Maintenance: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362#2263549 (10hashar) Based on @bblack input patch set `tlsproxy... [12:11:25] !log beta: restarted nginx on varnish caches ( systemctl restart nginx.service ) since they were not listening on port 443 #T134362 [12:11:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:11:48] T134362: beta cluster varnish cache can't apt-get upgrade nginx-full: nginx: [emerg] unknown "spdy" variable - https://phabricator.wikimedia.org/T134362 [12:15:29] RECOVERY - Puppet run on integration-slave-trusty-1025 is OK: OK: Less than 1.00% above the threshold [0.0] [12:21:56] (03PS1) 10Hashar: mediawiki-core-phpcs to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/286828 (https://phabricator.wikimedia.org/T133976) [12:24:36] !log deleting Jenkins job mediawiki-core-phpcs , replaced by Nodepool version mediawiki-core-phpcs-trusty T133976 [12:24:37] T133976: Migrate mediawiki-core-phpcs job to Nodepool - https://phabricator.wikimedia.org/T133976 [12:24:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:25:00] (03CR) 10Hashar: [C: 032] mediawiki-core-phpcs to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/286828 (https://phabricator.wikimedia.org/T133976) (owner: 10Hashar) [12:26:10] (03Merged) 10jenkins-bot: mediawiki-core-phpcs to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/286828 (https://phabricator.wikimedia.org/T133976) (owner: 10Hashar) [12:30:58] 10Continuous-Integration-Config, 05Continuous-Integration-Scaling, 10releng-201516-q3, 03releng-201516-q4, and 2 others: [keyresult] Migrate php (Zend and HHVM) CI jobs to Nodepool - https://phabricator.wikimedia.org/T119139#2263624 (10hashar) [12:31:02] 05Continuous-Integration-Scaling, 13Patch-For-Review, 07WorkType-NewFunctionality: Migrate mediawiki-core-phpcs job to Nodepool - https://phabricator.wikimedia.org/T133976#2263623 (10hashar) 05Open>03Resolved [13:11:57] 10Continuous-Integration-Infrastructure, 10Thumbor, 07HHVM: CI slaves have conflict between libcurl4-openssl-dev libcurl4-gnutls-dev - https://phabricator.wikimedia.org/T134378#2263803 (10hashar) [13:13:15] 10Continuous-Integration-Infrastructure, 10Thumbor, 07HHVM, 13Patch-For-Review: CI slaves have conflict between libcurl4-openssl-dev libcurl4-gnutls-dev - https://phabricator.wikimedia.org/T134378#2263813 (10hashar) p:05Triage>03Low a:03hashar [13:15:59] 10Continuous-Integration-Infrastructure, 10Thumbor, 07HHVM, 13Patch-For-Review: CI slaves have conflict between libcurl4-openssl-dev libcurl4-gnutls-dev - https://phabricator.wikimedia.org/T134378#2263815 (10hashar) 05Open>03Resolved Cherry picked on CI puppetmaster. I removed both libs: ``` dpkg --pur... [13:23:40] 05Continuous-Integration-Scaling, 07WorkType-NewFunctionality: Migrate PHP extensions building jobs to Nodepool - https://phabricator.wikimedia.org/T134381#2263851 (10hashar) [13:51:41] (03PS1) 10Hashar: Rename PHP extensions building jobs [integration/config] - 10https://gerrit.wikimedia.org/r/286846 (https://phabricator.wikimedia.org/T134381) [13:52:05] 05Continuous-Integration-Scaling, 13Patch-For-Review, 07WorkType-NewFunctionality: Migrate PHP extensions building jobs to Nodepool - https://phabricator.wikimedia.org/T134381#2263965 (10hashar) [13:56:48] (03CR) 10Hashar: [C: 032] Rename PHP extensions building jobs [integration/config] - 10https://gerrit.wikimedia.org/r/286846 (https://phabricator.wikimedia.org/T134381) (owner: 10Hashar) [13:58:19] (03Merged) 10jenkins-bot: Rename PHP extensions building jobs [integration/config] - 10https://gerrit.wikimedia.org/r/286846 (https://phabricator.wikimedia.org/T134381) (owner: 10Hashar) [14:10:28] (03CR) 10Hashar: "Some notes following 1/1 with Tyler" (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/286207 (https://phabricator.wikimedia.org/T129357) (owner: 10Hashar) [14:14:56] Yippee, build fixed! [14:14:57] Project selenium-MultimediaViewer-286674 » safari,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #9: 09FIXED in 14 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer-286674/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/9/ [14:18:17] Project selenium-MultimediaViewer-286674 » internet_explorer 9.0,beta,Windows 7,contintLabsSlave && UbuntuTrusty build #9: 04FAILURE in 18 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer-286674/BROWSER=internet_explorer%209.0,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Windows%207,label=contintLabsSlave%20&&%20UbuntuTrusty/9/ [14:26:27] stephanebisson: if you have a few minutes, it would be great if you could review https://phabricator.wikimedia.org/D204 [14:26:40] I have added you as a reviewer [14:26:40] zeljkof: sure [14:26:46] great, thanks [14:27:13] oh, I've never used differential. Let me get used to it... [14:27:33] stephanebisson: I have probably used it, but not in a while :| [14:28:01] twentyafterfour: the same as stephanebisson, if you have a few minutes to review https://phabricator.wikimedia.org/D204, it would be great [14:28:13] I am doing my best, but my javascript-fu is white belt [14:30:58] zeljkof: is it linked to a phab ticket that provides some context? I can't find it on the UI [14:32:12] Yippee, build fixed! [14:32:12] Project selenium-MobileFrontend-279364 » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #11: 09FIXED in 17 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend-279364/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/11/ [14:39:04] Yippee, build fixed! [14:39:05] Project selenium-MobileFrontend-279364 » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #11: 09FIXED in 24 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend-279364/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/11/ [14:45:27] stephanebisson: let me check, probably there is one [14:46:09] stephanebisson: here it is https://phabricator.wikimedia.org/T132355 [14:46:15] will try to add it to the commit message [14:49:38] zeljkof: Looks good to me. Did you generate your doc to see if it works and looks ok? [14:50:17] stephanebisson: it should be generated by jenkins [14:50:26] let me check [14:52:55] hm, I have added T132355 to maniphest tasks, but it does not show anywhere on the page [14:52:55] T132355: Establish API documentation with jsduck - https://phabricator.wikimedia.org/T132355 [14:53:55] the build is too old, I guess it got deleted :| https://integration.wikimedia.org/ci/job/harbormaster-test/363/ [14:56:38] hashar, twentyafterfour: do you know how to re trigger jenkins job for https://phabricator.wikimedia.org/D204 ? [14:56:51] Project beta-scap-eqiad build #101170: 04FAILURE in 2 min 7 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/101170/ [14:58:05] PROBLEM - Puppet run on deployment-pdf01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [14:59:41] PROBLEM - Puppet run on deployment-restbase02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [14:59:56] Project beta-scap-eqiad build #101171: 04STILL FAILING in 1 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/101171/ [15:02:50] 00:01:16.382 rsync: getaddrinfo: deployment-tin.deployment-prep.eqiad.wmflabs 873: Name or service not known [15:03:01] PROBLEM - Puppet run on integration-slave-trusty-1014 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:03:35] !log beta-scap: deployment-tin.deployment-prep.eqiad.wmflabs [15:03:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:03:43] !log beta-scap: deployment-tin.deployment-prep.eqiad.wmflabs Name or service not known [15:03:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:04:21] stephanebisson: I think I have managed the build to rerun https://integration.wikimedia.org/ci/job/harbormaster-test/377/console [15:04:55] nevermind: [phabricator:send-harbormaster-uri] Error from Harbormaster: No such build target "PHID-HRUL-xiccj6sb3uvl3cithdkv"! [15:05:46] Yippee, build fixed! [15:05:47] Project beta-scap-eqiad build #101172: 09FIXED in 1 min 3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/101172/ [15:06:19] zeljkof: it doesn't seem to build the doc [15:06:56] stephanebisson: yes, I am trying to rebuild the job, but I am doing something wrong :) [15:07:04] nevermind, will try building it locally [15:07:52] PROBLEM - Puppet run on integration-slave-trusty-1011 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:08:18] if I choose "Accept Revision" is it gonna merge it? [15:09:26] What I want to do is +1 with a comment saying that I'm ok to +2 but I'll give the opportunity to someone else to review it [15:10:01] stephanebisson: it does not merge it until the owner of the patch does it [15:10:16] so feel free to do anything that web interface lets you :) [15:10:19] oh, good [15:10:30] merging it would not be a problem anyway [15:10:38] if that happens :) [15:10:52] I am not also really familiar with differential [15:11:47] ok, it's reviewed [15:12:00] PROBLEM - Puppet run on integration-slave-precise-1012 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [15:12:07] it's such a pleasure to read specs instead of xUnit tests [15:12:52] PROBLEM - Puppet run on integration-slave-precise-1002 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [15:13:27] Project beta-scap-eqiad build #101173: 04FAILURE in 2 min 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/101173/ [15:13:32] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: Name or service not known [15:13:53] PROBLEM - Puppet run on deployment-upload is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:13:53] PROBLEM - Puppet run on integration-slave-trusty-1015 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:13:53] stephanebisson: I was able to run jsduck locally, generated docs look good to me [15:14:41] RECOVERY - Puppet run on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:14:51] zeljkof: great. Is it published somewhere so I can learn more about that framework of yours? [15:15:12] stephanebisson: about malu? [15:15:20] it is all over the place [15:15:26] Project beta-scap-eqiad build #101174: 04STILL FAILING in 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/101174/ [15:15:38] https://phabricator.wikimedia.org/diffusion/GMALU/ [15:15:53] https://phabricator.wikimedia.org/project/board/1905/ [15:16:08] https://www.npmjs.com/package/malu [15:16:35] PROBLEM - Puppet run on deployment-redis02 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [15:16:35] PROBLEM - Puppet run on deployment-db1 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [15:18:31] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 29875 bytes in 1.141 second response time [15:18:51] all over what place? [15:20:21] PROBLEM - Puppet run on deployment-zotero01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:22:38] PROBLEM - Puppet run on mira is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:22:52] PROBLEM - Puppet run on deployment-mediawiki03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:23:27] stephanebisson: did you see the links I have posted above? [15:24:48] zeljkof: yeah [15:25:48] Yippee, build fixed! [15:25:48] Project beta-scap-eqiad build #101175: 09FIXED in 1 min 5 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/101175/ [15:26:30] stephanebisson: did I answer your question? [15:26:53] zeljkof: I learned more by reading the code than looking at those links. When the time is right, having a one-pager with a value proposition, an architecture diagram, and a sample test cases, or something like that would go a long way. [15:27:23] stephanebisson: it is still in early phase [15:27:49] bug agreed, a good readme should be done [15:28:11] zeljkof: looking forward to try it. Let me know when I should take it for a road test [15:28:33] stephanebisson: it should be usable already [15:28:41] we had a patch for core that was running fine [15:29:14] https://gerrit.wikimedia.org/r/#/c/256404/ [15:31:39] RECOVERY - Puppet run on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:37:35] RECOVERY - Puppet run on mira is OK: OK: Less than 1.00% above the threshold [0.0] [15:47:56] RECOVERY - Puppet run on integration-slave-trusty-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [15:49:10] 10scap, 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review, 03Scap3 (Scap3-Adoption-Phase1): Use scap3 to deploy eventlogging/eventlogging - https://phabricator.wikimedia.org/T118772#2264457 (10Ottomata) Yes, it is mostly performance team for the deployment to hafnium. See T131977 [15:49:57] 10scap, 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review, 03Scap3 (Scap3-Adoption-Phase1): Use scap3 to deploy eventlogging/eventlogging - https://phabricator.wikimedia.org/T118772#2264461 (10Ottomata) Actually T110903 is more descriptive. [15:51:30] RECOVERY - Puppet run on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [15:52:02] RECOVERY - Puppet run on integration-slave-precise-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [15:52:51] RECOVERY - Puppet run on integration-slave-precise-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [15:53:03] RECOVERY - Puppet run on integration-slave-trusty-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:02] 10Continuous-Integration-Config, 06Operations: Create a CI check for puppet/mediawiki-config to detect misspelled hostnames - https://phabricator.wikimedia.org/T134399#2264486 (10faidon) [15:54:14] beta-scap-eqiad was/is falling due to labs DNS unstability [15:56:50] 10Continuous-Integration-Config, 06Release-Engineering-Team, 06Operations: Write a test to check for clearly bogus hostnames - https://phabricator.wikimedia.org/T133047#2264503 (10hashar) [15:57:03] 10Continuous-Integration-Config, 06Operations: Create a CI check for puppet/mediawiki-config to detect misspelled hostnames - https://phabricator.wikimedia.org/T134399#2264486 (10hashar) [15:57:06] 10Continuous-Integration-Config, 06Release-Engineering-Team, 06Operations: Write a test to check for clearly bogus hostnames - https://phabricator.wikimedia.org/T133047#2218145 (10hashar) [16:04:53] PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:07:24] 10Continuous-Integration-Config, 06Release-Engineering-Team, 06Operations: Write a test to check for clearly bogus hostnames - https://phabricator.wikimedia.org/T133047#2264567 (10hashar) ```grep: the -P option only supports a single pattern``` Will need to use Extended Regular Expressions. ``` $ cat wrong... [16:08:58] 10Continuous-Integration-Config, 06Release-Engineering-Team, 06Operations: Write a test to check for clearly bogus hostnames - https://phabricator.wikimedia.org/T133047#2264572 (10hashar) And James pointed to https://github.com/wikimedia/mediawiki-extensions-VisualEditor/blob/master/build/typos.json [16:09:11] * hashar disappears [16:12:33] PROBLEM - Puppet run on deployment-ms-be02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:23:10] 10scap, 03Scap3 (Scap3-MediaWiki-MVP), 07WorkType-NewFunctionality: Remove apache dependency from scap3 deployment host - https://phabricator.wikimedia.org/T116630#2264635 (10thcipriani) One thing to investigate further is the use of git daemon. [16:38:56] Project selenium-MultimediaViewer-286674 » safari,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #11: 04FAILURE in 14 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer-286674/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/11/ [16:47:13] 06Release-Engineering-Team, 10scap: git/http operations in scap should be secure - https://phabricator.wikimedia.org/T127498#2264669 (10demon) So we've got two options going forward, neither of which are terribly hard. 1) We can generate some certs and slap them on the apache instance we use for git operations... [16:49:35] 06Release-Engineering-Team, 10scap, 03Scap3 (Scap3-MediaWiki-MVP): git/http operations in scap should be secure - https://phabricator.wikimedia.org/T127498#2264711 (10mmodell) [17:23:11] Do we know from when REL1_27 will get cut? Right now master is pre-1.28.0-wmf.1. [17:25:26] James_F: It's being done now. [17:25:33] See https://phabricator.wikimedia.org/T132078#2250108 [17:25:35] Please [17:27:04] paladox: that link is very unrelated, I'd say. :) [17:27:18] Woops sorry yes [17:27:46] paladox, where did you get the link that you pasted? :) [17:27:47] * andre__ curious [17:28:17] Oh from operations- [17:28:22] wikimedia-operation [17:28:35] ah, I see :) [17:28:45] https://phabricator.wikimedia.org/diffusion/MW/browse/REL1_27/ [17:29:01] Heres the correct link sorry about that [17:29:07] ^^ James_F [17:38:13] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [17:48:22] RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:49:06] (03CR) 10Legoktm: [C: 032] Factor our tokenIsNamespaced method [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/286299 (owner: 10Addshore) [17:50:08] (03CR) 10Legoktm: [C: 032] "Nice :D" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/286301 (owner: 10Addshore) [18:01:00] (03Merged) 10jenkins-bot: Factor our tokenIsNamespaced method [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/286299 (owner: 10Addshore) [18:01:07] (03Merged) 10jenkins-bot: Speed up PrefixedGlobalFunctionsSniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/286301 (owner: 10Addshore) [18:09:41] (03PS1) 10Chad: Minor syntax error [tools/release] - 10https://gerrit.wikimedia.org/r/286929 [18:10:52] (03CR) 10Chad: [C: 032] Minor syntax error [tools/release] - 10https://gerrit.wikimedia.org/r/286929 (owner: 10Chad) [18:17:49] (03Merged) 10jenkins-bot: Minor syntax error [tools/release] - 10https://gerrit.wikimedia.org/r/286929 (owner: 10Chad) [18:25:39] RECOVERY - Puppet run on deployment-ms-be02 is OK: OK: Less than 1.00% above the threshold [0.0] [18:26:30] (03PS4) 10Madhuvishy: Add partial support for maven-release-plugin [integration/jenkins-job-builder] - 10https://gerrit.wikimedia.org/r/286788 (https://phabricator.wikimedia.org/T132175) [18:48:18] (03PS1) 10Hashar: typos job template can no vary [integration/config] - 10https://gerrit.wikimedia.org/r/286932 (https://phabricator.wikimedia.org/T133047) [18:50:02] (03CR) 10Hashar: [C: 032] "It is a noop for T133047" [integration/config] - 10https://gerrit.wikimedia.org/r/286932 (https://phabricator.wikimedia.org/T133047) (owner: 10Hashar) [18:51:40] (03Merged) 10jenkins-bot: typos job template can no vary [integration/config] - 10https://gerrit.wikimedia.org/r/286932 (https://phabricator.wikimedia.org/T133047) (owner: 10Hashar) [18:56:01] https://phabricator.wikimedia.org/T131559 < isn't this invalid? [18:58:22] 06Release-Engineering-Team, 05Release: MW-1.27.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T131559#2170735 (10Paladox) This should be closed please since I think were going to 1.28 wmf 1 next week. [18:59:16] (03PS1) 10Hashar: Change typos job to use extended regexp [integration/config] - 10https://gerrit.wikimedia.org/r/286937 (https://phabricator.wikimedia.org/T133047) [19:07:08] 06Release-Engineering-Team, 13Patch-For-Review, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2265223 (10mmodell) [19:07:24] 06Release-Engineering-Team, 05Release: MW-1.27.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T131559#2265226 (10mmodell) 05Open>03Invalid [19:08:04] 06Release-Engineering-Team, 05Release: 1.28.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T134249#2259711 (10mmodell) [19:17:08] How did make-extension-branches ever work? [19:17:17] * ostriches has been livehacking it all morning [20:03:47] PROBLEM - Puppet run on deployment-tmh01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:10:03] (03PS2) 10Hashar: Change typos job to use extended regexp [integration/config] - 10https://gerrit.wikimedia.org/r/286937 (https://phabricator.wikimedia.org/T133047) [20:10:16] (03CR) 10Hashar: "Needs puppet.git patch https://gerrit.wikimedia.org/r/#/c/286938/" [integration/config] - 10https://gerrit.wikimedia.org/r/286937 (https://phabricator.wikimedia.org/T133047) (owner: 10Hashar) [20:12:25] RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:13:52] ostriches are extensions and skins being branched to 1.27, and is mediawiki version being bumped to 1.28. [20:13:56] today. [20:14:03] 07Browser-Tests, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 15User-zeljkofilipin: RSpec::Expectations::ExpectationNotMetError in CentralNotice Selenium Jenkins job - https://phabricator.wikimedia.org/T134366#2265493 (10DStrine) [20:15:58] Yes. [20:23:15] 06Release-Engineering-Team, 13Patch-For-Review, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2265530 (10hashar) [20:23:25] ostriches: Ok, thanks. [20:30:47] RoanKattouw: feel free to push the Echo change for wmf/1.27.0-wmf.23 ( https://gerrit.wikimedia.org/r/#/c/286951/ ) [20:31:03] Thanks :) doing that now [20:31:24] the task detail was pretty badly formatted [20:31:27] 06Release-Engineering-Team, 13Patch-For-Review, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2265550 (10hashar) [20:31:30] but I went lazy with the json oneliner [20:31:39] I keep forgetting about json_pp [20:31:55] I was confused because the JSON seemed to point to the wrong line, but I found it eventually [20:33:34] 06Release-Engineering-Team, 13Patch-For-Review, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2265562 (10Catrope) [20:35:24] 07Browser-Tests, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Unplanned-Sprint-Work, and 2 others: RSpec::Expectations::ExpectationNotMetError in CentralNotice Selenium Jenkins job - https://phabricator.wikimedia.org/T134366#2265576 (10DStrine) [20:35:39] 07Browser-Tests, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Unplanned-Sprint-Work, and 2 others: RSpec::Expectations::ExpectationNotMetError in CentralNotice Selenium Jenkins job - https://phabricator.wikimedia.org/T134366#2263303 (10DStrine) [20:35:46] 10Continuous-Integration-Config, 10Utilities-mwdumper, 07Jenkins: Re-add mwdumper builds to continuous integration / jenkins - https://phabricator.wikimedia.org/T133456#2265582 (10brion) [20:36:16] hashar: Ugh, that exception occurs in wmf22 too :( backporting [20:36:42] RoanKattouw: eeek :) [21:28:28] !log deployed puppet FQDN domain patch for OCG: https://gerrit.wikimedia.org/r/286068 and restarted ocg on deployment-pdf0[12] [21:28:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:48:49] 10Deployment-Systems: mw conf cache is not properly invalidated - https://phabricator.wikimedia.org/T134448#2265882 (10hashar) [21:52:37] 06Release-Engineering-Team, 05Release: 1.28.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T134249#2265900 (10Luke081515) [21:53:45] 06Release-Engineering-Team, 13Patch-For-Review, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2265918 (10Luke081515) [21:54:07] 10Deployment-Systems: mw conf cache is not properly invalidated - https://phabricator.wikimedia.org/T134448#2265920 (10hashar) [21:55:10] 06Release-Engineering-Team, 05Release: MW-1.28.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T134249#2265932 (10Luke081515) [21:56:38] 06Release-Engineering-Team, 05Release: MW-1.28.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T134450#2265934 (10Luke081515) [21:56:49] 06Release-Engineering-Team, 05Release: 1.28.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T134249#2259711 (10Luke081515) [21:56:52] 10Deployment-Systems: mw conf cache is not properly invalidated - https://phabricator.wikimedia.org/T134448#2265953 (10demon) 05Open>03Resolved a:03demon Reverted. [21:57:21] 06Release-Engineering-Team, 13Patch-For-Review, 05Release: MW-1.27.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T131557#2265956 (10Luke081515) [21:57:31] 06Release-Engineering-Team, 05Release: 1.28.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T134249#2265957 (10Luke081515) [21:57:47] sry for that little spam ;) [21:58:06] 10Deployment-Systems, 07HHVM: mw conf cache is not properly invalidated - https://phabricator.wikimedia.org/T134448#2265958 (10hashar) That also might be related to #HHVM ` hhvm.server.stat_cache=true`. From a quick glance at [[ https://github.com/facebook/hhvm/blob/master/hphp/runtime/base/stat-cache.h | HH... [21:59:59] hashar: could you look at this when you get a chance? https://gerrit.wikimedia.org/r/#/c/286788/ I wasn't sure if I should directly submit for review at openstack [22:00:58] 05Gerrit-Migration, 10Differential, 10Utilities-mwdumper: Migrate mwdumper to Differential - https://phabricator.wikimedia.org/T134434#2265963 (10Jdforrester-WMF) p:05Triage>03Normal [22:01:11] 05Gerrit-Migration, 10Differential, 10Utilities-mwdumper: Migrate mwdumper to Differential - https://phabricator.wikimedia.org/T134434#2265563 (10Jdforrester-WMF) [22:24:56] RECOVERY - Puppet run on integration-publisher is OK: OK: Less than 1.00% above the threshold [0.0] [22:27:20] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 07Jenkins, 07WorkType-Maintenance: Upgrade Jenkins from 1.642.3 to 1.651.1 - https://phabricator.wikimedia.org/T133737#2266037 (10hashar) Daniel Beck (upstream) wrote: > We will publish new Jenkins releases (mainline and 1.651.2 LTS) on W... [22:32:47] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #1038: 04FAILURE in 21 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/1038/ [22:35:52] 10Continuous-Integration-Config, 10Utilities-mwdumper, 07Jenkins: Re-add mwdumper builds to continuous integration / jenkins - https://phabricator.wikimedia.org/T133456#2266053 (10hashar) [22:47:33] (03PS1) 10Hashar: maven job for mediawiki/tools/mwdumper [integration/config] - 10https://gerrit.wikimedia.org/r/287022 (https://phabricator.wikimedia.org/T133456) [22:48:47] (03CR) 10Hashar: [C: 032] maven job for mediawiki/tools/mwdumper [integration/config] - 10https://gerrit.wikimedia.org/r/287022 (https://phabricator.wikimedia.org/T133456) (owner: 10Hashar) [22:52:08] (03Merged) 10jenkins-bot: maven job for mediawiki/tools/mwdumper [integration/config] - 10https://gerrit.wikimedia.org/r/287022 (https://phabricator.wikimedia.org/T133456) (owner: 10Hashar) [22:55:46] 10Continuous-Integration-Config, 10Utilities-mwdumper, 07Jenkins, 13Patch-For-Review: Re-add mwdumper builds to continuous integration / jenkins - https://phabricator.wikimedia.org/T133456#2266085 (10hashar) The beauty with maven is that it is well integrated in Jenkins and the lifecycle is straightforward... [22:56:29] brion: mwdumper has a maven jenkins job now https://phabricator.wikimedia.org/T133456 :D [22:56:38] woot :D [22:56:40] thanks hashar ! [22:56:51] and analytics is doing a ton on that front via https://phabricator.wikimedia.org/T130122 [22:57:10] the idea is apparently that pushing a tag should result in a package being available on maven central [23:01:42] brion: Hi resolution switching was fixed in videojs [23:01:50] 10Continuous-Integration-Config, 06Analytics-Kanban, 13Patch-For-Review: Add JJB support for Jenkins Maven Release Plugin {hawk} - https://phabricator.wikimedia.org/T132175#2266092 (10hashar) [23:01:56] So ogv.js plugin for videojs works after updating videojs to 5.10.1. [23:02:18] 10Continuous-Integration-Config, 06Analytics-Kanban, 13Patch-For-Review: Add JJB support for Jenkins Maven Release Plugin {hawk} - https://phabricator.wikimedia.org/T132175#2190696 (10hashar) [23:02:20] paladox: awesome :) thanks for testing that! [23:02:51] hashar: i've volunteered mwdumper for migration to differential as well; would the CI still work or should I tell James_F to hold off? :) [23:02:54] http://videojs.com/ ?? [23:03:02] Your welcome, ive submitted the patch here https://gerrit.wikimedia.org/r/#/c/284375/ [23:03:16] hashar: Yes [23:03:18] brion: volunteer for Differential. Definitely [23:03:24] ok spiff [23:03:49] brion: it can still trigger job in Jenkins under the hood. That machinery is going to be on our shoulders, imho nothing you should worry about :) [23:03:58] yay! [23:04:03] things i don't have to worry about are the best [23:04:08] having to use arcanist would be the most complicated thing [23:04:27] hashar: I found it really easy using the installer i created [23:04:28] which is rather simple in the end, but is a very different flow than Gerrit [23:04:55] brion: havent you written a pure JS video player once upon a time ? [23:05:08] hashar: With https://phabricator.wikimedia.org/D225 you would not need to really use arc to push. [23:05:13] hashar: that's the ogv.js player :D [23:05:35] video.js is a frontend wrapper for regular