[00:51:02] 10Deployment-Systems, 10Stashbot, 07WorkType-NewFunctionality: Re-do Wikimedia's server admin log implementation - https://phabricator.wikimedia.org/T59343#3004021 (10bd808) With Stashbot having replaced Adminbot/Morebots we do now have `!log` messages going into both Wikitech wiki pages (nice because they g... [00:52:09] 03Scap3: autolog scap3 deployments in beta - https://phabricator.wikimedia.org/T156079#3004024 (10bd808) Related: {T46791} [01:26:44] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T155526#3004075 (10thcipriani) The logspam that we've accrued over the past few months has made it difficult to glance at error logs after a deployment and reason about a deploy... [01:48:27] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10Phabricator (Upstream), 07Upstream: Can not delete column in project workboard - https://phabricator.wikimedia.org/T75716#777667 (10srishakatux) @hashar @Aklapper unable to figure out how to change the status of a column so that they ge... [02:23:39] 03Scap3, 10Parsoid: Saying yes (y) continues to all groups - https://phabricator.wikimedia.org/T156839#3004167 (10dduvall) [03:28:04] 10Continuous-Integration-Config, 10Cite, 10Parsoid: Parsoid should run tests against the Cite parser tests, not just MediaWiki core's - https://phabricator.wikimedia.org/T114256#3004272 (10Arlolra) a:03Arlolra [03:58:27] Yippee, build fixed! [03:58:28] Project selenium-MultimediaViewer » firefox,mediawiki,Linux,contintLabsSlave && UbuntuTrusty build #289: 09FIXED in 2 min 26 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=mediawiki,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/289/ [04:26:41] (03PS1) 10Tim Landscheidt: operations/software/tools-manifest: Make debian-glue voting [integration/config] - 10https://gerrit.wikimedia.org/r/336369 (https://phabricator.wikimedia.org/T156651) [05:32:54] 10Gerrit, 06Release-Engineering-Team, 03Scap3: Deploy gerrit with scap3 - https://phabricator.wikimedia.org/T157414#3004394 (10demon) [05:34:53] 10scap: Automatically clean up unused wmfXX versions - https://phabricator.wikimedia.org/T73313#3004408 (10demon) >>! In T73313#2865620, @demon wrote: > This is easier now than before, eg: `scap clean 1.28.0-wmf.9` I'm inclined to actually close this resolved. This probably shouldn't be fully automated and with... [08:33:23] (03PS1) 10Hashar: ops/software/etcd-mirror: add debian-glue non voting [integration/config] - 10https://gerrit.wikimedia.org/r/336379 [08:33:40] (03CR) 10Hashar: [C: 032] ops/software/etcd-mirror: add debian-glue non voting [integration/config] - 10https://gerrit.wikimedia.org/r/336379 (owner: 10Hashar) [08:34:33] (03Merged) 10jenkins-bot: ops/software/etcd-mirror: add debian-glue non voting [integration/config] - 10https://gerrit.wikimedia.org/r/336379 (owner: 10Hashar) [08:41:56] PROBLEM - Puppet run on repository is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:01:12] PROBLEM - Host integration-slave-jessie-1001 is DOWN: CRITICAL - Host Unreachable (10.68.16.72) [09:02:52] !log Hard rebooting integration-slave-jessie-1001 . I messed up with the DHCP client :( [09:02:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:06:52] RECOVERY - Host integration-slave-jessie-1001 is UP: PING OK - Packet loss = 0%, RTA = 3.88 ms [09:07:52] (03CR) 10Hashar: [C: 032] "Well done! It indeed did pass just fine on the latest merged change https://gerrit.wikimedia.org/r/#/c/336057/" [integration/config] - 10https://gerrit.wikimedia.org/r/336369 (https://phabricator.wikimedia.org/T156651) (owner: 10Tim Landscheidt) [09:08:03] (03PS2) 10Hashar: operations/software/tools-manifest: Make debian-glue voting [integration/config] - 10https://gerrit.wikimedia.org/r/336369 (https://phabricator.wikimedia.org/T156651) (owner: 10Tim Landscheidt) [09:08:12] (03CR) 10Hashar: operations/software/tools-manifest: Make debian-glue voting [integration/config] - 10https://gerrit.wikimedia.org/r/336369 (https://phabricator.wikimedia.org/T156651) (owner: 10Tim Landscheidt) [09:08:17] (03CR) 10Hashar: [C: 032] operations/software/tools-manifest: Make debian-glue voting [integration/config] - 10https://gerrit.wikimedia.org/r/336369 (https://phabricator.wikimedia.org/T156651) (owner: 10Tim Landscheidt) [09:09:08] (03Merged) 10jenkins-bot: operations/software/tools-manifest: Make debian-glue voting [integration/config] - 10https://gerrit.wikimedia.org/r/336369 (https://phabricator.wikimedia.org/T156651) (owner: 10Tim Landscheidt) [09:21:54] RECOVERY - Puppet run on repository is OK: OK: Less than 1.00% above the threshold [0.0] [09:22:34] 10Continuous-Integration-Config, 06Labs, 10Tool-Labs, 13Patch-For-Review: operations/software/tools-webservice (and operations/software/tools-manifest?) do not run Debian tests - https://phabricator.wikimedia.org/T156651#3004755 (10hashar) a:03scfc @scfc made it happen :-} [09:22:39] 10Continuous-Integration-Config, 06Labs, 10Tool-Labs, 13Patch-For-Review: operations/software/tools-webservice (and operations/software/tools-manifest?) do not run Debian tests - https://phabricator.wikimedia.org/T156651#3004757 (10hashar) 05Open>03Resolved [09:25:43] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10Phabricator (Upstream), 07Upstream: Can not delete column in project workboard - https://phabricator.wikimedia.org/T75716#3004758 (10hashar) @srishakatux Phabricator removed support for deleting columns and instead they can only be hidd... [10:23:53] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 07Jenkins: Upgrade Jenkins from 1.x to latest 2.x - https://phabricator.wikimedia.org/T144106#3004820 (10hashar) [10:29:44] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 07Jenkins: Upgrade Jenkins from 1.x to latest 2.x - https://phabricator.wikimedia.org/T144106#3004836 (10hashar) Been reviewing the [[ https://jenkins.io/doc/upgrade-guide/ | upgrade guide ]] and I guess I will just go with the first 2.X LT... [10:50:07] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 07Jenkins: Upload Jenkins LTS v2.7.4 to wikimedia-ex - https://phabricator.wikimedia.org/T157429#3004897 (10hashar) [10:50:58] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 07Jenkins: Upload Jenkins LTS v2.7.4 to wikimedia-ex - https://phabricator.wikimedia.org/T157429#3004915 (10hashar) Need #operations to publish the package on apt.wikimedia.org and review the idea of using the `experimenta... [10:53:02] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 07Jenkins: Upload Jenkins LTS v2.7.4 to jessie-wikimedia/experimental - https://phabricator.wikimedia.org/T157429#3004923 (10hashar) [10:57:12] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 07Jenkins: Upload Jenkins LTS v2.7.4 to jessie-wikimedia/experimental - https://phabricator.wikimedia.org/T157429#3004897 (10MoritzMuehlenhoff) @hashar: jessie-wikimedia/experimental seems fine, we also used that to stage... [10:57:32] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 07Jenkins: Upload Jenkins LTS v2.7.4 to jessie-wikimedia/experimental - https://phabricator.wikimedia.org/T157429#3004926 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff [13:15:45] 06Release-Engineering-Team, 10Phabricator, 05Security: Update phabricator to 2017.05 - https://phabricator.wikimedia.org/T157198#3005268 (10Bawolff) [13:16:04] 06Release-Engineering-Team, 10Phabricator, 05Security: Update phabricator to 2017.05 - https://phabricator.wikimedia.org/T157198#3005272 (10Bawolff) >>! In T157198#3003165, @Paladox wrote: > Can this be open to the public? Done [13:18:10] 06Release-Engineering-Team, 10Phabricator, 05Security: Update phabricator to 2017.05 - https://phabricator.wikimedia.org/T157198#3005281 (10Paladox) Thanks. [13:21:17] hashar: https://gerrit.wikimedia.org/r/#/c/336404/ [13:21:56] \O/ [13:22:56] chasemp: beside the arrow alignments, it is all fine :} [13:23:08] amending now [13:23:25] if you +1 i'll merge, hashar who is "contint" group? [13:23:29] just you or all of releng? [13:24:50] no clue, but that is the group defined for zuul and others [13:25:02] ha ok, I'll look into it [13:25:35] modules/nagios_common/files/contactgroups.cfg [13:25:41] contactgroup_name contint [13:25:41] members amusso,irc-releng,irc [13:25:56] so me + IRC spam to operations and releng channels [13:26:29] alrighty, does that page you then? [13:26:37] nop [13:26:53] iirc it just mails me [13:27:49] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 07Jenkins: Upload Jenkins LTS v2.7.4 to jessie-wikimedia/experimental - https://phabricator.wikimedia.org/T157429#3005314 (10MoritzMuehlenhoff) jenkins 2.7.4 has been uploaded to carbon in the jessie-wikimedia/experimental... [16:28:28] 10scap: Automatically clean up unused wmfXX versions - https://phabricator.wikimedia.org/T73313#3006003 (10bd808) >>! In T73313#3004408, @demon wrote: > I'm inclined to actually close this resolved. This probably shouldn't be fully automated and with the logic properly hidden it's trivial to do the work. It doe... [16:41:44] 10scap: Automatically clean up unused wmfXX versions - https://phabricator.wikimedia.org/T73313#3006070 (10demon) >>! In T73313#3006003, @bd808 wrote: >>>! In T73313#3004408, @demon wrote: >> I'm inclined to actually close this resolved. This probably shouldn't be fully automated and with the logic properly hidd... [16:45:16] bd808: Lots of text :D [17:21:56] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10scap, 10ORES, 06Revision-Scoring-As-A-Service: Running out of space when deploying on sca03 (deploy-cache) - https://phabricator.wikimedia.org/T157199#3006164 (10Halfak) [17:22:00] 10Deployment-Systems, 03Scap3, 07WorkType-NewFunctionality: Scap3 submodule space issues - https://phabricator.wikimedia.org/T137124#3006165 (10Halfak) [17:23:20] Hey folks, I'm looking at https://phabricator.wikimedia.org/T137124 and hoping to get a temporary resolution so that I can do a deployment of ORES in beta labs and hopefully prod sometime soon. [17:23:32] Maybe someone could just delete some of the old cache files for me? [17:37:25] I'd love to just have the whole deployment directory for "ores" on sca03 just blown away. [17:39:35] hashar: may want to ping thcipriani or twentyafterfour directly :) also I know greg is currently locked up in a meeting [17:39:39] oops I meant halfak^ [17:39:50] thanks chasemp [17:40:19] * twentyafterfour is here [17:40:23] o/ [17:40:55] deployment-sca03:/srv/deployment/ores/deploy is super broken due to disk space errors. [17:41:01] Lost of zero-size files. [17:41:05] halfak: ok ... [17:41:13] let me see what I can do [17:41:46] I propose deleting cache files and re-cloning the deploy repo entirely [17:42:08] scap deploy --force doesn't get the clone to happen again :/ [17:42:21] maybe I should file that as a bug [17:43:28] I think the problem is that the machine simply doesn't have a big enough disk [17:43:54] Indeed that too [17:43:56] :) [17:44:12] there is already a bug about the submodules not using --reference to share disk space between revisions [17:44:45] right yeah... been blocked on deployment for a while though. I was hoping to get some temporary help so that we weren't sitting her blocked. [17:44:55] *here [17:45:18] can we just rebuild the instances with more storage? [17:46:17] That'd be cool with me [17:46:40] Right now, I'm pretty sure my deploy would be broken even if the --references bug was addressed. [17:46:47] But rebuilt instance would do it for me. [17:46:52] Or a manual delete :) [17:47:03] horizon says the disk is 40g but `df -h` says it's 20g [17:47:16] * halfak squints at horizon [17:47:37] halfak: I can delete the deployment directory but won't it start failing again after ~4 deploys? [17:48:13] twentyafterfour, yes. But I'd be able to move forward with the current deploy while we wait on more complete solutions :) [17:48:26] Assuming deletion is easier -- which is what I did. [17:49:16] ok I deleted deploy-cache and deploy [17:49:25] \o/ will try again. Thanks :) [17:49:57] !log deploying ores 7c80636 [17:50:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:53:42] halfak: I'm attempting to configure deployment-sca04 to see if I get 40g as promised or if it gives me 20g like sca03 [17:53:55] COol. Building now? [17:54:00] yeah [17:56:57] http://ores-beta.wmflabs.org/ [17:56:59] \o/ [17:57:01] ! [17:57:05] Thanks twentyafterfour! [17:57:21] :) [17:57:39] https://horizon.wikimedia.org/project/instances/517b2cce-7f49-479f-aa23-c21a66864fb3/ [17:58:15] /dev/vda3 19G [17:58:19] wtf ... horizon lies [17:58:48] andrewbogott: chasemp: why do medium instances have ~20G disks when horizon promises twice as much? [17:59:23] twentyafterfour: are you already using a puppet class to mount additional space? [17:59:33] * andrewbogott tries it [17:59:45] * twentyafterfour didn't know you needed a puppet class to use the full disk [18:00:04] andrewbogott: so it gives you 20G root but there is another 20g available? [18:00:09] Typically it only partitions 8g [18:00:12] hmm [18:00:14] I'm surprised you're getting a 20g root [18:00:42] Filesystem Size Used Avail Use% Mounted on [18:00:44] /dev/vda3 19G 1.6G 17G 9% / [18:01:00] this is typical for medium instances, as far as I can see [18:01:49] I'll have a look. but in the meantime, you should just apply role::labs::lvm::srv [18:01:58] andrewbogott: thank you! [18:02:47] twentyafterfour: what distro was that, btw? Jessie? [18:02:53] so, halfak: I'm going to apply that puppet role to sca03, ok? [18:02:58] andrewbogott: yes jessie [18:03:03] 'k thanks [18:09:30] twentyafterfour: I'm wrong about it always being 8g, apparently it defaults to 20 (which I vaguely remember changing now that I think about it). The thing about needing role::labs::lvm::srv to partition should still be true. [18:09:52] andrewbogott: yeah, I added that role and now I've got a 20g /srv [18:09:58] great [18:10:09] thank you, I simply didn't know about the extra lvm role [18:10:15] it makes sense [18:10:50] halfak: so I managed to break ores I think... it didn't re-deploy everything properly after remounting /srv [18:11:14] wait I'm wrong [18:11:17] it came back up after a while [18:11:35] http://ores-beta.wmflabs.org/ .. and now you shouldn't run out of space for a while [18:13:47] PROBLEM - Puppet run on deployment-sca03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [18:20:26] 10Continuous-Integration-Infrastructure, 10Release Pipeline: Mathoid CI Container Image POC - https://phabricator.wikimedia.org/T157469#3006353 (10thcipriani) [18:28:47] RECOVERY - Puppet run on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:41:55] Yippee, build fixed! [20:41:55] Project selenium-Echo » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #297: 09FIXED in 53 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/297/ [20:42:06] Yippee, build fixed! [20:42:06] Project selenium-Echo » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #297: 09FIXED in 1 min 5 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/297/ [20:48:20] thcipriani: monitoring for nodepool https://gerrit.wikimedia.org/r/#/c/336404/3/modules/role/manifests/labs/openstack/nodepool.pp ! :) [20:48:28] all by Chase [20:48:53] nice [20:49:12] a question Chase had is who it is going to ping/notify [20:49:32] the "contint" group in Icinga is irc to operations / releng [20:49:42] and I am the sole one receiving the occasional email [20:50:01] maybe we could add in our team list [20:50:17] yeah that sounds like a good plan [20:50:40] not sure how it's setup: I think people outside the team can mail to it? [20:50:59] would suck to get caught in moderation [20:51:24] yeah I think Greg clarified that a while ago [20:52:30] We can whitelist it to bypass the moderation possibly [20:52:38] Or manually subscribe the sender