[00:06:37] <bd808>	 !log striker Update to 82eb1c3 (T144710)
[00:06:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Striker/SAL
[00:06:42] <stashbot>	 T144710: Create Wikitech/LDAP accounts via a new user friendly guided workflow - https://phabricator.wikimedia.org/T144710
[00:27:36] <wikibugs>	 10Labs-project-Wikistats: allthetropes is not updating on wikistats - https://phabricator.wikimedia.org/T146712#2856064 (10NDKilla) @Dzahn  Shouldn't be an issue but I copy/pasted MySQL output to be sure I didn't typo the DB name before saying it's deleted. Please only purge links associated with unknown databas...
[00:58:46] <wikibugs_>	 10Labs-project-Wikistats: allthetropes is not updating on wikistats - https://phabricator.wikimedia.org/T146712#2856123 (10Dzahn) >>! In T146712#2856064, @NDKilla wrote > Below are wiki databases who still exits but whose stats havent been updated in over 400 hours > pnpwiki  I took one random example and it doe...
[01:00:41] <wikibugs>	 10Labs-project-Wikistats: allthetropes is not updating on wikistats - https://phabricator.wikimedia.org/T146712#2856124 (10Dzahn) In case you say to remove the wiki suffix, pnp is 404 as well.  https://pnp.miraheze.org/wiki/  Maybe just check those URLs  in a browser?
[01:02:55] <wikibugs>	 10Labs-project-Wikistats: all kinds of mixed issues with miraheze table (was: allthetropes is not updating on wikistats) - https://phabricator.wikimedia.org/T146712#2856138 (10Dzahn)
[01:15:02] <godog>	 the file listing at http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/ doesn't do the right thing I think with content-disposition, i.e. should display inline but downloads instead
[01:15:25] <godog>	 what would be the task to file or the config to change?
[01:17:17] <bd808>	 godog: works for me? I think that is just plain old apache directory listings
[01:17:57] <bd808>	 godog: but a bug report would go in https://phabricator.wikimedia.org/project/view/668/
[01:19:06] <bd808>	 those logs really could/should be organized by year or year/month too
[01:19:29] <bd808>	 just so the dir listing doesn't take forever to render
[01:19:39] <godog>	 bd808: thanks for the pointer!
[01:20:29] <godog>	 works for you on chrome? but yeah ff shows inline
[01:25:49] <bd808>	 godog: I don't use Google's corporate spyware ;)
[01:26:31] <godog>	 dammit you didn't fall for it!
[01:26:54] <godog>	 though chromium would do too
[01:33:58] <bd808>	 godog: works for me in chrome 54.0.2840.9 on osx
[01:34:09] <Yvette>	 Hi.
[01:34:15] <Yvette>	 https://en.wikipedia.org/wiki/Special:Contributions/10.68.17.202
[01:34:26] <Yvette>	 Is Labs using special IP addresses? MediaWiki isn't blocking these?
[01:35:14] <bd808>	 yeah, those are edits from tool labs I think... let me reverse the ip
[01:35:39] <bd808>	 tools-exec-1401.tools.eqiad.wmflabs
[01:36:23] <godog>	 bd808: odd. thanks
[01:36:50] <bd808>	 Yvette: someone's bot editing while logged out I guess
[01:37:16] <Yvette>	 I mean, it's my bot for some of them.
[01:37:30] <bd808>	 looks like it's pretty common -- https://en.wikipedia.org/w/index.php?title=Wikipedia:Biographies_of_living_persons/Noticeboard/Watchlist&action=history
[01:37:30] <Yvette>	 Just curious that it's using that IP and that MediaWiki is allowing the edits.
[01:37:55] <Yvette>	 You'd think a special range like that would be banned or something.
[01:38:06] <bd808>	 the 10/8 range is used for internal networking at WMF
[01:38:25] <Yvette>	 I guess Labs --> production traffic is on the internal network?
[01:38:39] <bd808>	 sort of
[01:38:44] <Yvette>	 I kind of assumed it went outside.
[01:39:19] <bd808>	 it's not in the same network segment
[01:39:43] <Yvette>	 When I saw the 10. address, I thought it might some kind of proxy issue.
[01:39:46] <Yvette>	 Like https://en.wikipedia.org/wiki/Special:Contributions/127.0.0.1
[01:39:48] <Yvette>	 But shrug.
[01:39:58] <bd808>	 but the x-forwarded-for headers that are set by the external varnish servers see the labs 10.68.x.x ip
[01:40:01] <Yvette>	 Not sure why the script isn't logging in, but I'll wait a bit to see if it keeps happening.
[01:41:59] <bd808>	 if you pass ?assert=user or ?assert=bot with your request then it will be blocked by the aciton api if your login failed
[01:42:11] <bd808>	 which is the safe thing to do honestly
[01:43:34] <bd808>	 I think if we looked at the full X-forwarded-for headers on those requests we would see a public ip that belongs to the WMF. I've never fully traced it out though
[01:43:57] <Yvette>	 Yeah, I've considered adding an assert.
[01:44:14] <Yvette>	 Personally I'd rather have the edit "unattributed" than not have the edit, though.
[01:44:23] <bd808>	 :) yeah
[01:44:49] <bd808>	 as long as it doesn't raise some admin's ire to see ips making bot edits
[01:45:03] <Yvette>	 Hah, yeah.
[01:50:17] <Krenair>	 sometimes labs instances get privileged access to services on 'production' hosts that are there for labs-support reasons
[01:50:39] <Krenair>	 sometimes you expect them to get the same unprivileged access that the public internet gets
[01:51:39] <Krenair>	 right now, no NAT is done to translate labs private IPs into public IPs before they reach the public LVS machines
[01:52:04] <Krenair>	 so varnish sees the private IP and sticks that in the XFF
[02:05:12] <Krenair>	 <Krenair> AndyRussG, okay, from reading your ssh -vvv log and from that, this issue is way before you hit deployment-prep
[02:05:13] <Krenair>	 <Krenair> you can't even get into labs
[02:05:20] <Krenair>	 <AndyRussG> Krenair: yeah! huh I changed Gerrit key like 6 months ago, I was assuming that it's somehow coordinated with labs
[02:05:26] <AndyRussG>	 Krenair: sorry I didn't see what u meant by -labs
[02:05:27] <Krenair>	 18<AndyRussG18> But https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-openstack shows the old key, the one you pasted
[02:05:27] <Krenair>	 18<AndyRussG18> I should just update it there, no?
[02:05:36] <Krenair>	 you need to update it there as well as gerrit, yes
[02:05:55] <Krenair>	 gerrit's ssh key list controls SSH to gerrit.wikimedia.org:29418
[02:06:02] <Krenair>	 it is not linked to labs LDAP
[02:06:02] * AndyRussG self trout-slaps repeatedly
[02:06:20] <Krenair>	 there is a historical request laying around to link it that way somewhere
[02:06:49] <AndyRussG>	 K lemme fix dat....
[02:09:21] <AndyRussG>	 K I assume it updates on a schedule...
[02:10:06] <Krenair>	 I can't find the task I mentioned
[02:10:13] <Krenair>	 krenair@bastion-01:~$ /usr/sbin/ssh-key-ldap-lookup andyrussg
[02:10:13] <Krenair>	 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDAMmAGeTcMpB7ZNlnWlqN16H0H8c37/XkhgVJBrQAngXfNLBP6aji0ldq0wrUmJFKPmu/BXftUr0fX02Rohz87qG1po242IXovdbjhPSGCi3sa4ofqL5smUuOKBJ/hkKGqxKjsAQ2sWktI6onHcX7MrTwUHxACal2c3LF6D+mKr0VpL0kXfnzCJvn2QW4Abjgc68623lb7Z1vR8tV3peepex/nrZ2hbSBBzuH+ImcEGm3Pm9/BRxcqIy6Z6OMpkmc+WAtqEmlekZNOhoa71DrN3yt0xxp7jZ3dUQ89ll49JqUHlJ7o35+V5YQCPK0vbT6aDAx3B7lnb8zg8f5cfFwj5MU27gkHFcaKMKBsZ3EYcRy+RseLT5H0SbcHtvGOoo3My7fy88V8KmwyR4W2CV+1ETXUdx
[02:10:14] <Krenair>	 x+oA62mgtrcvfACZXqXG3FX3ZeaHviG+VzXZkWSaxFJ4oXSFR2QvFUQXvcG2ck0kysLFYMSoi02FbZBLcLP3IefsFMgtGjNICiWe6XcGeE5p6vPJ/kbIzwmexQM8DlNH1sGIKuWUSeuXCdeLGZaV99v5UVM6iDhLQQJzQwOBOQpT4v82uco/nZS/6gIOOrDcwaSmy4UDZ3bmZhL64kRteqau9PUWo+tzGgMKmOdMqAlUqY1FryXHrkdP0PcmoYJnQ8DbfyAh20aI1uUw== andrew.green.df@gmail.com
[02:10:21] <Krenair>	 you should be able to ssh in now, AndyRussG 
[02:15:36] <AndyRussG>	 Krenair: yep all good!
[02:15:46] <AndyRussG>	 Krenair: thx so much and sorry for the bother!!!!!! :D \o/
[02:42:01] <wikibugs>	 10Tool-Labs-tools-Pageviews: More user-friendly errors for when there is no data - https://phabricator.wikimedia.org/T152657#2856296 (10MusikAnimal)
[02:48:13] <wikibugs>	 10Tool-Labs-tools-Pageviews: Vibrating page on large screens - https://phabricator.wikimedia.org/T152658#2856313 (10MusikAnimal)
[02:48:30] <wikibugs_>	 10Tool-Labs-tools-Pageviews: Vibrating page on large screens - https://phabricator.wikimedia.org/T152658#2856325 (10MusikAnimal) p:05Triage>03High
[03:01:16] <grrrit-wm>	 (03CR) 10Catrope: [V: 040 C: 032] Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) 
[03:01:21] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 040 C: 040] Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) 
[03:07:56] <grrrit-wm>	 (03Merged) 10jenkins-bot: Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) 
[03:25:58] <wikibugs_>	 06Labs, 10Tool-Labs: Warnings/errors in /var/lib/gridengine/spool/qmaster/messages - https://phabricator.wikimedia.org/T152477#2856381 (10scfc) T151980 changed `host_aliases`, but the grid master was probably not restarted afterwards, so it was still working with a reference to that host, therefore I decided t...
[03:44:14] <wikibugs>	 06Labs, 10Tool-Labs: Warnings/errors in /var/lib/gridengine/spool/qmaster/messages - https://phabricator.wikimedia.org/T152477#2856386 (10scfc) Rebooting `tools-webgrid-lighttpd-1208` was not enough: I had to remove the directory `/var/spool/gridengine/execd/tools-webgrid-lighttpd-1208/active_jobs/4594249.1`....
[04:13:36] <wikibugs_>	 06Labs, 10Tool-Labs: Redis replication from tools-proxy-01 to tools-proxy-02 broken - https://phabricator.wikimedia.org/T152356#2856393 (10scfc) (That was meant to say https://gerrit.wikimedia.org/r/#/c/325751/.)
[05:11:37] <yurik>	 seems like beta-cluster does not have latest code - https://gerrit.wikimedia.org/r/#/c/325732/ is missing
[05:21:26] <bd808>	 yurik: it looks like the update job is running successfully -- https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/
[05:21:55] <bd808>	 have you ssh'ed into deployment-tin and looked at the clone to see if the problem is obvious?
[05:22:07] <yurik>	 bd808, i'm looking at deployment-tin:/srv/mediawiki/php-master/extensions/JsonConfig/includes$ vi JCSingleton.php
[05:22:14] <yurik>	 its outdated
[05:22:21] <bd808>	 ok.
[05:22:35] <bd808>	 I wonder if the extensions mega repo is messed up?
[05:22:42] * bd808 logs in to poke around
[05:23:12] <yurik>	 thx :)
[05:38:18] <bd808>	 yurik: I think that whatever magic updates the mediawiki/extensions.git repo is busted
[05:38:28] <yurik>	 yepii
[05:38:33] <bd808>	 it'd not showing any changes since 11 hours ago
[05:38:34] <bd808>	 https://github.com/wikimedia/mediawiki-extensions/commits/master
[05:38:36] * yurik loves magic
[05:38:55] <bd808>	 which ... would probably be around the time that the gerrit upgrade started
[05:39:18] <yurik>	 and considering that gerrit was busted for a few hours...
[05:39:19] <bd808>	 so I guess file a bug and hope that hashar fixes it
[05:41:12] <yurik>	 https://phabricator.wikimedia.org/T152663
[05:41:15] <yurik>	 thx!
[07:03:03] <grrrit-wm>	 (03PS1) 10BryanDavis: contrib: Add ldapPublicKey when creating dummy users [labs/striker] - 10https://gerrit.wikimedia.org/r/325890 
[07:12:01] <grrrit-wm>	 (03PS3) 10BryanDavis: Bump static, striker, and wheels submodules [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/325814 (https://phabricator.wikimedia.org/T144710) 
[07:38:14] <wikibugs_>	 10Striker: Striker error logs not getting into ELK cluster - https://phabricator.wikimedia.org/T151422#2856532 (10bd808) The ferm rules allow `$DOMAIN_NETWORKS`. Californium is `208.80.154.147/26` which must at least be a part of `$::network::constants::deployable_networks` or scap3 wouldn't work. I guess I need...
[07:45:56] <wikibugs>	 10Striker: Striker error logs not getting into ELK cluster - https://phabricator.wikimedia.org/T151422#2816717 (10yuvipanda) I verified that the packets are making it all the way to logstash1003, and being dropped by iptables.  here's the expanded ruleset: ``` ACCEPT     udp  --  10.128.0.0/24        anywhere...
[07:57:21] <wikibugs>	 10Striker: Striker error logs not getting into ELK cluster - https://phabricator.wikimedia.org/T151422#2856575 (10yuvipanda) We realized that californium is hitting logstash1003 over ipv6...  ``` yuvipanda@logstash1003:~$ sudo ip6tables -L | grep 11514  ACCEPT     udp      2620:0:860:101::/64  anywhere...
[08:29:31] <wikibugs>	 10Striker: Striker error logs not getting into ELK cluster - https://phabricator.wikimedia.org/T151422#2856583 (10yuvipanda) So I added a TRACE but looks like there's no traffic from californium to logstash1003 in general?
[09:07:44] <Phawkes>	 [13gerrit-patch-uploader] 15eloquence opened pull request #33: Clarify process for patch set updates (06master...06patch-1) 02https://git.io/v1g2i
[11:30:48] <Steinsplitter>	 is there a way to move .out and .err to a subdir such as /logs.  -e -o works, but it it changing the .out/.err to someting random.
[11:31:23] <Steinsplitter>	 -e $HOME/logs   isn't working as described in the docs
[11:35:11] <Steinsplitter>	 Krenair maybe you know? (seen you contributed to the conig git repo)
[14:51:41] <wikibugs>	 10Labs-project-Wikistats: all kinds of mixed issues with miraheze table (was: allthetropes is not updating on wikistats) - https://phabricator.wikimedia.org/T146712#2857128 (10NDKilla) @Dzahn all urls are suffixed with wiki (mediawiki grants are to like '%wik%'.* or something).  The databases in the second list...
[15:42:17] <wikibugs_>	 10Striker: Striker error logs not getting into ELK cluster - https://phabricator.wikimedia.org/T151422#2857200 (10bd808) >>! In T151422#2856583, @yuvipanda wrote: > So I added a TRACE but looks like there's no traffic from californium to logstash1003 in general?  The logging for Striker would be the only direct...
[17:02:32] <Phawkes>	 [13gerrit-patch-uploader] 15valhallasw pushed 2 new commits to 06master: 02https://git.io/v12aV
[17:02:32] <Phawkes>	 13gerrit-patch-uploader/06master 14dc89db6 15Erik Moeller: Clarify process for patch set updates
[17:02:33] <Phawkes>	 13gerrit-patch-uploader/06master 1494ab68f 15Merlijn van Deen: Merge pull request #33 from eloquence/patch-1...
[17:04:07] <Phawkes>	 [13gerrit-patch-uploader] 15valhallasw pushed 2 new commits to 06master: 02https://git.io/v12ad
[17:04:07] <Phawkes>	 13gerrit-patch-uploader/06master 14d4f15f9 15Merlijn van Deen: Only add non-space characters to URL
[17:04:08] <Phawkes>	 13gerrit-patch-uploader/06master 145797b2d 15Merlijn van Deen: Merge branch 'master' of https://github.com/valhallasw/gerrit-patch-uploader
[17:14:50] <wikibugs>	 10Striker: Striker error logs not getting into ELK cluster - https://phabricator.wikimedia.org/T151422#2857363 (10bd808) @yuvipanda helped me debug this further by temporarily disabling Puppet on californium and increasing the log verbosity of Striker by editing /etc/striker/striker.ini. I then tailed /srv/log/s...
[17:19:41] <wikibugs_>	 10Striker: Some Striker errors not getting into ELK cluster - https://phabricator.wikimedia.org/T151422#2857371 (10bd808)
[17:28:28] <wikibugs_>	 10Striker: Some Striker errors not getting into ELK cluster - https://phabricator.wikimedia.org/T151422#2857389 (10bd808) >>! In T151422#2857363, @bd808 wrote: > The `telnet` in the initial report is bogus because this is a UDP event stream not a TCP stream. But I started testing because I had a user report of a...
[17:35:28] <wikibugs>	 06Labs, 10Tool-Labs: ssh from tools-puppetmaster-02 and tools-bastion-03 to tools-services-01 times out - https://phabricator.wikimedia.org/T152695#2857414 (10scfc)
[17:36:19] <wikibugs_>	 06Labs, 10Tool-Labs: ssh from tools-puppetmaster-02 and tools-bastion-03 to tools-services-01 times out - https://phabricator.wikimedia.org/T152695#2857433 (10scfc)
[17:54:42] <grrrit-wm>	 (03PS1) 10BryanDavis: profile: make less output nicer [labs/tools/stashbot] - 10https://gerrit.wikimedia.org/r/325972 
[17:54:44] <grrrit-wm>	 (03PS1) 10BryanDavis: stashbot.sh: Add 'attach' command [labs/tools/stashbot] - 10https://gerrit.wikimedia.org/r/325973 
[18:22:45] <shinken-wm>	 RECOVERY - Host tools-secgroup-test-103 is UP: PING OK - Packet loss = 0%, RTA = 2.95 ms
[18:29:08] <shinken-wm>	 PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22)
[18:38:45] <wikibugs_>	 06Labs, 10Labs-Infrastructure, 10Monitoring: nova: Monitor existence and membership for certain projects and accounts - https://phabricator.wikimedia.org/T152708#2857711 (10Andrew)
[18:48:32] <YuviPanda>	 !log tools restarted toolschecker on tools-checker-01
[18:48:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[18:56:40] <shinken-wm>	 RECOVERY - Host secgroup-lag-102 is UP: PING OK - Packet loss = 0%, RTA = 1.17 ms
[18:58:45] <YuviPanda>	 andrewbogott: how do I cleanup these super annoying ldap entries ^
[18:59:04] <shinken-wm>	 PROBLEM - Puppet run on tools-grid-master is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[18:59:27] <andrewbogott>	 YuviPanda: are they ldap host entries?  Or something else?
[18:59:35] <andrewbogott>	 I assumed they were part of shinken's state storage
[18:59:37] <YuviPanda>	 andrewbogott: pretty sure LDAP that gets picked up by shinken
[18:59:47] * andrewbogott checks
[18:59:47] <YuviPanda>	 andrewbogott: yeah that gets re-run every 10min
[19:00:59] * Krenair facepalms
[19:01:01] <Krenair>	 guys
[19:01:04] <Krenair>	 shinkengen gets data from ldap
[19:01:25] <Krenair>	 I have complained about stale ldap host data many times before
[19:03:26] <shinken-wm>	 RECOVERY - Host tools-secgroup-test-102 is UP: PING OK - Packet loss = 0%, RTA = 1.62 ms
[19:03:49] <andrewbogott>	 ok, I cleaned up a bunch of things, including ^
[19:04:05] <andrewbogott>	 Shinkengen will need a rewrite anyway, since ldap hosts are no longer especially accurate (obviously)
[19:04:33] <godog>	 fwiw what I did for prometheus is ask the openstack api for a list of hosts, seems to work well
[19:04:36] <Krenair>	  /etc/shinken/generated/tools.cfg:    host_name        secgroup-lag-102
[19:04:37] <Krenair>	  /etc/shinken/generated/tools.cfg-    address          10.68.17.218
[19:04:53] <Krenair>	 krenair@shinken-01:~$ ldapsearch -x aRecord=10.68.17.218 dn -LLL
[19:04:53] <Krenair>	 dn: dc=ci-jessie-wikimedia-49929.contintcloud.eqiad.wmflabs,ou=hosts,dc=wikime
[19:04:53] <Krenair>	  dia,dc=org
[19:04:54] <Krenair>	 dn: dc=ci-trusty-wikimedia-151608.contintcloud.eqiad.wmflabs,ou=hosts,dc=wikim
[19:04:54] <Krenair>	  edia,dc=org
[19:04:55] <Krenair>	 dn: dc=ci-jessie-wikimedia-229502.contintcloud.eqiad.wmflabs,ou=hosts,dc=wikim
[19:04:56] <Krenair>	  edia,dc=org
[19:05:03] <Krenair>	 3 contintcloud hosts with the same aRecord? yeah. . .
[19:05:59] <shinken-wm>	 PROBLEM - Host tools-secgroup-test-102 is DOWN: CRITICAL - Host Unreachable (10.68.21.170)
[19:06:39] <shinken-wm>	 PROBLEM - Host secgroup-lag-102 is DOWN: CRITICAL - Host Unreachable (10.68.17.218)
[19:23:14] <wikibugs>	 06Labs, 10Labs-Infrastructure, 06Operations, 10netops, and 3 others: Provide read-only access to OpenStack APIs from WMF IP space - https://phabricator.wikimedia.org/T150092#2857881 (10Andrew)
[19:29:06] <shinken-wm>	 RECOVERY - Puppet run on tools-grid-master is OK: OK: Less than 1.00% above the threshold [0.0]
[19:39:46] <godog>	 I'd need to upgrade prometheus-node-exporter in labs too to the latest version, IIRC there's no salt for that and the closest is clustershell? 
[19:40:06] <YuviPanda>	 godog: tools or labs? :)
[19:42:45] <godog>	 YuviPanda: labs in this case
[19:42:56] <godog>	 I can start with tools tho
[19:43:31] <YuviPanda>	 godog: so there's salt from labcontrol1001 but that won't hit deployment-prep (or other places with their own saltmaster)
[19:43:31] <YuviPanda>	 godog: will hit tools tho
[19:43:39] <kaldari>	 bd808: Just got a whole bunch of cron daemon error emails from labs, for multiple projects. For example: "error: failed receiving gdi request response for mid=1 (got syncron message receive timeout error)."
[19:44:00] <kaldari>	 which is greek to me
[19:45:06] <godog>	 YuviPanda: ah, that'd do too, thanks! what about places with their own salt master?
[19:45:11] <kaldari>	 actually those were all about an hour ago, but just got them
[19:45:20] <kaldari>	 or rather just checked my email :)
[19:45:22] <YuviPanda>	 godog: then need to find their salt master and use it.
[19:46:23] <kaldari>	 also "error: commlib error: got read error (closing "tools-grid-master.tools.eqiad.wmflabs/qmaster/1")"
[19:47:25] <godog>	 YuviPanda: thanks, any easy way to find all salt masters in this case?
[19:48:08] <YuviPanda>	 godog: hmm, not sure. we used to have 'watroles' that helped you find instances with a role but that doesn't work right now
[19:48:22] <YuviPanda>	 godog: I know that at least deployment-prep and integration have one. so I guess maybe just do those now and see how it goes?
[19:50:35] <godog>	 YuviPanda: yeah I'll do that now, the next change is a commandline one in puppet that will break on the old version :(
[19:50:40] <valhallasw`cloud>	 kaldari: very odd -- sounds like a network connectivity issue
[19:50:53] <YuviPanda>	 godog: ah, ouch
[19:51:26] <kaldari>	 valhallasw`cloud: I'm just going to ignore them for now, but wanted to let you know just in case :)
[20:07:43] <shinken-wm>	 PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:23:16] <wikibugs_>	 10Striker: Some Striker errors not getting into ELK cluster - https://phabricator.wikimedia.org/T151422#2858048 (10bd808) The Python stack traces are too long to encode in a UDP packet. ``` 20:22:04.269268 IP (tos 0x0, ttl 63, id 11981, offset 0, flags [+], proto UDP (17), length 1500)     californium.wikimedia....
[20:27:11] <wikibugs_>	 10Striker: Striker error log events not getting into ELK cluster due to UDP truncation of JSON payload - https://phabricator.wikimedia.org/T151422#2858064 (10bd808)
[20:33:39] <wikibugs_>	 06Labs, 10Labs-Infrastructure, 10Monitoring, 13Patch-For-Review, 07Wikimedia-Incident: labservices1001 crashed and sent no pages - https://phabricator.wikimedia.org/T152368#2858069 (10Andrew) 05Open>03Resolved
[20:49:44] <wikibugs>	 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 06Developer-Relations, and 2 others: Developing community norms for vital bots and tools - https://phabricator.wikimedia.org/T149312#2858100 (10bd808) In a full 90 minute slot, @chasemp and I would expand the scope of this to cover: * Planning for the Tool L...
[21:02:42] <shinken-wm>	 RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:44:54] <wikibugs_>	 06Labs, 10Labs-Infrastructure, 10Monitoring: nova: Monitor existence and membership for certain projects and accounts - https://phabricator.wikimedia.org/T152708#2857711 (10Krenair) These are keystone things rather than nova. As I mentioned earlier I also think we should have monitoring of the $project.wmfla...
[22:25:19] <wikibugs>	 10Tool-Labs-tools-Other, 10Possible-Tech-Projects: Fix TreeViews to provide pageviews statistics for all articles of any wikiproject etc. - https://phabricator.wikimedia.org/T56184#2858577 (10Nuria)