[09:22:36] <godog>	 following up here because -operations is noisy, the logstash elastic 5 cluster is currently in trouble, I'm investigating
[09:23:34] <volans>	 godog: ack, shout if you need help, should be considered an incident, need an IC, etc...
[09:25:12] <godog>	 volans: ok thanks! not sure yet
[09:25:21] <apergos>	 here also, same
[09:33:00] <godog>	 yeah let's start an incident please, I don't see the cluster recovering
[09:35:16] <godog>	 so what we know so far: the logstash elasticsearch 5 cluster was in gc-trouble earlier, I roll-restarted the cluster hosts and observing now but e.g. the master now is logstash1011 and that's experiencing high gc activity too
[09:36:14] <XioNoX>	 hello
[09:36:20] <godog>	 the cluster is recovering in the sense that unassigned shards are going down
[09:36:33] <XioNoX>	 creating the doc
[09:36:46] <volans>	 "paged" in -operations
[09:37:24] <godog>	 thanks
[09:37:46] <volans>	 XioNoX: https://wikitech.wikimedia.org/wiki/Incident_response/Quick_reference
[09:38:30] <volans>	 godog: some ES experts might be useful?
[09:38:31] <godog>	 I also noticed the disk space on logstash101[0-2] going from 71% to 80% from 1 am to 8 am, not sure if related
[09:38:57] <godog>	 volans: dcausse or gehel come to mind, if they are around
[09:38:58] <XioNoX>	 godog, volans https://docs.google.com/document/d/1qutKDBzEYsCVVE9ndHbCJFa9LaW_McNtuSjEiimRSXk/edit?usp=sharing
[09:39:14] * gehel has been summoned?
[09:39:18] <godog>	 thanks XioNoX !
[09:39:59] <godog>	 gehel: hey, yeah we're currently having some troubles with the elasticsearch cluster for logstash 5, namely high gc activity is the symptom
[09:40:17] <godog>	 not sure yet of the root cause and/or how to stop it though
[09:41:13] <godog>	 earlier in the week we increased the number of shards from 1 to 3 for new indices of that cluster, which I reverted now out of caution
[09:41:47] <gehel>	 can we reduce the amount of messages sent to logstash?
[09:42:38] <gehel>	 looks like there are a lot of shards initializing, that's going to put additional load on the cluster until it is fully recovered
[09:43:03] <godog>	 we can stop logstash until the cluster has recovered yes
[09:43:32] <gehel>	 appart from high GC, do we have a measurable impact?
[09:44:08] <gehel>	 high GC will increase latency, but if logstash still has enough throughput to consume all logs, that's not really an issue
[09:44:47] <godog>	 no at the moment things seem ok in terms of impact, logstash is catched up with kafka
[09:45:28] <gehel>	 looks like there are timeouts in some of the icinga checks, so I assume that we probably also have impact on the read use cases
[09:46:25] <godog>	 https://logstash.wikimedia.org isn't super fast for me but queries work eventually
[09:47:39] <godog>	 but yeah unassigned shards are going down somewhat slowly
[09:48:15] <XioNoX>	 sorry for my dumb questions, but could the cause of the issue in the logstash log? did someone look?
[09:49:07] <XioNoX>	 what are unassigned shards?
[09:49:22] <gehel>	 unassigned shards are related to the previous restart of the cluster
[09:49:45] <gehel>	 when a node goes down, shards that were on that node need to move around in the cluster
[09:50:22] <XioNoX>	 so even "unused" shards are seeing issues?
[09:50:53] <XioNoX>	 feel free to ignore my questions :) trying to understand what's going on but I don't want to slow the investigation
[09:50:57] <gehel>	 unassigned shards are not really an issue, just a byproduct of how ES manages scaling
[09:51:27] <godog>	 XioNoX: I'm taking a look in the logstash logs in case
[09:51:40] <volans>	 gehel: didn't you solve that issue for the main ES cluster on rolling restart? (like stopping the re-assignment for the time of the restart)
[09:51:44] <gehel>	 looks like some of the recoveries are for old indices. I would expect those to be recovered from disk since there should be no writes on them
[09:52:06] <gehel>	 yes, we solved (or at least optimized) for the search cluster
[09:52:18] <gehel>	 the context is fairly different for logstash, but the same principles could apply
[09:53:13] <volans>	 k, added a comment to the doc for potential action item, thx
[09:53:27] <gehel>	 I do remember seeing log messages being sent to old indices, that might be something we want to investigate (low priority)
[09:53:53] <XioNoX>	 is there a graph of messages rate? maybe something spiked before?
[09:53:58] <volans>	 can the mediawiki fatals be related in any way?
[09:54:13] * gehel needs to leave in a few minutes (meeting at Oscar's school)
[09:54:28] <godog>	 gehel: thanks for stopping by! appreciate it
[09:55:16] <godog>	 can't find logstash logs for earlier than 8:20 today atm
[09:56:03] <gehel>	 https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMjAvMDYvMTkvLS10aHJlYWRkdW1wLjItLTktNDgtMjg7Oy0tdGhyZWFkZHVtcC40LS05LTQ4LTI4OzstLXRocmVhZGR1bXAuMy0tOS00OC0yODs7LS10aHJlYWRkdW1wLjEtLTktNDgtMjg=
[09:56:19] <gehel>	 ^ quick analysis of a few thread dumps
[09:56:35] <XioNoX>	 Let me know if there is anyone I can try to call/ping if needed
[09:57:22] <gehel>	 nothing jumps out except that there is a lot of threads writing to disk (but that's expected)
[09:57:57] <godog>	 *nod* we'll wait for full recovery of all shards and see if things stabilise
[09:58:03] * gehel needs to run, maybe dcausse has a better idea
[09:59:03] <XioNoX>	 godog: recovery from the restart?
[09:59:50] <volans>	 FYI dcausse is not in this chan
[10:00:16] <godog>	 XioNoX: correct
[10:00:55] <godog>	 volans: ack, we can invite him here from -discovery if needed
[10:01:17] <volans>	 if you need him I'll go call
[10:01:28] <XioNoX>	 volans: maybe ping for now so he is aware?
[10:01:33] <XioNoX>	 (if he is around)
[10:01:38] <volans>	 ok trying
[10:01:51] <XioNoX>	 thx
[10:02:48] <godog>	 we're at 18 unassigned shards btw, I'm looking at https://grafana.wikimedia.org/d/oXH_v3rWk/logstash-cluster-health?orgId=1&refresh=5s "elasticsearch details" row
[10:02:53] <dcausse>	 o/
[10:02:54] <XioNoX>	 is there anyone on the service ops team who can look at the mediawiki alerts? PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[10:03:12] <volans>	 hello dcausse, sorry for bothering, the logstash ES cluster is having some trouble and we got some help from ge.hel that had to leave
[10:03:19] <volans>	 summary of current status is in the incident doc: https://docs.google.com/document/d/1qutKDBzEYsCVVE9ndHbCJFa9LaW_McNtuSjEiimRSXk/edit#
[10:04:37] <XioNoX>	 much better to have that on a Friday than on a weekend :)
[10:05:10] <volans>	 if you want backlog of this chan is at https://bots.wmflabs.org/browser/index.php?start=06%2F19%2F2020&end=06%2F19%2F2020&display=%23wikimedia-sre
[10:05:23] <godog>	 also of note, the high gc activity on logstash1010 started yesterday at ~16, whereas on logstash101[12] started today at ~5, I'm looking at https://grafana.wikimedia.org/d/000000462/elasticsearch-memory
[10:07:23] <XioNoX>	 volans: oops I resolved your action item by accident
[10:07:50] <volans>	 you can undo
[10:08:11] <dcausse>	 godog: is the purge of old indices functionning?
[10:08:52] <godog>	 dcausse: afaik yes, I'm checking now
[10:10:16] <dcausse>	 the increasing in terms size since tonigh 1am is worrysome
[10:11:10] <apergos>	 looking at some of the exceptions, I see a bunch of "NamespaceInfo::isTalk called with non-integer (string) namespace 'arachni_name'"  coming from ChangesListSpecialPage->buildQuery more or less, all dewiktionary, but they are from over 20 mins ago (kibana)
[10:11:18] <apergos>	 I could check the raw logs
[10:11:36] <XioNoX>	 apergos: is that the mediawiki exceptions?
[10:11:51] <godog>	 dcausse: yeah curator ran tonight at ~00:42 although it failed some actions, namely to set replicas
[10:12:03] <apergos>	 yeah, that's the second most popular, the most popular I'm not sure is so helpful,
[10:12:12] <godog>	 dcausse: I'm looking at logstash1007:/var/log/elasticsearch/curator-logstash.log.1
[10:12:39] <godog>	 dcausse: so it could be logs coming in with more terms to be indexed than "usual" ?
[10:12:43] <XioNoX>	 apergos: thanks, let me know if you think it could be linked to the logstash issue
[10:12:57] <apergos>	 ErrorException from line 393 of /srv/mediawiki/php-1.35.0-wmf.37/includes/debug/MWDebug.php: PHP Warning: [data-update-failed]: A data update callback triggered an exception (Fail-safe exception. Avoiding infinite loop due to possibily undetectable existing records in master.
[10:12:58] <dcausse>	 godog: yes
[10:13:01] <apergos>	  It may be due to encoding incompatibility between database values and values passed in $neededRecords parameter.) [Called from Wikibase\Repo\Content\DataUpdateAdapter::doUpdate...
[10:13:07] <apergos>	 that's the most popular
[10:13:14] <apergos>	 I don't know that it's related at all
[10:13:23] * apergos goes to look at the raw logs now
[10:13:26] <XioNoX>	 thx
[10:14:36] <dcausse>	 godog: it could be more terms/fields, if you switched to 3 shards it also increase term usage as they're no longer shared as they were when in the same index
[10:15:32] <godog>	 dcausse: *nod* thanks, yeah the shards change I reverted so tomorrow's indices are back to one shard
[10:15:50] <godog>	 is there a way to inspect which terms/fields grew ?
[10:17:18] <dcausse>	 godog: somehow https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
[10:17:29] <dcausse>	 i'll take a look
[10:17:49] <apergos>	 I don't see anything too interesting coming in now, which makes sense as the exception rate has recovered
[10:17:57] <dcausse>	 in the meantime you could perhaps by closing old indices
[10:18:02] <godog>	 not seeing the same increase in the elk7 cluster fwiw, https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?panelId=48&fullscreen&orgId=1&from=1591957073935&to=1592561873935&var-exported_cluster=production-elk7-eqiad&var-datasource=eqiad%20prometheus%2Fops&var-instance=logstash1020&var-top5_fielddata=All&var-top10_terms=All&var-top5_completion=All
[10:18:18] <godog>	 dcausse: ok! I'll start with the oldest
[10:18:47] <XioNoX>	 let me know what actions you're taking, I can't follow your conversation :)
[10:19:37] <godog>	 XioNoX: haha ok! I'll !log too
[10:20:47] <XioNoX>	 thx!
[10:25:05] <godog>	 almost done closing march indices
[10:26:37] <XioNoX>	 rgr
[10:28:17] <apergos>	 the most namespaceinfo messages I see in a minute is 50 (and that's not sustained), and the other ones are warnings anyways (the infinite loop ones), so dead end from me
[10:28:50] <apergos>	 s/are/were/  don't see new ones
[10:29:00] <XioNoX>	 ok, thanks!
[10:30:30] <dcausse>	 godog: you set replicas to 1 which indices? one month old?
[10:31:29] <godog>	 dcausse: yes, >31 days old get 1 replica, >50 days get zero replicas
[10:31:49] <godog>	 dcausse: march indices are all closed now, I see field memory did decrease a little of course, can do april too
[10:39:23] <godog>	 somewhat slowly but shards are recovering, 8 unassigned
[10:41:39] <godog>	 it seems to me things are on their way to recovery *touches wood*
[10:42:12] <dcausse>	 if I'm reading stats correctly jumping from 1 to 3 shards increased terms_memory_in_bytes from ~650M to ~850M per day (curl localhost:9200/*-2020.06.18/_stats | jq ._all.total.segments.terms_memory_in_bytes)
[10:42:25] <XioNoX>	 cool!
[10:42:34] <dcausse>	 does not explain the spike we see 1am tho :/
[10:45:05] <godog>	 indeed, I can't find anything relevant in SAL in case it was say a deploy that changed logging format
[10:46:04] <dcausse>	 could be a giant merge or multiple ones, will check
[10:47:21] <godog>	 dcausse: thanks! I'm tempted to close april's indices too, what do you think ?
[10:47:33] <dcausse>	 godog: fine by me
[10:47:49] <dcausse>	 we can reopen them slowly when things are under control
[10:48:01] <godog>	 sound good
[10:55:34] <dcausse>	 godog: what are the kafka topics you read?
[10:57:12] <godog>	 dcausse: there's a bunch, I'll get the list
[10:58:06] <dcausse>	 I see multiple elastic metrics that increase at 1am so I wonder if it's also visible on the inputs
[10:58:22] <godog>	 https://phabricator.wikimedia.org/P11605
[10:58:54] <godog>	 quite possible yeah, the udp_localhost is also mediawiki logs, I bet that's likely it
[11:00:50] <godog>	 also I researched "give me all messages from timestamp x to y" earlier this week for an unrelated issue and couldn't find a way with our current version of kafka
[11:01:31] <dcausse>	 you can find the closest offset at timestamp x
[11:02:31] <dcausse>	 https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/603477/3/streaming-updater/scripts/set_initial_offsets.py is a small tool set consumer group offsets based on timestamp
[11:03:01] <XioNoX>	 godog: things look under control? Do you still need an IC? :)
[11:03:11] <godog>	 oh nice thank you dcausse
[11:03:25] <godog>	 XioNoX: I _think_ things are under control now yes, thanks for your help!
[11:03:34] <XioNoX>	 great, thx!
[11:05:12] <XioNoX>	 feel free to add data to the doc :)
[11:07:42] <godog>	 *nod* will do
[11:15:53] <godog>	 FWIW I'm basically monitoring things now, logstash 5 in eqiad is at 81% fs utilization so I'm thinking of ditching some indices altogether later
[11:18:10] <jbond42>	 please ignore the wikimedia-extmon alert
[11:18:50] <marostegui>	 I have ack'ed it
[11:19:00] <volans>	 no, I did :D
[11:19:05] <volans>	 :-P
[11:19:09] <marostegui>	 I shows ack'ed by me
[11:19:13] <jynus>	 yep
[11:19:35] <jbond42>	 thanks
[11:19:49] <marostegui>	 in my app it shows ack'ed by me, not by volans :-/
[11:20:12] <jbond42>	 yep looks like you one ;)
[11:20:18] <XioNoX>	 I see it by volans
[11:20:18] <volans>	 it changed
[11:20:21] <volans>	 was me before
[11:20:24] <volans>	 WUT?
[11:20:30] <marostegui>	 uh?
[11:20:40] <marostegui>	 when I hit it, it wasn't ack'ed
[11:20:45] <volans>	 race?
[11:20:50] <volans>	 eventual consistency?
[11:20:52] <volans>	 mondogb?
[11:20:58] <XioNoX>	 yeah I see marostegui now
[11:20:59] <volans>	 *mongodb
[11:21:02] <XioNoX>	 I guess to :)
[11:31:20] <godog>	 I'll check back after lunch on the cluster
[12:03:06] <Amir1>	 hey, today morning 20 mw nodes suddenly started to emit errors on wikibase that should be pretty rare (100/h right now while it's usually around 1/day)
[12:03:07] <Amir1>	 https://logstash.wikimedia.org/goto/ab3489acf83a5695d656dbcc87ae7393
[12:03:33] <Amir1>	 maybe opcache corruption again?
[12:04:10] <Amir1>	 started around 8:20 this morning
[12:09:02] <godog>	 ok the cluster is fully recovered now, I'm deleting some old indices to reclaim some space
[12:22:48] <dcausse>	 something is pushing way more docs than usual to the cluster (since june 19 ~3:40am) (c.f elasticsearch_indices_docs{cluster="logstash", exported_cluster="production-logstash-eqiad"})
[12:23:11] <dcausse>	 can't see anything obvious in kafka metrics
[12:36:37] <godog>	 dcausse: interesting, is that metric on a dashboard you are looking at ?
[12:36:46] <dcausse>	 godog: sadly no
[12:37:01] <dcausse>	 we have this: https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1
[12:37:12] <dcausse>	 but it's too cirrus specific to include logstash
[12:38:11] <godog>	 true, yeah I was (per)using that earlier, typing manually 'logstash' into the cluster you get metrics
[12:38:17] <godog>	 mostly for the IPS
[13:02:12] <kormat>	 godog: have you ever tried installing cumin in a pontoon environment?
[13:08:50] <godog>	 kormat: not yet no
[13:09:28] <kormat>	 ok, i'll let you know how i get on :)
[13:11:44] <godog>	 sweet -- I'm curious to see how that works out
[13:15:08] <volans>	 ping if I can be of any help ofc
[13:16:04] <kormat>	 godog: so far, it's going exactly as well as i expected ;) https://phabricator.wikimedia.org/P11606
[13:17:26] <godog>	 haha! yeah that's fair, I don't think you have set up puppetdb yet ?
[13:17:47] <kormat>	 i didn't realise it wasn't included by having a puppet master
[13:18:00] <volans>	 ofc not
[13:18:15] <volans>	 you didn't pay the extra pain yet ;)
[13:18:23] <kormat>	 well crap :)
[13:18:46] <volans>	 java and postgres
[13:19:19] <volans>	 if it can help cumin can also use different backends that are not puppetdb
[13:19:26] <volans>	 do depending what you're trying to do
[13:19:34] <volans>	 might or might not be useful
[13:20:02] <kormat>	 volans: i'd like to have some more fine-grained selection of db hosts for cumin, so i'm adding a profile with some specific parameters that i'd like to have cumin match on,
[13:20:12] <kormat>	 and i was hoping i could test this in my private puppet kingdom
[13:21:07] <volans>	 then no, you need puppetdb unfortunately
[13:21:13] <volans>	 but
[13:21:19] <volans>	 not sure if worth the effort
[13:21:35] <volans>	 as (almost) any parameter can be queried
[13:21:57] <volans>	 if you show me the profile signature and the query you'd like to do I can tell you if will work or not
[13:21:59] <kormat>	 volans: hierdata can't be queried, which is what led me here :)
[13:22:21] <kormat>	 alright, i'll give up on trying to test this and just add you to the CR when i create it
[13:22:29] <volans>	 indeed because hieradata is abstract out of a context of a catalog compilation
[13:22:31] <godog>	 kormat: sth like these variables should be enough to get puppetdb working in pontoon FWIW, plus of course assigning puppetmaster::puppetdb to a new host and init the pg database
[13:22:35] <godog>	 kormat: https://phabricator.wikimedia.org/P11607
[13:23:46] <kormat>	 godog: hah, nice :) needing a third vm just to test this is way into overkill, i think.
[13:23:57] <kormat>	 (for my current task)
[13:24:47] <godog>	 yeah fair enough
[13:26:43] <volans>	 kormat: tl;dr is, if there is a puppet class (including profiles) or define with a parameter, that's queryable
[13:27:21] <volans>	 with very few limitations
[13:29:03] <volans>	 *with some limitations
[13:29:14] <volans>	 like if you have a hash/list as parameter, etc...
[13:29:24] <volans>	 or custom types
[13:30:04] <kormat>	 volans: can you match an unset parameter?
[13:30:28] <volans>	 undef?
[13:31:03] <volans>	 or check if it's using the default value or a passed value?
[13:31:10] <kormat>	 `sudo cumin 'R:mariadb::monitor_readonly@title = undef'`
[13:31:12] <kormat>	 something like that
[13:31:18] <kormat>	 well. title is a bad example.
[13:31:27] <kormat>	 imagine that's `...%param1 = undef`
[13:31:49] <volans>	 IIRC no because there was an issue on passing undef via the json api of puppetdb
[13:31:56] <kormat>	 hah, ok
[13:32:03] <volans>	 need to recheck the patch that chaom.odus  did a year ago
[13:32:12] <volans>	 that's included in cumin 4.0.0rc1 on cumin2001
[13:32:40] <volans>	 but you can do regex
[13:36:40] <godog>	 dcausse: still a mistery to me what were the offending terms or what lead to terms memory increasing, though things seem stable and I'm tempted to stop curator on elastic 5 cluster for the weekend in case it can trigger similar behaviours
[13:41:51] <gehel>	 godog: going back through the earlier discussion, your restart did trigger a lot of shard initialization
[13:42:12] <gehel>	 we can probably help on that side to have a more streamlined (and automated) restart process
[13:42:38] <gehel>	 our current cookbooks are specific to search, but some of the ideas could be applied to logstash as well
[13:42:58] <gehel>	 we can include ryankemper in this and see if he can adapt the cookbooks
[13:43:01] <volans>	 that's what I commented on the doc as an action item
[13:43:15] <godog>	 gehel: totally! that'd be useful
[13:43:40] * gehel is reading the doc
[13:44:15] <godog>	 I'm going to turn that action item into a task as a followup
[13:48:59] <gehel>	 godog: I've added a note about writes to old indices as well
[13:49:35] <gehel>	 that one is probably low priority
[13:51:34] <godog>	 *nod* thanks gehel, I'm thinking of comparing docs counts day over day to verify it but maybe there's another way ?
[13:52:28] <gehel>	 one indirect measure is to check the number of segments per shard
[13:53:09] <gehel>	 Since we forceMerge via curator, old indices should all have a single segment
[13:53:18] <gehel>	 if that's not the case, it's suspicious
[13:54:09] <gehel>	 I haven't checked the curator logs, but we might even have something in the logs that shows if forceMerge was a noop or not
[13:56:42] <godog>	 interesting, thanks for the lead I'll take a quick look now if there are smoking guns
[14:03:56] <godog>	 yeah the case I'm seeing is when the cluster was catching up with kafka, anyways low priority
[15:08:55] <kormat>	 jbond42: hi. this is your (somewhat late) friday reminder regarding puppet debugging documentation :)
[15:11:26] <dcausse>	 godog: I'd say that term mem increase is due to more docs being indexed between 4:30am and the incident (
[15:11:28] <dcausse>	 https://grafana.wikimedia.org/explore?orgId=1&left=%5B%22now-7d%22,%22now%22,%22eqiad%20prometheus%2Fops%22,%7B%22expr%22:%22elasticsearch_indices_docs%7Bcluster%3D%5C%22logstash%5C%22,%20exported_cluster%3D%5C%22production-logstash-eqiad%5C%22%7D%22%7D,%7B%22mode%22:%22Metrics%22%7D,%7B%22ui%22:%5Btrue,true,true,%22none%22%5D%7D%5D )
[15:11:29] <jbond42>	 kormat: thanks its erm ... underway :S.  serioulsy it is on my list still but didn't get to it today
[15:11:46] <kormat>	 hehe. no worries :)
[15:12:16] <dcausse>	 but I have no clue why... it's not even visible in kibana
[15:14:17] <dcausse>	 I don't see such increase on the elastic7 cluster
[15:14:45] <godog>	 dcausse: hehe yeah I was about to say, truly bizzarre that elk7 doesn't show a similar increase
[15:15:17] <godog>	 there is an increase at about ~2:40 but that's it, it doesn't spiral out of control
[16:17:15] <sukhe>	 hi! if anyone is around today, I was wondering if you could help me debug this pcc error: https://puppet-compiler.wmflabs.org/compiler1001/23342/malmok.wikimedia.org/prod.malmok.wikimedia.org.err
[16:17:43] <sukhe>	 this is a bit weird because the change catalog has the correct output, and I am not referencing this class anywhere in the change: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/606459/
[16:17:57] <sukhe>	 the error it gives is: "Could not find class ::passwords::dnsdist::wikidough for malmok.wikimedia.org"
[16:18:20] <sukhe>	 but I don't have this line in the latest change! this is what the line is:  include passwords::wikidough::dnsdist
[16:18:35] <sukhe>	 it seems to be fetching the line from the older commit for some reason?
[16:19:53] <sukhe>	 the reallyyy weird thing is the change catalog has the expected output: https://puppet-compiler.wmflabs.org/compiler1001/23342/malmok.wikimedia.org/change.malmok.wikimedia.org.pson (grep for "console_key" and it says "snake oil")
[16:22:57] <mutante>	 sukhe: check this now: https://puppet-compiler.wmflabs.org/compiler1001/23344/
[16:23:07] <mutante>	 i rebased it and compiled it one more time
[16:23:35] <mutante>	 though noop is also not what is expected? heh
[16:23:37] <sukhe>	 mutante: thanks! but same :( https://puppet-compiler.wmflabs.org/compiler1001/23344/malmok.wikimedia.org/prod.malmok.wikimedia.org.err
[16:24:00] <sukhe>	 it is trying to find "class ::passwords::dnsdist::wikidough" which is incorrect
[16:24:15] <sukhe>	 the commit has "include passwords::wikidough::dnsdist" which is what it should be trying
[16:24:49] <sukhe>	 (the commit corresponds to the structure in puppet master and labs/private)
[16:25:38] <sukhe>	 if I commit something to labs/private, I just have to run puppet merge on master right?
[16:26:23] <mutante>	 sukhe: that link is before the change is applied
[16:26:40] <mutante>	 before: https://puppet-compiler.wmflabs.org/compiler1001/23344/malmok.wikimedia.org/prod.malmok.wikimedia.org.err
[16:26:43] <mutante>	 after: https://puppet-compiler.wmflabs.org/compiler1001/23344/malmok.wikimedia.org/change.malmok.wikimedia.org.err
[16:26:58] <mutante>	 it's because it was broken before the change is applied and only with the change it is not broken anymore
[16:27:05] <mutante>	 which shows up the same as a noop
[16:27:26] <mutante>	 "no difference (or compile correctly only with the change)
[16:27:48] <sukhe>	 but what's the reason for the noop here
[16:28:28] <mutante>	 it's not a real noop. it is the second option "compile correcly only with the change"
[16:28:42] <mutante>	 it should be broken now when running puppet on the host?
[16:30:19] <sukhe>	 ah I see what you mean now. it's because I changed the class name
[16:31:21] <sukhe>	 so what's a fix for this? apologies if I am missing something here. or will it look up the correct class name correctly once I apply the change?
[16:33:32] <sukhe>	 just making sure I understand "Hosts that have no differences (or compile correctly only with the change)" correctly :)
[16:33:47] <mutante>	 sukhe: the "fix" seems to be to just merge the change, i think
[16:34:01] <mutante>	 because the compiler tells us after doing that puppet would be unbroken
[16:34:31] <sukhe>	 ok yeah makes sense. submitted for review. thanks! first time I have come across this and wanted to make sure what the noop here meant!
[16:34:35] <mutante>	 if you manually run puppet right now, is it broken ?
[16:34:48] <mutante>	 i would expect it to (before merging)
[16:35:02] <mutante>	 then after merging the issue should be gone
[16:35:14] <sukhe>	 oh yeah
[16:35:18] <sukhe>	 broken on malmok
[16:37:42] <mutante>	 ack, then it adds up
[16:37:52] <sukhe>	 yeah :)
[16:37:56] <mutante>	 merge it and see the error go away
[16:38:39] <mutante>	 +1ed
[16:38:45] <sukhe>	 thanks, trying!
[16:40:45] <sukhe>	 mutante: all good! successfully merged. thanks!
[16:41:01] <sukhe>	 also for the question yesterday, jbond42 suggested I use merge in the lookup
[16:44:10] <mutante>	 great!  and ok, eh.. let's try that
[16:44:34] <sukhe>	 yeah that's what I did in the current commit and it worked out fine :)
[16:44:46] <mutante>	 :)