[02:10:37] 10Analytics, 10Analytics-SWAP: Upgrade R in SWAP notebooks to 3.4+ - https://phabricator.wikimedia.org/T222933 (10Groceryheist) [04:19:27] PROBLEM - Check the last execution of monitor_refine_eventlogging_analytics on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_analytics [05:32:32] morning! [05:32:36] just restarted eventlogging [05:32:43] there were some consume errors [05:32:50] and afaics there is some lag [05:37:28] mmmm something is not woring ok [05:40:11] https://grafana.wikimedia.org/d/000000027/kafka?panelId=54&fullscreen&orgId=1&from=now-30d&to=now [05:40:23] as precaution, I have executed preferred-replica-election on jumbo [05:40:29] 1002 seems used way more than others [05:45:17] I am seeing kafka.coordinator [WARNING] Heartbeat failed for group eventlogging_processor_client_side_00 because it is rebalancing [05:45:27] for example often in the eventlogging's processors logs [05:46:54] and camus complains about [05:46:54] Topic not fully pulled, max task time reached at 2019-05-09 [05:46:54] T22:06:04.000Z, pulled 3872 records [05:46:55] T22: Identify features Bugzilla users would miss in Phabricator - https://phabricator.wikimedia.org/T22 [05:46:58] for example [05:50:06] kafka bytes out is super spiky for 1002 [05:50:16] but that has been going for a while [05:50:28] there must be a client that pulls data in a weird/bursty way [05:59:08] so the first occurrence of "Topic not fully pulled, max task time reached at 2019-05-09T21:48:34.000Z, pulled 2 records" [05:59:23] was at May 9 22:07:05 UTC, for camus eventlogging [06:06:59] and eventlogging processors started to fail the consumer group heatbeats (with the assigned broker leader) intermittently since at least May 8 [06:07:45] there is a clear trend of kafka clients lagging a bit, but I still haven't found a clear cut with the last alarms [06:12:43] ah we also didn't refine the last hour that errored for upload yesterday [06:12:46] sigh [06:30:14] !log refine with higher loss threshold webrequest upload 2019-5-8-18 [06:30:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:10:26] ok so all false positives after refinement [07:10:33] one thing fixed :D [07:14:53] trying now to run a el processor with max_poll_records=100 (default 500) [07:15:24] since it fails the heatbeat with kafka, it should mean that the poll() spend too much time processing? [07:19:58] Morning elukey - From the alert emails, I guess it's not been a quiet past 2 days :S [07:20:38] joal: bonjour! Now it is you and Andrew with a curse! :D [07:20:42] Andrew's one is worse [07:20:52] kidding :) [07:20:56] :) [07:20:59] small things, nothing horrible [07:21:20] but! One positive thing is that the RPC alarms + hdfs audit logs works well [07:21:24] Is there anything I should start focusing on, or reading emails is a good start? [07:21:31] \o/ [07:22:10] yesterday we had another spark partitioning issue causing a ton of temp files [07:22:18] but we caught it after 5 mins [07:22:28] nice !! [07:22:34] still not a good fence but better than seeing the HDFS master going down :D [07:22:49] also, the hdfs-audit.log on the master showed user + RPC action [07:22:59] that was.. create file tmp etc. [07:23:03] :) [07:23:24] so please read emails, the only outsting problem seems to be related to eventlogging [07:23:33] but not really sure what now :) [07:23:38] ok - reading [07:23:43] Thanks for the heads up [07:25:31] check webrequest status seems ok now, should recover in the next minutes [07:35:07] RECOVERY - Check the last execution of check_webrequest_partitions on an-coord1001 is OK: OK: Status of the systemd unit check_webrequest_partitions [07:40:41] going afk for a bit! [07:49:17] (03CR) 10Joal: [V: 03+1] "Ping @fdans please :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/502858 (https://phabricator.wikimedia.org/T220111) (owner: 10Joal) [08:02:54] back! [08:22:03] I am trying one setting for the kafka python consumers of Eventlogging [08:22:12] namely setting max_poll_records to 100 (default is 500) [08:22:18] to see if the rebalances decrease [08:23:53] looks good so far [08:25:26] a bit better, seeing less rebalances but still they happen [08:44:40] another try made, raising timeouts [08:46:39] nope [08:46:46] I suspect though that the settings are not applied [08:46:46] :( [08:47:40] anyway joal, I think that we have a not-new issue with eventlogging, namely the processors failing to healtcheck with the broker (leader of their consumer group) and triggering rebalances [08:48:01] and the last alerts from camus + eventlogging analytics refine [08:51:50] elukey: could it be related to version mismatch between kafka and consumers? [08:52:26] in theory no, kafka python is 1.4.6 that is recent [08:53:10] I am trying now one last change [08:53:26] namely no specific timeouts (seems to get worst with these) and poll size 50 [08:53:37] that should be the max amount of records to fetch for each poll [08:55:37] joal: look how beautiful this graph is [08:55:38] https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-30d&to=now&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=jumbo-eqiad&var-topic=All&var-consumer_group=eventlogging_consumer_client_side_events_log_00&panelId=1&fullscreen [08:55:43] seems the throne of GOT :D [08:56:49] :) [08:59:39] my current theory is that processing of some events became heavier for some reason [08:59:54] hm [09:00:17] so kafka-python poll() and spend a lot of time on processing, causing a timeout to occur that triggers the rebalance [09:00:26] all the consumer group is affected [09:00:33] but usually once it stabilize it is good [09:00:40] but I keep seeing constant churning [09:00:52] so it means that processors are failing one by one periodically [09:01:21] there is unbalanced topics elukey [09:01:35] yes 1002 is serving more traffic [09:01:53] I tried a preferred-replica-election today but not really effective [09:01:58] elukey: https://grafana.wikimedia.org/d/000000027/kafka?panelId=12&fullscreen&orgId=1 [09:02:10] I'd say 1003 receives more traffic [09:02:55] joal: https://grafana.wikimedia.org/d/000000027/kafka?panelId=54&fullscreen&orgId=1 [09:03:09] Nice :) [09:03:21] 1002 get more messages, but 1003 gets more volume! [09:04:19] the interesting thing is that your graph is constant for the past 30d [09:04:22] mine is not [09:04:25] https://grafana.wikimedia.org/d/000000027/kafka?panelId=54&fullscreen&orgId=1&from=now-30d&to=now [09:05:00] from the 23rd there is a neat increase [09:05:06] that of course does not match with EL's lg [09:05:08] *lag [09:05:08] sigh [09:17:44] 10Analytics, 10Analytics-Kanban: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 (10elukey) p:05Triage→03High [09:22:09] 10Analytics, 10Analytics-Kanban: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 (10elukey) I tried with the following changes: https://gerrit.wikimedia.org/r/#/c/509341/ https://gerrit.wikimedia.org/r/#/c/operations/puppe... [09:22:18] joal: added some thoughts --^ [09:22:33] my question is: are the recent refine failures due to --^ or something else? [09:22:53] And actually elukey, bytes-in is higher for both 1002 and 1004 when looking longer time range [09:33:07] elukey: also I think there is a partition-replication imbalanced in kafka [09:33:19] This deviation https://grafana.wikimedia.org/d/000000027/kafka?panelId=12&fullscreen&orgId=1 doesn't make sense [09:39:02] elukey: something else - Spike in eventlogging_VirtualPageView schema between 21:20 and 23:35 [09:39:54] elukey: same spike in eventlogging_CitationUsagePageLoad [09:40:11] elukey: this feels like nuria backfilling eventlogging data, no? [09:42:22] hm - mabe not - seems that other eventlogging schema don't show the pattern [09:45:04] joal: I had the same thought but didn't find any trace of it.. we can check on the stat boxes? [09:45:37] sorry better - what would be the current leading theory? [09:45:53] something ongoing affecting the refinement, or something happened that messed it up? [09:58:23] joal: it seems, from the kafka logs, that the broker leader of the cgroup is jumbo 1001 [09:58:44] I am wondering if it could make sense to restart kafka on it to see if the issue still persist on the next leader [09:59:07] kinda desperate attempt I know [10:00:27] joal: ah!!!!! [10:00:34] https://tools.wmflabs.org/sal/analytics?p=0&q=&d=2019-04-30 [10:00:37] take a look! [10:00:47] PERFECT MATCH [10:00:53] that's it, 1.4.6 is at fault [10:01:05] I didn't check in the analytics SAL earlier on [10:01:12] only in the production one [10:01:54] 10Analytics, 10Analytics-Kanban: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 (10elukey) SAL for the 04-30: https://tools.wmflabs.org/sal/production?p=0&q=&d=2019-04-30 https://tools.wmflabs.org/sal/analytics?d=2019-04-... [10:11:13] so we surely need to either find what's wrong or rollback to 1.4.3 [10:11:20] (that needs to be rebuilt) [10:13:39] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Operations, and 2 others: Upgrade python-kafka - https://phabricator.wikimedia.org/T221848 (10elukey) The upgrade of python-kafka to 1.4.6 on eventlog1002 coincides very well with T222941 :( [10:15:31] IIRC andrew should work today [10:17:21] https://github.com/dpkp/kafka-python/blob/master/CHANGES.md#145-mar-14-2019 [10:17:25] This release is primarily focused on addressing lock contention and other coordination issues between the KafkaConsumer and the background heartbeat thread that was introduced in the 1.4 release. [10:26:08] 10Analytics, 10Analytics-Kanban: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 (10elukey) https://github.com/dpkp/kafka-python/issues/1418 seems related, some workarounds are listed. [10:34:45] 10Analytics, 10Analytics-Kanban: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 (10elukey) Rebuilt python-kafka_1.4.3-1_all.deb and uploaded to eventlog1002 in case we decide to rollback. [10:55:56] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Operations, and 2 others: Upgrade python-kafka - https://phabricator.wikimedia.org/T221848 (10elukey) >>! In T221848#5137318, @MoritzMuehlenhoff wrote: > We can probably simply backport https://github.com/dpkp/kafka-python/pull/1628/commits/f12d4... [11:08:37] elukey: may be easier to chat here, looks like it has been failied in nagios for ~1.5 days [11:09:35] jbond42: yep yep I was about to write, Marcel did some changes and IIUC yesterday they were supposed to be working fine [11:09:38] apparently not [11:10:25] I am seeing that the ExecStart contains &&, not sure if systemd parses those like bash [11:10:36] I will replace it with a bash script [11:10:59] ack feel free to ping me for review [11:11:34] jbond42: just as note, what command did you run? [11:11:48] the saltrotate one? [11:12:17] elukey: https://phabricator.wikimedia.org/P8509 command and output [11:12:58] ah! [11:13:00] jbond42: <3 [11:13:20] yeah pretty sure that systemd doesn't like it [11:13:40] yes i think your right [11:17:25] going to send a patch after lunch! [11:21:10] ack [11:21:57] going afk for lunch! [13:02:53] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10fgiunchedi) >>! In T196066#5170415, @Ottomata wrote: > I think there are a few more branches: > > - prod... [13:18:29] hey team! [13:18:49] o/ [13:18:56] mforns: I have a code review for you [13:19:08] elukey, ok [13:19:16] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509387/ [13:19:30] elukey, I just saw you merged the one for AQS deploy, thanks! [13:19:30] when you are ready, not super urgent :) [13:19:39] mforns: nope I didn't! [13:20:15] elukey erb @ vs puppet $ :) [13:21:18] ah right! [13:21:19] fixing [13:22:57] ottomata: done :) [13:23:08] oh elukey, my mistake, was looking at another patch before [13:23:50] mforns: we can merge the AQS on monday if you are ok, since it'll require a roll restart of aqs [13:23:50] elukey, the one I was talking about: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509150/1/hieradata/role/common/aqs.yaml [13:25:23] elukey, the patch you passed me looks good to me, though I don't know exactly how you pass "parameters" to the template. Is it implicit? [13:26:32] ok elukey no prob [13:27:48] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Operations, and 2 others: Upgrade python-kafka - https://phabricator.wikimedia.org/T221848 (10Ottomata) That'd be fine! [13:28:53] mforns: yes exactly [13:29:04] the erb teplates grabs them from the scope in which the file resource is defined [13:29:09] in this case, the variables of the class [13:29:42] ok, then LGTM [13:30:01] ottomata: when you have time I'd like to discuss with you what to do in the short term for eventlog1002 [13:30:55] mforns: also, was it ok to run the saltrotate today manually? [13:31:21] elukey, yes, saltrotate did (or should have done nothing today) [13:31:30] lemme check [13:31:51] it didn't run yesterday though [13:31:58] (EU evening I mean) [13:32:12] but yeah better triple checking :D [13:32:54] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10Ottomata) > I'm not sure I see the value in breaking up the broker name into broker_hostname, broker_id,... [13:32:55] elukey, everything is good [13:33:10] salt will be only rotated at end of quarter [13:33:14] gooood [13:33:37] mforns: so I can re-run it now to see that everything works? [13:34:45] elukey: so likely those camus checker failuer emails about those high volume EL topics were do to this consumer rebalance problem, yes? [13:35:06] ottomata: not sure, it has been ongoing for a while, but it is the only weird thing that I found [13:35:09] :( [13:35:38] I tried some parameter today for kafka python but they were all making it worse [13:35:46] or not changing much [13:36:49] ok [13:36:53] am reading this kafka-python bug [13:39:14] elukey: you got that 1.4.3 .deb from buster? [13:40:03] I say lets install it anad see if it fixes the problem [13:40:08] ottomata: nono I have simply used the gerrit repo, then did git reset --hard 1.4.3 on master and to your 1.4.3 release commit on debian [13:40:11] then built [13:40:20] oh [13:40:38] but I wasn't sure if it was the correct source/procedure [13:40:38] but buster has a backport for gilles bug to 1.4.3? [13:40:38] elukey,sure [13:40:41] so didn't attempt [13:41:04] elukey, can re-run, it should do nothing. Just print some logs [13:41:05] OHHh [13:41:05] sorry [13:41:17] moritz was just saying that 1.4.3 is in buster [13:42:31] ottomata: IIUC he proposed to open a bug to debian upstream if our package worked no? [13:43:10] yes, sorry [13:43:12] misremembered [13:43:13] just read [13:43:15] better [13:43:29] we could definitely grab the buster source, rebuild for strech (even if no C parts are involved afaics), apply the patch and test [13:43:31] the 1.4.3 that you put there is whta we were runnign before [13:43:42] nono, we have packagae, we can do it too. [13:43:48] hmm [13:44:15] elukey: i'll try and backport this fix for gille,s make a .deb, then we can install on eventlog1002 anad see if it fixes our problem [13:44:29] if it should then we can upload to apt and go with this one? [13:44:30] ottomata: 1.4.3 + the patch? [13:44:33] yes [13:44:36] +1 [13:44:42] k [13:47:26] RECOVERY - Check the last execution of refinery-eventlogging-saltrotate on an-coord1001 is OK: OK: Status of the systemd unit refinery-eventlogging-saltrotate [13:48:39] \o/ [13:48:47] oh elukey [13:48:47] hm [13:48:49] ? [13:48:57] that PR thata gilles wanted is already in 1.4.3 [13:49:14] OPH nonono [13:49:15] sorry [13:49:19] i just branched wrong. [13:58:35] ottomata: when you are ready to install gimme something that I'll merge a change to remove the max_poll_records [13:58:51] ok [14:09:42] (03CR) 10Elukey: "> Given that it's not private data, analytics seems fine. Do we have" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/509016 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [14:14:43] ok elukey am ready to install [14:16:40] merging [14:17:08] you can install, I'll merge + run puppet to refresh ok? [14:17:17] ok [14:17:44] !log downgrading python-kafka from 1.4.6-1~stretch1 to 1.4.3-2~wmf0 on eventlog1002 - T221848 [14:17:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:17:47] T221848: Upgrade python-kafka - https://phabricator.wikimedia.org/T221848 [14:17:49] done elukey [14:20:57] done! [14:21:56] ok elukey now we wait? [14:22:43] ottomata: still seeing the rebalance in the processor logs [14:23:02] but those might have happened even before, so yes let's wait a bit [14:23:35] I am watching https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=jumbo-eqiad&var-topic=All&var-consumer_group=eventlogging_consumer_client_side_events_log_00 [14:23:48] oh ya [14:24:00] a lot less rebalances for sure afaics [14:24:02] (from the logs) [14:24:26] ha maybe the deadlock fix we just backported is the cause of this :p [14:25:30] we need eventgate! :P [14:27:44] (03CR) 10Fdans: [C: 03+1] "I like nice code :) Thank you for doing this Joseph, this looks and works way nicer. I think the solution of identifying the leaders is aw" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/502858 (https://phabricator.wikimedia.org/T220111) (owner: 10Joal) [14:33:17] ottomata: it seems not working :( [14:40:03] ok elukey let's roll back to 1.4.3 without backport [14:40:04] and see [14:40:25] oh to [14:40:26] 1.4.1 [14:40:28] not 1.4.3, right? [14:40:39] 1.4.3 IIRC [14:41:01] hmmm its just not ni /var/cache/apt/archives aas i'd expect [14:41:02] hm [14:42:26] from your commits on the repo it seemed 1.4.3 [14:46:38] unless we never installed it on eventlog1002? [14:47:04] let's try 1.4.3 without pathc first [14:47:09] elukey: that's the one in your homedir ya? [14:51:02] ottomata: if you trust what I did yes :D [14:51:08] oh you rebuilt it? [14:51:09] hmm [14:51:36] it is based on your commits on the repo so should be good [14:51:43] ok i'm going to takethe one that is on boron just in case [14:51:47] thata was built last year [14:51:53] ack! [14:52:22] I hope I didn't override it [14:52:45] ok, restarting eventlogging [14:53:10] !log restarted eventlogging with python-kafka-1.4.3 [14:53:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:57:45] (03PS18) 10Fdans: Replace time range selector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/499968 (https://phabricator.wikimedia.org/T219112) [14:57:56] this might be the good one [14:58:00] elukey: i still see heartbeats failing... [14:58:58] ottomata let's see if they crash brutally or if they are kind of stable(ish) [14:59:17] it must be that package, the regression matches 1:1 with your upgrade [15:00:08] i'd assume so too.... [15:00:12] but ya let's watch the lag [15:03:00] milimetric: ready when you are to merge the timeselector change [15:06:03] mforns: we are investigating an issue with eventlogging that may be related to the refine failures [15:06:14] elukey, ah [15:06:44] that is https://phabricator.wikimedia.org/T222941 [15:06:45] elukey, are you seeing the same error? [15:06:58] but it doesn't correlate with the occurrence of the alarm, since it started before [15:07:43] elukey, I sent a response to the alarm thread (monitor_refine_eventlogging) [15:08:07] (03CR) 10Ladsgroup: [C: 03+1] "> Patch Set 1:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/509016 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [15:08:14] mforns: yep saw it [15:08:31] mforns: but there were also errors related to missing data no? [15:08:48] I think it's a deprecated lib? maybe? I've seen that javax.mail is unmaintained since 2013 and it's kept for compat, but we should be using javax.mail-api [15:09:04] elukey, yes, the monitor was about to send a failure report [15:09:33] which is probably related to the issue you're mentioning [15:09:37] ok, I get that [15:12:04] mforns: the main problem is that the issue started ~10 days ago [15:12:12] not sure why we got those alerts only now [15:12:16] O.o [15:12:25] ottomata: it seems stable now! [15:12:37] ok [15:12:56] elukey: i'm sstill setting lots of spikey lag [15:12:57] OH [15:12:58] soryr [15:13:01] my graph isn't updating at latest [15:13:37] hm so far stable but it hasn't been stable that long, want to wait a bit more [15:13:58] ah yes but I am sure we got it [15:16:42] the heartbeats still flap tho! [15:17:04] yeah but once in a while was previous behavior [15:17:13] if it doesn't lag we are good(ish) [15:17:51] hmm, was it also missing heartbeats every minutish before? [15:21:15] elukey: what is your github username? [15:21:32] elukey [15:21:57] https://github.com/dpkp/kafka-python/issues/1418#issuecomment-491327542 [15:25:08] super thanks [15:27:01] not sure how to handle multiple versions of the package now [15:27:10] we need to chat with gilles about it [15:34:57] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 (10elukey) p:05High→03Normal Andrew deployed 1.4.3 and we are back to stable. [15:39:23] ottomata: of course I write --^ and then lag occurs [15:39:24] sigh [15:39:43] let's see how it behaves in a period of some hours [15:42:32] (but the impact is way lower now) [15:52:49] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "Good and good and good." (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/499968 (https://phabricator.wikimedia.org/T219112) (owner: 10Fdans) [15:53:45] !log kill mediacounts-archive coordinator, chown analytics:analytics /wmf/data/archive/mediacounts + restart the coord with the analytics user [15:53:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:56:11] 10Analytics, 10Analytics-Kanban, 10EventBus, 10serviceops, 10Services (watching): Change LVS port for eventlogging-analytics from 31192 to 33192 - https://phabricator.wikimedia.org/T222962 (10Ottomata) [15:58:11] 10Analytics, 10Analytics-Kanban, 10EventBus, 10serviceops, 10Services (watching): Change LVS port for eventlogging-analytics from 31192 to 33192 - https://phabricator.wikimedia.org/T222962 (10Ottomata) Hm, question. Currently mediawiki-config ProductionServices.php has: 'eventgate-analytics' => 'http... [15:59:52] ping ottomata [16:11:46] RECOVERY - Check the last execution of monitor_refine_eventlogging_analytics on an-coord1001 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_analytics [16:32:30] 10Analytics, 10Performance: > 2% of API wall time spent generating UUIDs - https://phabricator.wikimedia.org/T222966 (10ori) [16:32:48] 10Analytics, 10Performance: > 2% of API wall time spent generating UUIDs - https://phabricator.wikimedia.org/T222966 (10ori) [16:40:29] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Replace current time range selector on Wikistats to allow for arbitrary time selections - https://phabricator.wikimedia.org/T219112 (10Milimetric) Three issues found while testing this on staging on my iphone: https://wikistats-canary.wmflabs.org/time-sel... [16:40:50] fdans: listed the issues I found with the selector ^. If you're sick of it I can debug, up to you [16:41:00] curious if you see them on your phone too [16:51:23] * elukey off! [17:09:02] team, need to pick up my mom at the (boat?)port, will be back in a bit [18:54:38] 10Analytics, 10Puppet: modules/udp2log/manifests/instance/monitoring.pp has unreachable code - https://phabricator.wikimedia.org/T152104 (10Dzahn) [18:55:17] 10Analytics, 10Puppet: modules/udp2log/manifests/instance/monitoring.pp has unreachable code - https://phabricator.wikimedia.org/T152104 (10Dzahn) It's been a couple years and this has been called obsolete for a long time but also it's not completely removed yet. [18:56:32] 10Analytics, 10Operations, 10Wikimedia-Logstash, 10observability: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10Dzahn) [18:56:47] 10Analytics, 10Operations, 10Wikimedia-Logstash, 10observability: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10Dzahn) also T152104 [19:31:27] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Replace current time range selector on Wikistats to allow for arbitrary time selections - https://phabricator.wikimedia.org/T219112 (10fdans) > when navigating to another area (Content/Contributing/Reading) from the Detail view, the time range control shri... [23:34:26] THE TIMERANGE IS SO COOL fdans !!!! [23:43:34] Thank youuu don’t use the right handle yet tho, I already have a fix for it