[03:20:34] (03PS5) 10Sahil505: [WIP] Added CSS custom properties using postcss [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/437387 (https://phabricator.wikimedia.org/T190915) [08:15:38] hello people :) [08:15:48] Hi Master Luca [08:16:21] not sure Master of what but thanks :D :D :D [08:16:40] elukey: Master of Servers [08:16:41] :) [08:17:17] elukey: I'm checking sqoop issue as said in the email [08:17:24] elukey: It made me fou [08:17:41] again -- elukey: It made me find a bug in our sqoop scripts [08:17:54] ah nice! [08:18:11] I didn't check the whole thing since I was trying to figure out a weird thing with burrow [08:18:19] that lead to fixing a bug in the logging config [08:18:22] elukey: We have evolved the sqoop base script for sqooping, but we have forgotten that there wasd a second script to generate the jar (hat has not yet evolved jointly) [08:18:24] but I have still the original issue :P [08:25:57] joal: let me know if you need help with anything [08:26:06] elukey: will do :) [08:28:50] I currently can't explain [08:28:51] {"level":"info","ts":1528441314.9835448,"msg":"cluster or consumer not found","type":"module","coordinator":"evaluator","class":"caching","name":"default","cluster":"main-eqiad","consumer":"kafka-mirror-main-eqiad_to_main-codfw","showall":true} [08:29:17] basically there are no lag metrics from burrow for the mirror maker tha reads from main-eqiad and produces to main-codfw [08:29:22] but other metrics are there [08:29:26] and it seems to work fine [08:51:59] I don't understand the log message actually [08:53:44] in theory it seems like mirror maker is not committing any offset [08:54:02] mehh [08:54:05] ? [08:54:38] Like it would not use the kafka-internal-mechanism for offsets, and use zookeeper as in old kafka versions? [08:55:43] nono it would mean basically that the consumer is not consuming [08:56:14] Ah - ok :) [08:56:24] I understand better [08:56:32] but thinking about it, might be something expected, we stopped mirroring sensitive topics due to the absence of TLS, not sure now what's the remaining ones [08:56:41] need to review metrics and config in a better way [08:56:45] k [08:56:56] I should have followed those things a bit more closely [08:58:14] (03PS1) 10Joal: Correct sqoop-jar-generation script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/438213 [08:58:17] PROBLEM - Kafka MirrorMaker main-eqiad_to_eqiad max lag in last 10 minutes on einsteinium is CRITICAL: 1.163e+06 gt 1e+05 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_eqiad [08:58:49] I put the downtime on those.. [08:58:55] this is due to a fix in burrow's config [08:59:01] others might come at this point [09:00:18] basically https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=codfw%20prometheus%2Fops&var-lag_datasource=eqiad%20prometheus%2Fops&var-mirror_name=main-eqiad_to_main-codfw&refresh=5m&orgId=1 [09:00:30] is the one that I was talking about (main eqiad to codfw) [09:00:34] that seems to work [09:00:59] (in this case, on kafka200[1-3] mirror maker consumers from Kafka main eqiad and produces to main codfw) [09:01:11] *consumes [09:03:47] RECOVERY - Kafka MirrorMaker main-eqiad_to_eqiad max lag in last 10 minutes on einsteinium is OK: (C)1e+05 gt (W)1e+04 gt 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_eqiad [10:03:37] joal: ah! found this one [10:03:38] https://grafana.wikimedia.org/dashboard/db/kafka-consumer-lag?panelId=2&fullscreen&orgId=1&from=now-2d&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=main-eqiad&var-topic=All&var-consumer_group=kafka-mirror-main-eqiad_to_main-codfw [10:04:04] elukey: well - This is like a real stop [10:04:13] I already bounced the mirror makers though [10:04:15] this morning [10:04:15] mmmm [10:10:59] so it matches, more or less, with the last restart of kafka [10:11:11] that I think was due to the new produce timestamp settings [10:11:38] so in theory, an issue with the producer could prevent the consumer to keep fetching [10:12:44] but mirror main codfw to eqiad works [10:14:27] it would help if some useful log was there [10:16:25] and also, the consume/produce metrics are fine [10:18:01] so, kafkacat -b localhost:9092 -t eqiad.resource_change on kafka2002 shows data [10:18:12] that is a topic mirrored [10:18:24] so I'd say that the issue is definitely on the Burrow side [10:43:45] I'll keep going after lunch + errand :) [10:43:47] * elukey lunch! [11:07:28] (03PS2) 10Joal: Correct sqoop-jar-generation script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/438213 [11:12:59] (03PS3) 10Joal: Correct sqoop-jar-generation script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/438213 [11:45:29] !log Launching manual sqooping of revision and archive table to recover from failure [11:45:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:54:08] 10Analytics-Kanban: Fix issue with prod/labs jars for sqoop - https://phabricator.wikimedia.org/T196737#4266932 (10JAllemandou) [11:54:41] 10Analytics-Kanban: Fix issue with prod/labs jars for sqoop - https://phabricator.wikimedia.org/T196737#4266932 (10JAllemandou) p:05Triage>03High a:03JAllemandou [12:11:30] 10Analytics, 10Research: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207#4266965 (10Tbayer) >>! In T138207#4261605, @Tbayer wrote: > Is this going to be carried forward into the [[https://www.mediawiki.org/wiki/Wikimedia_Technology/Annual_Plans/FY2019 | 2018-19 a... [12:49:04] 10Analytics, 10Pageviews-API, 10Product-Analytics, 10Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#4266987 (10Nemo_bis) Is this resolved? The English Wikipedia main page seems to be again within the 20M/d threshold: https://tools.wmflabs.o... [13:43:52] I just restarted burrow to test logging and new alarms [13:43:56] if anything fires it is me :) [13:46:23] elukey: I know you're on fire - It's friday ;) [13:52:30] I am still trying to debug one problem from this morning [13:52:38] in the meantime I've fixed three little one [13:52:40] *ones [13:52:45] but it is still frustrating [13:52:57] now I've set debug logging in Burrow [13:53:08] and I can't see anymore the log that I showed it to you this orning [13:53:23] no trace of the mirror maker eqiad->codfw consumer [14:05:36] oh yes! Alerts didn't fire \o/ [14:21:49] ah ok now I found the new logs, logrotation works as expected [14:21:59] but no trace of the consumer group [14:26:10] this is quite nice [14:26:10] curl localhost:8100/v3/admin/loglevel -X POST -H "Content-Type: application/json" -d '{"level":"info"}' [14:28:18] trying this https://github.com/linkedin/Burrow/wiki/http-request-remove-consumer-group [15:15:35] so I confirmed with kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group kafka-mirror-main-eqiad_to_main-codfw that the offsets are committed [15:15:38] (nice command) [15:51:39] a-team: could i move standup 1 hour later today due to a conflict ? (probably too late for elukey and otto and milimetric are out today) so it might be just joal fdans mforns and myself [15:52:09] works fro me [15:52:56] works for me too [15:53:41] a-tem: super thanks [15:55:23] so joal I have officially no idea what's happening with burrow [15:55:28] :S [15:55:31] it is not a big deal but it seems a bug [15:55:41] elukey: will i wait monday, or should we discuss this tonight?> [15:56:05] maybe Andrew did something to mirror maker that I don't know, but I checked and nothing really stands out [15:56:13] nono it is fine, not urgent [15:56:21] lag metrics not populated, but everything works [15:57:10] nuria_: if it's just going to be 4 of us and it's friday 7pm I would vote for calling it off and send e-scrums :) [15:57:24] but I'll be here either way [16:01:19] oh, just saw the standup change [16:01:37] 1 hour later ok by me [16:02:17] fdans: i know, i understand if you canot make it , totally [17:25:13] * elukey off! [17:29:35] 10Analytics, 10Pageviews-API, 10Product-Analytics, 10Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#4267635 (10Nuria) We do not plan to remove the real (if unintentional) spike of pageviews that hit our servers on 2016, is that what you mea... [17:30:48] 10Quarry: Query Quarry's own database and tools one - https://phabricator.wikimedia.org/T151158#4267636 (10Framawiki) [17:32:28] hey a-team, are you guys haveing problems with Gerrit -> my changes? [17:33:21] 10Quarry: Query Quarry's own database and tools one - https://phabricator.wikimedia.org/T151158#4267641 (10Framawiki) Added tool' users database per user request at https://meta.wikimedia.org/wiki/Research_talk:Quarry#Tool_labs'_user_databases?. [17:35:36] *having [18:14:43] 10Analytics, 10Operations, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4267759 (10RobH) a:03elukey So, the difference between this request, and our current dual cpu misc spec, is we no longer put in 4 * 4TB disks (like stat1006), b... [18:14:46] 10Analytics, 10Operations, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4267762 (10RobH) [18:16:51] 10Analytics, 10Operations, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4267765 (10RobH) Please note I'll be away all next week, so if this needs quotation before I return, please chat with @Cmjohnson & @faidon. [19:13:22] mforns: seems working for me [19:13:27] mforns: still wrong for you? [19:13:57] joal, yea... it says: 'is:wip' operator is not supported by change index version [19:14:10] weeeeeird [19:14:21] mforns: Have you tried login off tehn back on? [19:14:41] joal, I tried incognito mode and happens as well [19:15:28] mforns: When I add "is:wip" in search bar and search, it gives me the same message [19:15:37] ha... [19:15:58] However clicking on "My --> Changes" works for me (nothing in search bar) [19:16:28] hmmmm [19:17:10] joal, I can use reviewer:self and it works [19:17:16] weeeirdddd [19:17:17] k mforns [19:17:22] weeeird indeed !!! [19:17:25] thanks :] [19:17:30] np :) [19:17:50] mforns: Let's ask our opsy-masters next week :) [19:17:56] yep [22:51:39] 10Analytics: Confusing results in Turnilo - https://phabricator.wikimedia.org/T196785#4268396 (10MMiller_WMF)