[07:15:00] 10Analytics, 10EventBus, 10Services (next), 10User-Elukey, 10Wikimedia-Incident: Clean up leftover topics - https://phabricator.wikimedia.org/T199510 (10elukey) Just to be sure, here's the len of all the topics: ``` cat topics | while read line; do echo $line" "${#line}; done | sort -n -k 2 eqiad.change... [07:18:05] 10Analytics, 10Operations, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345 (10elukey) @RobH If there are no more blockers I'd proceed with the quote request (no rush, just wanted to avoid this task to stall). [07:19:15] 10Analytics, 10Operations, 10hardware-requests, 10User-Elukey: eqiad | (3) Labs Data Lake hardware - https://phabricator.wikimedia.org/T199674 (10elukey) [07:19:17] 10Analytics, 10Operations, 10hardware-requests, 10User-Elukey: eqiad | (14 + 6) hadoop hardware refresh and expansion - https://phabricator.wikimedia.org/T199673 (10elukey) [07:46:33] 10Analytics, 10EventBus, 10Operations, 10Services: Set a proper max open files limit for Kafka clusters - https://phabricator.wikimedia.org/T200177 (10elukey) p:05Triage>03High [07:55:14] 10Analytics, 10EventBus, 10Services (watching): Remove `kafka-mirror` unit from main kafka cluster - https://phabricator.wikimedia.org/T199443 (10elukey) So this `kafka-mirror` instance is defined as follows: ``` elukey@kafka1001:~$ sudo systemctl cat kafka-mirror # /lib/systemd/system/kafka-mirror.service... [08:30:32] (03CR) 10Joal: "Thanks for the method-names change, looks a lot better :)" (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans) [08:36:43] (03PS1) 10Joal: Update pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447381 [08:38:28] (03CR) 10Joal: "I have provided a patch including this change with the date update (see https://gerrit.wikimedia.org/r/c/analytics/refinery/+/447381)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/446399 (https://phabricator.wikimedia.org/T188776) (owner: 10Reedy) [09:07:59] (03PS6) 10Fdans: Adds empty dir removal to hive partition dropping jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) [09:08:29] (03CR) 10Fdans: Adds empty dir removal to hive partition dropping jobs (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans) [09:15:16] added two interesting graphs at the bottom of https://grafana.wikimedia.org/dashboard/db/kafka [09:15:25] namely jvm's nio mapped/direct allocations [09:15:34] very interesting to compare jumbo vs main-eqiad [09:18:02] elukey: o/ -- I think I'll need help to understand the meaning :) [09:22:17] joal: morning :) [09:23:47] so IIUC direct memory areas are the ones that the jvm tries to request to the os and sure as natively as possible without using buffers etc.. I need to figure out how the jvm explicitly requests them, but kafka doesn't use them basically. The interesting part is memory mapped areas, since all the kafka log files are mapped in memory and we have a hard limit of 65k on our hosts now (linux's defa [09:23:53] ult) [09:24:18] it seems that main-eqiad holds ~3k mmap areas per host, but the total size is ~15G (per host) [09:24:41] jumbo is different - a lot more mmap areas (5x main eqiad) but same size (more or less) [09:26:07] log files are biffer on main-eqiad afaics in /srv/kafka/data [09:27:43] *bigger [09:28:32] so basically everything is fine now but it is better to keep an eye on those [09:28:37] probably even adding alarming [09:29:14] joal: for example, this is during the outage [09:29:15] https://grafana.wikimedia.org/dashboard/db/kafka?panelId=50&fullscreen&orgId=1&from=1531232107246&to=1531445400349 [09:29:52] wow --^ [09:31:24] yeah [09:41:42] (03CR) 10Joal: "Comments inline, thanks Francisco :)" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans) [09:46:26] (03CR) 10Fdans: "Thank you for the reviews Joseph :D" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans) [09:57:31] the funny thing is also that there is no clear way, afaics, to limit the maximum number of topics in kafka [09:59:03] 10Analytics, 10ChangeProp, 10EventBus, 10WMF-JobQueue, 10Services (designing): Consider disabling automatic topic creation in main-kafka - https://phabricator.wikimedia.org/T199432 (10elukey) >>! In T199432#4426780, @fgiunchedi wrote: > I think a good balance between safety and ease of use would be if ka... [10:32:24] * elukey lunch + errand! [11:26:26] (03CR) 10Joal: "Probably the last round :)" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans) [13:24:36] 10Analytics, 10ChangeProp, 10EventBus, 10WMF-JobQueue, 10Services (designing): Consider disabling automatic topic creation in main-kafka - https://phabricator.wikimedia.org/T199432 (10fgiunchedi) >>! In T199432#4444762, @elukey wrote: >>>! In T199432#4426780, @fgiunchedi wrote: >> I think a good balance... [13:26:23] elukey: Hi again [13:26:46] elukey: Most of our coworkers will be missing from today standup - Do you mind if we move if forward? [13:28:23] joal: I think we can skip, not a big deal! [13:28:32] elukey: +1 :) [13:29:18] elukey: It makes my life easier to be able to care the kids at the time - I'll be working late though :) [13:29:50] sure! let's resync tomorrow [13:30:15] elukey: works for me [13:30:27] elukey: hopefully mforns will see this thread [13:30:35] elukey: I'll send him an email :) [13:30:46] Actually, will send an email to internal [16:38:38] (03PS7) 10Fdans: Adds empty dir removal to hive partition dropping jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) [16:39:38] 10Analytics, 10Discovery-Search (Current work): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10EBernhardson) [16:39:52] (03CR) 10Fdans: Adds empty dir removal to hive partition dropping jobs (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans) [16:40:52] 10Analytics, 10Discovery-Search (Current work): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10EBernhardson) [16:40:59] joal: sending this patch flying somewhere over the coast of Somalia :) [16:41:49] NeilPatelQuinn[m is in this flight too and was so nice to let me borrow his mac charger [16:48:38] * joal loves exotic patches :) [17:05:15] (03CR) 10Joal: [C: 031] "LGTM, let's have somebody else opnion :)" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans) [17:37:38] 10Analytics, 10EventBus, 10Operations, 10Services (watching), and 2 others: Document the process for hard-deleting topics in kafka - https://phabricator.wikimedia.org/T199441 (10elukey) Just created https://wikitech.wikimedia.org/wiki/Kafka/Administration#Delete_a_topic, should be enough! [18:04:23] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Decommision edit analysis dashboard - https://phabricator.wikimedia.org/T199340 (10Milimetric) [18:06:38] 10Analytics, 10EventBus, 10Services (next), 10User-Elukey, 10Wikimedia-Incident: Clean up leftover topics - https://phabricator.wikimedia.org/T199510 (10elukey) 05Open>03Resolved a:03elukey All topics deleted! [18:07:40] 10Analytics, 10EventBus, 10Operations, 10Services (watching), and 2 others: Document the process for hard-deleting topics in kafka - https://phabricator.wikimedia.org/T199441 (10elukey) 05Open>03Resolved [18:07:53] 10Analytics: Generate pagecounts-ez data back to 2008 - https://phabricator.wikimedia.org/T188041 (10Milimetric) That works, let me know if you need to take them down before I get to copy them, and I'll try to squeeze it in. [18:08:43] 10Analytics, 10EventBus, 10Services (watching): Remove `kafka-mirror` unit from main kafka cluster - https://phabricator.wikimedia.org/T199443 (10elukey) 05Open>03Resolved [18:11:29] * elukey off! [18:42:24] 10Analytics: Generate pagecounts-ez data back to 2008 - https://phabricator.wikimedia.org/T188041 (10CristianCantoro) >>! In T188041#4445976, @Milimetric wrote: > That works, let me know if you need to take them down before I get to copy them, and I'll try to squeeze it in. I'm in no particular hurry, nor I hav... [20:25:20] 10Analytics, 10Operations, 10decommission: Decommission stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T173097 (10RobH) [20:28:05] 10Analytics, 10EventBus, 10Services (done), 10User-Elukey, 10Wikimedia-Incident: Clean up leftover topics - https://phabricator.wikimedia.org/T199510 (10mobrovac) [20:28:31] 10Analytics, 10Operations, 10decommission: Decommission stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T173097 (10RobH) a:05RobH>03Cmjohnson [20:28:57] 10Analytics, 10Operations, 10decommission, 10ops-eqiad: Decommission stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T173097 (10RobH) [21:37:07] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Data request for logs from SparQL interface at query.wikidata.org - https://phabricator.wikimedia.org/T143819 (10Andrawaag)