[00:14:04] (03PS1) 10Stibba: Add Execution time to query table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) [00:16:20] (03PS2) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) [00:17:51] 10Analytics, 10Anti-Harassment: Add ipblocks_restrictions table to Data Lake - https://phabricator.wikimedia.org/T209549 (10Nuria) FYI that mediawiki_ipblocks is scooped monthly. [00:19:04] (03PS3) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) [02:02:05] 10Analytics-EventLogging, 10Analytics-Kanban: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive - https://phabricator.wikimedia.org/T209503 (10mforns) @Tbayer > Just to double-check: The information in the documentation that "Sanitization happens right aft... [03:56:25] (03CR) 10Zhuyifei1999: [C: 04-1] "Also make sure KeyError won't break the page" (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [06:53:07] 10Analytics, 10DC-Ops, 10Operations, 10ops-eqiad: Degraded RAID on analytics1039 - https://phabricator.wikimedia.org/T208706 (10elukey) This host needs to be decommed relatively soon, there are others that showed this kind kind of behavior in the range 28-42. [07:13:45] PROBLEM - Check the last execution of refinery-druid-drop-public-snapshots on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refinery-druid-drop-public-snapshots [07:14:32] buuuuu [07:15:23] system.exit(0) [07:15:28] NameError: name 'system' is not defined [07:15:30] whattt [07:16:42] ah makes sense, we are in the case that no snapshots need to be removed, and it does system.exit(0) [07:16:49] but of course we don't import it [07:16:51] fixing [07:21:11] (03PS1) 10Elukey: refinery-druid-drop-public-snapshots: s/system/sys [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473674 [07:22:17] (03CR) 10Elukey: [V: 032 C: 032] refinery-druid-drop-public-snapshots: s/system/sys [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473674 (owner: 10Elukey) [07:23:53] RECOVERY - Check the last execution of refinery-druid-drop-public-snapshots on an-coord1001 is OK: OK: Status of the systemd unit refinery-druid-drop-public-snapshots [07:52:48] so another broken disk, this time on an1039 [07:52:58] the batch of hosts to be decommed is complaining :) [07:53:29] what I did was applying an override in puppet to avoid the use of the /var/lib/hadoop/data/d dir, since it was mounted as read only (disk completely kaput) [07:53:38] (or almost completely) [07:54:38] 10Analytics, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on analytics1039 - https://phabricator.wikimedia.org/T208706 (10elukey) 05Open>03Resolved a:03elukey The host will be decommed so no point in getting a new disk :) [08:28:19] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install an-worker10[78-96].eqiad.wmnet - https://phabricator.wikimedia.org/T207192 (10elukey) a:05elukey>03Cmjohnson [08:30:07] Thanks elukey for caring those old beasts of ours. We still need them to get to the checkpoint with new hosts :) [08:30:39] And thanks as well for the patch on refinery-druid-drop-public-snapshots [08:30:41] joal: bonjour! [08:30:51] Bonjour Γ  toi elukey :) [08:31:05] the new beasts have 10G NIC, looking forward to have them working :) [08:31:54] Wow, they'll be flying over the network - but our network-administrators will hate us ;) [08:36:24] I feel I need some brass to start the day -- https://www.youtube.com/watch?v=gtSjPUkDVT8 [08:45:51] :D [09:04:33] Personal message for milimetric - I think I know why the sqoop of logging_compat has failures ! [09:05:22] also elukey: the hdfs-wikitet-importer works like a charm as far I can see :) [09:06:50] nice! [09:27:45] 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10JAllemandou) [09:27:53] elukey: do you mind proff-reading this --^ ? [09:33:59] 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10elukey) [09:34:48] 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10elukey) [09:35:31] added some links just for clarity :) [09:35:56] is the plan to switch reading from labsdb to say dbstore? [09:35:59] elukey: understandable I imigine then :) [09:37:07] the only thing that it is not clear is "The main difficulty is that, since data will be extracted from production-slaves" [09:37:21] I can see people asking what slaves you have in mind [09:37:29] Right, will that super-explicit [09:37:35] +make [09:38:57] 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10JAllemandou) [09:38:58] Updated [09:43:12] +1 [10:05:20] 10Analytics, 10Operations, 10Wikimedia-Logstash, 10Core Platform Team Backlog (Watching / External), and 2 others: Review and make librdkafka-0.11.6 installable from stretch-wikimedia - https://phabricator.wikimedia.org/T209300 (10mobrovac) Mental note for @Ottomata @Pchelolo and myself: the current node-r... [10:13:20] (03PS1) 10Fdans: Add offset and underestimate to uniques table schema [analytics/aqs] - 10https://gerrit.wikimedia.org/r/473708 (https://phabricator.wikimedia.org/T167539) [10:14:09] hellooo joal, let's merge this so that we can test load? [10:15:05] fdans: needs to be merged to be applied to beta, right? [10:15:20] And, hellLLLoOOO fdans :) [10:29:59] joal: sorry, ignore me for a while, I got my orders confused :) [10:30:36] I won't bug you as asked, but I'll never ignore you fdans :) [10:52:05] joal: I think I'm missing something... if we change the table schema in aqs, restarting aqs should apply the change to cassandra right? [11:36:11] * elukey lunch! [12:09:34] hmmm elukey I can't access deployment-tin.deployment-prep.eqiad.wmflabs, does it work for you? [13:16:46] 10Analytics, 10Analytics-Wikistats: links tot wikistats pages in stats.wikimedia.org/index.html fail, show en.wikipeda.org main page instead - https://phabricator.wikimedia.org/T209581 (10ezachte) [13:17:39] 10Analytics, 10Analytics-Wikistats: links tot wikistats pages in stats.wikimedia.org/index.html fail, show en.wikipeda.org main page instead - https://phabricator.wikimedia.org/T209581 (10ezachte) 05Open>03Resolved a:03ezachte I edited index.html and replaced en.wikipedia.org/wikistats/... with stats.wik... [13:22:13] hey team :] [13:28:26] elukey, I noticed that moving our 1x1 last week didn't affect all other events, I shifter our 1x1 today, and did it from the edit form, so that all subsequent events get shifter [13:28:30] *shifted [13:44:30] 10Analytics, 10Maps, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Review Grafana Dashboards for search / wdqs / maps - https://phabricator.wikimedia.org/T209584 (10Mathew.onipe) [13:44:47] 10Analytics, 10Maps, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Review Grafana Dashboards for search / wdqs / maps - https://phabricator.wikimedia.org/T209584 (10Mathew.onipe) p:05Triage>03Normal [13:52:18] mforns: ah ok! Can we move it half an hour earlier? [13:54:10] because otherwise it clashes with another meeting :( [13:55:32] elukey: πŸ‘€πŸ™ŒπŸΌβ€οΈ [13:56:11] ah snap I saw only now the notification! [13:56:34] deployment-tin is dead, deployment-deploy01 is the correct one [13:56:37] fdans: --^ [13:56:45] ohhhh [13:56:50] updating dogs [13:56:55] i mean docs [13:56:59] thank you elukey [14:00:13] anybody checked mediacounts ? [14:01:37] elukey: what's to be checked? [14:02:48] Fatal Error - Oozie Job mediacounts-load-wf-2018-11-15-8 [14:03:01] but it was 4h ago [14:03:23] checking [14:04:34] (03PS4) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) [14:07:08] ahhhh it was on an1039 [14:07:16] that had trobles this morning [14:07:21] all right I need to restart it [14:07:51] !log re-run mediacounts-load-wf-2018-11-15-8 - died due to issues on an1039 (happened this morning, broken disk) [14:07:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:08:20] omg scap working in beta aqs with a non-master HEAD [14:08:43] * fdans it's an early christmas miracle [14:09:51] (03PS5) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) [14:11:33] (03PS6) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) [14:15:13] (03PS7) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) [14:35:31] (03PS1) 10Fdans: [wip] Add offset and underestimate to uniques loading job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473746 (https://phabricator.wikimedia.org/T167539) [14:41:34] elukey, sorry no headphones, yes you can move the meeting whenever is good for you [14:42:01] (03PS2) 10Fdans: [wip] Add offset and underestimate to uniques loading job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473746 (https://phabricator.wikimedia.org/T167539) [14:43:11] mforns: do you want to reschedule to tomorrow? (for the headphones) [14:43:37] elukey, no no! I was saying that I didn't get your ping, because my headphones were off [14:43:45] ahhh [14:44:22] ok I changed the meeting! [14:44:58] elukey, hmmm, don't see any change... [14:45:56] I moved it to 15 mins from now [14:50:57] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Services (watching): Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10Ottomata) We need a name! Have been brainstorming over on https://etherpad.wikimedia.org/p/event-platform. The current thre... [14:51:48] !log testing load of new uniques fields in test keyspace in cassandra [14:51:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:53:39] mforns: https://twitter.com/jimmy_wales/status/1063081673666060288 [14:53:46] este hombre estΓ‘ en tos los saraos [14:53:56] a ver [14:54:07] 10Analytics, 10Analytics-Kanban, 10Research: Cannot connect to Spark with Jupyter notebook on stat1007 - https://phabricator.wikimedia.org/T208896 (10bmansurov) Just a follow up that Python kernels won't start on stat1007 for some reason. I've given up trying to fix the issue for now. [14:54:44] fdans, heh, a ver si no se equivoca como Woody Harrelson! [14:55:50] in one of the first games, he was moving for Fabi, who asked for e4, and Harrelson moved d4! (in the end the refs let them go back and play e4) [14:57:01] 10Analytics, 10Multimedia: Add mediacounts to pageview API - https://phabricator.wikimedia.org/T88775 (10BerndFiedlerWMDE) this might just solve a problem we're having over here in DE. There's great content providers hesitating to contribute to wikimedia commons for they cannot implement reporting. Let's fix... [14:58:49] (03CR) 10Addshore: "So we still want this? If so I'm going to try to add it to my review queue." [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/440133 (https://phabricator.wikimedia.org/T196883) (owner: 10WMDE-leszek) [14:58:52] (03CR) 10Addshore: "So we still want this? If so I'm going to try to add it to my review queue." [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/440876 (https://phabricator.wikimedia.org/T196883) (owner: 10WMDE-leszek) [15:19:52] joal: load test successful! :D [15:20:15] checked numbers matching against the dumps [15:21:13] awesome fdans :) [15:22:08] joal: if you want we can merge and deploy the aqs patch in order to add the two blank fields in production [15:22:57] fdans: I assume you've tested AQS behavior on old data with the 2 blnk fields [15:23:00] ? [15:24:19] hmmm, I've tested it with an empty keyspace, let me test with a keyspace with data on beta to make sure [15:24:35] (joal ^) [15:24:36] fdans: it would be bad if it doesn't work :) [15:24:52] fdans: out of curiosity, have you used cluster in beta to test loading? [15:25:43] joal: nope, in beta i only tested the fields creation [15:26:06] Right - Then you have followed the steps we discussed yesterda, right? [15:26:20] yessir [15:26:26] Great :) [15:28:25] I have a question on commons and other wiki projects - When a file is played from 'upload.wikimedia.org', is it on commons or could it be somewhere else? [15:36:40] joal: hmmm re: new fields I'm pretty sure that aqs will not add fields automatically - it will only create missing tables [15:36:55] i think we have to add the fields manually [15:37:07] hm - this feels bizarre [15:39:47] joal: tested both adding the field in aqs and restarting the service (which changed nothing) and just dropping the keyspace and restarting (which does create the keyspace with the new field) [15:40:26] hm - fdans - May I ask another test? [15:40:33] always joal [15:41:05] fdans: I'd like to see if trying to manually add data using AQS (as we do for tests for instance) makes aqs update the fields [15:41:15] Also fdans - You have changed the version number, right? [15:41:37] πŸ‘€ [15:41:41] * fdans no [15:42:33] This might come into play :) [15:53:55] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Ingest data into druid for readingDepth schema - https://phabricator.wikimedia.org/T205562 (10Tbayer) 05Resolved>03Open >>! In T205562#4634049, @mforns wrote: > @Tbayer @Nuria > > I loaded 1 month (Sept 2018) of Re... [15:58:59] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Ingest data aggregate ReadingDepth data into Druid - https://phabricator.wikimedia.org/T205562 (10Tbayer) [15:59:29] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Ingest data aggregate ReadingDepth data into Druid - https://phabricator.wikimedia.org/T205562 (10Tbayer) [16:06:28] joal: ok after a round of really frustrating testing, I think that when bumping the version, aqs will add the fields only when there's no rows in the table [16:07:28] Really ? [16:07:33] Wow that's uncool ! [16:08:11] fdans: Could it be that adding fields to an existing table takes a very long time? [16:09:05] joal: hmmm I mean the table I'm testing it with has exactly one row... [16:09:18] joal: I could try and see how long an ALTER takes [16:09:35] nah [16:09:42] one row is not exactly big :) [16:09:52] Crappy crap ! [16:10:08] We could and possibly should ask the services team [16:11:10] why is it a problem to add it manually joal? [16:11:45] fdans: because schema and version and all is managed by AQS, validating through schema hash, etcv [16:22:25] joal: ok... what is it that we want to ask the services team? [16:23:02] fdans: if AQS not updating the table with new fields is an expected behavior :) [16:41:07] joal: Pchelolo agrees that this is not what should be happening [16:41:20] hmm, where is the logic that applies these schema changes? [16:42:01] fdans: sooo... https://github.com/wikimedia/restbase-mod-table-cassandra/blob/master/lib/schemaMigration.js#L18 [16:42:24] but we actually have changed quite a bit there, which version of the package are you using? [16:43:20] I'm pretty sure we upgraded not long ago.. [16:43:23] lmcheck [16:44:34] 10Analytics, 10Analytics-Kanban: AQS unique devices api should report offset/underestimate separately - https://phabricator.wikimedia.org/T164201 (10Nuria) p:05Low>03Normal [16:46:35] 10Analytics, 10Analytics-Kanban: AQS unique devices api should report offset/underestimate separately - https://phabricator.wikimedia.org/T164201 (10Nuria) [16:46:38] Pchelolo: ohhh I deployed to beta with an old version, this might have to do with it, thanks for the tip [16:46:38] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Final steps to expose project family unique devices data - https://phabricator.wikimedia.org/T167539 (10Nuria) [16:47:39] fdans: it might still not work with the new version, if so - ping me [16:47:50] thank you :) [16:49:19] Pchelolo: o/ [16:49:26] hi elukey [16:49:37] about https://phabricator.wikimedia.org/T207994 - is 'once an hour' always at the same time? [16:50:11] if you want I can start tcpdump on all kafka20* with some pcap log rotation etc.. [16:50:35] elukey: no, not at the same time [16:50:40] super randomly [16:50:42] so we store only X amount of data maximum, and then we stop it when you have another repro case [16:50:47] and we analyze the pcaps [16:51:00] not sure if you already had a chat with others SREs for this [16:51:07] in case I'll leave the work to more expert people [16:51:13] but if not, I can offer my help :) [16:53:34] elukey: thank you :) but I have a meeting in 5 mins [16:53:42] let me know in case :) [16:53:45] it's not gonna be a long one, will ping you after it ok? [16:53:50] sure [17:01:26] ping fdans [17:20:16] 10Analytics, 10Analytics-EventLogging, 10Performance-Team (Radar), 10Readers-Web-Backlog (Tracking): Make it easier to enable EventLogging's debug mode - https://phabricator.wikimedia.org/T188640 (10Nuria) [17:20:47] 10Analytics, 10Analytics-EventLogging, 10Performance-Team (Radar), 10Readers-Web-Backlog (Tracking), 10cloud-services-team (Kanban): Make it easier to enable EventLogging's debug mode - https://phabricator.wikimedia.org/T188640 (10Nuria) [17:21:10] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team (Radar), 10Readers-Web-Backlog (Tracking): Make it easier to enable EventLogging's debug mode - https://phabricator.wikimedia.org/T188640 (10Nuria) [17:23:53] 10Analytics, 10Anti-Harassment (AHT Sprint 33): πŸ‘©β€πŸ‘§ Track how often blocked user attempt to edit - https://phabricator.wikimedia.org/T189724 (10dbarratt) [17:37:55] elukey: so I'm done with meetings, but I really have 0 time to work on that thing right now [17:39:24] I guess running a tcpdump for eventbus on one of the codfw nodes and writing the source ip into a file for a day might be risky cause the file will grow pretty quickly [17:40:18] If you don't mind I will have time later today when you sleep to think about all the IPs that should write into eventbus in codfw and make a list for you to exclude from writing into a file [17:40:33] then we can make some script to run [17:41:12] actually, the known IPs are pretty simple - it's all restbase and all scb hosts in codfw [17:51:33] Pchelolo: (in a meeting, should be ready in ~40 mins) [17:55:03] 10Analytics: XML dumps imports into hadoop need to adapt to latest changes done in xml schema for MCR - https://phabricator.wikimedia.org/T209545 (10fdans) p:05Triage>03Normal [17:56:24] 10Analytics, 10Analytics-Kanban: Hive query fails with local join - https://phabricator.wikimedia.org/T209536 (10fdans) a:03JAllemandou [17:59:33] 10Analytics, 10Maps, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Review Grafana Dashboards for search / wdqs / maps - https://phabricator.wikimedia.org/T209584 (10fdans) p:05Normal>03Triage [17:59:38] 10Analytics, 10Maps, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Review Grafana Dashboards for search / wdqs / maps - https://phabricator.wikimedia.org/T209584 (10fdans) p:05Triage>03Normal [18:00:01] 10Analytics, 10Analytics-Kanban: Hive query fails with local join - https://phabricator.wikimedia.org/T209536 (10JAllemandou) Ping @elukey, @Ottomata : I think we should apply `hive.auto.convert.join = false` in hive-site.xml so that map-side joins are never done automatically. [18:06:16] 10Analytics, 10Analytics-EventLogging: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453 (10fdans) p:05Triage>03Normal [18:07:28] Pchelolo: about the size - I can force tcpdump to create a fixed amount of pcap files of size X [18:07:51] so say we state that we want max 10G allocated maximum at any given time, we can do it (disk allocation I mean) [18:08:02] I did the same for the objectcache [18:08:27] 10Analytics, 10Anti-Harassment: Add ipblocks_restrictions table to Data Lake - https://phabricator.wikimedia.org/T209549 (10Milimetric) @TBolliger thanks for paying such close attention to our work! As part of that task, we've decided to just import all tables from the production replicas. I'm adapting our s... [18:08:29] the main problem is that it will be a "moving window" of pcap traffic, so we'll need to stop it as soon as we find the weird log [18:08:30] kafka machines have a lot of disk, so I guess that will work [18:09:33] Pchelolo: ah even better, we can use httpry! [18:09:41] that logs http requests only [18:09:46] so we remove a ton of garbage [18:10:16] ah but it doesn't auto-rotate pcaps [18:10:23] we mostly need the source IP [18:10:57] yeah [18:11:08] I guess it's too late for you now cause we'd need to run it for quite a while before being able to do any analysis [18:11:10] well we can do some tests tomorrow if you want [18:11:42] if you could start it your morning tomorrow I guess when I wake up we will have nough data to look at [18:11:58] is there a easy way for me to find the anomalies when they are enqueued in kafka? Like a signature etc.. so I can start tomorrow eu morning [18:12:16] and stop if I find anything [18:12:39] but I guess that anything not restbase or mw eqiad will be one possible lead [18:12:45] anomalies in kafka? all the messages in revision-create topic in codfw are anomalies [18:12:57] ah okok [18:13:33] as for the IPs - anything not restbase or scb [18:13:35] so I can tail the topic, and when I see a new even I stop capturing traffic [18:13:44] *event [18:13:47] elukey: ye, makes sence [18:13:51] ack [18:14:18] cool, thanks for heloing with this [18:14:20] I am going to update the task with the plan [18:14:37] got nerd sniped by Alex [18:14:39] :D [18:28:59] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10RobH) p:05Triage>03High [18:30:14] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10RobH) @elukey: Can you confirm the racking and hostname details? Hostname Proposal: dbstore100[3-5].eqiad.wmnet, since this is a dbstore1002 replacement or... [18:38:22] 10Analytics-EventLogging, 10Analytics-Kanban: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive - https://phabricator.wikimedia.org/T209503 (10Neil_P._Quinn_WMF) >>! In T209503#4748908, @mforns wrote: > @Tbayer >> Just to double-check: The information in t... [18:38:47] 10Analytics, 10EventBus, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), 10Core Platform Team Backlog (Later), 10Services (later): revision-create events are sometimes emitted in a secondary DC - https://phabricator.wikimedia.org/T207994 (10elukey) I had a chat with Petr a... [18:39:02] 10Analytics, 10EventBus, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), 10Core Platform Team Backlog (Later), and 2 others: revision-create events are sometimes emitted in a secondary DC - https://phabricator.wikimedia.org/T207994 (10elukey) [18:51:07] (03CR) 10Zhuyifei1999: "@Framawiki, what do you think about adding a @property to decode the json in models/queryrun.py?" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [18:56:12] elukey, let me know if you want to pair for deletion, we can do that tomorrow if you're leaving [19:34:38] hey a-team: I'm trying to use kafkacat to monitor messages for a schema we just deployed, but don't seem to be able to consume anything [19:35:41] Nettrom: what schema? [19:35:48] milimetric: EditorJourney [19:35:55] checking [19:36:23] Grafana shows some events, but I was hoping to use kafkacat to see what's happening since we're trying to debug some issues [19:39:30] Nettrom: the client validation update went live today, do you know how to enable it now and see whether there are validation errors? [19:39:50] (grafana doesn't show any errors, so I don't think this is the problem, but just checking since that just went live) [19:40:14] milimetric: no, I'm not familiar with that [19:40:35] Nettrom: do this: [19:40:36] https://www.irccloud.com/pastebin/ST2ObGCy/ [19:40:44] in the JS console in your browser [19:41:00] milimetric: the code is logging server side [19:41:08] ah ok, never mind then :) [19:44:21] Nettrom: send me your kafkacat command, and in the meantime here's another way to get data: [19:44:22] hdfs dfs -text /wmf/data/raw/eventlogging/eventlogging_EditorJourney/hourly/2018/11/15/19/eventlogging_EditorJourney.1005.0.3.390.1542308400000 [19:44:37] (from stat1004 or 1007) [19:44:48] milimetric: kafkacat -C -b kafka1012 -t eventlogging_EditorJourney [19:45:37] which I'm running on stat1007 [19:51:15] 10Analytics, 10EventBus, 10Growth-Team, 10MediaWiki-Watchlist, and 6 others: Clear watchlist on enwiki only removes 50 items at a time - https://phabricator.wikimedia.org/T207329 (10kostajh) @Pchelolo any update on this? The "Clear watchlist" link on enwiki is still not working for me. Only one job executes. [19:52:33] 10Analytics, 10EventBus, 10Growth-Team, 10MediaWiki-Watchlist, and 6 others: Clear watchlist on enwiki only removes 50 items at a time - https://phabricator.wikimedia.org/T207329 (10Pchelolo) @Ottomata Let's try again? https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventBus/+/472189/ [19:55:27] Nettrom: ok, I don't understand why kafkacat won't work, but were you able to look in Hive/hdfs for your events? [19:55:48] Nettrom: for example select * from editorjourney where year=2018 limit 10; [19:56:06] (in the event database, figure you're familiar with it, but not sure) [19:57:04] milimetric: yeah, I'm puzzled as to why it doesn't work as well, I don't see any messages for neither the EditorJourney schema, nor for other schemas that I know have data flowing in (e.g. EditAttemptStep) [19:57:26] Nettrom: yeah, I also tried like NavigationTiming which always has some data flowing [19:57:28] milimetric: HDFS doesn't seem to yet have the events I'm interested in since it only contains events up to 19:05 UTC [19:57:52] yeah, makes sense, grr... will try and see what's going on [19:58:07] I'll go read some webpages to see if I can figure this out [19:58:23] oddly I see two topics for EditorJourney, one is Eventlogging_ and one is eventlogging_ (different caps) [19:58:47] Nettrom: I don't think you're making any mistakes, must be a change in kafka I'm not aware of [19:58:59] (not usually the part of our world I work on, sorry, other folks are out today) [19:59:27] milimetric: yeah, I noticed that it seems that I can introduce new topics by misspelling things in parameters to kafkacat, sorry ;) [19:59:47] Nettrom: oh! ok, that at least explains that :) [20:08:35] Nettrom: argh, sorry, they switched clusters and the docs are out of date [20:08:36] kafkacat -C -b kafka-jumbo1001.eqiad.wmnet -t eventlogging_EditorJourney [20:08:41] try that ^ [20:08:45] I'll go around cleaning up docs [20:09:31] milimetric: that works much better, thanks for figuring that out! :) [20:16:42] 10Analytics: Update EventLogging kafkacat examples to use jumbo - https://phabricator.wikimedia.org/T209635 (10Milimetric) [20:17:02] 10Analytics, 10Analytics-Kanban: Update EventLogging kafkacat examples to use jumbo - https://phabricator.wikimedia.org/T209635 (10Milimetric) p:05Triage>03High a:03Milimetric [20:36:00] 10Analytics, 10Analytics-Kanban: Update EventLogging kafkacat examples to use jumbo - https://phabricator.wikimedia.org/T209635 (10nettrom_WMF) I updated the "How do I" section of [[ https://wikitech.wikimedia.org/wiki/Kafka | Kafka ]] on Wikitech as well, as it was mentioned there. [20:42:14] 10Analytics, 10Multimedia: Add mediacounts to pageview API - https://phabricator.wikimedia.org/T88775 (10JAllemandou) Did some analysis in term of data size and storage: - Our Cassandra instances (2 per host) each have 2.9Tb usable space. - We currently use ~720Gb per instance (this accounts for all keyspace... [20:53:25] joal: not sure if you're around, I was going to write an update on the sqoop task, let me know if you want to vet it first [20:53:52] Hey milimetric - I can read it once sent :) [21:19:06] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Milimetric) It seems filtered_tables.txt are not updated to work with the new columns, I asked about it here: rOPUP188a50fa82a... [22:04:20] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10bd808) >>! In T209031#4751986, @Milimetric wrote: > - Step 1: use filtered_tables.txt to generate sqoop selects, selecting a... [22:16:13] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Milimetric) >>! In T209031#4752128, @bd808 wrote: >>>! In T209031#4751986, @Milimetric wrote: >> - Step 1: use filtered_tabl... [22:22:13] 10Analytics, 10Research: Copy Wikidata dumps to HDFs - https://phabricator.wikimedia.org/T209655 (10bmansurov) [22:49:07] 10Analytics, 10Analytics-Kanban, 10New-Readers, 10Patch-For-Review: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10atgo) Thanks @Nuria. It sounds like @Prtksxna isn't sure if he's instrumented correctly. >>! In T202592#4741120, @Prtksxna wrote: > Same as @atgo, I am able to se... [23:03:39] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Bstorm) I'm aiming to write tests for this script shortly because it is too complex to not have them. Overloading the scripts... [23:17:06] (03CR) 10Nuria: [C: 04-1] "Ping on this." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471038 (https://phabricator.wikimedia.org/T208332) (owner: 10Nettrom) [23:19:06] (03CR) 10Nuria: Add bin/refinery-drop-older-than (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471279 (https://phabricator.wikimedia.org/T199836) (owner: 10Mforns) [23:23:08] (03PS8) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) [23:25:47] (03CR) 10Nettrom: "> Ping on this." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471038 (https://phabricator.wikimedia.org/T208332) (owner: 10Nettrom) [23:25:51] (03CR) 10Stibba: "> @Framawiki, what do you think about adding a @property to decode" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [23:31:59] so I have a question (again): what's the expected lag of the Data Lake? https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Hadoop_&_Hive says "New data should be available for querying at most every hour." while it looks like several event tables are 3+ hours behind. Is there a place to go to know more about this when these things happen? [23:32:38] (apart from asking about it here, that is) [23:41:44] (03PS8) 10Mforns: Add bin/refinery-drop-older-than [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471279 (https://phabricator.wikimedia.org/T199836) [23:42:17] (03CR) 10Mforns: Add bin/refinery-drop-older-than (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471279 (https://phabricator.wikimedia.org/T199836) (owner: 10Mforns) [23:45:17] Nettrom: new data appears every hour (true) but it is not 1 hour behind from real time [23:45:37] Nettrom: depending on cluster it might be 2/3 [23:46:38] Nettrom: this is THE PLACE to ask [23:47:17] Nettrom: you saw your events in kafka but they are not in events yet, is that the concern? [23:47:46] nuria: happy to hear I've come to the right place :) and yes, the events are in kafka (and on disk in Hadoop), but not in the events table yet [23:49:11] (03CR) 10Zhuyifei1999: "Yes, as a @property." [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba) [23:49:35] Nettrom: ok, that makes sense , they should appear in the next hour if they do not have a problem in being refined , if they were to have a problem the eventerror table in refine [23:49:41] Nettrom: will store the errors [23:51:51] nuria: sounds good! I'll make a mental note to be patient. I did check the eventerror table earlier today, and will make sure I check that in case I suspect something is up [23:51:56] nuria: thanks for the help! :)