[00:14:04] <wikibugs>	 (03PS1) 10Stibba: Add Execution time to query table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264)
[00:16:20] <wikibugs>	 (03PS2) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264)
[00:17:51] <wikibugs>	 10Analytics, 10Anti-Harassment: Add ipblocks_restrictions table to Data Lake - https://phabricator.wikimedia.org/T209549 (10Nuria) FYI that  mediawiki_ipblocks  is scooped monthly.
[00:19:04] <wikibugs>	 (03PS3) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264)
[02:02:05] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive - https://phabricator.wikimedia.org/T209503 (10mforns) @Tbayer  > Just to double-check: The information in the documentation that "Sanitization happens right aft...
[03:56:25] <wikibugs>	 (03CR) 10Zhuyifei1999: [C: 04-1] "Also make sure KeyError won't break the page" (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba)
[06:53:07] <wikibugs>	 10Analytics, 10DC-Ops, 10Operations, 10ops-eqiad: Degraded RAID on analytics1039 - https://phabricator.wikimedia.org/T208706 (10elukey) This host needs to be decommed relatively soon, there are others that showed this kind kind of behavior in the range 28-42.
[07:13:45] <icinga-wm>	 PROBLEM - Check the last execution of refinery-druid-drop-public-snapshots on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refinery-druid-drop-public-snapshots
[07:14:32] <elukey>	 buuuuu
[07:15:23] <elukey>	 system.exit(0)
[07:15:28] <elukey>	 NameError: name 'system' is not defined
[07:15:30] <elukey>	 whattt
[07:16:42] <elukey>	 ah makes sense, we are in the case that no snapshots need to be removed, and it does system.exit(0)
[07:16:49] <elukey>	 but of course we don't import it
[07:16:51] <elukey>	 fixing
[07:21:11] <wikibugs>	 (03PS1) 10Elukey: refinery-druid-drop-public-snapshots: s/system/sys [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473674
[07:22:17] <wikibugs>	 (03CR) 10Elukey: [V: 032 C: 032] refinery-druid-drop-public-snapshots: s/system/sys [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473674 (owner: 10Elukey)
[07:23:53] <icinga-wm>	 RECOVERY - Check the last execution of refinery-druid-drop-public-snapshots on an-coord1001 is OK: OK: Status of the systemd unit refinery-druid-drop-public-snapshots
[07:52:48] <elukey>	 so another broken disk, this time on an1039 
[07:52:58] <elukey>	 the batch of hosts to be decommed is complaining :)
[07:53:29] <elukey>	 what I did was applying an override in puppet to avoid the use of the /var/lib/hadoop/data/d dir, since it was mounted as read only (disk completely kaput)
[07:53:38] <elukey>	 (or almost completely)
[07:54:38] <wikibugs>	 10Analytics, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on analytics1039 - https://phabricator.wikimedia.org/T208706 (10elukey) 05Open>03Resolved a:03elukey The host will be decommed so no point in getting a new disk :)
[08:28:19] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install an-worker10[78-96].eqiad.wmnet - https://phabricator.wikimedia.org/T207192 (10elukey) a:05elukey>03Cmjohnson
[08:30:07] <joal>	 Thanks elukey for caring those old beasts of ours. We still need them to get to the checkpoint with new hosts :)
[08:30:39] <joal>	 And thanks as well for the patch on refinery-druid-drop-public-snapshots
[08:30:41] <elukey>	 joal: bonjour!
[08:30:51] <joal>	 Bonjour à toi elukey :)
[08:31:05] <elukey>	 the new beasts have 10G NIC, looking forward to have them working :)
[08:31:54] <joal>	 Wow, they'll be flying over the network - but our network-administrators will hate us ;)
[08:36:24] <joal>	 I feel I need some brass to start the day -- https://www.youtube.com/watch?v=gtSjPUkDVT8
[08:45:51] <elukey>	 :D
[09:04:33] <joal>	 Personal message for milimetric - I think I know why the sqoop of logging_compat has failures !
[09:05:22] <joal>	 also elukey: the hdfs-wikitet-importer works like a charm as far I can see :)
[09:06:50] <elukey>	 nice!
[09:27:45] <wikibugs>	 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10JAllemandou)
[09:27:53] <joal>	 elukey: do you mind proff-reading this --^ ?
[09:33:59] <wikibugs>	 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10elukey)
[09:34:48] <wikibugs>	 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10elukey)
[09:35:31] <elukey>	 added some links just for clarity :)
[09:35:56] <elukey>	 is the plan to switch reading from labsdb to say dbstore?
[09:35:59] <joal>	 elukey: understandable I imigine then :)
[09:37:07] <elukey>	 the only thing that it is not clear is "The main difficulty is that, since data will be extracted from production-slaves"
[09:37:21] <elukey>	 I can see people asking what slaves you have in mind
[09:37:29] <joal>	 Right, will that super-explicit
[09:37:35] <joal>	 +make
[09:38:57] <wikibugs>	 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10JAllemandou)
[09:38:58] <joal>	 Updated
[09:43:12] <elukey>	 +1
[10:05:20] <wikibugs>	 10Analytics, 10Operations, 10Wikimedia-Logstash, 10Core Platform Team Backlog (Watching / External), and 2 others: Review and make librdkafka-0.11.6 installable from stretch-wikimedia - https://phabricator.wikimedia.org/T209300 (10mobrovac) Mental note for @Ottomata @Pchelolo and myself: the current node-r...
[10:13:20] <wikibugs>	 (03PS1) 10Fdans: Add offset and underestimate to uniques table schema [analytics/aqs] - 10https://gerrit.wikimedia.org/r/473708 (https://phabricator.wikimedia.org/T167539)
[10:14:09] <fdans>	 hellooo joal, let's merge this so that we can test load?
[10:15:05] <joal>	 fdans: needs to be merged to be applied to beta, right?
[10:15:20] <joal>	 And, hellLLLoOOO fdans :)
[10:29:59] <fdans>	 joal: sorry, ignore me for a while, I got my orders confused :)
[10:30:36] <joal>	 I won't bug you as asked, but I'll never ignore you fdans :)
[10:52:05] <fdans>	 joal: I think I'm missing something... if we change the table schema in aqs, restarting aqs should apply the change to cassandra right?
[11:36:11] * elukey lunch!
[12:09:34] <fdans>	 hmmm elukey I can't access deployment-tin.deployment-prep.eqiad.wmflabs, does it work for you?
[13:16:46] <wikibugs>	 10Analytics, 10Analytics-Wikistats: links tot wikistats pages in stats.wikimedia.org/index.html fail, show en.wikipeda.org main page instead - https://phabricator.wikimedia.org/T209581 (10ezachte)
[13:17:39] <wikibugs>	 10Analytics, 10Analytics-Wikistats: links tot wikistats pages in stats.wikimedia.org/index.html fail, show en.wikipeda.org main page instead - https://phabricator.wikimedia.org/T209581 (10ezachte) 05Open>03Resolved a:03ezachte I edited index.html and replaced en.wikipedia.org/wikistats/... with stats.wik...
[13:22:13] <mforns>	 hey team :]
[13:28:26] <mforns>	 elukey, I noticed that moving our 1x1 last week didn't affect all other events, I shifter our 1x1 today, and did it from the edit form, so that all subsequent events get shifter
[13:28:30] <mforns>	 *shifted
[13:44:30] <wikibugs>	 10Analytics, 10Maps, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Review Grafana Dashboards for search / wdqs / maps - https://phabricator.wikimedia.org/T209584 (10Mathew.onipe)
[13:44:47] <wikibugs>	 10Analytics, 10Maps, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Review Grafana Dashboards for search / wdqs / maps - https://phabricator.wikimedia.org/T209584 (10Mathew.onipe) p:05Triage>03Normal
[13:52:18] <elukey>	 mforns: ah ok! Can we move it half an hour earlier?
[13:54:10] <elukey>	 because otherwise it clashes with another meeting :(
[13:55:32] <fdans>	 elukey: 👀🙌🏼❤️
[13:56:11] <elukey>	 ah snap I saw only now the notification!
[13:56:34] <elukey>	 deployment-tin is dead, deployment-deploy01 is the correct one
[13:56:37] <elukey>	 fdans: --^
[13:56:45] <fdans>	 ohhhh
[13:56:50] <fdans>	 updating dogs
[13:56:55] <fdans>	 i mean docs
[13:56:59] <fdans>	 thank you elukey 
[14:00:13] <elukey>	 anybody checked mediacounts ?
[14:01:37] <fdans>	 elukey: what's to be checked?
[14:02:48] <elukey>	 Fatal Error - Oozie Job mediacounts-load-wf-2018-11-15-8
[14:03:01] <elukey>	 but it was 4h ago
[14:03:23] <elukey>	 checking
[14:04:34] <wikibugs>	 (03PS4) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264)
[14:07:08] <elukey>	 ahhhh it was on an1039
[14:07:16] <elukey>	 that had trobles this morning
[14:07:21] <elukey>	 all right I need to restart it
[14:07:51] <elukey>	 !log re-run mediacounts-load-wf-2018-11-15-8 - died due to issues on an1039 (happened this morning, broken disk)
[14:07:53] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:08:20] <fdans>	 omg scap working in beta aqs with a non-master HEAD
[14:08:43] * fdans it's an early christmas miracle
[14:09:51] <wikibugs>	 (03PS5) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264)
[14:11:33] <wikibugs>	 (03PS6) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264)
[14:15:13] <wikibugs>	 (03PS7) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264)
[14:35:31] <wikibugs>	 (03PS1) 10Fdans: [wip] Add offset and underestimate to uniques loading job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473746 (https://phabricator.wikimedia.org/T167539)
[14:41:34] <mforns>	 elukey, sorry no headphones, yes you can move the meeting whenever is good for you
[14:42:01] <wikibugs>	 (03PS2) 10Fdans: [wip] Add offset and underestimate to uniques loading job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473746 (https://phabricator.wikimedia.org/T167539)
[14:43:11] <elukey>	 mforns: do you want to reschedule to tomorrow? (for the headphones)
[14:43:37] <mforns>	 elukey, no no! I was saying that I didn't get your ping, because my headphones were off
[14:43:45] <elukey>	 ahhh
[14:44:22] <elukey>	 ok I changed the meeting!
[14:44:58] <mforns>	 elukey, hmmm, don't see any change...
[14:45:56] <elukey>	 I moved it to 15 mins from now
[14:50:57] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Services (watching): Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10Ottomata) We need a name!  Have been brainstorming over on https://etherpad.wikimedia.org/p/event-platform.  The current thre...
[14:51:48] <fdans>	 !log testing load of new uniques fields in test keyspace in cassandra
[14:51:49] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:53:39] <fdans>	 mforns: https://twitter.com/jimmy_wales/status/1063081673666060288
[14:53:46] <fdans>	 este hombre está en tos los saraos
[14:53:56] <mforns>	 a ver
[14:54:07] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Research: Cannot connect to Spark with Jupyter notebook on stat1007 - https://phabricator.wikimedia.org/T208896 (10bmansurov) Just a follow up that Python kernels won't start on stat1007 for some reason. I've given up trying to fix the issue for now.
[14:54:44] <mforns>	 fdans, heh, a ver si no se equivoca como Woody Harrelson!
[14:55:50] <mforns>	 in one of the first games, he was moving for Fabi, who asked for e4, and Harrelson moved d4! (in the end the refs let them go back and play e4)
[14:57:01] <wikibugs>	 10Analytics, 10Multimedia: Add mediacounts to pageview API - https://phabricator.wikimedia.org/T88775 (10BerndFiedlerWMDE) this might just solve a problem we're having over here in DE. There's great content providers hesitating to contribute to wikimedia commons for they cannot implement reporting.  Let's fix...
[14:58:49] <wikibugs>	 (03CR) 10Addshore: "So we still want this? If so I'm going to try to add it to my review queue." [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/440133 (https://phabricator.wikimedia.org/T196883) (owner: 10WMDE-leszek)
[14:58:52] <wikibugs>	 (03CR) 10Addshore: "So we still want this? If so I'm going to try to add it to my review queue." [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/440876 (https://phabricator.wikimedia.org/T196883) (owner: 10WMDE-leszek)
[15:19:52] <fdans>	 joal: load test successful! :D
[15:20:15] <fdans>	 checked numbers matching against the dumps
[15:21:13] <joal>	 awesome fdans :)
[15:22:08] <fdans>	 joal: if you want we can merge and deploy the aqs patch in order to add the two blank fields in production
[15:22:57] <joal>	 fdans: I assume you've tested AQS behavior on old data with the 2 blnk fields
[15:23:00] <joal>	 ?
[15:24:19] <fdans>	 hmmm, I've tested it with an empty keyspace, let me test with a keyspace with data on beta to make sure
[15:24:35] <fdans>	 (joal ^)
[15:24:36] <joal>	 fdans: it would be bad if it doesn't work :)
[15:24:52] <joal>	 fdans: out of curiosity, have you used cluster in beta to test loading?
[15:25:43] <fdans>	 joal: nope, in beta i only tested the fields creation
[15:26:06] <joal>	 Right - Then you have followed the steps we discussed yesterda, right?
[15:26:20] <fdans>	 yessir
[15:26:26] <joal>	 Great :)
[15:28:25] <joal>	 I have a question on commons and other wiki projects - When a file is played from 'upload.wikimedia.org', is it on commons or could it be somewhere else?
[15:36:40] <fdans>	 joal: hmmm re: new fields I'm pretty sure that aqs will not add fields automatically - it will only create missing tables
[15:36:55] <fdans>	 i think we have to add the fields manually
[15:37:07] <joal>	 hm - this feels bizarre
[15:39:47] <fdans>	 joal: tested both adding the field in aqs and restarting the service (which changed nothing) and just dropping the keyspace and restarting (which does create the keyspace with the new field) 
[15:40:26] <joal>	 hm - fdans - May I ask another test?
[15:40:33] <fdans>	 always joal 
[15:41:05] <joal>	 fdans: I'd like to see if trying to manually add data using AQS (as we do for tests for instance) makes aqs update the fields
[15:41:15] <joal>	 Also fdans - You have changed the version number, right?
[15:41:37] <fdans>	 👀
[15:41:41] * fdans no
[15:42:33] <joal>	 This might come into play :)
[15:53:55] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Ingest data into druid for readingDepth schema - https://phabricator.wikimedia.org/T205562 (10Tbayer) 05Resolved>03Open >>! In T205562#4634049, @mforns wrote: > @Tbayer @Nuria  >  > I loaded 1 month (Sept 2018) of Re...
[15:58:59] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Ingest data aggregate ReadingDepth data into Druid  - https://phabricator.wikimedia.org/T205562 (10Tbayer)
[15:59:29] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Ingest data aggregate ReadingDepth data into Druid - https://phabricator.wikimedia.org/T205562 (10Tbayer)
[16:06:28] <fdans>	 joal: ok after a round of really frustrating testing, I think that when bumping the version, aqs will add the fields only when there's no rows in the table
[16:07:28] <joal>	 Really ?
[16:07:33] <joal>	 Wow that's uncool !
[16:08:11] <joal>	 fdans: Could it be that adding fields to an existing table takes a very long time?
[16:09:05] <fdans>	 joal: hmmm I mean the table I'm testing it with has exactly one row...
[16:09:18] <fdans>	 joal: I could try and see how long an ALTER takes
[16:09:35] <joal>	 nah
[16:09:42] <joal>	 one row is not exactly big :)
[16:09:52] <joal>	 Crappy crap !
[16:10:08] <joal>	 We could and possibly should ask the services team
[16:11:10] <fdans>	 why is it a problem to add it manually joal?
[16:11:45] <joal>	 fdans: because schema and version and all is managed by AQS, validating through schema hash, etcv
[16:22:25] <fdans>	 joal: ok... what is it that we want to ask the services team?
[16:23:02] <joal>	 fdans: if AQS not updating the table with new fields is an expected behavior :)
[16:41:07] <fdans>	 joal: Pchelolo agrees that this is not what should be happening
[16:41:20] <fdans>	 hmm, where is the logic that applies these schema changes?
[16:42:01] <Pchelolo>	 fdans: sooo... https://github.com/wikimedia/restbase-mod-table-cassandra/blob/master/lib/schemaMigration.js#L18
[16:42:24] <Pchelolo>	 but we actually have changed quite a bit there, which version of the package are you using?
[16:43:20] <fdans>	 I'm pretty sure we upgraded not long ago..
[16:43:23] <fdans>	 lmcheck
[16:44:34] <wikibugs>	 10Analytics, 10Analytics-Kanban: AQS unique devices api should report offset/underestimate separately - https://phabricator.wikimedia.org/T164201 (10Nuria) p:05Low>03Normal
[16:46:35] <wikibugs>	 10Analytics, 10Analytics-Kanban: AQS unique devices api should report offset/underestimate separately - https://phabricator.wikimedia.org/T164201 (10Nuria)
[16:46:38] <fdans>	 Pchelolo: ohhh I deployed to beta with an old version, this might have to do with it, thanks for the tip
[16:46:38] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Final steps to expose project family unique devices data - https://phabricator.wikimedia.org/T167539 (10Nuria)
[16:47:39] <Pchelolo>	 fdans: it might still not work with the new version, if so - ping me
[16:47:50] <fdans>	 thank you :)
[16:49:19] <elukey>	 Pchelolo: o/
[16:49:26] <Pchelolo>	 hi elukey
[16:49:37] <elukey>	 about https://phabricator.wikimedia.org/T207994 - is 'once an hour' always at the same time?
[16:50:11] <elukey>	 if you want I can start tcpdump on all kafka20* with some pcap log rotation etc..
[16:50:35] <Pchelolo>	 elukey: no, not at the same time
[16:50:40] <Pchelolo>	 super randomly
[16:50:42] <elukey>	 so we store only X amount of data maximum, and then we stop it when you have another repro case
[16:50:47] <elukey>	 and we analyze the pcaps
[16:51:00] <elukey>	 not sure if you already had a chat with others SREs for this
[16:51:07] <elukey>	 in case I'll leave the work to more expert people
[16:51:13] <elukey>	 but if not, I can offer my help :)
[16:53:34] <Pchelolo>	 elukey: thank you :) but I have a meeting in 5 mins
[16:53:42] <elukey>	 let me know in case :)
[16:53:45] <Pchelolo>	 it's not gonna be a long one, will ping you after it ok?
[16:53:50] <elukey>	 sure
[17:01:26] <nuria>	 ping fdans 
[17:20:16] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Performance-Team (Radar), 10Readers-Web-Backlog (Tracking): Make it easier to enable EventLogging's debug mode - https://phabricator.wikimedia.org/T188640 (10Nuria)
[17:20:47] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Performance-Team (Radar), 10Readers-Web-Backlog (Tracking), 10cloud-services-team (Kanban): Make it easier to enable EventLogging's debug mode - https://phabricator.wikimedia.org/T188640 (10Nuria)
[17:21:10] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team (Radar), 10Readers-Web-Backlog (Tracking): Make it easier to enable EventLogging's debug mode - https://phabricator.wikimedia.org/T188640 (10Nuria)
[17:23:53] <wikibugs>	 10Analytics, 10Anti-Harassment (AHT Sprint 33): 👩‍👧 Track how often blocked user attempt to edit - https://phabricator.wikimedia.org/T189724 (10dbarratt)
[17:37:55] <Pchelolo>	 elukey: so I'm done with meetings, but I really have 0 time to work on that thing right now
[17:39:24] <Pchelolo>	 I guess running a tcpdump for eventbus on one of the codfw nodes and writing the source ip into a file for a day might be risky cause the file will grow pretty quickly
[17:40:18] <Pchelolo>	 If you don't mind I will have time later today when you sleep to think about all the IPs that should write into eventbus in codfw and make a list for you to exclude from writing into a file
[17:40:33] <Pchelolo>	 then we can make some script to run
[17:41:12] <Pchelolo>	 actually, the known IPs are pretty simple - it's all restbase and all scb hosts in codfw
[17:51:33] <elukey>	 Pchelolo: (in a meeting, should be ready in ~40 mins)
[17:55:03] <wikibugs>	 10Analytics: XML dumps imports into hadoop need to adapt to latest changes done in xml schema for MCR - https://phabricator.wikimedia.org/T209545 (10fdans) p:05Triage>03Normal
[17:56:24] <wikibugs>	 10Analytics, 10Analytics-Kanban: Hive query fails with local join - https://phabricator.wikimedia.org/T209536 (10fdans) a:03JAllemandou
[17:59:33] <wikibugs>	 10Analytics, 10Maps, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Review Grafana Dashboards for search / wdqs / maps - https://phabricator.wikimedia.org/T209584 (10fdans) p:05Normal>03Triage
[17:59:38] <wikibugs>	 10Analytics, 10Maps, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Review Grafana Dashboards for search / wdqs / maps - https://phabricator.wikimedia.org/T209584 (10fdans) p:05Triage>03Normal
[18:00:01] <wikibugs>	 10Analytics, 10Analytics-Kanban: Hive query fails with local join - https://phabricator.wikimedia.org/T209536 (10JAllemandou) Ping @elukey, @Ottomata : I think we should apply `hive.auto.convert.join = false` in hive-site.xml so that map-side joins are never done automatically.
[18:06:16] <wikibugs>	 10Analytics, 10Analytics-EventLogging: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453 (10fdans) p:05Triage>03Normal
[18:07:28] <elukey>	 Pchelolo: about the size - I can force tcpdump to create a fixed amount of pcap files of size X
[18:07:51] <elukey>	 so say we state that we want max 10G allocated maximum at any given time, we can do it (disk allocation I mean)
[18:08:02] <elukey>	 I did the same for the objectcache
[18:08:27] <wikibugs>	 10Analytics, 10Anti-Harassment: Add ipblocks_restrictions table to Data Lake - https://phabricator.wikimedia.org/T209549 (10Milimetric) @TBolliger thanks for paying such close attention to our work!  As part of that task, we've decided to just import all tables from the production replicas.  I'm adapting our s...
[18:08:29] <elukey>	 the main problem is that it will be a "moving window" of pcap traffic, so we'll need to stop it as soon as we find the weird log
[18:08:30] <Pchelolo>	 kafka machines have a lot of disk, so I guess that will work
[18:09:33] <elukey>	 Pchelolo: ah even better, we can use httpry!
[18:09:41] <elukey>	 that logs http requests only
[18:09:46] <elukey>	 so we remove a ton of garbage
[18:10:16] <elukey>	 ah but it doesn't auto-rotate pcaps
[18:10:23] <Pchelolo>	 we mostly need the source IP
[18:10:57] <elukey>	 yeah
[18:11:08] <Pchelolo>	 I guess it's too late for you now cause we'd need to run it for quite a while before being able to do any analysis
[18:11:10] <elukey>	 well we can do some tests tomorrow if you want
[18:11:42] <Pchelolo>	 if you could start it your morning tomorrow I guess when I wake up we will have nough data to look at
[18:11:58] <elukey>	 is there a easy way for me to find the anomalies when they are enqueued in kafka? Like a signature etc.. so I can start tomorrow eu morning
[18:12:16] <elukey>	 and stop if I find anything
[18:12:39] <elukey>	 but I guess that anything not restbase or mw eqiad will be one possible lead
[18:12:45] <Pchelolo>	 anomalies in kafka? all the messages in revision-create topic in codfw are anomalies
[18:12:57] <elukey>	 ah okok
[18:13:33] <Pchelolo>	 as for the IPs - anything not restbase or scb 
[18:13:35] <elukey>	 so I can tail the topic, and when I see a new even I stop capturing traffic
[18:13:44] <elukey>	 *event
[18:13:47] <Pchelolo>	 elukey: ye, makes sence
[18:13:51] <elukey>	 ack
[18:14:18] <Pchelolo>	 cool, thanks for heloing with this
[18:14:20] <elukey>	 I am going to update the task with the plan
[18:14:37] <elukey>	 got nerd sniped by Alex
[18:14:39] <elukey>	 :D
[18:28:59] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10RobH) p:05Triage>03High
[18:30:14] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10RobH) @elukey: Can you confirm the racking and hostname details?    Hostname Proposal: dbstore100[3-5].eqiad.wmnet, since this is a dbstore1002 replacement or...
[18:38:22] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive - https://phabricator.wikimedia.org/T209503 (10Neil_P._Quinn_WMF) >>! In T209503#4748908, @mforns wrote: > @Tbayer  >> Just to double-check: The information in t...
[18:38:47] <wikibugs>	 10Analytics, 10EventBus, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), 10Core Platform Team Backlog (Later), 10Services (later): revision-create events are sometimes emitted in a secondary DC - https://phabricator.wikimedia.org/T207994 (10elukey) I had a chat with Petr a...
[18:39:02] <wikibugs>	 10Analytics, 10EventBus, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), 10Core Platform Team Backlog (Later), and 2 others: revision-create events are sometimes emitted in a secondary DC - https://phabricator.wikimedia.org/T207994 (10elukey)
[18:51:07] <wikibugs>	 (03CR) 10Zhuyifei1999: "@Framawiki, what do you think about adding a @property to decode the json in models/queryrun.py?" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba)
[18:56:12] <mforns>	 elukey, let me know if you want to pair for deletion, we can do that tomorrow if you're leaving
[19:34:38] <Nettrom>	 hey a-team: I'm trying to use kafkacat to monitor messages for a schema we just deployed, but don't seem to be able to consume anything
[19:35:41] <milimetric>	 Nettrom: what schema?
[19:35:48] <Nettrom>	 milimetric: EditorJourney
[19:35:55] <milimetric>	 checking
[19:36:23] <Nettrom>	 Grafana shows some events, but I was hoping to use kafkacat to see what's happening since we're trying to debug some issues
[19:39:30] <milimetric>	 Nettrom: the client validation update went live today, do you know how to enable it now and see whether there are validation errors?
[19:39:50] <milimetric>	 (grafana doesn't show any errors, so I don't think this is the problem, but just checking since that just went live)
[19:40:14] <Nettrom>	 milimetric: no, I'm not familiar with that
[19:40:35] <milimetric>	 Nettrom: do this:
[19:40:36] <milimetric>	 https://www.irccloud.com/pastebin/ST2ObGCy/
[19:40:44] <milimetric>	 in the JS console in your browser
[19:41:00] <kostajh>	 milimetric: the code is logging server side
[19:41:08] <milimetric>	 ah ok, never mind then :)
[19:44:21] <milimetric>	 Nettrom: send me your kafkacat command, and in the meantime here's another way to get data:
[19:44:22] <milimetric>	 hdfs dfs -text /wmf/data/raw/eventlogging/eventlogging_EditorJourney/hourly/2018/11/15/19/eventlogging_EditorJourney.1005.0.3.390.1542308400000
[19:44:37] <milimetric>	 (from stat1004 or 1007)
[19:44:48] <Nettrom>	 milimetric: kafkacat -C -b kafka1012 -t eventlogging_EditorJourney
[19:45:37] <Nettrom>	 which I'm running on stat1007
[19:51:15] <wikibugs>	 10Analytics, 10EventBus, 10Growth-Team, 10MediaWiki-Watchlist, and 6 others: Clear watchlist on enwiki only removes 50 items at a time - https://phabricator.wikimedia.org/T207329 (10kostajh) @Pchelolo any update on this? The "Clear watchlist" link on enwiki is still not working for me. Only one job executes.
[19:52:33] <wikibugs>	 10Analytics, 10EventBus, 10Growth-Team, 10MediaWiki-Watchlist, and 6 others: Clear watchlist on enwiki only removes 50 items at a time - https://phabricator.wikimedia.org/T207329 (10Pchelolo) @Ottomata Let's try again? https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventBus/+/472189/
[19:55:27] <milimetric>	 Nettrom: ok, I don't understand why kafkacat won't work, but were you able to look in Hive/hdfs for your events?
[19:55:48] <milimetric>	 Nettrom: for example select * from editorjourney where year=2018 limit 10;
[19:56:06] <milimetric>	 (in the event database, figure you're familiar with it, but not sure)
[19:57:04] <Nettrom>	 milimetric: yeah, I'm puzzled as to why it doesn't work as well, I don't see any messages for neither the EditorJourney schema, nor for other schemas that I know have data flowing in (e.g. EditAttemptStep)
[19:57:26] <milimetric>	 Nettrom: yeah, I also tried like NavigationTiming which always has some data flowing
[19:57:28] <Nettrom>	 milimetric: HDFS doesn't seem to yet have the events I'm interested in since it only contains events up to 19:05 UTC
[19:57:52] <milimetric>	 yeah, makes sense, grr... will try and see what's going on
[19:58:07] <Nettrom>	 I'll go read some webpages to see if I can figure this out
[19:58:23] <milimetric>	 oddly I see two topics for EditorJourney, one is Eventlogging_ and one is eventlogging_ (different caps)
[19:58:47] <milimetric>	 Nettrom: I don't think you're making any mistakes, must be a change in kafka I'm not aware of
[19:58:59] <milimetric>	 (not usually the part of our world I work on, sorry, other folks are out today)
[19:59:27] <Nettrom>	 milimetric: yeah, I noticed that it seems that I can introduce new topics by misspelling things in parameters to kafkacat, sorry ;)
[19:59:47] <milimetric>	 Nettrom: oh! ok, that at least explains that :)
[20:08:35] <milimetric>	 Nettrom: argh, sorry, they switched clusters and the docs are out of date
[20:08:36] <milimetric>	 kafkacat -C -b kafka-jumbo1001.eqiad.wmnet -t eventlogging_EditorJourney
[20:08:41] <milimetric>	 try that ^
[20:08:45] <milimetric>	 I'll go around cleaning up docs
[20:09:31] <Nettrom>	 milimetric: that works much better, thanks for figuring that out! :)
[20:16:42] <wikibugs>	 10Analytics: Update EventLogging kafkacat examples to use jumbo - https://phabricator.wikimedia.org/T209635 (10Milimetric)
[20:17:02] <wikibugs>	 10Analytics, 10Analytics-Kanban: Update EventLogging kafkacat examples to use jumbo - https://phabricator.wikimedia.org/T209635 (10Milimetric) p:05Triage>03High a:03Milimetric
[20:36:00] <wikibugs>	 10Analytics, 10Analytics-Kanban: Update EventLogging kafkacat examples to use jumbo - https://phabricator.wikimedia.org/T209635 (10nettrom_WMF) I updated the "How do I" section of  [[ https://wikitech.wikimedia.org/wiki/Kafka | Kafka ]] on Wikitech as well, as it was mentioned there.
[20:42:14] <wikibugs>	 10Analytics, 10Multimedia: Add mediacounts to pageview API - https://phabricator.wikimedia.org/T88775 (10JAllemandou) Did some analysis in term of data size and storage:  - Our Cassandra instances (2 per host) each have 2.9Tb usable space.  - We currently use ~720Gb per instance (this accounts for all keyspace...
[20:53:25] <milimetric>	 joal: not sure if you're around, I was going to write an update on the sqoop task, let me know if you want to vet it first
[20:53:52] <joal>	 Hey milimetric - I can read it once sent :)
[21:19:06] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Milimetric) It seems filtered_tables.txt are not updated to work with the new columns, I asked about it here: rOPUP188a50fa82a...
[22:04:20] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10bd808) >>! In T209031#4751986, @Milimetric wrote: >   - Step 1: use filtered_tables.txt to generate sqoop selects, selecting a...
[22:16:13] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Milimetric) >>! In T209031#4752128, @bd808 wrote: >>>! In T209031#4751986, @Milimetric wrote: >>   - Step 1: use filtered_tabl...
[22:22:13] <wikibugs>	 10Analytics, 10Research: Copy Wikidata dumps to HDFs - https://phabricator.wikimedia.org/T209655 (10bmansurov)
[22:49:07] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10New-Readers, 10Patch-For-Review: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10atgo) Thanks @Nuria. It sounds like @Prtksxna isn't sure if he's instrumented correctly.   >>! In T202592#4741120, @Prtksxna wrote: > Same as @atgo, I am able to se...
[23:03:39] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Bstorm) I'm aiming to write tests for this script shortly because it is too complex to not have them.  Overloading the scripts...
[23:17:06] <wikibugs>	 (03CR) 10Nuria: [C: 04-1] "Ping on this." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471038 (https://phabricator.wikimedia.org/T208332) (owner: 10Nettrom)
[23:19:06] <wikibugs>	 (03CR) 10Nuria: Add bin/refinery-drop-older-than (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471279 (https://phabricator.wikimedia.org/T199836) (owner: 10Mforns)
[23:23:08] <wikibugs>	 (03PS8) 10Stibba: Add Execution time to recent queries table of Quarry [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264)
[23:25:47] <wikibugs>	 (03CR) 10Nettrom: "> Ping on this." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471038 (https://phabricator.wikimedia.org/T208332) (owner: 10Nettrom)
[23:25:51] <wikibugs>	 (03CR) 10Stibba: "> @Framawiki, what do you think about adding a @property to decode" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba)
[23:31:59] <Nettrom>	 so I have a question (again): what's the expected lag of the Data Lake? https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Hadoop_&_Hive says "New data should be available for querying at most every hour." while it looks like several event tables are 3+ hours behind. Is there a place to go to know more about this when these things happen?
[23:32:38] <Nettrom>	 (apart from asking about it here, that is)
[23:41:44] <wikibugs>	 (03PS8) 10Mforns: Add bin/refinery-drop-older-than [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471279 (https://phabricator.wikimedia.org/T199836)
[23:42:17] <wikibugs>	 (03CR) 10Mforns: Add bin/refinery-drop-older-than (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471279 (https://phabricator.wikimedia.org/T199836) (owner: 10Mforns)
[23:45:17] <nuria>	 Nettrom: new data appears every hour (true) but it is not 1 hour behind from real time
[23:45:37] <nuria>	 Nettrom: depending on cluster it might be 2/3
[23:46:38] <nuria>	 Nettrom: this is THE PLACE to ask
[23:47:17] <nuria>	 Nettrom:  you saw your events in kafka but they are not  in events yet, is that the concern?
[23:47:46] <Nettrom>	 nuria: happy to hear I've come to the right place :) and yes, the events are in kafka (and on disk in Hadoop), but not in the events table yet
[23:49:11] <wikibugs>	 (03CR) 10Zhuyifei1999: "Yes, as a @property." [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/473644 (https://phabricator.wikimedia.org/T71264) (owner: 10Stibba)
[23:49:35] <nuria>	 Nettrom: ok, that makes sense , they should appear in the next hour if they do not have a problem in being refined , if they were to have a problem the eventerror table  in refine 
[23:49:41] <nuria>	 Nettrom: will store the errors
[23:51:51] <Nettrom>	 nuria: sounds good! I'll make a mental note to be patient. I did check the eventerror table earlier today, and will make sure I check that in case I suspect something is up
[23:51:56] <Nettrom>	 nuria: thanks for the help! :)