[00:18:58] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services, 10Core Platform Team Backlog (Watching / External): Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10CCicalese_WMF) [00:19:16] 10Analytics, 10EventBus, 10Operations, 10WMF-JobQueue, and 5 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10Pchelolo) Not to be worried. We have all the failed events stored since 2018-04-18. If needed, I will fetch all the missing page deletes tomor... [00:21:49] 10Analytics, 10EventBus, 10Operations, 10WMF-JobQueue, and 5 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10Ottomata) Oo, I just did the same, or, at least I copied the relevant files. They are on stat1004:/home/otto/eventbus-validation-logs0. Stas... [00:23:27] 10Analytics, 10EventBus, 10Operations, 10WMF-JobQueue, and 5 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10Smalyshev) I think I've extracted all I need from the DB tables for now, but I'll double-check and if anything is still missing I check the ex... [02:51:40] 10Analytics, 10Analytics-Data-Quality, 10Product-Analytics: mediawiki_history datasets have null user_text for IP edits - https://phabricator.wikimedia.org/T206883 (10Neil_P._Quinn_WMF) >>! In T206883#4757850, @JAllemandou wrote: > I hear your point and it makes a lot of sense. I think our views differ in th... [07:38:31] 10Analytics, 10Analytics-Kanban, 10DBA: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10elukey) p:05Triage>03High [07:39:28] hic sunt leones --^ [08:10:08] Hi elukey - Good hunting --^ [08:17:30] :D [08:17:35] bonjour! [08:17:42] Bonjour :) [08:18:21] so joal I was thiking about the Banner impression "real time" ingestion, and maintaining a spark streaming job only to add normalization [08:18:58] elukey: I think it'd do a bit more, like expanding maps possibly, but maybe I'm wrong :) [08:19:19] joal: yeah but what kind of maps? They don't need any at the moment [08:19:37] oh right - I thought we wanted to add geoloc for instance [08:22:43] my current thought is - what is the minimum amount of config that we need to make everything working since it is the 27th of Nov :D [08:22:46] ? [08:23:14] right - If we go without normalization, we can try an ingestor job [08:23:46] exactly, very quick and possibly ready with a couple of hours of work and some swearing :D [08:24:49] possibly a lot more swearing than anything else :) [08:25:39] elukey: do you want me to give it a try? [08:26:53] joal: if you have time yes, otherwise I was planning to spend some time on it later on today [08:35:01] joal: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/475964/1/modules/druid/templates/log4j2.xml.erb [08:35:39] elukey: meh ? [08:36:00] my fault, minor thing but now the logs are not split in two [08:36:45] last log in every -metrics.log was 2018-10-25T10:31:16.271Z :P [08:37:10] good to merge? [08:37:14] Ah ok - please go :) [08:37:26] I also have to roll restart all druid daemons for jvm upgrades [08:38:25] ok [08:41:33] that is the first with 0.12.3, hopefully quiet :D [08:44:57] Event [{"feed":"metrics","timestamp":"2018-11-27T08:44:52.517Z" [08:45:01] good now :) [08:45:08] (in -metrics.log) [08:45:17] so I am doing the historicals now on the private cluster [08:46:08] geeat elukey :) Thanks ! [08:46:39] elukey: have ou deleted the central-notice impression data? [08:47:37] joal: I did yes [08:47:41] (5 mins ago) [08:48:01] elukey: I have seen that - I was using it as a baseline for realtime config [08:48:38] joal: ah sorry! It was a test without the right dimensions to use, so I cleaned it up [08:48:44] I can give you the list [08:48:56] Is it documented in the task elukey ? [08:49:04] it should but I am not sure [08:49:27] we can make the final list in there [08:53:57] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Return to real time banner impressions in Druid - https://phabricator.wikimedia.org/T203669 (10elukey) List of dimensions that we are going to grab from the Eventlogging event: ` event_campaign,event_banner,event_project,event_uselang,event_bucket,event_anonym... [08:54:01] joal: --^ [08:54:37] elukey: Noted - Will use that :) Thanks ! [08:55:46] elukey: I'm assuming we're gonna keep wiki or webHost, and possibly some userAgent info? [08:56:41] elukey: or maybe event_db ? [08:56:58] joal: I don't think so, those are not in https://turnilo.wikimedia.org/#banner_activity_minutely right ? [08:57:58] elukey: true, but I'm assuming it's a mistake [08:58:17] project is project-family for us - and db hass the lang part [08:59:16] you know that I am ignorant about this part, I am only reporting what I am seeing in the current data that we have to replicate :) [08:59:24] :) [09:00:42] elukey: I'm gonna make the list what seems relevant in the event, and we'll discuss with them I assume [09:01:26] sure [09:01:44] if more complex fields are needed (like maps etc..) then a spark job makes sense [09:01:59] I am reluctant in maintaining a job if not needed [09:02:08] the last one (with tranquillity) was a bit of pain [09:02:13] elukey: I think we should be ok [09:02:46] "don't worry, about a thing, everything is gonna be all right.." [09:04:20] :D [09:14:37] druid private restart done [09:14:41] moving to druid public [09:27:44] so I am reading 0.13 release notes (still WIP) https://github.com/apache/incubator-druid/issues/6442 and they seem to be in the direction of making zookeeper optional [09:27:53] looks good :) [09:33:28] elukey: https://turnilo.wikimedia.org/#test_kafka_event_centralnoticeimpression [09:35:10] wooooooooooooooooooooooooooooooooooooowwwwwwwwwwwwwwwwwwwwwwwwww [09:35:52] elukey: monitoring currently to see how the thing behaves in term of tasks and all [09:35:53] even with UA map? [09:36:17] elukey: UA is json, I use a flattener spec, so I get what I want [09:36:27] very nice [09:36:41] elukey: there is redundancy is fields, but I put almost everything [09:36:57] for instance: wiki and event_db [09:37:26] and it only needs an indexation job right? [09:37:31] I put everything cause I think it could help sometimes, for debuging purposes for instance (recvFrom for instance) [09:38:07] elukey: supervision job - This means Druid manages by itself getting stuff from kafka (I think it uses tranquility behind the scene) [09:38:41] yeah but a supervision job is basically an indexation job with small batches no? [09:39:11] well it depends what you mean b indexation job [09:40:17] elukey: what we usually call indexation job happens in hadoop, with data being read from HDFS [09:40:39] elukey: in supervisor case the job is a realtime-indexation task, reading from kafka [09:41:03] elukey: you can tunnel to overlord (druid1002:8090) and look at the tasks :) [09:41:17] what I mean is turning data into segments, that eventually will be handed off to historicals [09:41:37] elukey: overlord makes no difference between indexation tasks, but I think we should always make a difference between batch and realtime ones [09:41:50] elukey: historical have some segments now [09:41:54] sure I'll keep it it mind [09:42:33] joal: yes IIUC they get the segments after the realtime indexation reaches a certain threshold [09:42:55] elukey: I configured realtime-tasks to last 10 minutes, that means we'll have segments of 10minutes in historical, but it's not big deal as we shall overwrite them with Marcel job [09:43:06] ack [09:43:23] elukey: segments are handed-off to historical once "finalized", meaning the task is done [09:43:46] elukey: I configured smaller time than 1h to try to prevent overflow tomorrow ;) [09:44:41] joal: the task is done after reaching the 10 minutes collection, then the segments are handed off to historicals [09:44:44] right? [09:45:07] correct ! [09:45:14] now I am wondering if we have metrics for the supervisor [09:45:17] or at least, that's my undersntading :) [09:45:18] probably yes [09:45:58] so first one is [09:45:59] Event [{"feed":"metrics","timestamp":"2018-11-27T09:43:48.717Z","service":"druid/overlord","host":"druid1002.eqiad.wmnet:8090","version":"0.12.3","metric":"ingest/kafka/lag","value":2,"dataSource":"test_kafka_event_centralnoticeimpression"}] [09:49:27] probably the druid/peon metrics that are emitted now in the middlermanager-metrics.log [09:50:26] ah! [09:50:27] druid_realtime_ingest_events_processed_count{datasource="test_kafka_event_centralnoticeimpression"} 1667.0 [09:50:31] \o/ [09:52:15] \o/ :) [09:52:17] Awesome :) [09:53:28] https://grafana.wikimedia.org/dashboard/db/druid?refresh=1m&panelId=41&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=druid_analytics&var-druid_datasource=All&from=now-3h&to=now [09:53:34] didn't even think about it [09:53:43] 10Analytics, 10Operations, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Gilles) [09:55:13] elukey: Nice :) [09:55:24] elukey: wil kill and relaunch supervisor [10:00:26] elukey: would you mind restarting turnilo for me? I have a possibly successful test running on :) [10:02:18] joal: you should be able to restart it with sudo etc.. [10:02:37] let's try to see if it works (I don't think that anybody did it after the new perms were deployed) [10:02:52] elukey: no prob ! I need help on the machine though [10:03:04] mmm your user is not on analytics-tool1002 [10:03:52] lemme check, I was convinced otherwise [10:03:54] weird [10:04:13] ahhhhhh [10:04:20] analytics-admins is not deployed to those hosts [10:04:21] tried to connect to analytics-tool1002 - no good yet [10:07:04] creating a patch now [10:09:54] joal: can you try now? [10:09:58] sure [10:10:55] login ok, but can't sudo for sudo systemctl restart turnilo.service [10:11:03] I also tried sudo -u analytics - no chance [10:11:06] elukey: --^ [10:13:21] ah sorry, sudo -u turnilo systemctl restart turnilo [10:13:53] nope :( [10:14:06] not on analytics-tool1002? [10:14:17] The name org.freedesktop.PolicyKit1 was not provided by any .service files [10:15:27] ??? [10:15:45] oh yes I can repro [10:15:50] status does not return that [10:16:28] Right - status works for me as well [10:16:42] elukey: stop / stat? [10:16:46] elukey: stop / start? [10:16:49] or try-restart? [10:17:07] I think it is a perm issue [10:17:18] probably the sudoers rule is not the right one [10:17:34] :/ [10:17:54] because you have %analytics-admins ALL = (turnilo) NOPASSWD: ALL [10:18:09] and I thought that it was sufficient to restart a service [10:18:12] but maybe not [10:18:22] hm - /me is not sudoer-fluent :/ [10:19:10] joal: restarted turnilo in the meantime, so you are unblocked [10:19:13] really annoying :( [10:20:16] elukey: Have a look at https://turnilo.wikimedia.org/#test_kafka_event_centralnoticeimpression medtrics [10:20:23] \o/ !!!! [10:20:33] * joal sings and dances [10:20:48] sure [10:21:36] elukey: http://druid.io/docs/latest/ingestion/transform-spec.html [10:21:38] what??? normalized count? [10:21:45] :D [10:21:49] wow!! [10:21:58] this is really awesome [10:22:06] * joal is happy to have read the docs again [10:22:59] joal: can you add it to the task so everybody can check it (and possibly also start looking at metrics) [10:23:18] yes elukey - Will do [10:25:58] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Return to real time banner impressions in Druid - https://phabricator.wikimedia.org/T203669 (10JAllemandou) I have launched a realtime job indexing values flowing in kafka. Data can be seen here (please notice the event normalized count metric :) : https://tu... [10:26:01] elukey: --^ [10:28:32] We should ping mforns as well on that - There is a "bucket extraction function" defined in http://druid.io/docs/latest/querying/dimensionspecs.html [10:28:36] super! [10:28:41] elukey: --^ as well :) [10:29:28] elukey: maybe there is not even the need to modify spark code for EL2Druid to get buckets, inverts and all [10:29:38] mforns: --^ [10:31:54] but it needs to be added to the indexing specs right? So probably Marcel will need to modify EL2Druid anyway [10:33:12] joal: I filed https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/475984/1/modules/admin/data/data.yaml for the sudoers rule [10:33:28] you guys need to be able to restart stuff in case of an emergency [10:35:47] the new stream is so awesome, great work1 [10:35:48] !! [10:43:11] elukey: supervisor is really great :) [10:45:28] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Return to real time banner impressions in Druid - https://phabricator.wikimedia.org/T203669 (10JAllemandou) For reference, here is the request sent to druid for realtime ingestion: ` curl -L -X POST -H 'Content-Type: application/json' -d '{ "type": "kafka",... [10:59:20] heya team :] [10:59:41] joal, elukey reading [11:05:17] elukey, how did you manage to add the normalized count to turnilo?? [11:07:04] mforns: joal did it with http://druid.io/docs/latest/ingestion/transform-spec.html [11:07:17] but that is the real time ingestion from kafka [11:07:24] elukey, I see! [11:07:27] so druid reading directly from the topic [11:12:43] joal, the bucket extraction function looks useful, but for the case of time measures, we've been using an "exponential" bucketing (similar to orders of magnitude), this seems not supported by the extraction function. However, the time buckets seem not to be of real value to analysts, so... [12:19:18] 10Analytics, 10Analytics-Cluster: Upgrade Hive to ≥ 2.0 - https://phabricator.wikimedia.org/T203498 (10elukey) Keeping the task updated - in https://issues.apache.org/jira/browse/BIGTOP-3074 the BigTop Apache distribution removed the oozie packaging since it seems not compatible (yet) with Hive 2.x. The CDH6 d... [12:31:10] 10Analytics, 10Analytics-Wikimetrics, 10Patch-For-Review, 10WorkType-Maintenance: flake8 errors on wikimetrics - https://phabricator.wikimedia.org/T210320 (10rafidaslam) Okay, no problem. Just a note, if we wanted to fix both `W504 line break after binary operator` and `W503 line break before binary opera... [12:41:59] 10Analytics, 10EventBus, 10Operations, 10WMF-JobQueue, and 5 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10mobrovac) The fix has been deployed, delete events should start flowing again, so resolving. Let's reopen the ticket if that does not occur. [12:42:06] 10Analytics, 10EventBus, 10Operations, 10WMF-JobQueue, and 5 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10mobrovac) 05Open>03Resolved a:03mobrovac [13:01:23] hallo [13:01:30] about https://gerrit.wikimedia.org/r/#/c/analytics/limn-language-data/+/475618/1/cx/config.yaml [13:01:46] (03CR) 10Amire80: Add a scheduled job for daily CX abuse filters statistics (032 comments) [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/475618 (https://phabricator.wikimedia.org/T189475) (owner: 10Amire80) [13:02:13] mforns' first comment is easy. [13:02:32] the second is about a line that I just copied from another config file, which milimetric had written :) [13:02:44] Maybe it can be simply removed. [13:06:47] (03CR) 10Amire80: Add a scheduled job for daily CX abuse filters statistics (031 comment) [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/475618 (https://phabricator.wikimedia.org/T189475) (owner: 10Amire80) [13:08:05] (03PS2) 10Amire80: Add a scheduled job for daily CX abuse filters statistics [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/475618 (https://phabricator.wikimedia.org/T189475) [13:30:17] hi aharoni! thanks for the fixes [13:32:27] I think having a lag specified (second comment) was a good idea, but 85400 seconds (1 day) was probably too much. I would use 1 hour, so -> lag: 3600 # wait 1 hour to compute last day [13:34:07] mforns: ack, I'll update the patch [13:34:24] aharoni, thanks, I also added the comment in the patch [13:34:40] (03CR) 10Mforns: Add a scheduled job for daily CX abuse filters statistics (031 comment) [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/475618 (https://phabricator.wikimedia.org/T189475) (owner: 10Amire80) [13:35:28] aharoni, I think we are able now to test this. Do you want to pair via Hangouts some time? [13:39:10] Yes, in the next few minutes. Let me just update the patch. [13:39:29] (Oh, and I have to reconnect to IRC, wait just a minute.) [13:40:35] back [13:47:35] um [13:47:37] weird [13:47:41] reconnecting again [14:03:13] (03PS3) 10Amire80: Add a scheduled job for daily CX abuse filters statistics [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/475618 (https://phabricator.wikimedia.org/T189475) [14:05:01] mforns: I updated the patch. Hangout whenever you're ready [14:34:04] mforns: beep [14:48:55] aharoni, sorry I was having quick lunch, now back! [14:49:22] do you want to meet now? [15:01:10] mforns: now is good [15:01:21] aharoni, great [15:02:01] aharoni, To join the video meeting, click this link: https://meet.google.com/wey-upgj-qeg [15:02:01] Otherwise, to join by phone, dial +1 914-359-6313 and enter this PIN: 450 962 864# [15:02:24] uau, sorry for the noise, I thought it would paste the url only [15:10:04] (03PS1) 10Fdans: Expose offset and underestimate numbers in uniques [analytics/aqs] - 10https://gerrit.wikimedia.org/r/476033 (https://phabricator.wikimedia.org/T164201) [15:11:44] ottomata: o/ sorry i missed yer ping yesterday [15:13:07] phuedx: hiya np! [15:13:12] shall we look now? [15:13:34] ottomata: suresies! as long as it's alright with you [15:14:43] ottomata: actually, gimme 5-10 to get coffee and a snack [15:14:51] ya for sure! [15:14:59] ok phuedx i'm going to run home real quick then too, be back in about the same... [15:18:10] (03PS4) 10Amire80: Add a scheduled job for daily CX abuse filters statistics [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/475618 (https://phabricator.wikimedia.org/T189475) [15:20:50] ok here whenever [15:42:59] joal, yt? [15:43:44] ottomata: hey, sorry [15:43:49] hiya [15:44:16] so what are we looking for phuedx? [15:44:21] my eldest son is ill and he just woken up :) [15:44:24] (only been half paying attentiont to that ticket) [15:44:27] oh ok! no worries! [15:44:33] ottomata: might be easier to jump in a hangout [15:44:33] i'm just starting my day so i'll be on for a while [15:44:43] we have a standup starting in 15 tho. [15:44:45] sure, now is ok then? [15:44:57] nah! he's laying on a sofa colouring in at the moment [15:44:59] sure [15:45:02] ok cool [15:45:05] come on into the batcave!~ [15:45:06] https://hangouts.google.com/hangouts/_/wikimedia.org/a-batcave [15:55:40] Hey ottomata - let's talk about the refinement-schema issue post standup if ok? [15:55:45] yuppers [15:55:55] joal: catch up on T209178 pre-standup? [15:55:55] T209178: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 [15:56:01] i think i have a working fix, but ihave some qs about what to do in some other array cases [15:56:09] yes milimetric ! OMW [15:56:42] bc-2 milimetric ? [16:01:14] ping joal milimetric mforns [16:01:18] standddupppp [16:01:21] sorry coming! [16:02:04] phuedx: heyyyaa https://logstash.wikimedia.org/app/kibana#/dashboard/default?_g=h@7bf0c26&_a=h@ecf0ee1 [16:02:06] i think its working? [16:02:11] ottomata: i see it working! [16:02:22] sweet! [16:03:01] thanks :) [16:03:25] ottomata: any recommendations for a "logstash person"? [16:05:05] hahhhh hmmmm [16:05:13] maybe godog (filippo)? [16:05:17] or at least he'd know who to ask i think [16:13:07] phuedx: godog and herron [16:13:18] moritzm: ty! [16:13:38] yw :-) [16:38:55] mforns: when you're ready :) [16:39:13] aharoni, in round 20 mins [16:39:15] is that ok [16:39:16] ? [16:39:47] yes [16:54:25] (03PS2) 10Michael Große: Update metric's items and properties automatically [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/475807 (https://phabricator.wikimedia.org/T209399) [16:58:43] (03PS3) 10Michael Große: Update metric's items and properties automatically [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/475807 (https://phabricator.wikimedia.org/T209399) [17:00:24] (03CR) 10Michael Große: "This change is ready for review." [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/475807 (https://phabricator.wikimedia.org/T209399) (owner: 10Michael Große) [17:02:40] aharoni, just 3 more minutes [17:04:44] I'm here [17:05:27] aharoni, ok, me too [17:05:40] should we use the same hangouts link as before? [17:05:45] let me try... [17:06:21] aharoni, https://meet.google.com/wey-upgj-qeg [17:07:07] folks logging off to run a bit, my brain is fried after today's tests :D [17:07:15] * elukey off! [17:07:26] (will read later if anybody needs me!) [17:19:31] HaeB which analyst email list should I use? [17:19:43] produce-analytics or data-analysts ? [17:19:55] product-analytics* [17:20:21] ah found your email and answered myself, sorry for ping! [17:20:32] mforns: looks like you disconnected [17:24:02] 10Analytics, 10Analytics-Wikimetrics, 10Patch-For-Review, 10WorkType-Maintenance: flake8 errors on wikimetrics - https://phabricator.wikimedia.org/T210320 (10Nuria) We have noted this on teh patches but this project is on life-support so you might not get timely responses to these patches or questions. Ple... [17:26:59] 10Analytics, 10Analytics-EventLogging, 10Patch-For-Review: Resurrect eventlogging_EventError logging to in logstash - https://phabricator.wikimedia.org/T205437 (10Ottomata) https://logstash.wikimedia.org/goto/bda91f37481ae4970ee21e11810d49d3 [17:27:09] 10Analytics, 10Analytics-Wikimetrics, 10Patch-For-Review, 10WorkType-Maintenance: flake8 errors on wikimetrics - https://phabricator.wikimedia.org/T210320 (10rafidaslam) @Nuria no problem. This isn't a big issue anyway, this can be fixed very easy anytime. We only need the answer to this question: ` When I... [17:27:30] 10Analytics, 10Analytics-EventLogging, 10Patch-For-Review: Resurrect eventlogging_EventError logging to in logstash - https://phabricator.wikimedia.org/T205437 (10phuedx) >>! In T205437#4778368, @Ottomata wrote: > https://logstash.wikimedia.org/goto/bda91f37481ae4970ee21e11810d49d3 https://logstash.wikimedi... [17:27:34] 10Analytics, 10Analytics-EventLogging, 10Patch-For-Review: Resurrect eventlogging_EventError logging to in logstash - https://phabricator.wikimedia.org/T205437 (10phuedx) 05Open>03Resolved a:03phuedx Great to see this working! Thanks for all of your help @Ottomata and @fgiunchedi. [17:30:52] 10Analytics, 10Analytics-Kanban: Refactor Sqoop, join actor and comment from analytics replicas - https://phabricator.wikimedia.org/T210522 (10Milimetric) p:05Triage>03High [17:59:21] (03PS5) 10Mforns: Add a scheduled job for daily CX abuse filters statistics [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/475618 (https://phabricator.wikimedia.org/T189475) (owner: 10Amire80) [18:00:38] (03PS6) 10Mforns: Add a scheduled job for daily CX abuse filters statistics [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/475618 (https://phabricator.wikimedia.org/T189475) (owner: 10Amire80) [18:04:50] 10Analytics, 10EventBus, 10Operations, 10WMF-JobQueue, and 6 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10Smalyshev) Yep, seeing the events in grafana now, so I think it's all good now. Thanks! [18:06:31] elukey, ottomata: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/475984/ can probably also be merged without SRE meeting approval, it's just some fine-tuning of existing permissions, if you need it ealier than next Monday, simply ping Mark or Faidon for a quick sign-off and merge away [18:06:31] 10Analytics, 10EventBus, 10Operations, 10WMF-JobQueue, and 6 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10Pchelolo) Do you need the events for the last month to be replayed? [18:08:02] (03PS7) 10Mforns: Add a scheduled job for daily CX abuse filters statistics [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/475618 (https://phabricator.wikimedia.org/T189475) (owner: 10Amire80) [18:10:35] (03CR) 10Mforns: [C: 031] "LGTM! And is tested. Please, feel free to merge." [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/475618 (https://phabricator.wikimedia.org/T189475) (owner: 10Amire80) [18:14:16] 10Analytics, 10EventBus, 10Operations, 10WMF-JobQueue, and 6 others: Kafka eqiad.mediawiki.page-delete topic is empty - https://phabricator.wikimedia.org/T210451 (10Smalyshev) @Pchelolo No I already updated the affected items manually. [18:18:16] (03CR) 10Amire80: [C: 032] Add a scheduled job for daily CX abuse filters statistics [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/475618 (https://phabricator.wikimedia.org/T189475) (owner: 10Amire80) [18:18:22] (03Merged) 10jenkins-bot: Add a scheduled job for daily CX abuse filters statistics [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/475618 (https://phabricator.wikimedia.org/T189475) (owner: 10Amire80) [18:26:32] hey, does anyone know how do i get access to pikwik? i have wikitech login but it doesn't work. it works for turnilo [18:26:38] cc @elukey [18:27:31] nzr_: i did just find https://wikitech.wikimedia.org/wiki/Analytics/Systems/Piwik#Access [18:27:41] not sure if that is helpful, i assume you are in those groups [18:27:46] ? [18:27:50] maybe mfournier do you know more? [18:27:52] oops [18:27:54] sorry wrong ping [18:28:01] nuria: do you know? [18:40:44] nzr_: what's your wikitech/ldap username? [18:40:50] nirzar [18:40:55] ottomata: [18:41:27] ottomata: IIRC we had to create users on piwik itself no? [18:43:30] we definitely need some docs at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Piwik#Access [18:44:11] nzr_: are you getting denied at the http auth level? [18:44:44] elukey: just verified, nzr is in the wmf ldap group [18:44:48] https://usercontent.irccloud-cdn.com/file/otYzkuRd/image.png [18:45:02] yeah but that gets him passing the LDAP auth, not the piwik login [18:45:30] right nzr_ ? You are able to login with ldap when the user/pass menu pops out, but then not in piwik itself [18:45:30] ah ok, so ya he needs an account created, i think I do too (or do I?) [18:45:39] elukey, can you do a quick review of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/476081/ ? It's super simple [18:45:39] ya elukey that's right based on screen shot he just sent [18:45:52] > right nzr_ ? You are able to login with ldap when the user/pass menu pops out, but then not in piwik itself [18:45:52] neither [18:45:56] mforns_: i can do that [18:45:58] ottomata: there's a piwik-wmf-admin in pwstore, I am now in the admin panel of piwik (sooo slow) [18:46:00] not even on the pop yp [18:46:09] k! [18:46:13] nzr: there is a piwik user taht was created for design [18:46:21] cc elukey [18:46:21] ah! [18:46:38] thanks otto!! [18:47:07] 10Quarry, 10Patch-For-Review: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564 (10zhuyifei1999) p:05High>03Unbreak! This is getting [[https://tools.wmflabs.org/nagf/?project=quarry|ridiculously bad]] with queries like https://quarry.wmflabs.org/query/... [18:47:16] nzr_: what site are you trying to look at? [18:47:48] nuria: where are the pass for the users stored? [18:47:58] elukey: in piwik itself [18:48:06] wikimediafoundation.org [18:48:11] sure sure, I mean how can I retrieve them : [18:48:13] :) [18:49:02] the admin panel is horribly slow for me [18:49:12] elukey: ya, it always is [18:49:40] fyi: my same login works for turnilo [18:50:46] nzr_: right, cause every web that we host will require ldap [18:53:24] nuria: try now the admin panel [18:53:28] should be way quicker [18:53:31] ya, better [18:53:39] (03PS2) 10Milimetric: [WIP] working on understanding and testing page history and quality [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468678 [18:53:48] Error in Matomo: curl_exec: Failed to connect to plugins.matomo.org port 443: Connection timed out. Hostname requested was: plugins.matomo.org [18:54:02] that was 10s wait before proceeding with the rest [18:54:20] elukey: what did you do? [18:54:31] https://matomo.org/faq/troubleshooting/faq_16646/ [18:54:35] need to puppetize it now [18:55:13] 10Quarry, 10Patch-For-Review: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564 (10zhuyifei1999) quarry-worker-02 was practically dead. [18:55:20] mmm still very slow [18:55:30] elukey: enable_internet_features = 0? [18:55:35] I need to tune it a bit more tomorrow [18:55:37] yeah! [18:55:37] elukey: it is OK now really [18:58:14] nuria: really sorry I didn't notice it before :( [18:58:30] ottomata: Heya - Back from diner [18:58:34] elukey: i did notice it and DID NOT SEARCHED for thsi solution [18:58:36] *this [18:58:42] elukey: totally my bad [18:58:46] (03CR) 10jerkins-bot: [V: 04-1] [WIP] working on understanding and testing page history and quality [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468678 (owner: 10Milimetric) [19:00:21] elukey: documented now: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Piwik#Access [19:00:40] woundeful thanks :) [19:01:50] I also disable the marketplace that seemed not needed [19:04:10] * elukey dinner :) [19:18:08] joal heya just started trying a bit ago [19:18:11] looking good [19:18:20] i think i can even 'merge' primitive map keys... maybe! [19:18:26] ottomata: I have added stuff in unit test, but the conversion seems to work ok [19:18:33] still trying though [19:18:35] awesome [19:33:59] ottomata: last test tells me that for structs in Maps, the struct must not change in term of structure, but primitive types can [19:34:48] Meaning if you use a struct as key or value, inner-primitive types can be evolved, but not the structure of the inner objects [19:36:26] (03PS1) 10Joal: Update the unit-test for Dataframe conversion [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476093 (https://phabricator.wikimedia.org/T210465) [19:36:30] ottomata: --^ [19:36:42] oh interesting [19:37:05] hm [19:37:08] ottomata: My saying is not documented as unit-test, just the working cases [19:37:14] right [19:37:16] intresting [19:37:24] Do you want me to add a test? [19:37:25] that's a bit harder then, its another special case [19:37:29] right [19:38:07] ottomata: I actually realize I have not tested struct-change for arrays [19:38:22] doing now [19:38:41] failed ! [19:38:43] Similar issue [19:39:07] same thing as map values? [19:39:11] wait [19:39:15] same [19:39:27] oh hm but that's only if a cast happens, right? [19:39:30] cannot cast array> to array> [19:39:32] if we alter the table before hand via merge [19:39:36] yes, that makes sense [19:39:42] but, if we alter the table before [19:39:44] it won't have to case [19:39:48] because datatypes will be the same [19:39:56] shoudl be the same for map values then, ya? [19:41:00] ottomata: Concern is: if the dataframe has no example of "error" subobject for a given hour, cast is needed no? [19:43:40] i don't think so [19:43:45] ok [19:43:47] beacuse, the table will be altered before the convert happens [19:44:23] ottomata: I see that case - What about possibly incomplete data? [19:44:26] in DataFrameToHive [19:44:30] first [19:44:34] prepareHiveTable [19:44:37] which does schema merging [19:44:42] now hive table has schema witih error field [19:44:49] then [19:45:10] ottomata: the error field will only be present if there are errors - If none occur during an hour, no data has the error filed, it needs to be added --> cast? [19:45:19] get compatible schema between hive table (as is after alter) and the incoming schema (another merge) [19:45:24] and convert to compatible schema [19:45:33] hMMmMMMm [19:45:51] ah [19:45:53] HMM [19:46:04] ok i see new case [19:46:06] table has error [19:46:09] but df doesn't [19:46:11] inputDf [19:46:15] right [19:46:21] right huh [19:46:34] in that scores elementType will not match [19:46:38] table will have struct with error field [19:46:40] inputDf will not [19:46:41] also ottomata, our limitation means no change in scores array :( [19:46:43] which will cause cast [19:47:02] and cast always fails if struct changes? [19:47:07] even if casting to smaller schema? [19:47:17] My tests say so :( [19:47:24] rats [19:47:36] heh, this is why we need to create tables based on JSOnschemas!!!! [19:47:42] one day one day! [19:47:46] wow ! [19:47:53] I might have wa [19:47:57] testing ! [19:47:59] oh? [19:48:17] nope actuall :( [19:50:40] no good? [19:50:57] nope, I wanted to cast inside the array, but doesn't work [19:51:59] 10Analytics, 10Operations, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Krinkle) See also T194814, which this task could resolve. > x-analytics As I understand this, this field mainly exists to transmit data... [19:52:02] ah hm [19:52:06] hm hm hm [19:52:20] is the merge code i'm working on now not worth implementing then? [19:53:15] ottomata: so far I think it's not worth [19:53:36] ottomata: I can't think of a good solution [19:54:26] is Refine not going to work on arrays then? [19:54:30] arrays of struct? [19:54:41] we can get petr to always set error to empty? [19:55:02] ottomata: it will as long as the arrays of structs never change struct, which is a pretty sever limitation [19:55:16] right, but it will def happen with this reision-score case [19:55:20] always having error empty will work [19:55:26] as error will usually not be present [19:55:34] i guess i still need code to do merge [19:55:37] to pick up table alters, right? [19:55:46] it will work in the case when adding fields [19:55:57] just won't work in the case where input dataset is missing fields [19:56:43] ...joal...shouldn't convertToSchema just put nulls in? [19:56:56] ottomata: We should test altering a table of arra to a new struct - I'm not sure how the thing behave in backward compatibility! [19:57:21] i think it worked, i have a hive table in my db hagn on... [19:57:45] In my view, if the inner type of an array or map change, well this is a new field actually [19:58:26] :( [20:01:32] ottomata: seems to be a known issue: http://mail-archives.apache.org/mod_mbox/spark-user/201701.mbox/%3C77c83e66-fcd2-cb80-fa8e-74b00a089376@mixmax.com%3E [20:02:28] hm [20:03:08] joal [20:03:08] alter table mediawiki_revision_score01 change column scores scores [20:03:08] > array,probability:array>,error:struct>> [20:03:08] > ; [20:03:08] OK [20:03:30] ok - and we can still access old data? [20:05:17] yes [20:05:21] "error":null [20:05:35] great@! [20:06:22] Ok, so we should expand schemas, and make sure data is present - That's a shame though :( [20:07:10] yeah.... [20:07:13] and actually, we should make sure fields are present even if null, not that data is present [20:07:22] yes [20:07:24] mwarf :( [20:07:28] joal wondering... can we force the nulls in the data? [20:07:36] in convertToSchema ? [20:09:00] wait no joal i'm confused again [20:09:02] ottomata: I don't know how - How would you state: CAST(array TO array)? [20:09:15] in the case of missing error struct in scores array in new input Df [20:09:44] ahh right sorry ok back on board [20:09:46] :p [20:09:47] nm [20:10:51] I think the only way to cast here is to do it manually, meaning back to RDDs of rows and schema generation [20:10:58] yeah [20:11:00] :/ [20:11:01] And this is very error-prone [20:11:04] yeah [20:11:05] ok [20:11:07] let's not do it [20:11:10] We used CAST to prevent having to do that [20:11:15] right [20:11:35] ottomata: Couldn't we send one score event per model? [20:11:45] joal? [20:11:59] oh instead of an array [20:12:03] instead of an array of score for a rev [20:12:07] i think we already went down that road and decided not to, no? [20:12:20] right - Not easily usable - Very true [20:12:43] q: do you think it is worthing having the complex merge code to alter schema? [20:12:53] Maaan - Only viable solution here would be to go back to your inital idea of having an explicit schema, but I hate that as well :( [20:12:57] i'm close to having that work [20:13:13] but, we'll still need to get the events to always have error: null [20:13:14] ? [20:13:29] or [20:13:32] i could just manually alter the table [20:13:33] ottomata: It's good to have it, to make alter table work - And we'll need explicit data [20:13:37] and assume that events always were like that [20:13:45] and not support complex array type changes [20:13:56] i guess if i make it work [20:13:58] ottomata: Could be an good trade-off [20:13:59] adding a field will work [20:14:05] as long as you always include it in future data [20:14:08] it'll be a required field [20:14:08] correct [20:14:16] ok, will keep working on this then [20:14:22] will post on ticket too [20:14:29] to get petr to set data [20:14:31] Actually ottomata - Any field in an array of struct is mandatory [20:14:40] Even if set from the beginning [20:14:57] because not having it would imply a cast [20:15:41] And to be fair, the field is not mandatory on every row, but it's mandatory on at least one row of the parsed data [20:15:45] oh [20:15:46] 10Analytics, 10Analytics-Kanban: Update sqoop to work with the new schema - https://phabricator.wikimedia.org/T210541 (10Milimetric) p:05Triage>03High [20:16:02] That means the double-parsing trick could work here possibly [20:16:06] ottomata: --^ [20:16:39] 10Analytics, 10Analytics-Kanban: Update datasets definitions and oozie jobs - https://phabricator.wikimedia.org/T210542 (10Milimetric) p:05Triage>03High [20:16:45] oh true [20:16:47] but joal hm [20:16:50] what if there isn't a cast here [20:16:52] what happens? [20:16:58] ? [20:17:08] I don't understand "there isn't a cast" [20:17:14] if (srcSchema(idx).dataType == dstField.dataType) { [20:17:14] namedValue(prefixedFieldName) [20:17:23] 10Analytics, 10Analytics-Kanban: New refinery-source job to join labsdb with actor and comment - https://phabricator.wikimedia.org/T210543 (10Milimetric) p:05Triage>03High [20:17:37] ottomata: well it works :) [20:17:38] if dataType.isArrayOfStruct [20:17:42] namedValue(prefixedFieldName) [20:17:56] just assume that the struct is insertable as is [20:17:57] without casting [20:18:13] ottomata: equality of datatypes inspects inner-types [20:18:34] ya but, maybe it isn't necessary, going to try... [20:18:40] I don't get it :) [20:18:47] (03PS1) 10Milimetric: Update sqoop selects for new mediawiki schema [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476100 (https://phabricator.wikimedia.org/T210541) [20:19:04] Oh ! Not even checking datatypes when pushing arrays of struct [20:19:14] (03PS1) 10Mforns: Correct credentials file in cx config file [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/476101 (https://phabricator.wikimedia.org/T189475) [20:19:19] wow - this shouldn't work :) [20:19:26] joseph: made this for you: https://phabricator.wikimedia.org/T210542 [20:19:46] I'm done with sqoop, will add you to the review and start the refinery-source job [20:20:05] (03CR) 10Mforns: [V: 032 C: 032] "Merging to unbreak job" [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/476101 (https://phabricator.wikimedia.org/T189475) (owner: 10Mforns) [20:20:06] Thanks yo milimetric :) [20:20:23] (03Merged) 10jenkins-bot: Correct credentials file in cx config file [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/476101 (https://phabricator.wikimedia.org/T189475) (owner: 10Mforns) [20:20:33] milimetric: currentl trying to help Andrew with spark-refine issue - I'll probably won't get to it before tomorrow evening, after kids [20:21:22] hah, how the heck do you insert into an array in hive..>? [20:21:50] I think the syntax is : INSERt INTO TABLE blah array(1, 2 ,3) as bloh; [20:21:58] ah ha [20:21:58] joal: no problem for me, if I finish early I’ll take it and we can pair on it [20:21:58] array(NAMED_STRUCT('f1', "hi")) as a1; [20:22:11] pff :( [20:22:14] nopers [20:22:14] FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'arr2': Cannot convert column 0 from array> to array>. [20:22:18] Thanks again milimetric [20:22:40] makes sense ottomata [20:22:54] ottomata: Do we try the double-reading trick? [20:22:54] was hoping it'd be smart enough to figure that out [20:23:02] joal then we have to change everything, no? [20:23:11] or, could we catch that at the DataFrameToHive level as a special case? [20:23:14] ottomata: I need to think about it again [20:23:33] 'wed have to say [20:23:50] if compatibleSchema contains complex array or map type change...do double read [20:23:52] ottomata: My idea would be to catch the thing when checking the types [20:23:56] yup [20:23:57] ah but we can't do double read in DataFrameToHive [20:24:03] unless we provide a string path to read again [20:24:10] we changed it so it works with any generic dataframe [20:24:46] ottomata: Could be a special trick of partitioned-dataFrame? readwithSchema ? [20:25:48] joal this would be a problem for e.g. eventlogging -> eventlogging sanitized too, right? [20:25:56] it does schema migrations there too [20:27:31] hm - wouldn't sanitization nullify anything not said to be kept (nullify, not drop) [20:27:34] ? [20:30:15] ottomata: About the double-reading trick: we could do it in refine, if an error occur in DataFrmaeToHive, but it;s not nice [20:31:03] joal for data ya, but it still needs to alter the schema [20:31:08] DataFrameToHive should support it [20:31:09] yar ok [20:31:20] ottomata: Given Spark or Hive don't let us change structs in arrays, we need to make thing fail if changes happen [20:31:27] i really don't like double read, not even just for the performance hit [20:31:33] just because it makes the whole thing harder to use [20:31:40] Yeah, I agree [20:31:48] it will fail, right? [20:31:52] you mean, we should fail more nicely? [20:31:58] As said just now - we could just say: no changes in struct in arrays [20:32:06] well, it will work though [20:32:11] in the case where you add a field and always populate it [20:32:32] actually joal. [20:32:32] in the case ALL fields are always populated at least once poer refined dataframe [20:32:36] its words than that [20:32:37] rigiht [20:32:38] per hour [20:32:57] which means really all struct fields inside of array or maps must always be set. [20:33:05] yes [20:33:13] we could make them not nullable? [20:33:29] hm, no, i think that wouldn't matter [20:33:31] for spark I don't think it changes anything [20:33:33] yeah [20:35:08] So it means alwas populate fields in array, or double-read [20:36:03] ottomata: Should we have another "refine" job doing double-read that we try in case of failure of simple-read? [20:36:06] It's ugy :( [20:36:13] s/ugy/ugly [20:37:03] yuck [20:37:05] hmmm [20:38:54] ottomata: I'm gonna stop for tonight, except if you need me to continue to brainstor, [20:40:26] trying things [20:42:00] joal strange error made me thinkg [20:42:03] i think not good [20:42:03] but [20:42:04] 10Analytics, 10ORES, 10Scoring-platform-team: Choose HDFS paths and partitioning for ORES scores - https://phabricator.wikimedia.org/T209731 (10JAllemandou) Another comment about folders that I hadn't thought before having read your update in the description: I actually think that the chosen is not the most... [20:42:04] i just tried [20:42:10] insert into table arr2 select array(NAMED_STRUCT('f1', "hi", 'f2', null)) as a1; [20:42:15] and got [20:42:24] Cannot convert column 0 from array> to array> [20:42:26] void ?? [20:42:48] CAST(NULL AS STRING) [20:44:16] could we use that to get missing fields in df as null? [20:44:21] when converting? [20:44:45] ottomata: We'd need to explode the arrya, no? [20:44:59] hmmmm [20:45:03] Here your array only has a single row [20:45:11] a single element sorry [20:45:33] We'd be willing to apply what you're saying to every element of the array - Meaning explode then recombine [20:46:02] yeah hm, hard to do in sql [20:46:38] ottomata: very expensive at recombine stage: explode is relatively easy, but to recombine you need to group by every other field [20:46:58] And I say very expensive, well it's a big group by :) [20:47:08] Doable though, but complex [20:47:41] And must involve an explode-recombine stage for every array to patch [20:48:10] right [20:48:11] ok [20:48:27] ok joal and [20:48:32] I really think the only viable solution here if we want so is double reading :( [20:48:34] if error is always null, it will be fine [20:48:41] yuuuuck i say! [20:48:42] :) [20:48:54] not error only, the other fields as well :) [20:49:09] 10Analytics, 10Patch-For-Review: Refinery Spark HiveExtensions schema merge should support merging of arrays with struct elements - https://phabricator.wikimedia.org/T210465 (10Ottomata) Hmm ok this is complicated. Complex type changes are always hard, but it seems they are extra hard when they are complex in... [20:49:14] every field must be populated at least once :) [20:49:18] ah yes [20:49:25] when there is error, the other fields need to be ull [20:49:29] For the json schema to be correct [20:49:36] actually, no [20:49:45] they can be something, doesn't matter [20:50:10] just probability [20:50:13] right? [20:50:19] oh null is not valid [20:50:20] yargh [20:50:23] i think [20:50:47] really? We don't accept nulls? [20:50:51] will check [20:50:53] wow then there is an issue:( [20:50:55] i think we'd need to ake the type [20:50:57] no we can make it [20:51:14] will check though [20:51:15] might be wrong [20:51:42] nope [20:51:42] Invalid type. Expected String but got Null. [20:51:46] ottomata: so, for your struct in array: every field of that struct needs to be populated at least once (even with null value) for a given hour [20:51:57] crap [20:52:16] it can be empty object? [20:52:18] hm [20:52:31] "error": {} validates, because fields inside are not required [20:52:53] can we handle that/ [20:52:54] ? [20:52:55] no [20:53:06] and you use: "error": { "message": "blah"} for real usage [20:53:07] becaus the infered elementType will be different [20:53:19] it will try to cast [20:53:21] true [20:53:28] gross [20:53:29] so [20:53:31] we'd need [20:53:41] error: { message: "", type: "" } [20:53:45] very gross [20:53:54] error: "" [20:54:08] no its an object [20:54:16] we can make the type be anyOf [20:54:20] with null i think [20:54:22] let me see [20:54:25] the schema gets nasty then... [20:54:35] Oh my [20:55:07] yeah now instead of [20:55:20] error: [20:55:20] type: object [20:55:20] properties: { } [20:55:21] it becomse [20:56:21] hmm maybe [20:56:22] looking [20:56:49] 10Analytics, 10Operations, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10TheDJ) what about ?debug=true ? We already vary on that right ? might as well vary which set of headers is let true... [20:58:15] AH [20:58:17] its not so bad joal [20:58:35] just need to add [20:58:36] "type": ["object", "null"], [20:58:38] that's the only change [20:58:39] don't need anyOf [20:58:43] ... i think that's ok. [20:59:11] ottomata: and error must be set in messages - We add some extra payload :( [20:59:21] ya but that's not a big deal [20:59:37] ok - Let's test before saying we won ;) [21:00:41] ook how to test....... [21:00:52] 10Analytics, 10Patch-For-Review: Refinery Spark HiveExtensions schema merge should support merging of arrays with struct elements - https://phabricator.wikimedia.org/T210465 (10Ottomata) Ah, rats, and in order to even support setting those to null, we need to change the schema to allow nulls. ` error:... [21:01:01] actually joal, i'm going to finish implementing the merge thing [21:01:08] we can sync up again tomorrow [21:01:21] need to talk to petr too now [21:01:35] ottomata: ok - Will drop then :) Tomorrow kids day, I'll be there at standup and later [21:01:40] k cool, ttyt [21:01:42] thanks for the help [21:01:52] np, sorry for the mess :( [21:02:07] completely unforseen [21:03:14] so random problem, it seems when i upgraded pyspark jobs from 1.6 to 2.3.1 i was able to remove SPARK_HOME from the oozie workflow.xml and things worked. It looks like since the cdh5.15 upgrade on the 8th i have to put SPARK_HOME back in for the jobs to start [21:03:23] i've put it back in, so no big deal, but curious [21:10:02] 10Quarry, 10Patch-For-Review: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564 (10Halfak) Yikes! I wonder if we could use the query optimization output to decide to not even start some queries. Does quarry use celery timeouts to kill queries? I've foun... [21:24:28] 10Quarry, 10Patch-For-Review: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564 (10zhuyifei1999) It currently only kills queries that mariadb knows that has been executing on the database for longer than 30 mins, how long it takes to store the query results... [21:54:08] 10Quarry, 10Patch-For-Review: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564 (10zhuyifei1999) p:05Unbreak!>03High (Lowered because the offending processes have been killed) [21:54:32] ebernhardson: i truly have no idea why that might be but we found we had to provide some "older" jars to spark to deal with some hive context workarrounds we were doing [21:54:57] ebernhardson: if you are using hivecontext in any of your spark jobs that might be related. [21:56:52] hmm, i think we always use HiveContext so possibly. [22:01:20] nuria: that will be unrelated for sure [22:01:37] that's only if you are using spark 2 and instantiating a Hive JDBC connection direclty from in the executor [22:01:45] rather than using the spark.sql API [22:01:54] which really no sane person should do (we are insane) [22:01:56] ottomata: because our usage of hive context and laters is not something that ebernhardson does? or cause spark_home would not include the jars we added? [22:06:18] because its not something he does [22:06:25] dunno why SPARK_HOME is needed now tho [22:06:54] nuria: we are manually including hive jars for our very specific case [22:07:00] where we need to alter tables from a spark job [22:07:09] and we are only doing so to work around a bug [22:07:27] ottomata: right, right [23:46:21] 10Analytics, 10ORES, 10Scoring-platform-team: Choose HDFS paths and partitioning for ORES scores - https://phabricator.wikimedia.org/T209731 (10awight) >>! In T209731#4779180, @JAllemandou wrote: > Another comment about folders that I hadn't thought before having read your update in the description: I actual... [23:52:22] (03PS1) 10Ottomata: [WIP] HiveExtensions StructType .merge supports Arrays and Maps with complex types [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476179 (https://phabricator.wikimedia.org/T210465) [23:55:36] (03CR) 10jerkins-bot: [V: 04-1] [WIP] HiveExtensions StructType .merge supports Arrays and Maps with complex types [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476179 (https://phabricator.wikimedia.org/T210465) (owner: 10Ottomata)