[00:12:39] (03PS1) 10Milimetric: Update changelog.md for v0.0.136 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/626518 [00:12:59] (03CR) 10Milimetric: [C: 03+2] Update changelog.md for v0.0.136 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/626518 (owner: 10Milimetric) [00:13:27] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Update changelog.md for v0.0.136 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/626518 (owner: 10Milimetric) [00:13:44] Starting build #59 for job analytics-refinery-maven-release-docker [00:22:58] Project analytics-refinery-maven-release-docker build #59: 09SUCCESS in 9 min 14 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/59/ [00:24:55] Starting build #26 for job analytics-refinery-update-jars-docker [00:25:11] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.0.136 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/626520 [00:25:12] Project analytics-refinery-update-jars-docker build #26: 09SUCCESS in 16 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/26/ [00:25:48] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add refinery-source jars for v0.0.136 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/626520 (owner: 10Maven-release-user) [00:40:16] (03CR) 10Milimetric: "hm, the deploy etherpad comment says to restart only a couple of the coordinators, but it looks like both hourly and daily bundles changed" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/623456 (https://phabricator.wikimedia.org/T257691) (owner: 10Nuria) [00:46:53] !log deployed refinery source 0.0.136, refinery, and synced to HDFS [00:46:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [01:20:48] (03PS1) 10Milimetric: Fix hql for loading editors_bycountry [analytics/refinery] - 10https://gerrit.wikimedia.org/r/626524 [01:22:57] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "(just FYI Jo)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/626524 (owner: 10Milimetric) [01:32:51] !log deployed small fix for hql of editors_bycountry load job [01:32:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:14:51] (03CR) 10Joal: "Oh sorry about that :( I tested without inserting obviously." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/626524 (owner: 10Milimetric) [06:15:10] Good morning team - I'm AFK for 2h for a doctor appointment [06:27:03] o/ [06:53:49] 10Analytics-Clusters, 10Discovery, 10Discovery-Search (Current work), 10Patch-For-Review: mjolnir-kafka-msearch-daemon dropping produced messages after move to search-loader[12]001 - https://phabricator.wikimedia.org/T260305 (10elukey) >>! In T260305#6449812, @elukey wrote: > I noticed that kafka-python==1... [07:31:29] 10Analytics-Radar, 10Performance-Team, 10MW-1.36-notes (1.36.0-wmf.8; 2020-09-08): Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10Gilles) 05Open→03Resolved [07:33:17] joal: "High Performance Computing with Apache Spark and Parquet on Mission Critical Tasks" -- keynote by Edmon Begoli, Director of Scalable Protected Data Facilities at Oak Ridge National Laboratory [07:44:42] just registered to apachecon (+donation) [07:46:46] TIL https://www.datastax.com/products/datastax-astra [08:14:31] I am doing an experiment in test for https://gerrit.wikimedia.org/r/c/operations/puppet/+/626595 [08:14:35] to enforce the oozie admin list [08:18:46] elukey: registered to ApacheCon as well :) [08:25:54] joal: :) did you see my patch aboce? [08:25:57] *above [08:26:07] looking [08:28:55] elukey: if it works that'd be great!! [08:29:56] so I just restarted one job for 'analytics' in test [08:30:44] info in https://oozie.apache.org/docs/4.2.0/AG_Install.html#User_Authorization_Configuration [08:31:51] and I can also kill it [08:32:09] (from hue I mean) [08:32:35] now if I remove myself from the adminlist I shouldn't be able [08:35:39] E0509: User [elukey] not authorized for Coord job [0000003-200811104837120-oozie-oozi-C] [08:35:42] niceeee [08:35:58] \o/ [08:37:02] the only use case that I have in mind is people running jobs as analytics-search/product/etc.. [08:37:47] there is a "Users have write access to jobs based on an Access Control List (list of users and groups)" [08:38:59] but I am not sure how to configure it [08:41:22] there seems to be something required to do on the job xml level [08:41:50] http://oozie.apache.org/docs/5.2.0/AG_Install.html#Defining_Access_Control_Lists (this is 5.2's doc though) [08:42:11] err properties [08:42:37] group.name=etc..,etc.. and then the user will be matched via HDFS check [08:42:53] that seems nice [08:43:18] so all our jobs will be locked down, and people launching their own coords/bundles will have to add the group.name [08:43:24] joal: how does it sound? [08:43:26] (if it works) [08:43:49] elukey: it sounds a lot better than having anyone being able to do whatever with the jobs :) [08:44:37] elukey: I think the SLA errors of tonight are due to burst usage of the cluster - I wonder how we could get historical view of who uses what :S [08:45:18] joal: but Marcel did re-run failed coords IIUC no? [08:45:31] elukey: that's my understanding as well [08:45:47] but I didn't get any email about those [08:46:40] like, why webrequest-load-coord-text 17UTC failed? [08:49:25] I am not sure in hue how to find the failed coord to check logs etc. [08:51:14] mforns: o/ when you are online, I have question [08:51:42] * elukey coffee [08:52:28] elukey: I found some info [09:17:52] "Good" morning from me and my massive headache :-S [09:23:05] klausman: My best wishes to you, and not your headach :S [09:31:41] Thanks. I have tracked down som Ibuprofen. Hopefully, it'll help soon [09:53:07] klausman: rest and take it easy! [10:37:16] 10Analytics-Clusters: Review and improve Oozie authorization permissions - https://phabricator.wikimedia.org/T262660 (10elukey) [10:37:28] there you go --^ [10:39:46] going afk for lunch! bb in ~2h [11:01:08] * addshore needs to add this to some docs.... [11:01:20] What is the best way to see what snapshots are availbe in wmf.wikidata_entity ? [11:09:46] addshore: show partitions wmf.wikidata_entity [11:11:09] ty! [11:11:20] :) [11:24:36] joal: and what is the "fastest" way to run SHOW PARTITIONS wmf.wikidata_entity as a one off in a command lline? "hive -e "SHOW PARTITIONS wmf.wikidata_entity"" ? [11:24:53] addshore: seems correct :) [11:25:38] addshore: another way is from hdfs: hdfs dfs -ls /wmf/data/wmf/wikidata/entity [11:26:04] nice, that is slightly faster [11:35:57] https://www.irccloud.com/pastebin/TA6jEQYZ/ [11:36:24] joal: ^^ any idea why my parition bit complains? :D Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict [11:36:34] I have a hunce you may tell em to stop using hive [11:36:37] *hunch [11:37:03] addshore: you need to give your inserted partition a value [11:37:28] snapshot = snapshot? :/ [11:37:54] addshore: Or, if you wish the snapshot partition to be extracted from the snapshot-column of the read data, you need to put it as the last column of your selected data [11:38:54] And you might need to set a specific param for hive to accept (dynamic partitioning is the name of what you're doing - adding a partition based on computed values) - It's off by default on our cluster cause it can generate trouble if plenty partitions are craated by mistake [11:39:13] addshore: Our usual wa to add single partition into a table is to manually set the value [11:39:43] addshore: for instance: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/pageview/hourly/pageview_hourly.hql#L32 [11:40:38] addshore: makes sense? [11:40:40] so, when selecting the value and the last value I get the same error :/ [11:40:41] https://www.irccloud.com/pastebin/rNPbYEBN/ [11:41:03] addshore: SET hive.exec.dynamic.partition.mode=nonstrict; [11:41:09] it should work [11:41:26] okay! :) [11:41:32] and "PARTITION(snapshot = snapshot)" [11:41:44] and last value selected being partition [12:42:42] hi elukey I'm here :] [12:47:25] mforns: hello! Already self answered my qs with Joseph, all good :) [13:07:28] mforns: also let me know if what I wrote makes sense! [13:14:07] elukey: related to the failed jobs? yes makes sense! [13:14:47] yep! [13:14:53] thanks :) [13:23:53] joal: you there? [13:23:58] I am elukey [13:24:16] I found something interesting while debugging a problem with Hue [13:24:23] ? [13:24:26] if you have 5 mins [13:24:28] not super urgent [13:24:38] Da cave? [13:24:47] in here is better, need to copy paste [13:25:02] sure [13:25:10] so the "Graph" panel of the webrequest_load workflow is not rendered [13:25:17] (Even in our version of hue) [13:25:48] with debug on, I see in the logs [13:25:54] dashboard ERROR Graph data could not be generated from Workflow 0002679-200811104837120-oozie-oozi-W: Expecting , delimiter: line 1 column 1070 (char 1069) [13:26:31] I dumped the variable that python tries to json.load [13:26:35] and this is the issue [13:26:35] MEH [13:26:36] "subworkflow":{"app-path":"${replaceAll(wf:appPath(),"/[^/]*$","")}/check_sequence_statistics_workflow.xml"}} [13:26:58] after "app-path": we open a ", but the others are not escaped [13:27:18] I mean, the ones right after appPath [13:27:50] weird :S [13:27:59] if I do something like "subworkflow":{"app-path":"${replaceAll(wf:appPath(),\"/[^/]*$\",\"\")}/check_sequence_statistics_workflow.xml"} it renders [13:28:09] and it makes sense in my opinion [13:29:32] elukey: if the execution works - we can correct! It'll be greqt to see the graph! [13:30:08] joal: this is where I needed your opinion, not sure how to test it to have a good answer [13:30:25] maybe in hadoop test I can change the file on hdfs and see how it goes [13:30:37] elukey: I was about to suggest that [13:30:45] elukey: I can't recall if we run stats on test [13:31:10] we do but the limit is 100% :D [13:31:24] something like "subworkflow":{"app-path":"${replaceAll(wf:appPath(),'/[^/]*$','')}/check_sequence_statistics_workflow.xml" is also less invasive [13:33:52] elukey: test happens here: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/webrequest/load/workflow.xml#L267 [13:34:16] elukey: I have no idea if oozie likes escaping or single -quotes though :S [13:34:38] joal: oozie doesn't like anything :D [13:34:45] true dat [13:35:36] testing [13:38:21] No partition predicate found for Alias "x:wikidata_map_item_relations" Table "wikidata_map_item_pixels" [13:38:27] joal: any ideas? :D [13:38:34] https://www.irccloud.com/pastebin/8PuwIqhC/ [13:38:39] im a partition newbie [13:39:03] addshore: you have to provide at least one partition value [13:39:13] even if your table only has one [13:39:30] for instance: SELECT * FROM wmf.webrequest; will fail [13:39:33] so, this has to be in the sub query somewhere? [13:39:47] oh wait, I need a where in the outer query for snapshot? [13:39:49] for instance: SELECT * FROM wmf.webrequest where webrequest_source = 'text'; will succeed (too big howver) [13:41:36] addshore: the joins need to be restrained by snapshots I think [13:41:51] hmmm okay *thinks and trties* [13:42:58] JOIN addshore.wikidata_map_item_pixels a ON (a.id = x.fromId) AND a.snapshot=x.snapshot [13:43:15] also fails :/ [13:45:19] https://www.irccloud.com/pastebin/WeMq42O7/ [13:45:38] addshore: line 16 - b.snapshot [13:45:42] dammit! [13:45:46] yes, that was it! :) [13:53:26] (03PS1) 10Elukey: oozie: Fix webrequest load workflow string [analytics/refinery] - 10https://gerrit.wikimedia.org/r/626691 [13:54:05] 720613501702297 [13:55:40] elukey: this--^ was the used number of bytes on the cluster for image I analyse :) [13:55:50] * joal has aggregated sizes now :D [13:56:12] wait sorry what it that number? [13:56:49] elukey: 'hdfs dfs -du -s /' for the fs_image I have taken as an example [13:57:00] ah ok! [13:59:58] (03PS2) 10Elukey: oozie: Fix webrequest load workflow string [analytics/refinery] - 10https://gerrit.wikimedia.org/r/626691 [14:39:53] no more coffee in my home! /o\ [14:39:57] what a mess [14:40:06] going to buy some before the weekend :) [14:52:47] 10Analytics: install mwparserfromhell on spark for efficient usage of wikitext-dump in hive - https://phabricator.wikimedia.org/T262044 (10MGerlach) @Ottomata thanks, this works. I tried jupyterhub with the anaconda base-env on stat1008 and was able to use mwparserfromhell with spark to parse the wikitext-table... [14:58:18] mforns: if you're around maybe you can help me decide what to do on the data quality jobs? [15:04:05] milimetric: we can look at data quality jobs later (or monday) [15:08:49] 10Analytics, 10Event-Platform, 10Privacy Engineering: Remove http.client_ip from EventGate default schema (again) - https://phabricator.wikimedia.org/T262626 (10Krinkle) As a first step, we can start plugging this from the end of the pipeline and work our way back. For Logstash, the intake process supports... [15:09:35] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10observability: Remove http.client_ip from EventGate default schema (again) - https://phabricator.wikimedia.org/T262626 (10Krinkle) [15:34:25] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10observability: Remove http.client_ip from EventGate default schema (again) - https://phabricator.wikimedia.org/T262626 (10Nuria) cc @ottomata Should we go for the filter 1st? seems good practice to have one for sensitive data that might , by mistke,... [15:40:20] (03CR) 10Milimetric: "Discussed and restarted both hourly and daily bundles with new deployment:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/623456 (https://phabricator.wikimedia.org/T257691) (owner: 10Nuria) [15:41:10] !log restarted data quality stats bundles [15:41:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:47:09] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey) I went into system config and found `Physical Disk 00:01:12: SSD, SATA, 446.625GB, Ready, (512B)`, but in theory we have... [15:59:56] (03PS1) 10Paul Kernfeld: Add a link in the footer to translate Wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/626711 [16:00:31] (03PS2) 10Paul Kernfeld: Add a link in the footer to translate Wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/626711 (https://phabricator.wikimedia.org/T261502) [16:01:57] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics: Bug: 'Include Time' option in table visualization produces "0NaN-NaN-NaN NaN:NaN:NaN" - https://phabricator.wikimedia.org/T256136 (10mpopov) [16:02:00] 10Analytics-Clusters: Upgrade to Superset 0.37.x - https://phabricator.wikimedia.org/T262162 (10mpopov) [16:02:55] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics: Bug: 'Include Time' option in table visualization produces "0NaN-NaN-NaN NaN:NaN:NaN" - https://phabricator.wikimedia.org/T256136 (10mpopov) According to the replies to https://github.com/apache/incubator-superset/issues/10203 this is fixed in >=0... [16:03:30] milimetric: I haz chartz! [16:03:38] 10Analytics, 10Analytics-Wikistats, 10I18n, 10Patch-For-Review, 10good first task: Add link to translatewiki.net in wikistats footer - https://phabricator.wikimedia.org/T261502 (10paulkernfeld) a:03paulkernfeld While getting started on this change I got a bit stuck because I was trying to use Node 14.... [16:17:19] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10Papaul) here is what the Controller is showing {F32270656} [16:18:13] we might have a problem with 16 new hadoop workers sigh --^ [16:30:17] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: an-worker11[02-17] have only one SSD in the flexbay - https://phabricator.wikimedia.org/T262690 (10elukey) [16:31:11] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey) 05Open→03Stalled Pending T262690 [16:31:17] nuria: --^ [16:32:11] TL;DR: the 16 hadoop worker nodes (refresh for the past fiscal) cannot be put into service yet, the other 24 for this fiscal are yet to be delivered (but we ordered them) [16:32:16] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: an-worker11[02-17] have only one SSD in the flexbay - https://phabricator.wikimedia.org/T262690 (10RobH) Confirmed this is indeed the case. It appears that on the ordering task T246784, packing slips were not included in the shipment, and instead had... [16:32:16] (the gpu ones are ok) [16:39:45] milimetric: you mean the alert received today? I responded in the thread, we can discuss with the team in standup maybe? [17:07:26] mforns: no worries, sorted it out, not about the alert, was just double-checking what I should restart with the deploy [17:08:24] joal: I love chartz :) [17:08:38] milimetric: o/ [17:08:43] howdy [17:08:46] what chartz! [17:08:53] milimetric: quick show off in cave? [17:08:58] omw [17:22:05] elukey: is the problem described on any ticket? [17:22:19] nuria: https://phabricator.wikimedia.org/T262690 [17:23:53] not a big deal but I pinged you as FYI [17:24:02] milimetric: added restart commands to train etherpad for reference [17:24:44] elukey: got it [17:27:42] mforns: I think it makes sense to keep (as monitored metric) the "combined" metric [17:28:21] going afk for the weekend folks o/ [17:29:00] ciao elukey [17:34:13] 10Analytics, 10Analytics-Wikistats, 10I18n, 10Patch-For-Review, 10good first task: Add link to translatewiki.net in wikistats footer - https://phabricator.wikimedia.org/T261502 (10Nuria) @paulkernfeld updating README sounds great [17:36:43] (03PS3) 10Paul Kernfeld: Add a link in the footer to translate Wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/626711 (https://phabricator.wikimedia.org/T261502) [17:37:32] 10Analytics, 10Analytics-Wikistats, 10I18n, 10Patch-For-Review, 10good first task: Add link to translatewiki.net in wikistats footer - https://phabricator.wikimedia.org/T261502 (10paulkernfeld) 👍 just updated the README in the same patch [17:59:27] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Add editors per country data to AQS API (geoeditors) - https://phabricator.wikimedia.org/T238365 (10Milimetric) quick status update (tl;dr; we can announce it Monday): * data loaded from 2018-01 to 2020-08, using Joseph's new ordering technique (descend... [18:00:54] 10Analytics, 10Analytics-Kanban, 10Privacy Engineering, 10Product-Analytics, and 3 others: Drop data from Prefupdate schema that is older than 90 days - https://phabricator.wikimedia.org/T250049 (10Milimetric) Oof, good point, I hadn't thought of the Druid version of this. I suppose I'll have to wipe and... [18:21:17] 👋 i'm a community member with some background in python and "data engineering." i'm interested in trying to help out with python tasks. [18:26:45] 10Analytics, 10Analytics-Kanban, 10Privacy Engineering, 10Product-Analytics, and 3 others: Drop data from Prefupdate schema that is older than 90 days - https://phabricator.wikimedia.org/T250049 (10Nuria) {F32271218} I do not think the druid data has anything that needs deletion, does it? [18:28:56] 10Analytics, 10Analytics-Wikistats, 10I18n, 10Patch-For-Review, 10good first task: Add link to translatewiki.net in wikistats footer - https://phabricator.wikimedia.org/T261502 (10Nuria) @paulkernfeld dist files are not needed , just source files [18:31:54] (03CR) 10Nuria: [C: 03+2] "Seems like this is tested so +2" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/626691 (owner: 10Elukey) [18:32:53] (03CR) 10Nuria: [V: 03+2 C: 03+2] oozie: Fix webrequest load workflow string [analytics/refinery] - 10https://gerrit.wikimedia.org/r/626691 (owner: 10Elukey) [18:33:14] (03CR) 10Nuria: "Submitted and added to train etherpad" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/626691 (owner: 10Elukey) [19:02:12] (03PS4) 10Paul Kernfeld: Add a link in the footer to translate Wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/626711 (https://phabricator.wikimedia.org/T261502) [19:32:37] hi paulkernfeld, thanks so much! We have a bit of python but a lot more of "data engineering". If you'd like we can talk a bit and figure out what your level of interest is, match you up with something appropriate? [19:40:09] 10Analytics-Radar, 10Event-Platform, 10MW-1.36-notes (1.36.0-wmf.9; 2020-09-15), 10Platform Team Workboards (Clinic Duty Team), 10Wikimedia-production-error: PHP Notice: Array to string conversion (from EventBus.php) - https://phabricator.wikimedia.org/T262462 (10Krinkle) [20:05:15] 10Analytics, 10Patch-For-Review: Add urlshortener button to Turnilo - https://phabricator.wikimedia.org/T233336 (10Milimetric) https://github.com/allegro/turnilo/pull/657 [20:29:56] 10Analytics, 10Analytics-Kanban, 10Privacy Engineering, 10Product-Analytics, and 3 others: Drop data from Prefupdate schema that is older than 90 days - https://phabricator.wikimedia.org/T250049 (10Nuria) {F32272016} [21:00:53] 10Analytics, 10Analytics-Kanban, 10Privacy Engineering, 10Product-Analytics, and 3 others: Drop data from Prefupdate schema that is older than 90 days - https://phabricator.wikimedia.org/T250049 (10Nuria) Correction: while druid only stores counts of ocurrence of some properties, since https://gerrit.wiki... [21:02:33] (03CR) 10Nuria: Add a link in the footer to translate Wikistats (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/626711 (https://phabricator.wikimedia.org/T261502) (owner: 10Paul Kernfeld) [21:28:41] paulkernfeld: hello, are you open to more javascript work? [22:40:59] 10Analytics, 10Analytics-Wikistats: "Active editors" panel keeps flashing on stats.wikimedia.org - https://phabricator.wikimedia.org/T262725 (10Krinkle)