[06:42:56] PROBLEM - cache_text: Varnishkafka Webrequest Delivery Errors per second -codfw- on icinga1001 is CRITICAL: bad_data: parse error at char 97: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=codfw&prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=All [06:45:06] PROBLEM - cache_text: Varnishkafka Webrequest Delivery Errors per second -eqiad- on icinga1001 is CRITICAL: bad_data: parse error at char 97: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=eqiad&var-source=statsv&var-cp_cluster=cache_text&var-instance=All [06:45:15] these are false alarms [06:45:27] unclosed left parenthesis sigh [06:47:16] PROBLEM - cache_text: Varnishkafka Webrequest Delivery Errors per second -eqsin- on icinga1001 is CRITICAL: bad_data: parse error at char 97: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=eqsin&var-source=statsv&var-cp_cluster=cache_text&var-instance=All [06:49:18] PROBLEM - cache_text: Varnishkafka Webrequest Delivery Errors per second -esams- on icinga1001 is CRITICAL: bad_data: parse error at char 97: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=esams&prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=All [06:51:18] PROBLEM - cache_text: Varnishkafka Webrequest Delivery Errors per second -ulsfo- on icinga1001 is CRITICAL: bad_data: parse error at char 97: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=ulsfo&var-source=statsv&var-cp_cluster=cache_text&var-instance=All [06:53:20] RECOVERY - cache_text: Varnishkafka Webrequest Delivery Errors per second -esams- on icinga1001 is OK: (C)5 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=esams&prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=All [06:53:32] RECOVERY - cache_text: Varnishkafka Webrequest Delivery Errors per second -codfw- on icinga1001 is OK: (C)5 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=codfw&prometheus/ops&var-source=statsv&var-cp_cluster=cache_text&var-instance=All [06:54:02] RECOVERY - cache_text: Varnishkafka Webrequest Delivery Errors per second -ulsfo- on icinga1001 is OK: (C)5 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=ulsfo&var-source=statsv&var-cp_cluster=cache_text&var-instance=All [06:54:10] RECOVERY - cache_text: Varnishkafka Webrequest Delivery Errors per second -eqsin- on icinga1001 is OK: (C)5 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=eqsin&var-source=statsv&var-cp_cluster=cache_text&var-instance=All [07:00:46] RECOVERY - cache_text: Varnishkafka Webrequest Delivery Errors per second -eqiad- on icinga1001 is OK: (C)5 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka https://grafana.wikimedia.org/d/000000253/varnishkafka?panelId=20&fullscreen&orgId=1&var-datasource=eqiad&var-source=statsv&var-cp_cluster=cache_text&var-instance=All [07:07:24] brb [07:55:03] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: TLS certificates for Analytics origin servers - https://phabricator.wikimedia.org/T227860 (10ema) 05Open→03Resolved Thank you so much @elukey! ATS is now using TLS only for connections to #analytics origins. [08:01:40] fdans: o/ - left some notes to https://etherpad.wikimedia.org/p/analytics-weekly-train about stuff for this deployment [08:02:25] elukey: reading [08:03:08] if you are ok I'll add a couple more oozie coords to restart for hive2 actions today [08:03:27] elukey: would you prefer to do the snapshot deployment now, in the morning? [08:03:57] whenever you want! [08:24:57] (03PS1) 10Elukey: mobile_apps: move uniques daily/monthly oozie coords to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528708 (https://phabricator.wikimedia.org/T227257) [08:41:32] (03PS1) 10Elukey: pageview: move druid oozie coordinators to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528714 (https://phabricator.wikimedia.org/T227257) [08:44:35] (03PS2) 10Elukey: pageview: move druid oozie coordinators to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528714 (https://phabricator.wikimedia.org/T227257) [08:45:40] ok hive2 code reviews done for this week :) [09:00:22] 10Analytics, 10Operations, 10hardware-requests, 10User-Elukey: eqiad: 1 misc node for the Kerberos KDC service - https://phabricator.wikimedia.org/T227288 (10elukey) Looks good to me (followed up only on the codfw task). Can we get them repurposed? [09:06:36] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10Wikidata-Campsite: Track WDQS updater UA in wikidata-special-entitydata grafana dashboard - https://phabricator.wikimedia.org/T218998 (10Ladsgroup) Note: If {T199219} gets done (which IMO, it should) we can't use hadoop to track UA. The WDQS itself s... [09:41:11] 10Analytics, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey) Very interesting reading: https://www.tldp.org/HOWTO/Kerberos-Infrastructure-HOWTO/server-replication.html My understanding is that: * `kdb5_util dump` could be used to periodi... [10:28:17] 10Analytics, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey) I tried to use `kdb5_util dump` on kerberos1001, the resulting file was 24K. It might be worth to avoid Bacula and have a simple rsync on the KDC slave that copies dumps periodic... [11:06:47] * elukey lunch! [11:16:11] anyone else having issues logging into swap? my tunnel's up, localhost:8000 loads fine in the browser. a bad pw gives an error warning. but a good pw just redirects back to the login page, even though a jupyter-hub-token is set. [11:16:49] this happens starting from a completely fresh browsing session [11:25:03] btw, i had used the jupyter 'close'/'stop' (i forget the button name now?) button not the jupyter interface 'logout' button (nor just closing the browser) when i wrapped jupyter work earlier today, prior to rebooting the mac for updates. the symptoms i'm presently seeing happen in both safari and firefox. i mention this use of the button in case it has any influence on session handshaking, and i mention mac reboot just in case (although [11:25:03] i don't think the security upgrade has any bearing here) [11:27:10] (03PS1) 10Ladsgroup: New build [analytics/wmde/toolkit-analyzer-build] - 10https://gerrit.wikimedia.org/r/528767 (https://phabricator.wikimedia.org/T214894) [12:11:07] /away [12:11:44] dr0ptp4kt: o/ [12:11:52] what notebook are you using?/ [12:18:26] assuming notebook1003, I don't see anything weird in your unit's logs [12:22:57] dr0ptp4kt: just logged in, I can see my page [12:24:27] (brb) [12:29:06] elukey: ayayay, I didn't check out the latest branch i fetched [12:37:56] :) [12:39:09] fdans: also, one nit - the "::common::" part is not needed [12:39:17] basically it is related to hiera [12:39:24] (because you have common, eqiad, etc..) [12:39:33] the puppet role is role::aqs [12:40:16] elukey: excuse me mr luca I totally copypasted your commit message [12:42:07] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey) [12:44:38] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Allow all Analytics tools to work with Kerberos auth - https://phabricator.wikimedia.org/T226698 (10elukey) After some tests to make Refine work with Kerberos in T228291 we decided to leave RPC encryption and authentication disabled for Sp... [12:45:57] elukey: ok just rebased and changed commit msg [12:46:21] it's good that I'm not in the ops channel cos there must be a bunch of notifications there right now [12:48:42] is is always very chatty :) [12:49:29] fdans: I am running puppet on the aqs nodes [12:49:37] do you want to proceed with the testing? [12:52:31] elukey: yes [12:52:49] elukey: this is the first time I do this, what do yall usually look for? [12:54:01] fdans: I have never checked the urls, but Joseph checked that the endpoint on aqs1004 with queries for new data [12:54:04] to make sure that it was good [12:54:34] I don't think that he has ever wrote down a list [12:55:06] but I can access his bash history [12:55:10] and give you some commands [12:55:12] if you want [12:55:38] elukey: that would be helpful, I'll update the docs accordingly [12:55:57] one example is [12:55:58] curl -X GET --header 'Accept: application/json; charset=utf-8' 'http://aqs1004.eqiad.wmnet:7232/analytics.wikimedia.org/v1/edits/aggregate/fr.wikipedia/all-editor-types/content/daily/20181201/20190401' [12:56:17] with the new interval of course [12:57:13] elukey: I refuse to query frwiki, but let me check [12:57:37] ahahhah [12:57:41] fdans: check /home/fdans/test_aqs_endpoint on aqs1004 [12:57:53] when you are ready, I'll depool and restart aqs on it [12:59:37] elukey: i don't have permission to run the file [13:00:38] elukey: I am ready though :) [13:03:19] fdans: you need to read it and see the queries first, then you can add the exec perm :) [13:03:59] those are some queries, pick the ones that you think make sense and test the endpoint before/after [13:04:32] elukey: have you restarted aqs yet? [13:04:43] nope [13:05:24] I think that we should first get a list of test queries to make, and verify the actual result [13:05:45] that should of course not show results for July [13:05:52] then depool/restart and re-check [13:06:01] sounds good elukey [13:10:11] elukey: ok I got a bunch of queries for july that right now are not returning any results [13:11:48] fdans: all right depool+restart [13:12:29] done! [13:12:32] fdans: you can re-check now [13:12:35] testing [13:15:28] elukey: data looks good [13:20:01] fdans: ack just repooled [13:20:08] (I was writing a task sorry) [13:20:29] triple checking for a couple of mins that no weird HTTP return codes are returned [13:20:32] then I'll apply to all [13:20:33] fdans: ok? [13:20:34] elukey: no problem! anything else we have to do? I'll check wikistats now to make sure everything is fine [13:21:07] yeah when I am finished if you could triple check in there it would be great [13:26:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) (+Luca) jupyterhub-singleuser is the Jupyterhub login process that will spawn Jupyter notebookss for individual users. I'll stop it, and it shouldn't start up again... [13:28:01] ottomata: o/ [13:28:03] didn't get --^ :D [13:28:09] heya teamm [13:29:05] o/ [13:30:12] fdans: ready to test! [13:32:12] elukey: looking good [13:33:21] then we are done :) [13:33:28] let's document this procedure in the on-call page [13:44:11] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1004 is CRITICAL: connect to address 10.64.5.104 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [13:47:23] just forced the remount [13:47:26] oom killer had a party [14:02:11] 10Analytics, 10Analytics-Kanban: Add more dimensions to netflow's druid ingestion specs - https://phabricator.wikimedia.org/T229682 (10mforns) > - ip_src - ip address cardinality > - ip_dst - ip address cardinality I think the cardinality of these fields is too big for Druid. Each one of these dimensions woul... [14:05:22] mforns: can you explain me --^ [14:05:31] I am a bit ignorant about this part of druid [14:08:06] elukey, sure [14:08:23] if we add a column with the IP of the src [14:08:51] it means we'll have to split all current rows by IP [14:10:06] if before we had: as_dst=blah, as_path=foo, peer_as_dst=bar, count=100 [14:10:14] now we'll have: [14:10:36] as_dst=blah, as_path=foo, peer_as_dst=bar, count=100, IP=984567 [14:10:38] as_dst=blah, as_path=foo, peer_as_dst=bar, count=100, IP=5678 [14:10:41] sorry [14:10:56] now we'll have: [14:11:01] as_dst=blah, as_path=foo, peer_as_dst=bar, count=2, IP=984567 [14:11:06] as_dst=blah, as_path=foo, peer_as_dst=bar, count=3, IP=569087 [14:11:12] as_dst=blah, as_path=foo, peer_as_dst=bar, count=1, IP=945678 [14:11:17] ... [14:11:47] ah ok so the data will be grouped in a more granular way [14:11:55] and it will cause segments to get bigger [14:12:00] adding a column multiplies the data set by the cardinality of that new column (or slightly less, depending on current partitioning) [14:12:27] adding two columns of such high cardinality as IP is too much I think [14:12:40] but we do it for one IP column for webrequest right? [14:12:58] in Druid? [14:13:29] oh yea... [14:13:29] https://turnilo.wikimedia.org/#webrequest_sampled_128 [14:14:49] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1004 is OK: OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [14:15:12] elukey, ok you convinced me [14:15:17] :] [14:15:24] nono wait I am trying to understand :D [14:15:27] hehe [14:15:43] well, maybe it's ok, it will be small enough, dunno, we have to try [14:16:05] you know how many netflow rows we have per hour? [14:16:08] lookin [14:19:28] in theory few now, but they'll increase in the future [14:19:47] I am trying to find a good way to understand if we need more capacity or not [14:20:23] the last hours have abount 324735 rows [14:20:47] those are already exploded by all new fields [14:21:34] seems one order of magnitude smaller than webrequest_sampled_128 [14:21:41] so yea.. I think you're right [14:22:03] 324735 is in hive? [14:22:59] yes [14:24:06] unless... the hits metric in the webrequest_sampled_128 is aggregated... hmmm [14:24:47] I'm not sure! I think the best is we try and load 1 day of data or so, and see how big it is! [14:32:44] makes sense! [14:37:25] (03CR) 10WMDE-leszek: [C: 03+2] New build [analytics/wmde/toolkit-analyzer-build] - 10https://gerrit.wikimedia.org/r/528767 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [14:37:35] (03Merged) 10jenkins-bot: New build [analytics/wmde/toolkit-analyzer-build] - 10https://gerrit.wikimedia.org/r/528767 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [14:56:17] afk for a bit [15:03:07] elukey: yeah, it's notebook1003 i'm trying on. able to do a hangouts meet for a little bit to troubleshoot? [15:14:03] elukey: o/ sorry missed your ping too! [15:14:21] dr0ptp4kt: what's your troubles? [15:14:37] I am back sorry :) [15:15:13] ottomata: there is a description of the problem in the chan if you have backscroll [15:15:35] earliest i have today is 2 hrs ago [15:18:04] (03PS1) 10Ladsgroup: New build [analytics/wmde/toolkit-analyzer-build] - 10https://gerrit.wikimedia.org/r/528871 (https://phabricator.wikimedia.org/T214894) [15:18:09] (03CR) 10Ladsgroup: [C: 03+2] New build [analytics/wmde/toolkit-analyzer-build] - 10https://gerrit.wikimedia.org/r/528871 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [15:18:19] (03Merged) 10jenkins-bot: New build [analytics/wmde/toolkit-analyzer-build] - 10https://gerrit.wikimedia.org/r/528871 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [15:20:17] ottomata: ah ok, lemme paste [15:20:59] anyone else having issues logging into swap? my tunnel's up, localhost:8000 loads fine in the browser. a bad pw gives an error [15:21:02] warning. but a good pw just redirects back to the login page, even though a jupyter-hub-token is set. [15:21:05] this happens starting from a completely fresh browsing session [15:21:08] btw, i had used the jupyter 'close'/'stop' (i forget the button name now?) button not the jupyter interface 'logout' button (nor [15:21:11] just closing the browser) when i wrapped jupyter work earlier today, prior to rebooting the mac for updates. the symptoms i'm [15:21:14] presently seeing happen in both safari and firefox. i mention this use of the button in case it has any influence on session [15:21:17] handshaking, and i mention mac reboot just in case (although [15:21:20] i don't think the security upgrade has any bearing here) [15:21:23] paste is horrible sorry [15:22:14] hmm seems to work for me. [15:22:38] dr0ptp4kt: i'm going to restart your jupyterhub instance [15:23:17] checked as well, it was working for me, didn't find any logs in dr0ptp4kt's unit [15:23:27] dr0ptp4kt: try logging in again? [15:28:10] (03Abandoned) 10Fdans: [wip]Add file extension and media classification to mediacounts job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/522390 (https://phabricator.wikimedia.org/T225911) (owner: 10Fdans) [15:32:21] ottomata: meeting? [15:32:49] Oh coming [15:34:26] I'm just realizing, milimetric and mforns, that we have a similar exploding problem with total_bytes [15:34:41] as in, we can't backfill that data [15:34:42] fdans, ? [15:34:54] oh, like counts you mean [15:35:25] mforns: well, it's a sum of all bytes for all the requests in the row [15:35:34] aha [15:35:54] my brain is broken I still don’t get it [15:36:07] milimetric: mforns bc a second? [15:36:12] sure [15:36:13] k [15:36:39] milimetric: what's the batcave 2 url? [15:36:42] batcave2! [15:36:49] hm... [15:36:52] good question [15:36:54] let's make one [15:36:55] right... [15:39:57] milimetric: mforns https://meet.google.com/uob-roya-joe ? [15:40:10] omw [15:40:38] http://bit.ly/a-batcave-2 [15:40:47] mforns / fdans ^ official and permanent [15:42:26] 10Analytics, 10Better Use Of Data, 10EventBus, 10Reading-Infrastructure-Team-Backlog: Create client side error schema - https://phabricator.wikimedia.org/T229442 (10LGoto) @Tgr what's the priority of this task? [15:53:20] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Fix "Must provide the 'topic' parameter" in ORES /precache endpoint - https://phabricator.wikimedia.org/T228689 (10Halfak) 05Open→03Resolved [15:53:28] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10CPT Initiatives (Modern Event Platform (TEC2)), 10Services (watching): Fix revision-score event production in change-prop after migration of revision-create to eventgate-main - https://phabricator.wikimedia.org/T228688 (10Halfak) [16:01:11] a-team standup! [16:01:52] leila, I added a couple points that we discussed during the meeting in the doc [16:20:23] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10kzimmerman) Thanks Andrew! Kate -- Kate Zimmerman (she/they) Head of Product Analytics Wikimedia Foundation [16:31:31] ottomata elukey i'm in. it would seem the jupyter instance restart did the trick. for future reference, is there an action i can take at a shell to self service that next time? i had considered process killing the process but was worried it would break something [16:43:05] if you can kill, that will probably work :) [16:43:15] not sure if you can self restart [16:45:43] elukey: do you know, what is the proper wikimedia dist name if a package can be used in both stretch and buster? [16:45:58] or, do we build it twice with different dists? [16:46:02] and upload both to apt? [16:46:11] i don't think so? the package will be the same [16:46:16] maybe moritzm ^ knows? [16:48:09] depends on the package, if it's a binany package it must be built for each distro separately [16:48:19] sometimes there are exceptions where we simply copy the deb around [16:48:30] like for go-based stuff which are just a statically linked blob [16:48:52] even for things like Python it makes sense to rebuild separately [16:49:06] as there are differences in the integration of the interpreter e.g. [16:49:16] when in doubt, build it separately :-) [16:50:29] moritzm: in this case; i'm making the package support both python3.5 and python3.7 [16:50:44] we need to be able to do this in order to handle a heterogeneouss cluster during an upgrade [16:50:54] so, in this case, the same package really does work on both stretch and buster [16:52:04] ok, in that case import it once and then copy it using https://wikitech.wikimedia.org/wiki/Reprepro#Copying_between_distributions [16:52:39] ok, so i can e.g. leave dist as buster-wikimedia [16:52:44] and then just copy into stretch [16:53:01] note the insane argument parsing in that command, unless everything else in the world reprepro wants DEST SOURCE... [16:53:08] ottomata: ack [16:53:17] ha ok, dest first [16:54:43] (03PS1) 10Milimetric: Note the hidden page_artificial_id column [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528891 [16:54:51] mforns: ^ [16:55:35] 10Analytics, 10Readers-Web-Backlog (Needs Product Owner Decisions): % of "none" referers seems too high - https://phabricator.wikimedia.org/T195880 (10MBinder_WMF) [16:56:02] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528891 (owner: 10Milimetric) [16:56:11] thanks milimetric! [16:57:00] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10Wikidata-Campsite: Track WDQS updater UA in wikidata-special-entitydata grafana dashboard - https://phabricator.wikimedia.org/T218998 (10Addshore) >>! In T218998#5398700, @Ladsgroup wrote: > Note: If {T199219} gets done (which IMO, it should) we can'... [16:59:39] 10Analytics, 10Product-Analytics: Add page protection status to MediaWiki history tables - https://phabricator.wikimedia.org/T230044 (10nettrom_WMF) [17:03:02] elukey: https://gerrit.wikimedia.org/r/c/operations/debs/spark2/+/528894 [17:06:45] 10Analytics, 10Analytics-Kanban, 10Research-Backlog, 10Patch-For-Review: Release edit data lake data as a public json dump /mysql dump, other? - https://phabricator.wikimedia.org/T208612 (10nettrom_WMF) +1 to @leila's rough draft. Could add something to the last sentence to emphasize that these datasets r... [17:08:50] ottomata: sorry I was afk [17:08:52] reading [17:11:57] it seems good, but I'll read the changes carefully tomorrow morning if it is ok [17:12:09] don't consider me as blocker though, overall it seems a good work :) [17:13:07] ok! yeah want to merge and put in apt and install on stat1005. [17:13:15] but we also need to install on stretch nodes too [17:13:16] for it to work [17:15:56] ottomata: one thing - I realize now that the postinst of the debian package will not work when kerberos is enabled [17:16:37] but we can amend it later, not a blocker [17:16:47] need to add it to my notes [17:17:16] +1ed [17:17:17] ohhh righth [17:17:20] hm [17:17:41] maybe we should move that to puppet; it is kind of already there [17:17:43] if the spark-assembly is deployed somewhere on the filesystem we can use puppet for the upload [17:17:44] too [17:17:46] yeah [17:17:54] there is also the oozie sharelib install [17:18:11] that is already kerberized IIRC [17:19:38] will check later if you need me, going afk for a couple of hours :) [17:19:39] o/ [17:21:02] a-team, gotta run an errand, will be back in 90 mins max [17:21:28] elukey: i'm going to remove the postinstall assembly stuff then. [17:21:30] from this deb [17:21:34] let's do it in puppet. [17:25:25] mforns_brb: let me know when you're back and if you have time to talk about data lake release format. I just had a chat with dsaez about it and he had some good points to consider as we're thinking about how to gather feedback. I'd rather tell you those over voice and have a chance to discuss it. :) [17:47:53] 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog (Needs Product Owner Decisions): Reading_depth remove eventlogging instrumentation? - https://phabricator.wikimedia.org/T229042 (10Jdlrobson) [17:49:04] 10Analytics, 10Analytics-Kanban: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) Ok! I've uploaded spark2_2.3.1-bin-hadoop2.6-4_all.deb to apt for both buster and stretch, and installed on stat1005 (buster). This seems to be working great there. It won't work in YAR... [17:50:53] 10Analytics-Kanban, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 2 others: Change-Prop partitioner fails with eventuate event - https://phabricator.wikimedia.org/T230048 (10Pchelolo) [17:51:07] 10Analytics-Kanban, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 2 others: Change-Prop partitioner fails with enevtgate event - https://phabricator.wikimedia.org/T230048 (10Pchelolo) [17:54:05] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10MediaWiki-JobQueue, and 2 others: Delayed jobs fail validation in eventgate - https://phabricator.wikimedia.org/T230049 (10Pchelolo) [18:19:40] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: Fix revision-score event production in change-prop after migration of revision-create to eventgate-main - https://phabricator.wikimedia.org/T228688 (10Pchelolo) [18:26:35] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: Fix revision-score event production in change-prop after migration of revision-create to eventgate-main - https://phabricator.wikimedia.org/T228688 (10Pchelolo) PR: https://github.com/w... [18:43:29] leila, can we meet in 15 mins to discuss data lake release? [18:47:12] 10Analytics-Kanban, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 2 others: Change-Prop partitioner fails with enevtgate event - https://phabricator.wikimedia.org/T230048 (10Pchelolo) PR: https://github.com/wikimedia/change-propagation/pull/327 [18:47:23] 10Analytics-Kanban, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 2 others: Change-Prop partitioner fails with eventgate event - https://phabricator.wikimedia.org/T230048 (10Pchelolo) [18:47:52] mforns: I have a meeting then, but that may end earlier than the initial 40-min allocated. I'll ping you after the meeting and if you're still here we can talk. [18:49:21] ottomata: still there? [18:49:32] elukey: ya [18:49:36] I am ok to test the new spark version, but only if I have a version to rollback to :) [18:49:47] or a commit or whatever :) [18:49:52] yeah, hmmm [18:50:16] elukey: since it is the same version, it was overwritten in apt [18:50:18] but [18:50:26] the .deb is in /var/cache/apt/archives [18:50:42] spark2_2.3.1-bin-hadoop2.6-3~stretch1_all.deb [18:50:51] so, to rollback, you could cumin dpkg -i it [18:51:06] all right thanks for the trick, that is neat [18:51:19] I guess I can test it tomorrow morning and see how it goes :) [18:51:38] thanks! [18:51:38] leila, ok, cool! [18:53:40] ok! [18:59:24] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Fix "Must provide the 'topic' parameter" in ORES /precache endpoint - https://phabricator.wikimedia.org/T228689 (10Halfak) This change has been deployed. Should be good to remove the work-around. [19:07:21] (03PS3) 10Addshore: Create script tracking number of slots on wikibase repos [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/523938 (https://phabricator.wikimedia.org/T68025) [19:32:47] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528708 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [19:35:50] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528714 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [19:36:57] mforns you around to look at a wikistats patch? I wanted to deploy to fix the bug [19:38:12] 10Analytics, 10Analytics-Kanban: MapChart zoom behavior broken - https://phabricator.wikimedia.org/T230062 (10Milimetric) [19:39:29] (03PS1) 10Milimetric: Fix broken map chart zoom [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/528913 (https://phabricator.wikimedia.org/T230062) [19:39:41] mforns: if you're still around, I'm available now. otherwise, tomorrow? [19:39:52] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: MapChart zoom behavior broken - https://phabricator.wikimedia.org/T230062 (10Milimetric) p:05Triage→03High [19:40:07] leila, yep [19:40:22] oh, nvm, mforns, I can merge/deploy, but can I join you and leila to listen in? [19:40:24] leila, wanna meet in the batcave :] ? [19:40:37] of course! [19:40:39] sure. link, please. (I don't see it in the subject :/) [19:40:59] leila, https://meet.google.com/rxb-bjxn-nip [19:41:30] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix broken map chart zoom [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/528913 (https://phabricator.wikimedia.org/T230062) (owner: 10Milimetric) [19:52:20] (03PS1) 10Milimetric: Release 2.6.6 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/528916 [19:52:38] (03CR) 10Milimetric: [C: 03+2] Release 2.6.6 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/528916 (owner: 10Milimetric) [19:54:58] (03Merged) 10jenkins-bot: Release 2.6.6 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/528916 (owner: 10Milimetric) [21:18:26] 10Analytics, 10Analytics-Kanban: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) [21:32:15] bye a-team i will miss you! [21:32:23] have so much fun [21:32:25] :]]]] [21:32:34] yea, have fuuun!! [21:51:45] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10MediaWiki-JobQueue, and 3 others: Delayed jobs fail validation in eventgate - https://phabricator.wikimedia.org/T230049 (10Pchelolo) [23:45:12] 10Analytics, 10Analytics-Kanban: Add more dimensions to netflow's druid ingestion specs - https://phabricator.wikimedia.org/T229682 (10ayounsi) >>! In T229682#5396083, @elukey wrote: > * `event_type` (is this needed @ayounsi ?) Not needed. > * `tag2` - it should be a int holding few values > * `as_src` - shou...