[01:42:52] AH sorry djellel i got a phone call right then, ping me or luca tomorrow [05:03:22] Pchelolo: o/ - we used to do https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Administration#Reseting_burrow_consumer_group_monitoring for Eventlogging, maybe something similar is fine as well. Can you open a task with the crazy mappings needed to be removed? [05:03:49] djellel: hi! What kind of rsync error? [05:13:20] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` druid1003.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimag... [05:13:29] !log reimage druid1003 to Buster [05:13:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [05:21:44] goood morning people [05:21:48] there is something that I don't get [05:22:04] https://yarn.wikimedia.org/proxy/application_1589903254658_75721 is hive2druid for netflow, but it seems taking ages [05:22:17] and from the state of the executors I don't see why [05:22:26] there seems to be only the driver still running [05:23:06] ah ok https://yarn.wikimedia.org/cluster/app/application_1589903254658_75731 [05:23:09] this is taking ages [05:26:58] it is running on 1002, and I see a lot of [05:26:59] WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(com.google.protobuf.InvalidProtocolBufferException): Protocol message contained an invalid tag (zero). [05:36:15] !log restart druid middlemanager on druid1002 - strange protobuf warnings, netflow hive2druid indexation job stuck for hours [05:36:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [05:37:30] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['druid1003.eqiad.wmnet'] ` Of which those **FAILED**: ` ['druid1003.eqiad.wmnet'] ` [05:42:29] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10elukey) ` 05:33:10 | druid1003.eqiad.wmnet | WARNING: unable to verify that BIOS boot parameters are back to normal, got: Boot parameter version: 1 Boot parameter 5 is valid/unlo... [05:43:09] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10elukey) Interesting: the last bit of reimage failed for: ` 05:33:42 | cumin1001.eqiad.wmnet | Puppet run completed 05:33:42 | druid1003.eqiad.wmnet | Rebooted host 05:36:20 | dr... [05:50:23] still seeing the same problem on druid1002 after restarting everything [05:53:13] elukey: will do tomorrow, thank you! [05:53:26] ack! [06:01:05] 10Analytics: Investigate why netflow hive_to_druid job is so slow - https://phabricator.wikimedia.org/T254383 (10elukey) I have restarted the daily job since yesterday I killed it to reboot an-launcher1001 (new memory settings), and it still showed hours and hours of running time. What I found was that the spark... [06:13:07] !log kill application_1589903254658_75731 (druid indexation for netflow still running since 12h ago) [06:13:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:04:16] 10Analytics: Investigate why netflow hive_to_druid job is so slow - https://phabricator.wikimedia.org/T254383 (10elukey) There is something weird going on, the last daily segment for wmf_netflow that I see from the coordinator console is `2020-03-05`, after that it is all hourly segments. [07:04:25] so the last daily segment for wmf_netflow is 2020-03-05T [07:05:14] elukey: morning! no error, but I was not able to rsync files between the two machines 1006 and 1007. [07:06:44] djellel: from 1007 to 1006? What command did you use? [07:10:33] dedcode@stat1006:~$ scp stat1007:path_to_file . [07:11:20] 10Analytics, 10Operations, 10Traffic: missing wmf_netflow data, 18:30-19:00 May 31 - https://phabricator.wikimedia.org/T254161 (10elukey) ` scala> spark.sql("select count(*) from wmf.netflow where year=2020 and month=05 and day=31 and hour=18").show(); 20/06/04 07:09:37 WARN Utils: Truncated the string repre... [07:12:34] djellel: that is not rsync, but scp :) [07:13:08] see https://wikitech.wikimedia.org/wiki/Analytics/Systems/Clients#Rsync_between_clients [07:18:45] elukey: argh, I tried to save 2 characters :) [07:23:38] elukey: it works, thanks ^_^ [07:24:27] it was the missing ::home [07:24:46] nice! [07:31:17] !log stop netflow hive2druid timers to do some experiments [07:31:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:50:52] joal: for the wmf.* table, I've thought about it and I don't think it makes sense to add another name like pagecounts_ez [07:51:23] joal: how about wmf.pagecount_hourly or wmf.legacy_pageview_hourly [08:23:16] 10Analytics: Investigate why netflow hive_to_druid job is so slow - https://phabricator.wikimedia.org/T254383 (10elukey) From the coordinator console I missed something very clear, namely that after `2020-03-05` we got a huge increase in the segment sizes. The dimensions moved from 9 to 14. {F31853264} [08:24:36] mforns: hola! ping me when you are online :) [08:56:24] 10Analytics, 10Analytics-Kanban, 10Operations: Create a profile to standardize the deployment of JVM packages and configurations - https://phabricator.wikimedia.org/T253553 (10elukey) https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/602009/ [09:14:22] 10Analytics, 10Operations, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10Dzahn) p:05Triage→03Medium [09:24:16] 10Analytics, 10Operations, 10Traffic, 10Readers-Web-Backlog (Tracking): Mobile redirects drop provenance parameters - https://phabricator.wikimedia.org/T252227 (10Dzahn) p:05Triage→03Medium [09:53:44] 10Analytics, 10Analytics-Kanban: Move the Analytics infrastructure to Debian Buster - https://phabricator.wikimedia.org/T234629 (10elukey) @Ottomata today Miriam asked to me some info about why pyspark on stat100[5,8] were yielding version issues (3.7 on driver vs 3.5 on workers) and I found https://gerrit.wik... [10:56:40] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` druid1004.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimag... [10:56:53] !log depooled and reimage druid1004 to Debian Buster (Druid public cluster) [10:56:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:24:44] 10Analytics, 10Analytics-Kanban: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['druid1004.eqiad.wmnet'] ` Of which those **FAILED**: ` ['druid1004.eqiad.wmnet'] ` [11:29:51] hey elukey! sup [11:30:06] mforns: hola! something weird is happening for netflow.. [11:30:23] I saw your comments on-task [11:30:36] it's huuuge [11:30:53] I was also trying to kick off a manual run for a day in the past but --since and --until seem not picked up [11:30:56] not sure why [11:31:06] aha [11:31:13] maybe I am missing something stupid, surely it [11:31:28] anyway, I think that netflow daily has been failing for a long time [11:31:29] I will look into it today [11:31:39] yes, sigh [11:31:39] but we didn't get alarms since the spark job runs in cluster mod [11:31:49] oh [11:31:49] so we don't get the non-zero exit code :( [11:32:24] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` druid1004.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimag... [11:32:58] mforns: I see that for daily there is only one reducer in the druid indexation map-reduce job, is it expected? [11:33:08] because hourly seem to work [11:33:18] but probably daily hits some timeout or similar [11:33:47] I imagine it needs a single reducer for the druid aggregation [11:34:35] elukey: is it ok if I look into this this afternoon? [11:34:47] I'm finishing the vetting of the mediawiki history dumps [11:35:46] mforns: of course, np! Ping me when you are good so we can go through it together, I am curious [11:35:56] ok ok [11:42:15] 10Analytics: Investigate why netflow hive_to_druid job is so slow - https://phabricator.wikimedia.org/T254383 (10elukey) The jump on the 5/6 of March seems to be due to adding new dimensions, like: https://github.com/wikimedia/puppet/commit/f69dba4781141e7451c1b7cf9a026b47f95bff7d#diff-2c46a961dda1e3365e5b83b822... [11:47:49] 10Analytics: Investigate why netflow hive_to_druid job is so slow - https://phabricator.wikimedia.org/T254383 (10JAllemandou) Good catch @elukey! I think we probably agreed to keep netflow data indefinitely in its original size (~300Mb / day). Now that it's ~12gb /day, we need to discuss retention :) Storage is... [11:49:57] (03PS1) 10Mforns: Sort mediawiki history dumps by timestamp [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/602343 (https://phabricator.wikimedia.org/T254233) [11:52:23] hi team [11:52:26] o/ [11:52:37] elukey: let me know if you want me to investigate with you folks on netflow :) [11:52:57] joal: if you have time yes! [11:53:03] ack elukey [11:53:34] I stopped the hive2druid timers for the time being, so we could do tests etc.. [11:53:51] elukey: I can't imagine it's related to all hive2druid [11:53:53] hourly seems pushing segments, daily compaction doesn't work since early march [11:54:02] elukey: do we have a single timer for all, or 1 per datasource? [11:54:08] the latter [11:54:16] only netflow is giving issues [11:54:29] so you stopped netflow only I guess [11:54:36] exctly [11:54:39] perfect :) [11:54:50] elukey: it runs from an-launcher, right? [11:56:21] yep! [11:56:42] super elukey - looking into that [11:56:54] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Order mediawiki_history dumps by event_timestamp - https://phabricator.wikimedia.org/T254233 (10mforns) I tested the change and it works, data looks good. But most interestingly: ordered data is about 20% smaller in size after compression! Which makes sen... [11:56:57] one thing that I noticed is that the spark job is followed by a map-reduce job from the druid user (to index to druid I suppose), and that one has only one reducer, that takes ages to move ahead [11:56:58] 10Analytics: Investigate why netflow hive_to_druid job is so slow - https://phabricator.wikimedia.org/T254383 (10ayounsi) See T229682#5402701, we don't need all that data past x months it's totally fine to anonymize it, drop some dimensions, and reduce the granularity. [11:59:24] in the meantime, druid1004 runs buster [11:59:32] still depooled, recovering segment snow [12:00:06] 10Analytics: Investigate why netflow hive_to_druid job is so slow - https://phabricator.wikimedia.org/T254383 (10JAllemandou) Ack @ayounsi :) Data anonymization/schema-change (dropping columns) means reindexation. It's not very complicated by means we need to setup another job. @ayounsi Can you please devise spe... [12:00:10] great elukey [12:00:41] elukey: I wonder if the depool is of any help, as having joined cluster, historical are reachable from other nodes [12:01:07] elukey: I assume broker knows that it is still loading and therefore doesn't query, but I'm not sure [12:02:43] joal: depool is needed to prevent AQS to query the broker [12:02:56] just as precaution, but we could lift it now [12:02:56] elukey: question for you about netflow hive2druid timers [12:03:33] no problem elukey about it being depooled :) [12:03:53] elukey: looks like we have 3 timers for druid-netflow: hourly, daily and daily-sanitization [12:05:04] Ah elukey! I found the magic `--all` :) [12:07:02] okey - definitely then number of shards for the daily jobs is incorrect [12:07:56] no idea what that is :D [12:08:06] is it related to the reducers? [12:08:46] (need to run a quick errand, be back in ~30 mins!) [12:09:03] elukey: when doing indexation, you tell druid about segment-size via segment-granularity (1 segment = 1 granularity-period - for netflow-daily we use day) [12:09:18] elukey: writing to you for later read :) [12:10:56] elukey: A segment however is not necessarily a single file - it is split is shards (to care about size) - Druid advise shards to be between 300Mb and 700Mb - 1 day of hourly data being 4G in march, and now 12Gb, 1 shard is definitely too small! [12:12:17] 10Analytics, 10Analytics-Kanban: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['druid1004.eqiad.wmnet'] ` and were **ALL** successful. [12:37:13] joal: aahhh TIL about shards, didn't know! [12:37:36] so shards should be ~12 ? [12:37:44] elukey_: still checking stuff :) [12:38:29] sure, let me know if you need any help [12:39:02] elukey: there is something I don't understand in puppet [12:39:29] elukey: in druid-load, we use 'hourly_hours_until => 3,' for netflow [12:39:41] elukey: this means we try to always reindex 3 hours! [12:40:04] elukey: And there is no comment around that, so I don't understand :) [12:41:12] joal: me neither [12:41:17] joal: no idea either, probably it was done initially when the data was super small [12:41:22] let's change it [12:42:01] elukey: I'm also adding parameters for hourly shards, and daily+hourly reduce-memory in the puppet eventlogging_to_druid job, to facilitate configuration [12:47:45] elukey: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/602354/ [12:50:08] joal: take into account that we already have a sanitization job for netflow [12:50:21] after 90 days some of the fields are nullified in Druid [12:50:30] I'm writing in the task [12:50:42] mforns: yup I have seen that [12:50:45] thanks mforns [12:51:59] joal: merged [12:52:37] ack elukey [12:57:00] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add gunicorn[gevent] dependency. [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/599295 (https://phabricator.wikimedia.org/T253545) (owner: 10Elukey) [12:59:12] !log move Superset to gunicorn gevent config - T253545 [12:59:19] elukey: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/602356/ [12:59:54] \o/ elukey - a gunicorn in the family! [13:03:38] joal: hope that it will allow a little bit more parallelism in superset, let's see [13:05:43] joal: mmm you added the parameters to job_config, is it intended? [13:06:06] meh? [13:07:43] 10Analytics: Investigate why netflow hive_to_druid job is so slow - https://phabricator.wikimedia.org/T254383 (10mforns) @JAllemandou and @ayounsi We already have a netflow druid sanitization job set up that drops some fields, see: https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/an... [13:09:03] elukey: Ah I get it! will update [13:10:32] updated elukey [13:10:34] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Test superset running on gunicorn + gevent - https://phabricator.wikimedia.org/T253545 (10elukey) Deployed, let's leave this running for some days before closing. [13:11:20] again a patch about alignment elukey meh - sorry for that [13:11:25] joal: puppet lint from jenkins might complain about timestamp_column => [13:11:28] ah ok :D [13:11:51] it's puppet, people shouldn't really say sorry [13:21:47] joal: still not aligned :( [13:28:12] should be fixed now [13:33:39] !log re-enable netflow hive2druid jobs after https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/602356/ [13:33:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:35:02] ok timers restarted [13:35:21] if it all works fine we'll have to re-run daily from march onwards [13:36:51] (03PS1) 10Elukey: Upgrade to upstream version 1.24.0 [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/602367 (https://phabricator.wikimedia.org/T253294) [13:38:00] new turnilo version ready for testing/deploy [13:38:07] we don't really have a test instance [13:38:29] but I could add one to an-tool1005 [13:40:18] excuse me elukey - I'm in a meeting and didn't notice the ping:S [13:41:29] joal: nono all good, already merged :) [13:43:49] elukey: from 31 of march no? [13:44:26] mforns: daily is not happening since March 6th [13:44:32] from what I can see [13:44:55] not sure how it works with sanitization [13:45:01] since it is ~3 months ago [13:45:18] (maybe I am missing something, we can bc if you have time) [13:50:11] elukey: yes [13:50:17] omw [13:57:44] Wow - https://blogs.apache.org/foundation/entry/the-apache-software-foundation-announces64 [13:57:58] ottomata, milimetric, elukey --^ [13:58:52] elukey: out-of-meeting - How may I help with netflow now? [13:59:02] joal: we are in bc [13:59:09] Ah! joinig [14:16:46] 10Analytics, 10Analytics-Kanban: Move the Analytics infrastructure to Debian Buster - https://phabricator.wikimedia.org/T234629 (10Ottomata) Ah yes! https://phabricator.wikimedia.org/T229347#5439259 PYSPARK_PYTHON=python3.7 pyspark2 --master yarn I added https://wikitech.wikimedia.org/wiki/Analytics/Syst... [14:26:32] 10Analytics: Mediawiki History dumps unique editors feature request - https://phabricator.wikimedia.org/T254234 (10Milimetric) [14:34:20] 10Analytics, 10Analytics-Kanban: Move the Analytics infrastructure to Debian Buster - https://phabricator.wikimedia.org/T234629 (10Miriam) Oh this is great, thanks so much @ottomata ! [14:42:28] elukey, mforns: I confirm daily job has dropped hourly segments for the newly aggregated day [14:43:34] nice! [14:49:17] 10Analytics, 10Operations, 10Traffic: missing wmf_netflow data, 18:30-19:00 May 31 - https://phabricator.wikimedia.org/T254161 (10elukey) The hole is now gone, but we discovered a major problem in T254383 :( [14:49:36] 10Analytics, 10Operations, 10Traffic: missing wmf_netflow data, 18:30-19:00 May 31 - https://phabricator.wikimedia.org/T254161 (10elukey) 05Open→03Resolved [15:00:36] ping ottomata , fdans [15:02:29] ottomata: if you are changing that area of code, these are notes and tests i wrote around my own code to make it work with that: https://github.com/wikimedia/wikimedia-discovery-analytics/blob/master/airflow/tests/test_dag_structure.py#L66-L99 [15:02:49] elukey: task started!b [15:02:53] I have no clue why! [15:02:57] :O [15:03:47] djellel: you're using more than half the cluster - can you please dowscale? [15:04:49] djellel: looks like you've not set the maxExecutors configuration parameter [15:05:15] elukey: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/602391/ [15:05:20] for after standup _) [15:05:25] joal: will do it after the interview ok? [15:05:31] sure elukey - no problem [15:06:09] elukey: bad me would say that can surely interview somebody AND restart AQS in a safer way than any of us - But I'll be good :) [15:07:04] joal: "let's debug this problem that I am having in production, shall we?" [15:07:11] :D [15:09:25] djellel: I also think you need to change the spark default partition settings to higher - You are trying to process 1.5T from 200 tasks - I think you could use 1024 [15:20:07] 10Analytics: Investigate why netflow hive_to_druid job is so slow - https://phabricator.wikimedia.org/T254383 (10JAllemandou) Ack @mforns - No work needed from you @ayounsi then :) [15:26:11] (03PS1) 10Milimetric: Fix time range decay problem [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/602396 (https://phabricator.wikimedia.org/T253861) [15:27:00] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix time range decay problem [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/602396 (https://phabricator.wikimedia.org/T253861) (owner: 10Milimetric) [15:35:29] 10Analytics: Alarm when druid indexation fails - https://phabricator.wikimedia.org/T254493 (10Nuria) [15:45:36] 10Analytics: Alarm when druid indexation fails - https://phabricator.wikimedia.org/T254493 (10Nuria) If these jobs are scheduled through a scheduler like oozie we know of the status of the job cause oozie will get from yarn. Now, when they are scheduled through systemd timers we loose the ability to query yarn f... [15:45:57] 10Analytics: Alarm when druid indexation fails - https://phabricator.wikimedia.org/T254493 (10Milimetric) p:05Triage→03High [15:46:31] 10Analytics: Investigate why netflow hive_to_druid job is so slow - https://phabricator.wikimedia.org/T254383 (10Milimetric) p:05Triage→03High a:03elukey [15:46:57] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Temporarily remove hourly traffic alarms from analytics-alerts - https://phabricator.wikimedia.org/T254256 (10Milimetric) p:05Triage→03High [15:50:24] 10Analytics, 10MediaWiki-API: mostviewed generator not returning any results - https://phabricator.wikimedia.org/T254211 (10Milimetric) 05Open→03Invalid Looks like it's returning stuff to me, I see top pages with their pageviews for the past days, though I'm not super familiar with how this is queried from... [15:50:36] 10Analytics, 10MediaWiki-API: mostviewed generator not returning any results - https://phabricator.wikimedia.org/T254211 (10Milimetric) also, Hi Baha! [15:58:37] 10Analytics: Mediawiki History dumps unique editors feature request - https://phabricator.wikimedia.org/T254234 (10Milimetric) p:05Triage→03Medium We went over this in our prioritization and it's a bit more complicated. We'll focus on making this dataset available in a queryable fashion on our public APIs,... [15:59:41] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Order mediawiki_history dumps by event_timestamp - https://phabricator.wikimedia.org/T254233 (10Milimetric) p:05Triage→03High [16:00:22] 10Analytics, 10Analytics-Kanban: Table wmf_raw.mediawiki_imagelinks seems to be missing data - https://phabricator.wikimedia.org/T254188 (10Milimetric) p:05Triage→03High [16:03:40] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: permanent links in wikistats don't (always) work - https://phabricator.wikimedia.org/T254076 (10Milimetric) p:05Triage→03High a:03Milimetric Cool, lots of bugs here, thanks. The main problem seems to be the permalink doesn't pick up the graph typ... [16:06:59] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure, 10Wikimedia-production-error: [beta] EventLogging trying to fetch wrong Schema title - https://phabricator.wikimedia.org/T254058 (10Milimetric) p:05Triage→03Low [16:07:19] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, 10Wikimedia-production-error: [beta] EventLogging trying to fetch wrong Schema title - https://phabricator.wikimedia.org/T254058 (10Milimetric) [16:09:23] 10Analytics: Resetting Kerberos access for sguebo - https://phabricator.wikimedia.org/T254035 (10Milimetric) p:05Triage→03High a:03Ottomata Andrew's got your back [16:10:27] 10Analytics: reset of burrow metrics for consumer group - https://phabricator.wikimedia.org/T254498 (10hnowlan) [16:11:47] 10Analytics: Fix TLS certificate location and expire for Hadoop/Presto/etc.. and add alarms on TLS cert expiry - https://phabricator.wikimedia.org/T253957 (10Milimetric) p:05Triage→03High [16:11:59] 10Analytics: Fix TLS certificate location and expire for Hadoop/Presto/etc.. and add alarms on TLS cert expiry - https://phabricator.wikimedia.org/T253957 (10Milimetric) a:03elukey [16:13:10] 10Analytics, 10Analytics-Kanban, 10Operations: Increase memory available for an-launcher1001 - https://phabricator.wikimedia.org/T254125 (10Milimetric) Can we re-enable reportupdater on the machine now? [16:18:30] 10Analytics, 10Operations, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10Milimetric) p:05Medium→03High [16:30:13] 10Analytics: Mediawiki History dumps unique editors feature request - https://phabricator.wikimedia.org/T254234 (10marcmiquel) Oh, I see. But these seem two different things. The API will increase the use of some data contained in this dataset. But making it queryable does not mean that it will become easy to ob... [16:47:59] 10Analytics, 10Analytics-Kanban: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10Milimetric) Ok, victory is ours! All tables with data (more than 0 rows) from the `log` database have been sqooped to hdfs here: `hdfs dfs -ls /wmf/da... [16:57:32] joal: aqs1004 ready to test! [16:58:09] 10Analytics, 10Analytics-Kanban: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10elukey) WWWOOOOOWWWW!!! [17:10:57] (03CR) 10Nuria: "I think there is a typo that needs to be removed" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/602343 (https://phabricator.wikimedia.org/T254233) (owner: 10Mforns) [17:12:16] (03CR) 10Mforns: Sort mediawiki history dumps by timestamp (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/602343 (https://phabricator.wikimedia.org/T254233) (owner: 10Mforns) [17:19:09] 10Analytics, 10Analytics-Kanban: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10Nuria) I do not understand where the Edit_10676603 schema is being read from to create the table? [17:26:55] milimetric: I'm having problems when npm install wikistats. Just wondering if it happens to you as well, if you rm -r node_modules semantic ? [17:27:31] mmmm, I think it's non-related to changes... just me [17:27:53] mforns: if you remove semantic you have to build it [17:28:00] yea yea [17:28:16] but are you having other problems? [17:28:21] (03CR) 10Nuria: [C: 03+1] Sort mediawiki history dumps by timestamp (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/602343 (https://phabricator.wikimedia.org/T254233) (owner: 10Mforns) [17:29:11] milimetric: yes, probably npm version hell, don't worry :] [17:29:56] mforns: could be very real, keep me in the loop if it doesn't go away [17:30:05] sure! [17:48:30] 10Analytics: Resetting Kerberos access for sguebo - https://phabricator.wikimedia.org/T254035 (10Ottomata) a:05Ottomata→03elukey Actually, I was going to do this but I'm not exactly sure how! Assigning to luca [17:54:56] ottomata: I usually just delete the principal and re-create it [17:55:03] so the user gets the tmp pass etc.. [18:04:18] joal: will check later on :) [18:11:02] excuse me elukey I was gone for diner [18:11:06] testing aqs right now [18:11:54] Ok all good for me elukey - green light when you want (can even be tomorrow) [18:35:25] (03PS1) 10Mforns: Release 2.7.5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/602449 [18:36:33] milimetric: I solved my problems with npm, had to upgrade npm, upgrade gulp and downgrade node O.o [18:36:47] woah [18:36:53] milimetric: anyway, I created the release patch and am about to merge [18:37:00] thx! [18:37:33] (03CR) 10Mforns: [C: 03+2] Release 2.7.5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/602449 (owner: 10Mforns) [18:38:46] (03Merged) 10jenkins-bot: Release 2.7.5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/602449 (owner: 10Mforns) [18:39:22] !log deployed wikistats2 2.7.5 [18:39:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:39:46] joal: if you are there , do you know if i need more than: os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /srv/deployment/analytics/refinery/artifacts/refinery-job.jar' [18:39:55] joal: to read avro from the jupyter notebook? [18:40:33] nuria: I don't know for python - I've done for spark creating a different kernel [18:41:11] if it works for python then it's ok :) [18:41:20] joal: i see, creating a kernel to which teh jars are passed upon startup? [18:41:25] correct nuria [18:44:13] nuria: if needed I can help with setting that up [18:44:55] joal: no need to spend time on that , i will try for abit and otherwise things work fine on pyspark [18:45:02] joal: on command line [18:52:59] avck nuria [18:54:01] 10Analytics, 10Analytics-Kanban: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10Nuria) ok, note to self, i forgot that in avro, the schema and data are together, duh [18:58:49] (03PS1) 10Ottomata: DataFrameToHive - drop partition before writing output data [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/602463 [18:59:14] (03PS3) 10Ottomata: Refine - Make event transform functions smarter about choosing which possible column to use [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601865 [18:59:36] (03CR) 10Ottomata: "ok Joal, tested and ready for review!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/601865 (owner: 10Ottomata) [19:00:38] (03CR) 10Ottomata: "I encountered this while testing Refine on my own tables...one of them wasn't EXTERNAL and while Refine was successful, I didn't have any " [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/602463 (owner: 10Ottomata) [19:03:35] (03CR) 10Joal: [C: 03+1] "Look good to me! Merge when you want. Thanks mforns" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/602343 (https://phabricator.wikimedia.org/T254233) (owner: 10Mforns) [19:12:27] !log roll restart of aqs to pick up new druid settings [19:12:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:15:10] joal: done! [19:15:32] \o/ thanks elukey - checking UI [19:16:35] looks good [19:16:43] gooood [19:17:43] thanks elukey :) [19:19:06] np! afk again :) [19:37:03] milimetric: wikistats2 2.7.5 is in prod, can you check the fix worked, please? [20:01:29] (03PS1) 10Ottomata: Add EvolveHiveTable CLI tool to manually evolve Hive tables from JSONSchemas [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/602475 (https://phabricator.wikimedia.org/T238230) [20:05:26] joal: ok [20:05:35] all my refinery source stuff is ready for review! [20:05:36] https://gerrit.wikimedia.org/r/q/owner:ottomata+project:analytics%252Frefinery%252Fsource+status:open [20:07:10] ack ottomata :) [20:07:28] ottomata: will review that tomorrow firsdt thing (early afternoon normally) [20:07:47] thank you! [20:08:31] i've got some WIP in java code too for the canary events and stream config integration, probably will have that ready next week sometime [20:08:38] so many moving pieces! [20:09:21] indeed! Particularly when you wish everything to fall into place like a clock :) [20:24:57] (03CR) 10Mforns: [C: 03+2] Sort mediawiki history dumps by timestamp [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/602343 (https://phabricator.wikimedia.org/T254233) (owner: 10Mforns) [20:26:37] looks great mforns, tested on mobile and desktop as well as I could and it all seems ok [20:28:19] great milimetric thanks! [20:30:47] (03Merged) 10jenkins-bot: Sort mediawiki history dumps by timestamp [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/602343 (https://phabricator.wikimedia.org/T254233) (owner: 10Mforns) [21:15:59] (03PS1) 10Milimetric: Fix permalink logic [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/602497 (https://phabricator.wikimedia.org/T254076) [21:17:36] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: permanent links in wikistats don't (always) work - https://phabricator.wikimedia.org/T254076 (10Milimetric) @fdans / @mforns / @Nuria: review whenever you have time, seems solid. I still think we need a bigger refactor to make sur... [23:36:29] 10Analytics, 10MediaWiki-API: mostviewed generator not returning any results - https://phabricator.wikimedia.org/T254211 (10bmansurov) o/ @Milimetric Thanks for looking into it. It's working for me too. [23:38:03] 10Analytics, 10MediaWiki-API: mostviewed generator not returning any results - https://phabricator.wikimedia.org/T254211 (10bmansurov) Hmm, it's still not working on French Wikipedia: https://fr.wikipedia.org/w/api.php?action=query&generator=mostviewed&prop=pageviews This is what I get: ` { "batchcomplet...