[00:44:13] <icinga-wm>	 PROBLEM - Check the last execution of refinery-import-page-history-dumps on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[00:45:25] <icinga-wm>	 PROBLEM - Check the last execution of archive-maxmind-geoip-database on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[00:45:49] <icinga-wm>	 PROBLEM - Check the last execution of reportupdater-browser on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[00:54:47] <icinga-wm>	 RECOVERY - Check the last execution of refinery-import-page-history-dumps on stat1007 is OK: OK: Status of the systemd unit refinery-import-page-history-dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[00:55:57] <icinga-wm>	 RECOVERY - Check the last execution of archive-maxmind-geoip-database on stat1007 is OK: OK: Status of the systemd unit archive-maxmind-geoip-database https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[00:56:19] <icinga-wm>	 RECOVERY - Check the last execution of reportupdater-browser on stat1007 is OK: OK: Status of the systemd unit reportupdater-browser https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[06:12:04] <fdans>	 morning!
[06:18:20] <elukey>	 o/
[06:57:28] <elukey>	 !log drop wmf_netflow from Analytics druid and restart the job with more dimensions
[06:57:30] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[06:58:01] <elukey>	 the backfill is probably not needed in my opinion since we had partial data up to a couple of days ago, since some routers were not configured correctly
[06:58:45] <elukey>	 one thing that I noticed is that the hourly job calculates data with the range of -6h -> -5h before the current date
[06:59:22] <elukey>	 but it could be different? see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/531046/
[06:59:26] <elukey>	 I'll wait for mforns :)
[06:59:35] <elukey>	 (I am probably saying something silly)
[06:59:49] <elukey>	 the current "issue" is that data in Turnilo lags a bit to show up
[06:59:54] <elukey>	 (recent data I mean)
[07:00:20] <elukey>	 the good solution is to add real time indexation :)
[07:00:35] <elukey>	 buuut for the time being maybe having also less lag is better?
[07:03:12] <elukey>	 brb
[07:42:00] <joal>	 Good morning :)
[07:44:48] <elukey>	 o/
[07:45:17] <joal>	 elukey: how shall we start with Kerb?
[07:46:02] <elukey>	 joal: in theory from https://wikitech.wikimedia.org/wiki/User:Elukey/Analytics/Hadoop_testing_cluster
[07:46:12] <elukey>	 then basically coming up with things to test etc..
[07:46:22] <elukey>	 I added notes about all the problems that we found
[07:47:19] <joal>	 ok elukey - Will read and try to come up with whatever I think could be useful (possibly just a test plan :)
[07:47:30] <elukey>	 ah and possibly getting an account
[07:47:44] <elukey>	 I don't think I have sent you the email with the tmp password for the account
[07:48:00] <joal>	 elukey: I don't know if accounts have been renewed but I had one before I left
[07:48:16] <elukey>	 ah good
[07:48:25] <elukey>	 just checked yes
[07:48:34] <elukey>	 we have now a script that sends an email with the tmp pass
[07:48:39] <elukey>	 that you have to change upon first login
[07:48:43] <elukey>	 so everything is automated
[07:48:52] <joal>	 ok
[07:49:16] <elukey>	 ideally before enabling kerberos we'd need to roll out spark2.4
[07:49:30] <elukey>	 so we could start from that?
[07:50:03] <joal>	 roll out meaning making sure everything works with it?
[07:50:18] <elukey>	 no sorry backtrack, spark2.3 with buster compatibility first
[07:50:19] <elukey>	 https://phabricator.wikimedia.org/T229347
[07:50:43] <elukey>	 stat1005 is basically ready to go, with buster
[07:50:54] <elukey>	 Andrew left a package to rollout
[07:51:05] <elukey>	 that we could do together, I feel a bit more comfortable :)
[07:51:10] <elukey>	 (we can rollback if anything happens)
[07:51:22] <joal>	 no problem for me :)
[07:52:22] <elukey>	 there is also https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/AMD_GPU that is interesting
[07:52:25] <elukey>	 if you didn't see it
[07:52:28] <elukey>	 now everything works :)
[07:53:18] <elukey>	 joal: need to go out for an errand, 10/15 mins, will be back and roll out 
[07:53:24] <joal>	 elukey: please :)
[07:53:30] <elukey>	 should we drain spark jobs first? Not sure if needed
[07:53:31] <joal>	 I'll probably have questions for you
[07:54:41] <elukey>	 (brb)
[08:06:19] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Performance-Team: Piwik JS isn't cached - https://phabricator.wikimedia.org/T230772 (10Gilles)
[08:06:32] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Performance-Team: Piwik JS isn't cached - https://phabricator.wikimedia.org/T230772 (10Gilles)
[08:09:40] <elukey>	 back :)
[08:10:18] <joal>	 Hey :)
[08:10:32] <joal>	 elukey: I have a bunch of questions - shall we talk in da cave?
[08:10:36] <elukey>	 sure!
[08:55:58] <elukey>	 joal: does it look reasonable? https://etherpad.wikimedia.org/p/elukey-netflow
[08:59:05] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Performance-Team: Piwik JS isn't cached - https://phabricator.wikimedia.org/T230772 (10elukey) Nice catch, even if I was convinced that the static contents were cached by Varnish 24h if not Cache-Control (or similar) headers were found.. Maybe we have a specific pass pol...
[09:01:46] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Performance-Team: Piwik JS isn't cached - https://phabricator.wikimedia.org/T230772 (10elukey) @ema hi :) Are response headers like Cache-Control used by Varnish in case `caching: 'pass'` is configured?
[09:14:16] <elukey>	 https://turnilo.wikimedia.org/#test_wmf_netflow
[09:14:18] <elukey>	 \o/
[09:15:51] <elukey>	 wow it is amazing
[09:17:57] <elukey>	 so I guess that if it keeps working, I can just
[09:18:02] <elukey>	 1) kill the realtime indexation
[09:18:11] <elukey>	 2) restart it with the datasource 'wmf_netflow'
[09:18:22] <elukey>	 since hourly/daily indexations will override the realtime data right?
[09:19:20] <elukey>	 (brb)
[09:21:28] <joal>	 seems very correct elukey :)
[09:28:18] <wikibugs>	 10Analytics, 10Product-Analytics, 10Reading Depth, 10Readers-Web-Backlog (Needs Product Owner Decisions): Reading_depth remove  eventlogging instrumentation? - https://phabricator.wikimedia.org/T229042 (10phuedx)
[09:38:20] <elukey>	 joal: \o/ proceeding!
[09:38:28] <joal>	 elukey: 
[09:38:41] <joal>	 elukey: I'm thinking of task time in relation to segment size
[09:39:08] <joal>	 elukey: can we leave the test job running for some time before proceeding, so that I get a better understanding?
[09:39:19] <elukey>	 joal: ah sure of course!
[09:42:04] <joal>	 elukey: I wonder about task duration being PT10M and segment granularity being 1H
[09:42:52] <joal>	 elukey: I have also found a new information on segment optimization: we should take row-number into account, not size
[09:43:59] <joal>	 finally elukey, druid current doc shows an awesome UI - Do we have plans to upgrade one of these days?
[09:44:04] * joal runs fast
[09:48:13] <elukey>	 joal: we can definitely schedule one for next quarter!
[09:48:51] <joal>	 elukey: I think having task duration lasting longer than segment granularity is better - Having it lasting 10M makes no sense, and I think they actually last 1H
[09:49:13] <elukey>	 ah sure, we can kill/change it
[09:49:51] <joal>	 elukey: Let's use PT6H for task duration - The only advantage of having small tasks is in case of error
[09:49:58] <elukey>	 ack
[09:56:33] <elukey>	 mmm so curl localhost:8090/druid/indexer/v1/supervisor/test_wmf_netflow/terminate -X POST -i seems not working
[09:56:50] <joal>	 elukey: I think you need to wait for the task to finish
[09:57:38] <elukey>	 not sure since it tells me 404
[09:57:45] <joal>	 Ah - indeed
[09:57:59] <joal>	 hm
[09:58:18] <elukey>	 https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html maybe is for the latest upstream
[10:02:25] <joal>	 elukey: Worked for me with shutdown instead of terminate
[10:02:26] <elukey>	 ah no now it is gone
[10:02:29] <elukey>	 ah!
[10:02:40] <elukey>	 where did you find shutdown?
[10:02:41] <joal>	 elukey: probably not as clean though
[10:02:49] <joal>	 later https://druid.apache.org/docs/latest/operations/api-reference.html#supervisors
[10:03:09] <elukey>	 ahahhah
[10:03:10] <elukey>	 okok
[10:03:18] <elukey>	 shall I restart it with PT6H?
[10:03:18] <joal>	 elukey: shutdown is being deprecated, so it's probably our thing :)
[10:03:47] <joal>	 elukey: please- you can actually restart with prod datasource if you wish
[10:04:10] <elukey>	 nice
[10:04:34] <elukey>	 of course I didn't do it since I am stupid
[10:04:38] <elukey>	 anyway, will kill again
[10:04:42] <joal>	 huhu
[10:05:28] <elukey>	 ok done :)
[10:11:24] <elukey>	 interesting, https://turnilo.wikimedia.org/#wmf_netflow seems zero
[10:13:38] <elukey>	 but events are processed - https://grafana.wikimedia.org/d/000000538/druid?refresh=1m&orgId=1&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=druid_analytics&var-druid_datasource=wmf_netflow&from=now-1h&to=now&panelId=41&fullscreen
[10:14:43] <elukey>	 even more interesting
[10:14:44] <elukey>	 ssh -L 9091:an-tool1007.eqiad.wmnet:80 an-tool1007.eqiad.wmnet
[10:14:51] <elukey>	 it is different with the new version of Turnilo :D
[10:15:37] <elukey>	 ahhh wait count is zero
[10:15:53] <elukey>	 okok now I can see the issue
[10:16:01] <elukey>	 with realtime indexation, only the count measure is there
[10:16:04] <elukey>	 for some reason
[10:16:17] <elukey>	 meanwhile in the new version of turnilo all of them are there
[10:16:23] <elukey>	 but count is still zero
[10:16:57] <elukey>	 and in fact I haven't added the count measure to realtime indexation
[10:19:03] <elukey>	 restarted
[10:19:33] <elukey>	 working!
[10:19:41] <elukey>	 but all the measures are only in the new turnilo
[10:23:31] <joal>	 elukey: how come they are only in the new one?
[10:27:02] <joal>	 elukey: I can see data in the new one
[10:27:08] <joal>	 the old one I mean sorry
[10:29:53] <wikibugs>	 10Analytics, 10Analytics-Kanban: Wikistats: month on dashboard changes on any redraw - https://phabricator.wikimedia.org/T230514 (10fdans) a:05Milimetric→03fdans
[10:32:27] <elukey>	 joal: there is data but I can only see the 'count' measure in the old one
[10:32:39] <joal>	 elukey: ah yes - so can I
[10:32:59] <elukey>	 didn't we had to specify the measures in turnilo for realtime banner impressions?
[10:33:03] <wikibugs>	 (03PS1) 10Fdans: Transition data rows to using time ranges instead of timestamps [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/531148 (https://phabricator.wikimedia.org/T230514)
[10:33:06] <joal>	 elukey: do we expect other measures?
[10:33:26] <elukey>	 yeah if you check in the new turnilo there is 'packets' and 'bytes'
[10:33:35] <joal>	 hm
[10:33:50] <joal>	 elukey: turnilo restart needed, or manual config updatre?
[10:34:18] <elukey>	 joal: I think that we need to upgrade turnilo :D
[10:34:27] <joal>	 huhuhu :)
[10:34:38] <joal>	 ok :)
[10:34:50] <joal>	 gone errand, will be back in a bit
[10:34:54] <elukey>	 o/
[10:36:09] <elukey>	 lunch for me!
[12:08:35] <joal>	 hm - something is wrong with netflow supervisor elukey - let's check when you get back
[12:11:52] <elukey>	 I am here
[12:12:56] <elukey>	 joal: what is wrong?
[12:14:01] <joal>	 empty segments for hours 11 and 12 elukey :(
[12:14:57] <elukey>	 more info?
[12:15:00] <elukey>	 :)
[12:15:18] <joal>	 nope, didn't look more
[12:15:39] <joal>	 I think the restart of the task (or the kill of the previous) didn't succeed
[12:16:11] <joal>	 BUT - there is something I don't understand
[12:16:18] <joal>	 turnilo shows data!!
[12:17:32] <joal>	 It must be related to tasks not handing off segments as fast as previously - so segments don't show up, but data is here
[12:17:39] <joal>	 Let's wait for more
[12:17:42] <elukey>	 ack
[12:17:44] <elukey>	 :)
[12:17:44] <joal>	 sorry for the ping eluke
[12:17:48] <elukey>	 nono please!
[12:17:54] <elukey>	 I was trying to understand :)
[12:18:06] <elukey>	 I am trying to add jupyterhub to an-tool1006
[12:18:13] <elukey>	 but if fails in weird ways
[12:18:15] <elukey>	 lovely
[12:18:15] <elukey>	 :D
[12:18:24] <joal>	 :S
[12:25:23] <elukey>	 something must have changed since we first set it up on the notebooks
[12:25:42] <joal>	 elukey: buster?
[12:26:41] <elukey>	 no no it is stretch, I think I found the issue
[12:26:55] <elukey>	 there is a symbolic link that should be in puppet (probably) 
[12:27:23] <elukey>	 working!
[12:27:24] <elukey>	 :)
[12:27:31] <elukey>	 ssh -N an-tool1006.eqiad.wmnet -L 8000:127.0.0.1:8000
[12:28:04] <elukey>	 sort of, cannot login :
[12:28:05] <elukey>	 :P
[12:28:29] <elukey>	 User Elukey not in allowed groups (analytics-admins)
[12:28:31] <elukey>	 looool
[12:28:34] <elukey>	 joal: --^
[12:29:57] <elukey>	 but it should work for you
[12:30:21] <joal>	 will try elukey 
[12:30:47] <joal>	 elukey: another question - no more webrequest after august 16th - is it expected?
[12:31:27] <elukey>	 nope, something broke
[12:31:32] <joal>	 Arf
[12:31:40] <elukey>	 I think it was the last deployment
[12:31:47] <joal>	 hm
[12:32:10] <joal>	 notebook ok for me elukey (meaning, I'm in, will test later)
[12:32:21] <elukey>	 super
[12:34:09] <elukey>	 JA008: File does not exist: hdfs://analytics-test-hadoop/user/oozie/share/lib/lib_20190627073559/hive2/libfb303-0.9.3.jar
[12:34:20] <joal>	 wow
[12:34:55] <joal>	 when oozie complains about its own lib, I'm afraid
[12:35:06] <elukey>	 there is /user/oozie/share/lib/lib_20190809093929
[12:35:33] <elukey>	 but it is strange, it seems a pruning of some sort
[12:37:02] <elukey>	 so I think that it is when Andrew installed spark2
[12:37:14] <joal>	 hm
[12:37:29] <joal>	 you mean spark2 for buster?
[12:37:42] <elukey>	 no no we don't have buster in there
[12:37:47] <joal>	 ah sorry 
[12:37:49] <elukey>	 the new version that works with both
[12:37:53] <joal>	 right
[12:38:52] <elukey>	 probably not, the new version is 2.3.1-bin-hadoop2.6-4
[12:38:55] <elukey>	 and I don't see it
[12:39:56] <elukey>	 we have a thing called /usr/local/bin/spark2_oozie_sharelib_install
[12:40:04] <elukey>	 that puppet sometimes executes
[12:40:52] <elukey>	 but only if hdfs://analytics-test-hadoop/user/oozie/share/lib/lib_etc.. is not present
[12:41:01] <elukey>	 so what I am wondering is if during a test it was removed
[12:41:04] <elukey>	 puppet re-created it
[12:42:12] <elukey>	 but in this case it was 
[12:42:12] <elukey>	 drwxr-xr-x   - oozie hadoop          0 2019-08-09 09:40 /user/oozie/share/lib/lib_20190809093929
[12:42:39] <elukey>	 weird
[12:42:57] <joal>	 it feels as if the lib had not been registered by oozie
[12:45:10] <elukey>	 I have restarted oozie and re-run the last failed hour, it seems passing the add_partition step
[12:46:51] <elukey>	 joal: I think it was due to Andrew's testing, seems to be a one weird testing issue, never happened before
[12:46:59] <joal>	 possible
[12:47:12] <elukey>	 I'll restart all the failed hours when we log-off so it will not use all the cluster's resources :)
[12:47:25] <elukey>	 (I meant now, so you can test :)
[12:47:49] <joal>	 makes sense - thanks elukey :)
[12:47:53] <elukey>	 thank you!
[12:50:46] <joal>	 elukey: something else I have noticed: the spark2 folder in our cluster don't contain examples anymore :(
[12:51:05] <joal>	 While it's no big deal in itself, it's interesting to run tests
[12:51:48] <elukey>	 wasn't aware of it, can you give me the path?
[12:51:54] <elukey>	 I can check if it should be there or not
[12:52:34] <joal>	 on analytics1031, there is /usr/lib/spark for old spark1.6 and /usr/lib/spark2 for new spark - The former contains an examples folder
[12:53:21] <joal>	 elukey: --^
[12:53:22] <nuria>	 helllooo team europeee
[12:53:28] <elukey>	 holaaaa
[12:53:41] <joal>	 Hey! Good mornfternoon nuria :)
[12:53:50] <nuria>	 holaaa joal !!!
[12:54:17] <joal>	 good to read nuria :) I hope you enjoyed holidays (I certainly did :)
[12:55:46] <wikibugs>	 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Nuria) >Unless I'm very much mistaken, the described system will make it possible to determine the country of specific editors, in...
[12:58:47] <wikibugs>	 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Nuria) @Milimetric I think for our first release we can remove all  countries mentioned on the surveillance report, we can work on...
[13:21:24] <wikibugs>	 10Analytics, 10Operations: Access to HUE for Mayakpwiki - https://phabricator.wikimedia.org/T229143 (10Nuria) a:03JAllemandou
[13:30:02] <wikibugs>	 10Analytics: Unable to access SWAP notebooks using LDAP - https://phabricator.wikimedia.org/T230627 (10elukey) @cchen I'd suggest at this point to try to reset your password to something else, so we will see if it helps o not. From the logs point of view it seems that the wrong password is inserted, but then not...
[13:33:46] <elukey>	 a-team: if anybody of you has time please test superset/turnilo during these days :)
[13:34:07] <elukey>	 (brb)
[13:34:24] <joal>	 hm - just noticed- we need to restart the webrequest-load bundle - the last restart has been done on defualt queue instead of production one
[13:42:41] <nuria>	 elukey: i can test!
[13:42:49] <nuria>	 elukey: on the staging host?
[13:46:32] <elukey>	 joal: ah snap didn't check, it was part of the hive2 actions move
[13:46:43] <elukey>	 nuria: there are info in the emails, but yes :)
[13:46:59] <elukey>	 an-tool1005 for superset, an-tool1007 for turnilo
[13:53:54] <wikibugs>	 10Analytics, 10Reading Depth: Publish aggregated reading time dataset - https://phabricator.wikimedia.org/T230642 (10Nuria) >Hi Nuria. I'm proposing to start with a one-off release that I can handle easily.  Sounds good,  a one-off release is just a file with data on a public folder, no pipeline of any sort is...
[13:58:30] <wikibugs>	 10Analytics, 10Reading Depth: Publish aggregated reading time dataset - https://phabricator.wikimedia.org/T230642 (10Nuria) Documenting any caveats with data re important, for example, in a multi tab browsing situation is this data of quality? If I open two tabs with wikipedia content , does the data take into...
[13:59:53] <bearloga>	 joal: thanks again for helping with DataGrip yesterday! I've documented the setup process here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Access#DataGrip
[14:07:12] <wikibugs>	 10Analytics, 10Product-Analytics, 10Reading Depth, 10Readers-Web-Backlog (Needs Product Owner Decisions): Reading_depth remove  eventlogging instrumentation? - https://phabricator.wikimedia.org/T229042 (10Nuria) @phuedx I vote also for disabling the instrumentation, can we use this ticket for this purpose?
[14:22:38] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Tune Wikistats 2 Varnish caching - https://phabricator.wikimedia.org/T230136 (10Nuria) 05Open→03Resolved
[14:22:52] <mforns>	 hey alllll :]
[14:24:36] <nuria>	 holaaa mforns 
[14:25:03] <mforns>	 hey nuria! welcome back
[14:25:15] <nuria>	 grasias mforns 
[14:30:47] <wikibugs>	 10Analytics-Kanban: Upgrade superset to 0.34 - https://phabricator.wikimedia.org/T230416 (10Nuria) @nuria to test superset
[14:35:13] <wikibugs>	 10Analytics, 10Product-Analytics: Streamline Superset signup and authentication - https://phabricator.wikimedia.org/T203132 (10Nuria) >caveat is that the email of the user created will be $uid@email.notfound I do not think that is a problem, I really cannot think of  a superset feature that requires a true e-m...
[14:46:08] <bearloga>	 welcome back nuria!
[14:46:23] <nuria>	 gracias bearloga 
[14:55:39] <wikibugs>	 (03CR) 10Nuria: Add mediarequests hourly oozie job (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529911 (https://phabricator.wikimedia.org/T229817) (owner: 10Fdans)
[14:58:11] <fdans>	 nuria: thanks for taking a look, that file is just for the record about how to backfill from the old mediacounts dataset, and I've changed the query substantially. I probably shouldn't have added it with this change, will post a change deleting it or updating it once i've backfilled from mediacounts
[14:59:08] <nuria>	 hola fdans , I see, having a  record on how backfilling was done is helpful so updating sounds good
[15:13:25] <wikibugs>	 10Analytics, 10Operations: Access to HUE for Mayakpwiki - https://phabricator.wikimedia.org/T229143 (10Nuria) Assigning to @joal who has ops duty this week  https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Access#Admin_Instructions_to_sync_a_Hue_account
[15:26:17] <wikibugs>	 10Analytics, 10Performance-Team, 10Research, 10Security-Team, 10WMF-Legal: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10Nuria) Let's see, this dataset has no page info, neither timestamps, is that correct?
[15:38:50] <wikibugs>	 10Analytics, 10Performance-Team, 10Research, 10Security-Team, 10WMF-Legal: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10Gilles) My bad. We probably want timestamp as well, but it can be very coarse (rounded to the hour is fine)...
[15:40:23] <wikibugs>	 10Analytics, 10Performance-Team, 10Research, 10Security-Team, 10WMF-Legal: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10Gilles)
[15:49:45] <wikibugs>	 10Analytics, 10Performance-Team, 10Research, 10Security-Team, 10WMF-Legal: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10Nuria) @Gilles it would be good to shift timestamps so this data cannot be linked (or rather, obviously lin...
[15:53:09] <wikibugs>	 10Analytics, 10Performance-Team, 10Research, 10Security-Team, 10WMF-Legal: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10Gilles) Sure, we can shift the timestamps by an arbitrary amount. It would still prove lack of temporal cor...
[15:53:51] <joal>	 Thanks for the great doc bearloga !
[15:54:18] <joal>	 s mforns 
[15:54:30] <joal>	 oops- wrong paste mforns 
[15:54:36] <mforns>	 heh
[15:55:16] <joal>	 s mforns 
[15:55:21] <mforns>	 xD
[15:55:32] <joal>	 Mwarf - I need to get back in using computers I guess
[15:58:11] <mforns>	 hehehe
[16:09:39] <ebernhardson>	 where can i query kafka api's from? I can't seem to find any hosts that have things like `kafka-consumer-groups` from the kafka release installed
[16:09:57] <ebernhardson>	 only kafkacat, which while nice is missing things (like report offset of consumer group)
[16:11:13] <elukey>	 ebernhardson: usually those are on the kafka hosts itself, that we limit to ops
[16:11:15] <joal>	 hi ebernhardson - we're in standup I'd say kafka-jumbo100X but elukey should confirm
[16:11:24] <elukey>	 yes :)
[16:11:33] <ebernhardson>	 elukey: hmm, isn't that a bit of fake security? They just talk to the kafka api's that anyone else can
[16:11:47] <ebernhardson>	 i mean, i could go download the appropriate deb, unpack it, copy the binaries anywhere, and they will talk to kafka
[16:11:54] <elukey>	 ebernhardson: nono I mean access to the host themselves, not talking about api security :)
[16:11:55] <ebernhardson>	 (but i wont, because we agreed not to do that :P)
[16:12:29] <ebernhardson>	 elukey: ahh, so we just don't install those anywhere else?
[16:12:37] <elukey>	 stuff like kafka-consumer-groups are contained in the confluent kafka package IIRC, so we don't install everywhere
[16:12:41] <elukey>	 yes exactly
[16:12:54] <elukey>	 but if you use stuff like kafka-python etc.. you can easily get those info
[16:12:59] <elukey>	 but you'll need to code a bit :(
[16:13:07] <elukey>	 if you need a one off I can grab data for you
[16:14:23] <ebernhardson>	 elukey: i'm trying to see if jumbo-eqiad is correctly tracking cirrussearch_updates_eqiad  consumer group. I added consumer, it says it joined the group, produced a message and .. the daemon doesn't report recieving anything
[16:14:55] <ebernhardson>	 its for TopicPartition(topic='eqiad.swift.search_glent.upload-complete', partition=0)
[16:15:16] <ebernhardson>	 but the grafana dashboard for consumer lag reports no consumers for that topic, which makes me suspicious...
[16:15:39] <elukey>	 ah I was about to check it
[16:17:07] <elukey>	 ebernhardson: just to triple check, if you use kafkacat you can see the message sent right?
[16:17:14] <ebernhardson>	 elukey: yup
[16:18:31] <ebernhardson>	 this daemon was recently re-written so maybe i mucked something up, but the logging from python kafka consumer looks like it thinks it's talking to the right cluster and listening to the right partitions so...its odd
[16:19:27] <elukey>	 ebernhardson: kafka consumer-groups --list from kafka-jumbo1001 shows cirrussearch_updates_eqiad
[16:20:01] <elukey>	 lemme see if I can find its status
[16:20:04] <ebernhardson>	 elukey: and the commited offset?
[16:20:08] <ebernhardson>	 should be either 0, 1 or 2
[16:22:07] <elukey>	 so kafka consumer-groups --describe --group cirrussearch_updates_eqiad doesn't show me anything :D
[16:22:13] <elukey>	 empty
[16:23:19] <ebernhardson>	 hmm, so indeed something wierd going on somewhere :) The daemon also thinks it hasn't seen any messages, but i produced a new message at 15:57, after starting up the daemon and seeing it connect in logs at about 15:55
[16:23:23] <ebernhardson>	  :S
[16:25:08] <ebernhardson>	 it's gotta be on my daemon side somewhere...will look into what its doing and add more logging
[16:27:02] <elukey>	 ack, let me know if I can hel
[16:27:04] <elukey>	 *help
[16:42:27] <wikibugs>	 10Analytics: Unable to access SWAP notebooks using LDAP - https://phabricator.wikimedia.org/T230627 (10cchen) @elukey i update the password and it's working! i am able to log into the SWAP now. Thanks again!
[16:44:13] <joal>	 mayakpwiki: Hi! Would you mind trying to login to hue.wikimedia.org ?
[16:46:20] <wikibugs>	 10Analytics: Unable to access SWAP notebooks using LDAP - https://phabricator.wikimedia.org/T230627 (10elukey) 05Open→03Resolved a:03elukey Good!
[17:22:40] <wikibugs>	 10Analytics, 10Performance-Team, 10Research, 10Security-Team, 10WMF-Legal: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10Nuria) >Another possibility is to only keep the ruwiki data, which has by far the largest traffic during th...
[17:23:31] <wikibugs>	 10Analytics, 10Performance-Team, 10Research, 10Security-Team, 10WMF-Legal: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10Nuria) FYI that released one-off datasets get documented in meta like, for example: https://meta.wikimedia....
[17:24:37] <wikibugs>	 10Analytics, 10Performance-Team, 10Research, 10Security-Team, 10WMF-Legal: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10Gilles)
[17:46:24] <wikibugs>	 10Analytics, 10Product-Analytics, 10Reading Depth, 10Patch-For-Review, 10Readers-Web-Backlog (Needs Product Owner Decisions): Reading_depth: remove  eventlogging instrumentation - https://phabricator.wikimedia.org/T229042 (10Jdlrobson)
[17:49:36] <wikibugs>	 10Analytics, 10Product-Analytics, 10Reading Depth, 10Patch-For-Review, 10Readers-Web-Backlog (Needs Product Owner Decisions): Reading_depth: remove  eventlogging instrumentation - https://phabricator.wikimedia.org/T229042 (10ovasileva) +1 on disabling for now and keeping the dataset.
[19:01:24] <wikibugs>	 10Analytics, 10Operations: Access to HUE for Mayakpwiki - https://phabricator.wikimedia.org/T229143 (10JAllemandou) Action has been taken that should have granted access to shell username `Mayakpwiki`. @Mayakp.wiki can you test please? :)
[19:28:57] <wikibugs>	 10Analytics, 10Operations: Access to HUE for Mayakpwiki - https://phabricator.wikimedia.org/T229143 (10Mayakp.wiki) Checked connection and ran queries against mediawiki history. Access is working as expected. Thanks @JAllemandou and @Nuria for your help !
[19:38:16] <wikibugs>	 10Analytics, 10Operations: Access to HUE for Mayakpwiki - https://phabricator.wikimedia.org/T229143 (10JAllemandou) 05Open→03Resolved
[20:04:39] <wikibugs>	 (03PS4) 10Mforns: [WIP] Add Oozie job for mediawiki history dumps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530002 (https://phabricator.wikimedia.org/T208612)
[20:05:24] <wikibugs>	 (03CR) 10Mforns: [C: 04-2] "Finally, this seems to work! But still need to write the README." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530002 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns)
[20:15:44] <wikibugs>	 (03PS5) 10Mforns: [WIP] Add Oozie job for mediawiki history dumps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530002 (https://phabricator.wikimedia.org/T208612)
[20:20:46] <wikibugs>	 (03PS6) 10Mforns: [WIP] Add Oozie job for mediawiki history dumps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530002 (https://phabricator.wikimedia.org/T208612)
[20:30:13] <wikibugs>	 (03PS3) 10Mforns: [WIP] Add spark job to create mediawiki history dumps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/528504 (https://phabricator.wikimedia.org/T208612)
[20:31:22] <wikibugs>	 (03CR) 10Mforns: [C: 04-2] "This seems to work now! But we still have to agree on a final dumps format (splits) and add some detailed docs." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/528504 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns)
[20:44:43] <wikibugs>	 10Analytics, 10Fundraising-Backlog, 10Fundraising Sprint Q 2019: Identify source of discrepancy between HUE query in Count of event.impression and druid queries via turnilo/superset - https://phabricator.wikimedia.org/T204396 (10DStrine)
[20:58:01] <leila>	 milimetric: re T224459, I just reviewed it and is ready to go. Thanks for your edits. 2 points: I'll send the email to wiki-research-l and we will do some more pushes via our personal contacts and twitter. I'll have to give them 2 weeks time, so changing the deadline to September 3, Ok?
[20:58:01] <stashbot>	 T224459: Recommend the best format to release public data lake as a dump - https://phabricator.wikimedia.org/T224459
[21:07:31] <wikibugs>	 10Analytics, 10Research: Recommend the best format to release public data lake as a dump - https://phabricator.wikimedia.org/T224459 (10leila) @Milimetric thanks! I added two questions at the end: field of research and email address. I changed the deadline to 2019-09-03 and started advertising for it now.
[22:47:43] <wikibugs>	 10Analytics, 10Product-Analytics, 10Reading Depth, 10Patch-For-Review, 10Readers-Web-Backlog (Needs Product Owner Decisions): Reading_depth: remove  eventlogging instrumentation - https://phabricator.wikimedia.org/T229042 (10kzimmerman) @Groceryheist here's the proposal for SessionLength, which we want t...
[23:22:50] <wikibugs>	 10Analytics, 10Product-Analytics, 10Reading Depth, 10Patch-For-Review, 10Readers-Web-Backlog (Readers-Web-Kanbanana-2019-20-Q1): Reading_depth: remove  eventlogging instrumentation - https://phabricator.wikimedia.org/T229042 (10Jdlrobson)
[23:34:14] <wikibugs>	 10Analytics, 10Product-Analytics, 10Reading Depth, 10Patch-For-Review, 10Readers-Web-Backlog (Readers-Web-Kanbanana-2019-20-Q1): Reading_depth: remove  eventlogging instrumentation - https://phabricator.wikimedia.org/T229042 (10Jdlrobson) IT's off: https://grafana.wikimedia.org/d/000000566/overview?panel...