[06:55:57] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Performance-Team, and 2 others: Switch mw.user.sessionId back to session-cookie persistence - https://phabricator.wikimedia.org/T223931 (10Nuria) Uassigning @Milimetric and lowering priority a bit in the light of other (completely unrelated work) we... [06:56:12] 10Analytics, 10Better Use Of Data, 10Performance-Team, 10Product-Infrastructure-Team-Backlog, 10Product-Analytics (Kanban): Switch mw.user.sessionId back to session-cookie persistence - https://phabricator.wikimedia.org/T223931 (10Nuria) a:05Milimetricβ†’03None [07:00:38] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10SDC General, 10Wikidata: Create reportupdater reports that execute SDC requests - https://phabricator.wikimedia.org/T239565 (10Nuria) [07:33:39] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10AndyRussG) Just to note, we have the same problem for the new CentralNotice data pipeline, w... [07:45:25] 10Analytics: Check home leftovers of dfoy - https://phabricator.wikimedia.org/T239571 (10MoritzMuehlenhoff) [07:50:12] 10Analytics, 10DBA, 10Patch-For-Review: Repurpose db1107 as a generic database - https://phabricator.wikimedia.org/T238113 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1107.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimag... [08:05:54] nuria: I may not be available for the analytics standup later today, hope that’s not a problem - updated docs, checked outlier file, and will follow up on ticket later today [08:17:21] good morning team [08:33:18] bonjour! [08:39:34] elukey: druid new snapshot loading is still an issue :( (or maybe it;s deleting?) [08:41:55] Also: No need to restart the failed pageview-druid-hourly job - It's been covered by the daily one [08:44:51] just answered to alerts@ about the AQS failures [08:49:44] 10Analytics, 10DBA: Repurpose db1107 as a generic database - https://phabricator.wikimedia.org/T238113 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1107.eqiad.wmnet'] ` and were **ALL** successful. [08:49:58] elukey: I thought i had pinpointed the issue with GeoCode function last friday but in fact no :( I'm still stuck :( [08:52:03] /o\ [08:52:12] not a great start of the week :( [08:52:24] No :( [08:55:35] I'm planning to restart cassandra loading bundle after the backfilling period is done today - Please let me know of any concern [08:59:47] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal) [08:59:57] joal: nope all good [09:00:32] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553698 (https://phabricator.wikimedia.org/T239471) (owner: 10Addshore) [09:01:53] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553727 (https://phabricator.wikimedia.org/T239471) (owner: 10Addshore) [09:03:27] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553405 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal) [09:04:15] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/552510 (https://phabricator.wikimedia.org/T238855) (owner: 10Joal) [09:04:52] Ok - deploying refinery now (no refinery-source needed) [09:22:29] !log Deploy refinery using scap [09:22:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:41:58] elukey: very normally, 'thin' env deploy takes is extremely fast :) [09:43:13] !log Deploying refinery onto HDFS [09:43:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:46:36] 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10ArielGlenn) This is great news! We would be happy to link to it and host a copy once it's ready to be announced. What is the cumulative size of the files for download? [09:48:52] joal: \o/ [10:21:53] !log Create new tables for newly sqooped data in hive wmf_raw database [10:21:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:30:25] !log Manually sqoop tables not yet done because of late deploy (content_models, content, slots, slot_roles, wbt_entity_usage) [10:30:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:32:58] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10jijiki) @elukey Hey luca, I think I will need one too :) Thank you very much [10:36:29] 10Analytics, 10Research-management: Test GPUs with an end-to-end training task (Photo vs Graphics image classifier) - https://phabricator.wikimedia.org/T221761 (10Miriam) Update after changing learning rate and modifying few parameters in the training: * we reach 91% accuracy on the validation set (2% improvem... [10:38:45] 10Analytics: Change sqoop project list config so that content sqoop doesn't fail - https://phabricator.wikimedia.org/T239589 (10JAllemandou) [10:40:38] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10elukey) >>! In T237605#5704521, @jijiki wrote: > @elukey Hey luca, I think I will need one too :) Thank you very much ` elukey@krb1001:~$ sudo manage_principals.py create jiji --email_address=... [10:43:13] 10Analytics: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10JAllemandou) [10:45:13] 10Analytics: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10JAllemandou) [11:11:44] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10ops-eqiad: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Volans) I've updated the mgmt interface's DNS names on Netbox that were still reporting the old names `cloud... [11:23:53] * elukey lunch! [11:35:07] !log Kill mediawiki-geoeditors-monthly-coord before updating the jobn [11:35:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:39:18] !log Drop wmf.geoeditors_daily table and create wmf/editors_daily, moving underlying data and recreating partitions [11:39:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:40:13] !log Restart mediawiki-geoeditors-monthly-coord [11:40:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:03:12] 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10tizianopiccardi) It's 7TB compressed in gzip. The articles are partitioned in JSON files with 1000 revisions each (one per line) and we will share an index (+ download script) to kn... [13:04:35] @chiborg showed me something mysterious: a banner which has been sending eventlogging for a few days, we see the outgoing beacon, and a handful of EL processing errors in logstash. But then, one of the message types lands in Hadoop successfully and the other one is dropped without a trace. [13:04:48] Here's an example of the payload that seems to be disappearing: [13:04:54] {"event":{"bannerName":"org-mob02-191120-2-ctrl","bannerAction":"mobile-mini-banner-expanded","eventRate":1,"slidesShown":5,"finalSlide":5},"revision":18437830,"schema":"WMDEBannerEvents","webHost":"de.wikipedia.org","wiki":"dewiki"} [14:12:32] heyall! [14:14:42] weird, awight, I gotta catch up and then I'll take a look [14:15:00] milimetric: o. [14:15:02] o/ [14:15:12] hey elukey :) [14:15:22] how are things? [14:16:12] amazingly we managed to move into the new house and host Thanksgiving and Steph and I are both still alive :) [14:16:27] just barely, but alive [14:16:28] \o/ [14:17:14] how you doin? [14:20:15] all good! [14:20:53] a bit sad since it should have been today the Kerberos enable day :( [14:26:24] 10Analytics, 10DBA: Repurpose db1107 as a generic database - https://phabricator.wikimedia.org/T238113 (10Marostegui) 05Openβ†’03Resolved a:03Marostegui db1107 has been reimaged into buster and placed on test-s1 with MariaDB 10.3.20 with replicating from enwiki master and being an intermediate master for d... [14:26:27] 10Analytics-EventLogging, 10Analytics-Kanban: Sunset MySQL data store for eventlogging - https://phabricator.wikimedia.org/T159170 (10Marostegui) [14:26:29] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10Marostegui) [14:30:50] elukey: o/ [14:30:54] did you see this one? [14:30:54] https://phabricator.wikimedia.org/T236180#5698303 [14:31:12] i can look but didn't know if you already had done something with grants for that [14:32:28] ottomata: o/ I didn't no, I didn't add grants for the ipv6 IP for sure, fixing those in a sec [14:32:51] ok ty! [14:38:46] elukey: regarding your scap change, the -e thin only deploys to notebooks and labstore, and normal deploy no longer deploys to those hosts, so can I remove the section from the docs that says to deploy to notebooks separately? [14:39:53] (this one: "scap deploy -e notebook") [14:41:51] milimetric: yes yes I thought I had fixed that [14:42:05] np, I just changed it to -e thin [14:42:23] (just making sure I understood the scap change, which is great, and thanks for that!) [14:42:40] credits to ottomata for all the work, I only added a small nit :) [14:56:57] 10Analytics, 10User-ArielGlenn: Spike [2019-2020 work] Oozie Replacement. Airflow Study / Argo Study - https://phabricator.wikimedia.org/T217059 (10Ottomata) https://medium.com/flyr-labs-blog/why-were-switching-off-airflow-sort-of-780c4f58a660 [15:02:18] (03PS7) 10Mforns: Add data quality metric: traffic variations per country [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) [15:09:33] hiya joal, just checking regarding https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/553698/6..7/python/refinery/sqoop.py [15:09:47] should that change have been included in the 2019-10 snapshot? / did it get re run? [15:10:20] I see lots of nulls in that field in hadoop which is unexpected [15:28:53] 10Analytics, 10Analytics-Kanban, 10Wikidata, 10User-Addshore, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Sqoop wikidata terms tables into hadoop - https://phabricator.wikimedia.org/T239471 (10Addshore) Ping for @JAllemandou > 4:09 PM hiya joal, just checking regarding https://gerri... [15:41:42] 10Analytics, 10Research: Improve quality of external referer data - https://phabricator.wikimedia.org/T239625 (10Isaac) [15:49:37] 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10ArielGlenn) Let me add @Bstorm to make sure she knows I've volunteered us to host a copy and to make sure that there's 7T spare around, since that's more than I expected. [15:52:06] elukey: I saw that you removed python2 packages from stats machines https://github.com/wikimedia/puppet/commit/c84afc652e167008a2591734538b2af40dfc4790 and it looks like that may have broken reportupdater (see stat1007:/srv/discovery/log/golden-daily.log) [15:52:18] https://www.irccloud.com/pastebin/BnPWrXbx/ [15:55:17] bearloga: hi! report updater was moved to py3 before we removed py2 packages [15:57:24] elukey: great! thank you! phew! :) [15:57:59] bearloga: did you see anything weird in the data generated? [15:59:04] elukey: aye, isaac noticed we had missing data on the external traffic dashboard since september https://discovery.wmflabs.org/external/ [16:00:04] elukey: I just updated the reportupdater submodule we use to the py3-compatible version [16:00:36] elukey: reportupdater's gonna have a lot of data to backfill on its next run O_O [16:00:51] bearloga: really sorry, I tried to announce it as broadly as possible :( [16:01:40] ping ottomata [16:02:12] bearloga: don't have a lot of context, does the job pulls from hadoop? we might want to move the puppet code to systemd timers [16:02:15] elukey: it's okay, I remember seeing that email and going "oh hell yeah it's about time we dropped py2" [16:02:19] so when it fails we or you get an alert [16:02:26] :) [16:04:02] elukey: not realizing that I had a whole codebase that relied on a py2-only version of reportupdater (/me slams head on desk). but yes, it's almost entirely hive queries https://github.com/wikimedia/wikimedia-discovery-golden/ [16:05:36] bearloga: ok then I have confused your crons with other ones that run as 'analytics-search' when we discussed about kerberos [16:05:43] will need to check how to kerberize them as well [16:05:51] elukey: this is the script that's scheduled to run via cron in statistics::discovery.pp https://github.com/wikimedia/wikimedia-discovery-golden/blob/master/main.sh [16:06:03] it's that cron, you have it right [16:06:25] addshore: in standup now but will check after - normally field has been sqooped correctly - I'll nonetheless rerun a sqoop today or tomorrow with prod tables. [16:09:20] elukey: and since it's running as analytics-search on stat1007 (https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/discovery.pp#L65-L78), it should be fine, right? you mentioned you've taken care of kerberos stuff for system users [16:10:10] bearloga: there are other ones that are running with that user but not as crons, as kerberos::exec (so systemd timers) [16:10:21] ohhh [16:10:22] I'll need to upgrade your crons as well [16:10:35] but it should take a puppet change [16:10:37] nothing heavy [16:11:06] good thing is that we'll have proper alarms when the script returns non zero (as opposed to cron that simply forwards stdout/err if any) [16:11:07] elukey: aye, I also have a cron on notebook1004 (monthly reports for android app team) [16:11:56] bearloga: as your user right? [16:12:10] if so you should be able to move to https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide#Run_a_recurrent_job_via_Cron_or_similar_without_kinit_every_day [16:12:19] if not, I'll need to check the cron [16:12:35] elukey: yep, on notebook1004 it's my user so I'll follow that guide, thank you! [16:12:42] super :) [16:13:51] elukey: by the way I'd love to see your puppet patch to upgrade the crons running under analytics-search on stat1007 when you upload it [16:20:34] joal: okay :) [16:21:00] bearloga: sure! I was planning to add you as reviewer before proceeding, since we don't have a real rush [16:21:08] (needs to be done before the 16th) [16:41:32] yes please! :D [16:43:29] !log Restarting cassandra bundle after deploy [16:43:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:44:50] joal: when you have time, can we re-review https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/491791/ and see if it can be merged? (webrequest-load for the test cluster) [16:45:06] Yes elukey [16:45:17] I want to standardize a bit more refinery in the test cluster [16:45:33] for some reason I am seeing failure of spark refine in the test cluster after your deployment [16:45:41] :( [16:45:50] that are related to kerberos auth, super generic of course [16:47:44] 10Analytics, 10Multimedia, 10Tool-Pageviews: Mediaviewer views should be reworked to be an eventlogging event - https://phabricator.wikimedia.org/T239630 (10Nuria) [16:47:59] elukey: currently trying to fix cassandra restart :( after tasking/ops meeting? [16:49:21] oh yes even tomorrow [16:49:30] no rush [16:50:07] 10Analytics, 10Multimedia, 10Tool-Pageviews: Mediaviewer views should be reworked to be an eventlogging event - https://phabricator.wikimedia.org/T239630 (10Nuria) a:03dr0ptp4kt [16:50:46] 10Analytics, 10Multimedia, 10Tool-Pageviews: Mediaviewer views should be reworked to be an eventlogging event - https://phabricator.wikimedia.org/T239630 (10Nuria) Assigning this to @dr0ptp4kt so he can assign to the pertinent team cc @Tgr taht used to work on this [16:54:20] 10Analytics, 10Event-Platform, 10Multimedia, 10Tool-Pageviews: Mediaviewer views should be reworked to be an eventlogging event - https://phabricator.wikimedia.org/T239630 (10mforns) [16:54:41] 10Analytics, 10Event-Platform, 10Multimedia, 10Tool-Pageviews: Mediaviewer views should be reworked to be an eventlogging event - https://phabricator.wikimedia.org/T239630 (10mforns) This looks like a good candidate for an EventGate first implementation. [16:55:48] 10Analytics, 10Event-Platform, 10Multimedia, 10Tool-Pageviews: Mediaviewer views should be reworked to be an eventlogging event - https://phabricator.wikimedia.org/T239630 (10mforns) p:05Triageβ†’03Normal [16:56:42] 10Analytics, 10Event-Platform, 10Multimedia, 10Tool-Pageviews: Mediaviewer views should be reworked to be an eventlogging event - https://phabricator.wikimedia.org/T239630 (10dr0ptp4kt) a:05dr0ptp4ktβ†’03MarkTraceur Kicking over to Mark as manager of the Structured Data engineering team, where maintenanc... [16:58:25] 10Analytics, 10Research: Improve quality of external referer data - https://phabricator.wikimedia.org/T239625 (10mforns) p:05Triageβ†’03High [17:02:51] 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on dbstore1003 - https://phabricator.wikimedia.org/T239217 (10Marostegui) Any update on this? Thanks! [17:09:45] 10Analytics: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10mforns) @WDoranWMF Hi! We are trying to prioritize this task, do you know when the changes to the revision table (move fields to content table through slots) are going to take place? Thanks! [17:11:13] elukey: curious on thoughts...turns out airflow requires a mysqld global variable set, "explicit_defaults_for_timestamp". I'm wary of changing that out from under the other apps running on an-coord, so thinking best bet is setup mysqld on an-airflow? [17:14:58] ebernhardson: need to document myself about the setting, having a mysqld instance on ganeti is doable but not the best perf wise.. [17:15:07] but probably the best choice for the moment :( [17:16:12] elukey: ack [17:17:53] ebernhardson: I am in a meeting now but will review the option later on [17:59:14] oh joal we could even create a completely separate hdfs-rsync package and deploy it separately from refinery [17:59:22] like a debian package [17:59:27] milimetric: very feasible as well! [17:59:30] in its own repository [17:59:42] yup [18:00:07] elukey: ping me when your meeting is finished, I need your opinion on hdfs-rsync please [18:00:28] sure! [18:01:29] hm - cassandra just got killed - is it from any of us a-team? [18:01:50] cassandra oozie bundle sorry - I should have been precise --^ [18:09:23] not me [18:10:21] joal: hue lists it as DONEWITHERROR [18:10:26] never seen it before [18:10:27] yup, saw that [18:10:46] E1002: Invalid coordinator application URI [hdfs://analytics-hadoop/user/joal/oozie/mediarequest/datasets.xml], path not existed : /user/joal/oozie/mediarequest/datasets.xml [18:11:01] Annnnh [18:11:11] thanks a lot elukey - this helps :) [18:11:29] I am also available to chat, but can wait if you want to debug this further [18:11:43] elukey: let's chat [18:11:50] elukey: cave? [18:12:46] sure [18:26:07] (03PS1) 10Joal: Fixes after deploy [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554120 [18:26:19] milimetric: --^ I have tested both :) [18:27:57] ok joal, I should deploy refinery and start that bundle or what's the status? [18:28:29] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fixes after deploy [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554120 (owner: 10Joal) [18:28:39] milimetric: exactly - I can go for it, wanted to have your approal for merging :) [18:29:01] merged - and ok, you can deploy [18:29:15] ack milimetric :) Thanks! [18:39:24] !log Deploy refinery using scap for fixes [18:39:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:40:17] !log Deploy refinery onto hdfs [18:40:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:42:32] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10Ladsgroup) Hello, My name is Amir and I'm an alcoholic, I mean software engineer. I work as a developer at WMDE in wikidata team and I use hadoop data on daily basis from `stat100*`. My shell na... [18:48:36] joal: some of my kerberos assumptions just vanished :D [18:48:57] :) [18:49:01] elukey: what's up? [18:49:12] Refine works now (in hadoop test) only if I explicitly pass the --keytab and --principal arguments to spark2-submit [18:49:16] thing that I have never done [18:49:41] from the logs I can see that an ALTER TABLE was issued by a hadoop worker node, as part of refine [18:49:53] and that was rejected since no kerberos auth was provided [18:50:12] no hive-related kerb I assume [18:50:28] is it possible that up to now it has been working fine since no real alters or similar commands needed to run? [18:51:34] elukey: I don't know how it's been tested, but when running spark actions with oozie, I give hive credentials (in order for spark to be able to access metastore) [18:52:00] joal: simply running navigation timing's refine, nothing more [18:52:16] elukey: on existing tables I guess [18:52:22] but I was convinced that spark2 got delegation tokens for hive upon start [18:52:36] elukey: not in oozie, for sure [18:52:40] and that it needed a keytab passed to the AM only for to renew the kerb ticket [18:54:08] but what about spark.sql in yarn mode? [18:54:25] I thought we had it tested, and it didn't need the --keytab parameter [18:55:45] elukey: not documented in my test list - I might have forgotten :( [18:56:23] elukey: testing now [18:56:47] joal: I can see a pyspark2 sql test in your report [18:57:22] Ah! [18:58:48] 19/12/02 18:44:04 INFO AMCredentialRenewer: Attempting to login to KDC using principal: analytics/analytics1030.eqiad.wmnet@WIKIMEDIA [18:58:52] 19/12/02 18:44:05 INFO AMCredentialRenewer: Successfully logged into KDC. [18:59:00] this is from a task on analytics1037's logs [18:59:16] I confirm it works elukey - just tested [18:59:26] Without credentials [18:59:43] Now, for refine, we use a trick to talk to hive: we do it rhough JDBC [19:00:01] obviously this can't work as is with kerb!! [19:00:11] pfff - I should have thought about that :( [19:01:07] joal: we added a change for that, do you remember andrew's patch to read hive-site.xml? [19:01:23] Ah, right! [19:01:32] elukey: please excuse me being so slow :( [19:01:47] nono please I am trying to review all that we have done with you :) [19:02:34] I am doing a little test now [19:02:57] since I successfully executed the alter with --keytab --principal, maybe now refine will work without them [19:03:39] hmm ok :) [19:03:40] no not really [19:03:41] it fails [19:03:44] ah [19:04:01] ah no it is trying to do the alter [19:04:39] ok no brain fault now [19:04:54] I'll stop and start tomorrow with a fresh mind [19:05:04] it is not a big deal to add the --keytab --principal [19:05:05] back joal , milimetric , mforns , sorry for the very abrupt interrruption [19:05:12] but I want to understand how it works [19:05:26] thanks for the brainbounce joal! [19:05:37] * elukey off! [19:05:44] elukey: those passed params, are they for spark or hive? [19:05:47] joal: and all the cassandra jobs failing? [19:05:52] Bye elukey - Talk tomorrow :) [19:06:22] nuria: I found the bug, patched, milimetric merged, deploy is almost done and I'll restart the job in minutes [19:06:48] joal: for spark [19:07:44] !log restart cassandra bundle after redeployed patch [19:07:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:07:53] joal: I see it ya, cause the bundle is teh same for all jobs [19:08:42] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10elukey) >>! In T237605#5706229, @Ladsgroup wrote: > Hello, My name is Amir and I'm an alcoholic, I mean software engineer. I work as a developer at WMDE in wikidata team and I use hadoop data on... [19:20:53] !log Manually kill cassandra-coord-mediarequest-per-referer-hourly from bundle as it shouldn't exist [19:20:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:23:59] (03CR) 10Milimetric: "(agree with mforns's comments too)" (035 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/552943 (https://phabricator.wikimedia.org/T238360) (owner: 10Nuria) [19:24:16] milimetric: will look, super thanks for reviewing [19:24:54] ofc, I'm glad to be back and relatively stable so I can actually help [19:27:15] other errors in cassandra - will continue to fix [19:35:51] 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped at 2019-11-24Z00:57 and again at 2019-12-01Z22:45 - https://phabricator.wikimedia.org/T239121 (10Jdforrester-WMF) [19:36:00] 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped at 2019-11-24Z00:57 and again at 2019-12-01Z22:45 - https://phabricator.wikimedia.org/T239121 (10Jdforrester-WMF) It's re-broken itself. [19:38:20] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Hourly Feature extraction for bot detection from webrequest - https://phabricator.wikimedia.org/T238360 (10Milimetric) doc looks great, copy-edited a bit as I went through it [19:40:35] (03CR) 10Nuria: [C: 04-1] Fix GetGeoDataUDF and underlying function (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/553726 (https://phabricator.wikimedia.org/T238432) (owner: 10Joal) [19:50:12] (03PS1) 10Joal: Last patches for today deploy [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554134 [19:50:40] 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped at 2019-11-24Z00:57 and again at 2019-12-01Z22:45 - https://phabricator.wikimedia.org/T239121 (10Nuria) Pinging @Ottomata in case he has any ideas [19:51:00] milimetric: if you don't mind --^ [19:51:11] last patch I think for cassandra [19:52:46] 10Analytics, 10Multimedia, 10Tool-Pageviews: Mediaviewer preloads should be marked as such via x-analytics tag - https://phabricator.wikimedia.org/T239655 (10Nuria) [19:53:18] 10Analytics, 10Multimedia, 10Tool-Pageviews: Mediaviewer preloads should be marked as such via x-analytics tag - https://phabricator.wikimedia.org/T239655 (10Nuria) a:03MarkTraceur [19:53:23] looking at VE edit data thing [19:54:02] plenty of data in statsv topic [19:55:37] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554134 (owner: 10Joal) [19:55:41] joal: i do not understand how is that working on the loading we are doing for images [19:55:43] 10Analytics, 10Multimedia, 10Tool-Pageviews: Mediaviewer preloads should be marked as such via x-analytics tag - https://phabricator.wikimedia.org/T239655 (10Nuria) Assigning to @MarkTraceur for him to reroute. The earlier these changes can be done the earlier we can incorporate them to the apis that return... [19:55:53] nuria: ? [19:56:06] joal: sorry, mumbling [19:56:25] joal: your prior change seems like it would have broken the loading that happens on the cron [19:56:27] nuria: talking about oozie-cassandra? [19:56:29] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 9 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10WDoranWMF) [19:56:31] joal: ya [19:56:41] joal: and fran's loading of images via cron [19:56:42] Ah I get it [19:57:39] I don't think so nuria - My changes impact bundle.xml only [19:57:57] joal: and his are running just a workflow? [19:58:04] francisco uses coordinaor definition to backfill, not bundler [19:58:10] -r sorry [19:58:20] joal: k, i see [19:58:40] nuria: also, I'm sorry I didn't get to review your patch on bots fast :( [19:58:47] joal: np [19:59:03] joal: will give it another pass after dan's and marcel comments [19:59:17] I have quicklu read their comments and they make sense [19:59:21] ottomata: how do we find out what is teh statsv topic? [20:00:16] !log Deploy refinery using scap to fix today deploy (last) [20:00:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:01:18] it is [20:01:19] 'statsv [20:01:19] ' [20:01:20] hehh [20:01:21] nuria [20:01:29] ottomata: yessir [20:01:30] i see ve.mwTarget messages in that topic [20:01:31] plenty of them [20:01:37] so, the data is there [20:01:43] ottomata: is the topic called statsv? [20:01:44] so it must be somethign with the statsv processor [20:01:47] nuria: yes [20:01:54] ottomata: plain out , ok [20:02:10] ottomata: is there any docs as to where does the statsv processor run? [20:02:29] i think performance took over the statsv processor [20:02:30] ottomata: cause it runs on a machine that was part of perf team [20:02:30] not that i know of, just puppet? [20:02:34] yeah i guess a few of them now? [20:02:34] ottomata: k [20:02:35] trying to read puppet [20:03:57] oh ok only webperf[12]001 [20:06:38] 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped at 2019-11-24Z00:57 and again at 2019-12-01Z22:45 - https://phabricator.wikimedia.org/T239121 (10Ottomata) In statsv service logs on webperf1001: ` Dec 01 22:45:46 webperf1001 python[13187]: Process Process-2: D... [20:06:59] 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped at 2019-11-24Z00:57 and again at 2019-12-01Z22:45 - https://phabricator.wikimedia.org/T239121 (10Ottomata) Also possibly relevant: ` Dec 01 12:21:31 webperf1001 python[13187]: [357B blob data] Dec 01 12:21:31 we... [20:07:57] 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped at 2019-11-24Z00:57 and again at 2019-12-01Z22:45 - https://phabricator.wikimedia.org/T239121 (10Nuria) An error like that one would affect all metrics sent to statsv from the frontend seems like. [20:12:11] 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped at 2019-11-24Z00:57 and again at 2019-12-01Z22:45 - https://phabricator.wikimedia.org/T239121 (10Nuria) But other graphs seem unaffected (just looked at navigation timing ones). It is sure not caused by new chnag... [20:17:47] (03PS1) 10Mstyles: update user agent [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554142 (https://phabricator.wikimedia.org/T238106) [20:17:54] !log Deploying refinery to hdfs - Last for today! [20:17:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:26:16] 10Analytics, 10Performance-Team, 10Research, 10Security-Team, and 2 others: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10JFishback_WMF) [20:27:21] Ok this time I think we're good [20:27:28] !log restart cassandra bundle [20:27:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:31:23] Ok team - Gone for diner [20:32:59] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10Milimetric) I noticed that three folks used `rev_deleted = 0` to mean "revision not deleted", but this field mea... [20:39:20] (03CR) 10Mforns: "LGTM overall! Left a comment on the naming convention." (031 comment) [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/551941 (https://phabricator.wikimedia.org/T236941) (owner: 10Milimetric) [20:39:34] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10SDC General, 10Wikidata: Create reportupdater reports that execute SDC requests - https://phabricator.wikimedia.org/T239565 (10Milimetric) Yay, I get to work with @mpopov :) Ok, questions: * how often should this report be updated? * is it exactly... [20:40:40] (03CR) 10Mforns: "Hi Srishti, is this the final code after Joseph's modifications?" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/551690 (https://phabricator.wikimedia.org/T232671) (owner: 10Srishakatux) [20:43:24] nuria: Should I add a risk assessment section to the caching docs before Wednesday? [20:43:35] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) @akosiaris I merged and applied https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/551610 in staging.... [21:22:06] (03PS4) 10Srishakatux: Modify WMCS queries [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/551690 (https://phabricator.wikimedia.org/T232671) [21:29:08] lexnasser: we will let james do it [21:29:37] 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on dbstore1003 - https://phabricator.wikimedia.org/T239217 (10Jclark-ctr) just received drive from warehouse [21:30:14] (03CR) 10Nuria: [C: 03+2] update user agent [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554142 (https://phabricator.wikimedia.org/T238106) (owner: 10Mstyles) [21:34:39] (03Merged) 10jenkins-bot: update user agent [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554142 (https://phabricator.wikimedia.org/T238106) (owner: 10Mstyles) [21:34:44] ebernhardson: do you have couple mins for a fast request? [21:35:01] nuria: sure [21:35:51] ebernhardson: can you send me a copy of the document on your search drive where we wrote some rationales for the work of search this year? i no longer have access to it [21:36:11] hmm, lemme see if i can find that [21:36:56] ebernhardson: ok [21:37:20] nuria: does this sound right: Analytics/Search/ Information Retrieval Annual planning template FY19-20 - [21:37:28] ebernhardson: ya [21:37:54] ebernhardson: cause i am going to rescue some of stash writing arround wdqs to try to make a point about metrics [21:38:22] nuria: hmm, wdqs isn't mentioned in this :S lemme look some more [21:39:49] nuria: ok i think this is right one, shared: Annual Plan - Search and Information Retrieval - 2019 [21:39:58] (03CR) 10Ladsgroup: "Since the analytics part is merged but the patch that actually changes it is not merged or deployed I feel our system and analytics will b" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554142 (https://phabricator.wikimedia.org/T238106) (owner: 10Mstyles) [21:40:09] (03CR) 10Ladsgroup: "I will make a follow up." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554142 (https://phabricator.wikimedia.org/T238106) (owner: 10Mstyles) [21:40:32] ebernhardson: all right super thanks [21:42:58] (03PS1) 10Ladsgroup: Fix user agent for WDQS updater counter [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554162 (https://phabricator.wikimedia.org/T238106) [22:08:11] (03CR) 10Nuria: Fix user agent for WDQS updater counter (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554162 (https://phabricator.wikimedia.org/T238106) (owner: 10Ladsgroup) [22:13:25] (03PS2) 10Ladsgroup: Fix user agent for WDQS updater counter [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554162 (https://phabricator.wikimedia.org/T238106) [22:13:30] (03CR) 10Ladsgroup: Fix user agent for WDQS updater counter (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554162 (https://phabricator.wikimedia.org/T238106) (owner: 10Ladsgroup) [22:28:06] 10Analytics, 10Analytics-Kanban: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10lexnasser) @Danielsberger Updated Wikitech ([[ https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Caching | LINK ]]) once again with a description... [22:29:53] (03CR) 10Nuria: [C: 03+2] Fix user agent for WDQS updater counter [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554162 (https://phabricator.wikimedia.org/T238106) (owner: 10Ladsgroup) [22:34:24] (03Merged) 10jenkins-bot: Fix user agent for WDQS updater counter [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554162 (https://phabricator.wikimedia.org/T238106) (owner: 10Ladsgroup) [23:53:08] (03PS3) 10Milimetric: Add grouped option to datasets api [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/551941 (https://phabricator.wikimedia.org/T236941) [23:53:48] (03CR) 10Milimetric: "Thanks! I added a note that testing is broken, but I did add a couple of tests for when we get around to fixing it." (031 comment) [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/551941 (https://phabricator.wikimedia.org/T236941) (owner: 10Milimetric)