[00:01:53] 10Analytics, 10FR-Tech-Analytics, 10Fundraising-Backlog: Whitelist Portal and WikipediaApp event data for (sanitized) long-term storage - https://phabricator.wikimedia.org/T273246 (10EYener) Thank you @mforns and apologies that it took me a while to return to this. I've submitted three individual reviews for... [00:10:58] 10Analytics, 10Analytics-Visualization: Automate creation of monthly report - https://phabricator.wikimedia.org/T66689 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech.wikimedia.org/wik... [00:11:00] 10Analytics, 10Analytics-Visualization: Namespace selection impacts metrics - https://phabricator.wikimedia.org/T74973 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech.wikimedia.org/wik... [00:11:02] 10Analytics, 10Analytics-Visualization: Static part of Geowiki bot definition is outdated - https://phabricator.wikimedia.org/T56692 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech.wik... [00:11:06] 10Analytics, 10Analytics-Visualization: Respond with 404 for missing pages - https://phabricator.wikimedia.org/T55419 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech.wikimedia.org/wiki... [00:11:08] 10Analytics, 10Analytics-Visualization: The cache-busting does not work on Remote datafiles - https://phabricator.wikimedia.org/T55234 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech.w... [00:11:11] 10Analytics, 10Analytics-Visualization: Limn reportcard "sign in" OAuth redirect to localhost - https://phabricator.wikimedia.org/T55096 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech... [00:11:13] 10Analytics, 10Analytics-Visualization: Negative values break log-scale graphs - https://phabricator.wikimedia.org/T55052 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech.wikimedia.org/... [00:11:15] 10Analytics, 10Analytics-Visualization: Date in callout does not match max date on x-axis - https://phabricator.wikimedia.org/T55232 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech.wik... [00:11:18] 10Analytics, 10Analytics-Visualization: displaying single points of data as part of path - https://phabricator.wikimedia.org/T55049 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech.wiki... [00:11:20] 10Analytics, 10Analytics-Visualization: When Graph definitions are broken, other graphs on dashboard might break - https://phabricator.wikimedia.org/T55051 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashik... [00:11:22] 10Analytics, 10Analytics-Visualization: Bug with sorting in some pages due to string sorting instead of numerical sorting - https://phabricator.wikimedia.org/T147749 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded... [00:11:24] 10Analytics, 10Analytics-Visualization: Annotations get messed up on resize - https://phabricator.wikimedia.org/T55047 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech.wikimedia.org/wik... [00:11:26] 10Analytics, 10Analytics-Visualization, 10Story: Story: EEVSUser selects time range - https://phabricator.wikimedia.org/T70470 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech.wikimed... [00:11:28] 10Analytics, 10Analytics-Visualization: Graph types ("Core" and "Secondary") are unclear on Wikimedia Report Card - https://phabricator.wikimedia.org/T41120 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashi... [00:11:30] 10Analytics, 10Analytics-Visualization: Wikimedia Report Card should graph pages containing at least one image - https://phabricator.wikimedia.org/T40394 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki]... [00:11:32] 10Analytics, 10Analytics-Visualization: Support stylizing of individual lines - https://phabricator.wikimedia.org/T37732 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech.wikimedia.org/w... [00:11:34] 10Analytics, 10Analytics-Visualization: Clarify in graphs that users might be in both global north, and global south - https://phabricator.wikimedia.org/T56649 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Da... [00:11:37] 10Analytics, 10Analytics-Visualization: GeoIP updates can users to jump to new country in geowiki files - https://phabricator.wikimedia.org/T56650 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https:... [00:11:39] 10Analytics, 10Analytics-Visualization: Add keyboard shortcuts - https://phabricator.wikimedia.org/T37596 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](https://wikitech.wikimedia.org/wiki/Analytics/S... [00:11:42] 10Analytics, 10Analytics-Visualization: country data identified by row number rather than actual country name (or ISO code) - https://phabricator.wikimedia.org/T56359 (10Aklapper) [00:11:43] 10Analytics, 10Analytics-Visualization: Column changes in global-dev/dashboard-data files breaks user-generated graphs - https://phabricator.wikimedia.org/T56612 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [... [00:11:46] 10Analytics, 10Analytics-Visualization: Column changes in geowiki-data files breaks user-generated graphs - https://phabricator.wikimedia.org/T56611 (10Aklapper) 05Open→03Declined Declining this #Analytics-Visualization task, as this project tag was used for Limn. Limn has been superseded by [Dashiki](http... [00:11:48] 10Analytics, 10Analytics-Visualization: country data identified by row number rather than actual country name (or ISO code) - https://phabricator.wikimedia.org/T56359 (10Aklapper) [00:12:10] 10Analytics, 10Analytics-Visualization, 10Project-Admins: Archive #Analytics-Visualization (which seems to be about Limn)? - https://phabricator.wikimedia.org/T274647 (10Aklapper) 05Open→03Resolved Ah, thanks. :) * Updated project description of #Analytics-Visualization at https://phabricator.wikimedia.o... [00:45:44] 10Analytics, 10Analytics-Kanban, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: update welcome survey aggregation schedule - https://phabricator.wikimedia.org/T275172 (10nettrom_WMF) [00:45:47] 10Analytics, 10Analytics-Kanban, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: shorten welcome survey retention to 90 days - https://phabricator.wikimedia.org/T275171 (10nettrom_WMF) [00:45:50] 10Analytics, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: End wider data purge window - https://phabricator.wikimedia.org/T273815 (10nettrom_WMF) [00:51:44] 10Analytics-Radar, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: update welcome survey aggregation schedule - https://phabricator.wikimedia.org/T275172 (10nettrom_WMF) p:05High→03Medium a:05mforns→03nettrom_WMF Moving this to the Analytics radar and reassigning to me. The Welcome Su... [00:54:10] 10Analytics-Radar, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: shorten welcome survey retention to 90 days - https://phabricator.wikimedia.org/T275171 (10nettrom_WMF) p:05High→03Medium a:05mforns→03None Moving this to the Analytics Radar, for the same reasons as I just moved T2751... [01:04:02] 10Analytics-Radar, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: shorten welcome survey retention to 90 days - https://phabricator.wikimedia.org/T275171 (10nettrom_WMF) [01:04:05] 10Analytics, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: End wider data purge window - https://phabricator.wikimedia.org/T273815 (10nettrom_WMF) [01:05:31] Oh, the parent task (T273815) was just about the changes to the EventLogging data, the Welcome Survey work is separate. I changed the parent task of the latter to reflect that. Sorry about the confusion, shouldn't be any work for your team here [01:05:31] T273815: Growth: End wider data purge window - https://phabricator.wikimedia.org/T273815 [03:43:10] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - https://phabricator.wikimedia.org/T275466 (10Liz) [06:58:56] good morning [07:24:30] 10Analytics-Radar, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10Product-Analytics, 10Structured-Data-Backlog (Current Work): Set up generation of JSON dumps for Wikimedia Commons - https://phabricator.wikimedia.org/T259067 (10ArielGlenn) I have renamed everything and it's all rsynced out to the p... [07:56:51] 10Analytics-Radar, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10Product-Analytics, 10Structured-Data-Backlog (Current Work): Set up generation of JSON dumps for Wikimedia Commons - https://phabricator.wikimedia.org/T259067 (10Miriam) Thanks SO much @ArielGlenn, I am also downloading those on our... [08:31:50] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Segment CodeMirror metrics by user edit count [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/656210 (https://phabricator.wikimedia.org/T273471) (owner: 10Awight) [08:32:15] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Use edit count bucket sent by TemplateWizard [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657634 (https://phabricator.wikimedia.org/T273475) (owner: 10Awight) [08:33:05] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Add TemplateDataEditor schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664801 (https://phabricator.wikimedia.org/T275012) (owner: 10Awight) [08:33:28] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Add TemplateDataApi schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664799 (https://phabricator.wikimedia.org/T275011) (owner: 10Awight) [08:44:57] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Restore templatewizard queries [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/666203 (owner: 10Awight) [08:48:45] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) Updated script after uid/gid reservation (the script can be refactored in 100 ways but I prefer to keep it simple and clear): ` #!/bin/bash set -ex ## hdfs UID=$(id -u hdfs) GID=... [09:16:03] 10Analytics, 10Data-Persistence-Backup: Matomo database backup size doubled, we should check this is normal operation - https://phabricator.wikimedia.org/T272344 (10jcrespo) FYI: ` [dbbackups]> select section, start_date, total_size, REPEAT('▄', total_size/20000000) as graph FROM backups where section='matomo'... [09:18:10] 10Analytics, 10Data-Persistence-Backup: Matomo database backup size doubled, we should check this is normal operation - https://phabricator.wikimedia.org/T272344 (10elukey) @razzi can you check? :) [10:22:08] find + chown is really really slow :( [10:24:15] TIL chown --from [10:47:28] was the first attempt find -print0|xargs -0? [10:47:47] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) I tried to apply the above on an-test-worker1003 (already on Buster) doing the following: 1) Stop all hadoop daemons + puppet disabled 2) run the script 3) enable + run puppet Some... [10:48:46] if so and you're on spinning rust, I recommend `find -print0 > file; < file xargs` chown to avoid chown and find walking over each other's feet on I/O [10:48:52] klausman: morning! Nope I didn't do it, just tried find -exec chown.. I can try find -print + xargs indeed! [10:49:22] my first test was https://phabricator.wikimedia.org/T231067#6851675 [10:50:03] Ah. [10:50:17] I will sheepishly admit that I'd written a Go program :D [10:50:53] ahahahaah yes this was my second thought but I wanted to see if regular tools were good enough [10:51:07] I mean it is fine to wait some minutes, but not like 20 [10:51:26] Yeah. At least you can do machines in parallel [10:52:19] find -print is very fast, it is the chown part that is horribly slow [10:52:30] (in the find/exec model I mean) [10:54:35] Oh [10:54:55] Why are you using the name of the user instead of the UID? That way, you save on getent() calls [10:55:27] I mean the $2 in the chown call [10:55:52] this is a very good point [10:56:01] Also, with xargs you would bundle chown invocations, saving on chown startup cost [10:57:13] * elukey agrees and takes notes [10:58:30] 10Analytics-Radar, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10Product-Analytics, 10Structured-Data-Backlog (Current Work): Set up generation of JSON dumps for Wikimedia Commons - https://phabricator.wikimedia.org/T259067 (10Miriam) @ArielGlenn thanks a lot again for this! One question: would... [11:00:46] klausman: for the bundling part, do you mean with the xargs -n arg? [11:01:02] I admit my ignorance about xargs [11:01:46] Just don't specify an -n [11:02:05] The xargs will build command lines that are as long as the system's limit for commandlines allows [11:02:49] nice, okok [11:03:10] (xargs also allows for parallelism with -P, but I don't think that's useful here) [11:04:20] so for example [11:04:21] find / \( -path /proc -o -path /mnt -o -path /sys -o -path /dev -o -path /media \) -prune -false -o -user $OLD_UID -print0 | xargs -0 chown $1 [11:05:04] Yes. One note though [11:05:20] How many different real filesystems do you have? Like / and /home? [11:06:00] it depends, for hadoop worker nodes we have root + 12/24 partitions for datanode data [11:06:45] Because it might be easier to just enumerate them (and /) and use -xdev [11:07:43] like find / [other mountpoints] -xev [rest of find conditions] [11:07:46] xdev* [11:08:01] That way you don't have to be sure you don't forget any of the pseudo filesystems [11:09:02] ah you mean instead of pruning them [11:09:17] Yes [11:09:39] Very unrelatedly, I am not in the analytics POSIX group on stat1008, which means I can't read the Varnish webrequest files :D [11:10:18] if you use spark, you can do something like [11:10:36] sudo -u analytics kerberos-run-command analytics spark2-shell etc.. [11:10:46] Oh neat [11:11:03] The user keytab that you are trying to use (/etc/security/keytabs/analytics/analytics.keytab) doesn't exist or it isn't readable from your user, aborting... [11:11:05] and then you should be set with credentials etc.. [11:11:06] Foiled again! [11:11:21] ah right! That is only for us, go on an-launcher1002 [11:11:27] and you'll find it [11:12:15] keep an eye on memory consumption for the spark shell if possible, we have plenty of memory in theory but there are running jobs [11:12:26] (I mean mostly kicking off hadoop jobs) [11:12:34] Roger. I am only looking at an hour of data at a time [11:12:38] perfect [11:12:49] thanks a lot for the suggestions klausman! [11:12:54] np :) [11:22:01] klausman: you solution is like 100 times faster :D [11:27:17] \o/ [11:29:43] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) Tobias reviewed my horrible code and suggested some changes: * avoid using the name of the user/group in find to avoid unnecessary calls to `getent` * use find -print0 | xargs -0 to... [11:29:56] credits deserved --^ [11:36:06] *bows* [11:36:55] Say, do you happen to know what exactly I have access to on the map part of the sequenceFile load? Loading 1h of data is taking forever. [11:37:33] And since I only need TLS requests from cp3050, I am now loading/indexing a shit-ton of stuff I won't need. [11:39:43] klausman: yep, https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=an-launcher1002&var-datasource=eqiad%20prometheus%2Fops&var-cluster=analytics let's stop for the moment :D [11:40:11] Memory was fine, CPU much less so. [11:40:37] were you using --master yarn? [11:41:16] No [11:41:20] Should I have? [11:42:26] yes let's try it, it basically transfers all the work on the hadoop cluster [11:42:32] Will do, sec [11:42:39] so that cpu usage should improve a lot [11:43:43] Yeah, that is *way* faster. Thanks! [11:43:47] Now we're even :) [11:44:04] \o/ [11:44:17] for the columns I would check in the json records [11:45:15] from kafka for example, to get a quick glimpse [11:45:25] Yeah, I presumed the fields would be visible, but I was unsure how to access them. With this now being much faster it's easier to iterate on things [11:45:34] perfect :) [11:45:59] We usually keep an eye on https://yarn.wikimedia.org/cluster/scheduler [11:46:18] if the cluster is too busy people usually get pings from Joseph :D [11:46:42] but it happens when people mistakenly load say a day of webrequest [11:49:17] I am very carefully looking at the smallest clearly-delineated part of the data during development. if at a later date we want to do this for whole days, we probably need to productionize this a bit more [11:49:40] definitely [11:49:53] and the more we progress towards a final solution the more we'll productionize it [11:50:06] so it should be a thing that happens naturally (hopefully) [11:50:48] I imagine that we'll refine both vk and atskafka streams for a bit, to then drop the varnishkafka one at some point [11:51:08] and the atskafka stream will ramp up gracefully over time, with more hosts etc.. [11:51:43] going to have lunch, ping me if needed! :) [11:52:26] Will do [12:09:29] 10Analytics-Radar, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10Product-Analytics, 10Structured-Data-Backlog (Current Work): Set up generation of JSON dumps for Wikimedia Commons - https://phabricator.wikimedia.org/T259067 (10ArielGlenn) >>! In T259067#6851921, @Miriam wrote: --snip-- > One quest... [12:34:48] Ok, so I now have tables with the appropriate requests fron vrn and ats. [12:36:45] For the hour of 2021/02/22/01, I have 5746235 requests via Varnish, and 5711633 requests via ATS. The discrepancy is 34602 requests, or 0.6% missing. WHat exactly those requests are, I don't know yet. [12:37:16] (I also don't know if the remainder of requests are all the same) [12:37:34] And now, lunch! [12:48:59] nice! [14:01:44] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` an-test-worker1002.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2... [14:23:10] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) I am testing something on an-test-worker1002, but the next step is to merge the change to enforce uid/gid for Buster nodes and see how the reimages go. Before that, we need to manual... [14:41:56] elukey: 1.2T ./christinedk ? [14:42:28] ottomata: morning! Yes I was about to send an email, Tiziano is also using half a terabyte :( [14:42:39] aye ok cool [14:43:13] but I am also looking elsewhere, we are using 6TB [14:45:11] klausman: o/ [14:45:20] 10Analytics, 10Analytics-Kanban, 10Anti-Harassment, 10Event-Platform, and 2 others: Migrate Anti-Harassment EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T268517 (10Ottomata) [14:45:39] there is a dir on stat1008 called "backup-1007", that IIRC it was used to backup stat1007's data before the reimage [14:45:43] elukey: \o [14:45:55] do I recall correctly? If so, can we drop? [14:46:05] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-worker1002.eqiad.wmnet'] ` and were **ALL** successful. [14:46:32] elukey: what do you mean? [14:46:52] Oh, now I saw [14:46:54] Yes. drop [14:47:02] super thanks! [14:47:42] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [14:48:21] 10Analytics, 10Event-Platform, 10Language-analytics, 10MW-1.36-notes (1.36.0-wmf.27; 2021-01-19), 10Patch-For-Review: UniversalLanguageSelector Event Platform Migration - https://phabricator.wikimedia.org/T267352 (10Ottomata) 05Open→03Resolved a:03Ottomata [14:49:00] ottomata: to follow our guidelines about dropping data - ack to drop /srv/backup-1007 on stat1008? [14:49:59] elukey: what was that for? a stat1007 reimage? [14:50:07] ottomata: SO if I want to use spark-submit to run a pyspark program, what do I need to import to get the actual spark library? In the shell it's imported automagically. [14:50:23] ottomata: yes, that was a backup in case /srv got wiped during reimage, and I forgot to clean it up [14:50:35] +1 elukey wipe away [14:51:39] klausman: i've actually not done that too much, but findspark could help ya, which is available if you use a conda env [14:51:40] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Anaconda [14:51:49] !log drop /srv/backup-1007 on stat1008 to free space [14:51:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:52:00] https://github.com/minrk/findspark [14:52:03] thanks, will have a look-see [14:52:16] findspark.init('/usr/lib/spark2') [14:52:33] here's a real example of it being used for wmf [14:52:33] https://github.com/wikimedia/wmfdata-python/blob/master/wmfdata/spark.py [14:54:09] WOW https://github.com/RadeonOpenCompute/ROCm/issues/1391#issuecomment-784184306 [14:54:12] \o/ [14:55:37] ottomata: /dev/mapper/vg0-srv 7.2T 2.3T 4.5T 34% /srv [14:55:39] :P [14:55:46] I guess that we can avoid the email [14:55:52] ok [14:55:52] I am entirely lost with conda. [14:55:57] klausman: ok [14:56:10] its like a replacement for venv that does a little more [14:56:13] So for one thing I need to sudo to have access to the data [14:56:18] but basically you can think of it as a venv and pip [14:56:22] oh [14:56:24] .... [14:56:28] why? [14:56:37] oh becaseu you are querying webrequest raw? [14:56:42] Because I'm not in the analytics group, which owns the wr files [14:57:12] klausman: yeah bbut [14:57:12] https://phabricator.wikimedia.org/T275396 [14:57:14] that is a mistake [14:57:29] we realized that when you also couldn't read the atskafka stuff either [14:57:32] i chgrped it [14:57:41] we should do it for all the data in raw [14:58:30] ls -l /mnt/hdfs//wmf/data/raw/webrequest/webrequest_text/hourly/2021/02/22/01/|head -n1 [14:58:33] ls: cannot access '/mnt/hdfs//wmf/data/raw/webrequest/webrequest_text/hourly/2021/02/22/01/': Permission denied [14:58:44] right [14:58:48] we should fix that [14:59:17] it should be simple to just chgrp the right stuff, but i don't wantto accidetnally break somoething [14:59:17] so [14:59:28] you should just chgrp analytics-privatedata-users whatever it is you need to query [14:59:34] that is the proper group for that data [14:59:49] Okay, will do so for /mnt/hdfs//wmf/data/raw/webrequest/webrequest_text/hourly/2021/02/22/01/* for now [14:59:56] cool [15:00:05] So the splash damage area is contained to that hour for now [15:00:09] perfect [15:00:57] ottomata: did we chgrp the original camus dirs? Otherwise new files will be created with incorrect perms [15:01:18] exactly [15:01:21] https://phabricator.wikimedia.org/T275396 [15:01:26] we did not elukey [15:01:32] ah I see okok :) [15:01:45] we need to chgrp both /wmf/camus and /wmf/raw [15:01:48] Um. an-launcher1002 does not know about the privatedata group? [15:01:56] hm no? [15:01:59] oh actually that makes sense [15:02:02] chgrp: invalid group: ‘analytics-privatedata-users’ [15:02:07] hm [15:02:15] do it from an-master1001 ? i think? [15:02:44] klausman: what command are you using? [15:03:00] chgrp [15:03:12] I mean if you are using the hdfs dfs etc.. [15:03:23] sudo -u hdfs hdfs dfs -chgrp ... [15:03:34] sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp etc.. [15:03:51] But what path? there is no /mnt/hdfs on the master [15:04:01] no that it is the fuse mountpoint [15:04:07] just remove the /mnt/hdfs part [15:04:08] the rest is the pat [15:04:10] exactly yes [15:04:46] also please !log actions on hdfs [15:04:51] klausman: --^ [15:04:55] so we can keep track of them [15:04:59] 10Analytics, 10Event-Platform, 10MW-1.36-notes (1.36.0-wmf.28; 2021-01-26), 10Patch-For-Review: QuickSurveyInitiation Event Platform Migration - https://phabricator.wikimedia.org/T271165 (10Ottomata) [15:05:08] !log an-master1001 ~ $ sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp analytics-privatedata-users /wmf/data/raw/webrequest/webrequest_text/hourly/2021/02/22/01/webrequest* [15:05:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:05:14] (same thing for other people :P :P :P) [15:05:34] Question now is: what machine should I use spark-submit and anaconda on? [15:05:40] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Ottomata) [15:06:07] 10Analytics, 10Event-Platform, 10Patch-For-Review: QuickSurveysResponses Event Platform Migration - https://phabricator.wikimedia.org/T271166 (10Ottomata) [15:06:14] 10Analytics, 10Event-Platform, 10Patch-For-Review: QuickSurveysResponses Event Platform Migration - https://phabricator.wikimedia.org/T271166 (10Ottomata) 05Open→03Resolved [15:06:17] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:06:23] 10Analytics, 10Event-Platform, 10MW-1.36-notes (1.36.0-wmf.28; 2021-01-26), 10Patch-For-Review: QuickSurveyInitiation Event Platform Migration - https://phabricator.wikimedia.org/T271165 (10Ottomata) 05Open→03Resolved [15:06:24] klausman: if all data is owned by analytics-privatedata-users it is ok any stat100x [15:06:25] I mean, since an-launcher doesn't know the POSIX group, should I use a hdfs:// path instead of /mnt/hdfs? Or move to a different machine? [15:06:25] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:06:33] klausman: yeah use a stat box [15:06:38] Alright, moving to a statbox [15:07:01] klausman: the /mnt/hdfs is mountpoint is really just a convenience [15:07:11] for easier browsing of files [15:07:22] haha, really mostly for tab complete [15:07:48] hello teammmm [15:07:52] so klausman the only reason I suggested conda is that findspark is installed and that makes it a little easier to instantiate a SparkSession [15:08:17] so on a stat box you should just do [15:08:18] conda-create-stacked [15:08:22] source conda-activate-stacked [15:08:32] then you can run python scripts that import findspark, etc. [15:08:48] actually if you dno't plan on installing any new packages [15:08:56] you can skip the conda-create-stacked bit [15:09:04] source /usr/lib/anaconda-wmf/bin/activate [15:09:19] that'll just use the readonly base anaconda env, which has findspark [15:09:56] and the pyspark it finds has no sparkCOntext? [15:10:30] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Ottomata) 05Open→03Resolved [15:10:32] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:10:34] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:10:51] klausman yes findspark just helps put the spark bits in the PYTHONPATH [15:11:10] example: https://github.com/wikimedia/wmfdata-python/blob/master/wmfdata/spark.py [15:11:16] after findspark [15:11:16] from pyspark.sql import SparkSession [15:11:18] Yes, but even after importing pyspark, there is no pyspark.sparkContext [15:11:24] then [15:11:39] https://github.com/wikimedia/wmfdata-python/blob/master/wmfdata/spark.py#L82-L98 [15:12:26] https://spark.apache.org/docs/latest/api/python/pyspark.sql.html [15:20:33] Ok, making a session and using that, but it seems to just hang [15:20:53] I copied the get_custom_session() function and am using it [15:21:51] ottomata: firewall change deployed for mw api, if you want to test it [15:22:52] !log deploy new uid/gid scheme for yarn/mapred/analytics/hdfs/druid on an-airflow1001, an-test* buster nodes [15:22:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:23:00] (03CR) 10Awight: [C: 03+1] "Ready to deploy." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/659227 (https://phabricator.wikimedia.org/T272569) (owner: 10Andrew-WMDE) [15:23:57] !log deploy new uid/gid scheme for yarn/mapred/analytics/hdfs/druid on an-tool100[8,9] [15:24:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:24:33] -- [15:24:59] If nobody opposes I am going to stop all timers etc.. on an-launcher1002, to apply the new uid/gid changes and reboot for new kernel [15:28:35] !log stop timers on an-launcher1002 to change gid/uid for yarn/hdfs/mapred/analytics/druid and to reboot for kernel updates [15:28:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:29:25] joal: if you are around - can I kill your process on an-launcher1002? [15:29:26] klausman: it just hangs? [15:29:36] (thanks elukey ! ) [15:32:58] ottomata: there is a refine job called otto_test_refine_eventlogging_otto_2 on an-launcher1002, can it be killed? [15:39:35] ok I assume yes since it was started at the beginning of feb [15:42:00] yes [15:42:09] (03PS2) 10Awight: [WIP] Update event bucketing for visualeditor events [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657635 (https://phabricator.wikimedia.org/T273474) [15:42:18] 10Analytics-EventLogging, 10Analytics-Radar, 10Better Use Of Data, 10Event-Platform, and 3 others: OperationError: The operation failed for an operation-specific reason in generateRandomSessionId - https://phabricator.wikimedia.org/T263041 (10Mholloway) The behavior of getRandomValues is specified here: h... [15:42:47] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Adjust edit count bucketing for VisualEditor, segment all metrics - https://phabricator.wikimedia.org/T273474 (10awight) [15:45:49] 10Analytics, 10Analytics-Kanban, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: remove deletion timers for Growth's sanitized EL tables - https://phabricator.wikimedia.org/T274297 (10mforns) I created 2 patches to solve this, they are +2'd, we're waiting for deployment. By mistake I assign... [15:53:33] (03PS3) 10Awight: [WIP] Update event bucketing for visualeditor events [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657635 (https://phabricator.wikimedia.org/T273474) [15:55:16] 10Analytics-Radar, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: update welcome survey aggregation schedule - https://phabricator.wikimedia.org/T275172 (10mforns) Oh, we groomed this task and assigned it to me by mistake, thanks for fixing :] [15:56:21] 10Analytics-Radar, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: shorten welcome survey retention to 90 days - https://phabricator.wikimedia.org/T275171 (10mforns) Same, sorry for the misunderstanding. [15:57:50] !log an-launcher1002's timers restored [15:57:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:58:42] elukey: please do! I evewn didn't know there was one [15:58:52] joal: yep done! [15:58:57] Thanks :) [15:59:00] sorry for the ping :) [15:59:02] * joal gone again :) [15:59:29] (03PS4) 10Awight: [WIP] Update event bucketing for visualeditor events [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657635 (https://phabricator.wikimedia.org/T273474) [16:01:49] (03PS5) 10Awight: Update event bucketing for visualeditor events [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657635 (https://phabricator.wikimedia.org/T273474) [16:02:27] 10Analytics, 10Event-Platform, 10Research, 10Patch-For-Review: TranslationRecommendation* Schemas Event Platform Migration - https://phabricator.wikimedia.org/T271163 (10Ottomata) [16:03:21] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10Patch-For-Review, and 3 others: Adjust edit count bucketing for VisualEditor, segment all metrics - https://phabricator.wikimedia.org/T273474 (10awight) Was blocking a sprint task, so I've pulled in and finished t... [16:03:38] (03CR) 10Awight: "Smoke tests okay." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657635 (https://phabricator.wikimedia.org/T273474) (owner: 10Awight) [16:04:33] 10Analytics, 10SRE, 10ops-eqiad: Degraded RAID on an-worker1097 - https://phabricator.wikimedia.org/T274819 (10Cmjohnson) Created a Dell ticket for a new disk SR1052417309 [16:05:11] the test cluster is temporary down, my bad, fixing :) [16:08:23] 10Analytics, 10Event-Platform, 10Research, 10Patch-For-Review: TranslationRecommendation* Schemas Event Platform Migration - https://phabricator.wikimedia.org/T271163 (10Ottomata) [16:09:44] razzi: ping? :) [16:10:07] 10Analytics, 10Event-Platform, 10Research, 10Patch-For-Review: TranslationRecommendation* Schemas Event Platform Migration - https://phabricator.wikimedia.org/T271163 (10Ottomata) @bmansurov you should be able to produce theses events now using your code. This should work in both beta and production. See... [16:10:17] 10Analytics, 10SRE, 10ops-eqiad: an-worker1112 reports I/O errors for a disk - https://phabricator.wikimedia.org/T274981 (10Cmjohnson) The disk is showing healthy and does not have a predictive failure rate. Can you try writing to the disk, I really need it to fail or start failing to ask for a new disk fro... [16:10:33] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [16:11:27] 10Analytics, 10Editing-team, 10Event-Platform: EditAttemptStep Event Platform Migration - https://phabricator.wikimedia.org/T267343 (10Ottomata) I'm delaying this migration until later in the list; we've recently lot an addition of slightly easier schemas to migrate, so we will do those first. [16:11:30] 10Analytics, 10Editing-team, 10Event-Platform: VisualEditorFeatureUse Event Platform Migration - https://phabricator.wikimedia.org/T267353 (10Ottomata) I'm delaying this migration until later in the list; we've recently lot an addition of slightly easier schemas to migrate, so we will do those first. [16:12:04] awight: o/ yt? [16:15:25] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Merging :]" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/656210 (https://phabricator.wikimedia.org/T273471) (owner: 10Awight) [16:17:32] ottomata: hi! [16:17:49] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [16:17:54] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Merging :]" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657634 (https://phabricator.wikimedia.org/T273475) (owner: 10Awight) [16:17:58] hello! i finally have time to work on el migration for a bit! [16:18:03] since you've already got all these schema patches [16:18:05] let's do yours [16:18:06] q: [16:18:14] \o/ [16:18:22] are there any of these that do not use EventLogging extension to send the events? [16:18:23] e.g. [16:18:34] No custom client code. [16:18:38] ok great, [16:18:40] are they all JS? [16:18:42] or are some PHP? [16:19:14] We do have some PHP client code. How should I proceed? [16:19:30] they use EventLogging::logEvent, right? [16:19:46] the process should be the same, we just need to do our testwiki different [16:19:55] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Merging" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/666203 (owner: 10Awight) [16:19:55] unless you know how to actually trigger each of these events via instrumentation [16:20:02] i usually manually send an event by calling eventloggigng [16:20:04] to test [16:20:54] ottomata: kk yes I can cause events when needed [16:21:13] ok awesme [16:21:34] I'll just... try to make a list of which schemas come from the backend. [16:21:38] i'm going to do the first steps now to get everythign set on testwiki, including merging your schema patches and edit protecting the old ones on metawiki [16:21:44] oh that would be awesome awight [16:21:48] in the audit spreadsheet [16:21:57] ah kk [16:22:00] there is a column for Client Software [16:22:15] https://docs.google.com/spreadsheets/d/1WXbGPyuu2S6TYvrb-DvWWmrEx_K7TJ5rYPkjhvgWjoI/edit#gid=1715982822 [16:26:02] (03PS3) 10Ottomata: Add TwoColConflictExit schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664803 (https://phabricator.wikimedia.org/T275014) (owner: 10Awight) [16:26:50] (03PS3) 10Ottomata: Add TwoColConflictConflict schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664802 (https://phabricator.wikimedia.org/T275013) (owner: 10Awight) [16:27:24] (03PS3) 10Ottomata: Add ReferencePreviewsPopups schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664798 (https://phabricator.wikimedia.org/T275009) (owner: 10Awight) [16:27:48] (03PS4) 10Ottomata: Add ReferencePreviewsBaseline schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664795 (https://phabricator.wikimedia.org/T275007) (owner: 10Awight) [16:28:22] (03PS4) 10Ottomata: Add ReferencePreviewsCite schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664796 (https://phabricator.wikimedia.org/T275008) (owner: 10Awight) [16:28:28] milimetric: I'd like to include this in our deployment train today https://gerrit.wikimedia.org/r/c/analytics/refinery/+/659306/, Jo-seph asked for some comments, and I modified the code accordingly (for one of them, reasoned the other one), could you please review and merge if appropriate? :] [16:28:42] (03PS2) 10Ottomata: Add CodeMirrorUsage schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664792 (https://phabricator.wikimedia.org/T275005) (owner: 10Awight) [16:28:46] will do mforns [16:28:56] thanks milimetric! [16:29:19] hello folks! [16:29:30] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure, and 2 others: prefUpdate schema contains multiple identical events for the same preference update - https://phabricator.wikimedia.org/T218835 (10phuedx) a:05phuedx→03polishdeveloper [16:29:32] who is doing the train? Wikistats 2 is still to deploy [16:29:44] milimetric: btw, let me know if you want to pair on deployment train, or any other ops things these days [16:29:52] (03PS4) 10Ottomata: Add TemplateDataEditor schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664801 (https://phabricator.wikimedia.org/T275012) (owner: 10Awight) [16:29:52] ottomata: Updated in the doc. It was just two schemas, TemplateDataApi and TwoColConflictConflict. [16:29:59] great thank you [16:30:03] hi elukey :] [16:30:14] (03PS4) 10Milimetric: Factor out traffic anomaly countries into a Hive table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/659306 (https://phabricator.wikimedia.org/T272052) (owner: 10Mforns) [16:30:16] o/ [16:30:19] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Factor out traffic anomaly countries into a Hive table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/659306 (https://phabricator.wikimedia.org/T272052) (owner: 10Mforns) [16:30:31] thanks milimetric :] [16:30:38] I'll update the etherpad [16:30:42] thanks! [16:31:13] no worries, I can take care of it, today's my last day of cleanup tasks before I lock myself in a dungeon and come out only when I have a gobblin [16:32:12] O_O [16:32:15] (03CR) 10Ottomata: [C: 03+2] Add TemplateDataApi schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664799 (https://phabricator.wikimedia.org/T275011) (owner: 10Awight) [16:36:02] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Update event bucketing for visualeditor events [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/657635 (https://phabricator.wikimedia.org/T273474) (owner: 10Awight) [16:36:26] (03PS1) 10Ottomata: Rematerialize templatedataapi with numeric bounds [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/666399 (https://phabricator.wikimedia.org/T275011) [16:37:39] (03CR) 10Ottomata: [C: 03+2] Rematerialize templatedataapi with numeric bounds [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/666399 (https://phabricator.wikimedia.org/T275011) (owner: 10Ottomata) [16:39:06] 10Analytics, 10WMDE-Analytics-Engineering, 10Patch-For-Review, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 - https://phabricator.wikimedia.org/T274866 (10elukey) @GoranSMilovanovic I applied a more permanent fix that seems working from my tests, please let me know if it doesn't when... [16:39:21] (03PS5) 10Ottomata: Add TemplateDataEditor schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664801 (https://phabricator.wikimedia.org/T275012) (owner: 10Awight) [16:39:53] * elukey bbiab [16:41:02] (03CR) 10Ottomata: [C: 03+2] Add TemplateDataEditor schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664801 (https://phabricator.wikimedia.org/T275012) (owner: 10Awight) [16:41:15] (03CR) 10Ottomata: [C: 03+2] Add VisualEditorTemplateDialogUse schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664804 (owner: 10Awight) [16:41:45] (03CR) 10Ottomata: [C: 03+2] Add CodeMirrorUsage schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664792 (https://phabricator.wikimedia.org/T275005) (owner: 10Awight) [16:43:13] 10Analytics-Radar, 10Growth-Scaling, 10Growth-Team, 10Product-Analytics: Growth: update welcome survey aggregation schedule - https://phabricator.wikimedia.org/T275172 (10nettrom_WMF) @mforns : Sure thing! Sorry about the confusion, once I noticed that the parent task was set to the one that I created to s... [16:43:25] (03CR) 10Ottomata: [C: 03+2] Add ReferencePreviewsCite schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664796 (https://phabricator.wikimedia.org/T275008) (owner: 10Awight) [16:44:10] (03CR) 10Ottomata: [C: 03+2] Add ReferencePreviewsBaseline schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664795 (https://phabricator.wikimedia.org/T275007) (owner: 10Awight) [16:45:27] Whiel it doesn't break anything for me, is this normal: https://phabricator.wikimedia.org/P14450 ? [16:46:04] 10Analytics-Data-Quality, 10VisualEditor, 10WMDE-TechWish: Investigate missing dialog close events - https://phabricator.wikimedia.org/T272020 (10ppelberg) @awight – we're glad you flagged this. Do you have a sense for when you're going to start an analysis that will depend on these events? //We're trying t... [16:46:36] 10Analytics-Data-Quality, 10VisualEditor, 10WMDE-TechWish, 10Editing-team (Tracking): Investigate missing dialog close events - https://phabricator.wikimedia.org/T272020 (10ppelberg) [16:48:34] (03PS1) 10Ottomata: Rematerialize searchsatisfaction/1.4.0 for enforced numeric bounds and for examples $id [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/666406 (https://phabricator.wikimedia.org/T272991) [16:48:45] (03PS2) 10Ottomata: Rematerialize searchsatisfaction/1.4.0 for enforced numeric bounds and for examples $id [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/666406 (https://phabricator.wikimedia.org/T272991) [16:49:59] 10Analytics-Data-Quality, 10VisualEditor, 10WMDE-TechWish, 10Editing-team (Tracking): Investigate missing dialog close events - https://phabricator.wikimedia.org/T272020 (10awight) >>! In T272020#6853288, @ppelberg wrote: > @awight – we're glad you flagged this. Do you have a sense for when you're going to... [16:50:06] a-team desiree scheduled a meeting about the el migration at the same time as standup and staff, so if you are ok I will go to that. I think mforns may also be joining, not sure [16:50:25] (03CR) 10Ottomata: [C: 03+2] Rematerialize searchsatisfaction/1.4.0 for enforced numeric bounds and for examples $id [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/666406 (https://phabricator.wikimedia.org/T272991) (owner: 10Ottomata) [16:50:30] ottomata: sounds good! [16:51:27] (03PS4) 10Ottomata: Add ReferencePreviewsPopups schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664798 (https://phabricator.wikimedia.org/T275009) (owner: 10Awight) [16:52:00] (03PS4) 10Ottomata: Add TwoColConflictConflict schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664802 (https://phabricator.wikimedia.org/T275013) (owner: 10Awight) [16:53:00] awight: in twoocolconflictexit [16:53:05] you have examples start_time_ts_ms: 20210101003013000 [16:53:08] is this intended? [16:53:21] that looks like a datetime sttring, not an integer ms [16:53:33] that is not a unix epoch [16:55:03] that number is greater than JS Number.MAX_SAFE_INTEGER [16:55:21] uh oh [16:56:36] I botched that one terribly, then. Will have to create a new field with the correct format. [16:56:53] i think we can make it work, just have to set an explicit maximum greater than max safe interger [16:57:01] it looks like it is currently working [16:57:05] i see those values in hive [16:57:14] for now i'll just add a larger maximum to the schema [16:57:44] I would love to conform with the guidelines, but yeah the bad data is already out there, and doesn't affect our queries (we only use that column to join with other broken schemas :-) [16:58:26] yes, ottomata and a-team, I will also miss standup because of migration meeting [16:59:09] ^ just checked, the joined table is TwoColConflictConflict but that schema's *Time fields are just MW timestamps with 1s units. [17:00:36] * ottomata hm tests don't like it AssertionError [ERR_ASSERTION]: field start_time_ts_ms has a maximum value higher than enforcedNumericBounds minimum 9007199254740991 [17:00:51] well, its a robustness test which are skipped for legacy schemas [17:00:58] ¯\_(ツ)_/¯ [17:01:43] (03PS4) 10Ottomata: Add TwoColConflictExit schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664803 (https://phabricator.wikimedia.org/T275014) (owner: 10Awight) [17:01:48] (03CR) 10Ottomata: [C: 03+2] Add ReferencePreviewsPopups schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664798 (https://phabricator.wikimedia.org/T275009) (owner: 10Awight) [17:03:25] ottomata: What's the migration path away from an illegal field like that? We can never remove it? [17:05:47] 10Analytics, 10SRE, 10ops-eqiad: Degraded RAID on an-worker1097 - https://phabricator.wikimedia.org/T274819 (10Cmjohnson) a:03Cmjohnson [17:06:05] 10Analytics, 10SRE, 10ops-eqiad: an-worker1112 reports I/O errors for a disk - https://phabricator.wikimedia.org/T274981 (10Cmjohnson) a:03Cmjohnson [17:06:16] We can also not worry about it :-D [17:16:27] (03CR) 10Neil P. Quinn-WMF: [C: 03+1] "Dan, are you investigating, or should I go ahead and merge this?" (032 comments) [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/665406 (owner: 10Milimetric) [17:47:08] !log change uid/gid for yarn/mapred/analytics/hdfs/druid on stat100x, an-presto100x [17:47:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:47:31] * elukey bbiab [17:50:17] 10Analytics-Radar, 10wmfdata-python, 10Epic, 10Product-Analytics (Kanban): Analysts cannot reliably use wmfdata to run SQL queries against Hive databases - https://phabricator.wikimedia.org/T245891 (10nshahquinn-wmf) [17:58:54] 10Analytics, 10Editing-team, 10Event-Platform: EditAttemptStep Event Platform Migration - https://phabricator.wikimedia.org/T267343 (10ppelberg) >>! In T267343#6853148, @Ottomata wrote: > I'm delaying this migration until later in the list; we've recently lot an addition of slightly easier schemas to migrate... [18:11:17] awight: yeah you can't really remove it [18:11:57] i had some thoughts maybe recently of how field removoal MAYYBE could safely be done, but it is tricky, because others might later be tempted to re-add the field wtih a different datatype, which will defintely break things [18:13:43] (03CR) 10Ottomata: [C: 03+2] Add TwoColConflictConflict schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664802 (https://phabricator.wikimedia.org/T275013) (owner: 10Awight) [18:13:55] (03CR) 10Ottomata: [C: 03+2] Add TwoColConflictExit schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/664803 (https://phabricator.wikimedia.org/T275014) (owner: 10Awight) [18:17:02] awight: do any of these schemas need client_ip and/or geocoded_data? [18:20:25] ottomata: Not required, ty [18:20:36] gr8 [18:31:58] 10Analytics, 10SRE, 10ops-eqiad: an-worker1112 reports I/O errors for a disk - https://phabricator.wikimedia.org/T274981 (10elukey) Ok so I am doing the following in a tmux session: `sudo dd if=/dev/random of=/dev/sdl bs=64k` [18:57:19] 10Analytics, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 - https://phabricator.wikimedia.org/T274866 (10GoranSMilovanovic) @elukey Do you mind if we wait for the next regular update of our Wikidata Analytics and see then? It takes place in less than a week. [18:59:54] 10Analytics, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 - https://phabricator.wikimedia.org/T274866 (10elukey) Yep no problem! [19:31:02] !log roll out new uid/gid for mapred/druid/analytics/yarn/hdfs for all buster nodes (no op for stretch) [19:31:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:35:12] milimetric: sorry omw [19:38:54] awight: ok should be migrated on testwiki! [19:39:22] razzi/ottomata: just completed the uid/gid changes for all the buster nodes, no op on stretch.. tomorrow I'll do druid and then we should be good to reimage! [19:39:28] (the worker nodes I mean) [19:39:36] let me know later on if you see anything weird [19:40:37] * elukey afk! [19:40:44] elukey: nice! [19:40:48] ok [19:48:38] awight: you shouldl be able to trigger the events for these on testwiki and have them produced via eventgate, etc. [19:49:50] 10Analytics, 10Analytics-EventLogging, 10Community-Tech, 10Event-Platform, and 3 others: CodeMirrorUsage Event Platform Migration - https://phabricator.wikimedia.org/T275005 (10Ottomata) [19:49:57] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: ReferencePreviewsBaseline Event Platform Migration - https://phabricator.wikimedia.org/T275007 (10Ottomata) [19:50:02] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: ReferencePreviewsCite Event Platform Migration - https://phabricator.wikimedia.org/T275008 (10Ottomata) [19:50:08] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: ReferencePreviewsPopups Event Platform Migration - https://phabricator.wikimedia.org/T275009 (10Ottomata) [19:50:13] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TemplateDataApi Event Platform Migration - https://phabricator.wikimedia.org/T275011 (10Ottomata) [19:50:18] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TemplateDataEditor Event Platform Migration - https://phabricator.wikimedia.org/T275012 (10Ottomata) [19:50:24] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TwoColConflictConflict Event Platform Migration - https://phabricator.wikimedia.org/T275013 (10Ottomata) [19:50:27] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TwoColConflictExit Event Platform Migration - https://phabricator.wikimedia.org/T275014 (10Ottomata) [19:50:30] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: VisualEditorTemplateDialogUse Event Platform Migration - https://phabricator.wikimedia.org/T275015 (10Ottomata) [19:52:27] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [19:55:32] 10Analytics: Kubeflow on stat machines - https://phabricator.wikimedia.org/T275551 (10fkaelin) [19:55:46] hi razzi :), do you think https://gerrit.wikimedia.org/r/c/operations/puppet/+/665326 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/665328 can be merged? [19:56:10] mforns: yeah, thanks for the reminder! [19:57:08] :D thanks! [19:58:05] 10Analytics, 10Analytics-Kanban, 10Anti-Harassment, 10Event-Platform, and 2 others: Migrate Anti-Harassment EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T268517 (10Ottomata) [19:59:07] razzi: FYI i just ran pupet on an-launcher1002 and i see those purge dropts being removeod [20:03:21] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:25:14] ottomata: Where do I monitor eventgate? The same kafkacat command as for legacy logging, and I just look at the metadata to see that it's using the new schema version? [20:35:12] (nvm, I see instructions in the task) [20:36:04] yup that would work [20:36:08] or also awight [20:36:15] https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#In_production [20:36:20] An internal only EventStreams instance exists in production [20:36:24] so you can get a GUI too [20:41:37] ottomata: I see my events are POSTing to intake-analytics.wmo, but no sign of them in Kafka nor validation errors in logstash. I'll try patience. [20:41:54] (03PS1) 10Ottomata: Fix examples for twocolconflictconflict [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/666436 (https://phabricator.wikimedia.org/T275013) [20:42:18] awight: interesting which events are you testing? [20:42:36] (03CR) 10Awight: [C: 03+2] "Thanks!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/666436 (https://phabricator.wikimedia.org/T275013) (owner: 10Ottomata) [20:43:02] ottomata: Just CodeMirrorUsage, so far. I've sent directly from the console, and triggered through the app workflow. [20:43:15] hm [20:43:21] that is a JS one? [20:43:34] can you paste me your example event? [20:44:07] mw.eventLog.logEvent('CodeMirrorUsage', {editor: "wikitext-2017", enabled: true, toggled: false, session_token: "1234", user_id: 0, edit_start_ms: 1585854680000}) [20:46:21] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:48:56] awight: here's the validation error [20:48:56] https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-2021.02.23?id=RkWl0HcBjr5R1RLCiZZf [20:48:59] trying to find out why [20:49:19] OHHH same issue as the other one [20:49:37] I... misspelled the field, anyway [20:50:07] oh yes [20:50:09] hm [20:50:47] yeah we need to update the maximum [20:51:04] doing [20:51:44] awight: are there others we missed [20:51:50] btw, for non legacy schemas, this would be caught by CI [20:51:53] ookay thanks, with the spelling corrected I see events in Kafka. It's one of those errors :-) [20:53:17] (03PS1) 10Ottomata: Override maximum for edit_start_ts_ms in codemirrorusage [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/666445 (https://phabricator.wikimedia.org/T275005) [20:53:43] i thnk i'll have to restart eventgate to pick that up and clear schema caches. it will have cached the previous 1.0.0 schema with the smaller maximum [20:54:30] 10Analytics, 10Analytics-EventLogging, 10Community-Tech, 10Event-Platform, and 3 others: CodeMirrorUsage Event Platform Migration - https://phabricator.wikimedia.org/T275005 (10awight) Events successfully received from testwiki. [20:55:44] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TemplateDataApi Event Platform Migration - https://phabricator.wikimedia.org/T275011 (10awight) Events successfully received from testwiki. [20:56:13] 10Analytics, 10Event-Platform, 10WMDE-TechWish: ReferencePreviewsBaseline Event Platform Migration - https://phabricator.wikimedia.org/T275007 (10awight) Events successfully received from testwiki. [20:57:29] 10Analytics, 10Event-Platform, 10WMDE-TechWish: ReferencePreviewsCite Event Platform Migration - https://phabricator.wikimedia.org/T275008 (10awight) Events successfully received from testwiki. [20:57:35] (03PS2) 10Ottomata: Override maximum for edit_start_ts_ms in codemirrorusage and in twocolconflictexit [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/666445 (https://phabricator.wikimedia.org/T275005) [20:58:38] (03CR) 10Ottomata: [C: 03+2] Override maximum for edit_start_ts_ms in codemirrorusage and in twocolconflictexit [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/666445 (https://phabricator.wikimedia.org/T275005) (owner: 10Ottomata) [20:59:57] 10Analytics, 10Event-Platform, 10WMDE-TechWish: ReferencePreviewsPopups Event Platform Migration - https://phabricator.wikimedia.org/T275009 (10awight) Events successfully received from testwiki. [21:06:45] PROBLEM - Hadoop NodeManager on an-worker1112 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [21:06:58] ottomata: I'm looking at the right logstash dashboard now. But TemplateDataEditor events don't seem to land yet. Here's the example event, mw.eventLog.logEvent('TemplateDataEditor', {action: 'dialog-open-edit', page_id: 123, page_namespace: 1, page_title: 'Template:Test_page', rev_id: 456, user_edit_count: 3, user_edit_count_bucket: '1-4 edits', user_id: 1000}) [21:10:19] ottomata: There's a typo in the config, I've commented on the patch. [21:12:04] ahhh good catch thank you, am in meeting now, feel free to submit fix i can deplyo later! [21:14:37] 10Analytics-Data-Quality, 10VisualEditor, 10WMDE-TechWish, 10Editing-team (Tracking): Investigate missing dialog close events - https://phabricator.wikimedia.org/T272020 (10matmarex) This seems to affect not just the transclusion dialog, but all of the dialogs. Query for !log rebalance kafka partitions for webrequest_upload partition 2 [21:15:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:15:45] i beat you to a patch :) [21:15:56] oof so you did! [21:16:04] razzi I looked at the NodeManager alert, as indicated in https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#Yarn_Nodemanager_process, but could not find any OOMs [21:16:18] not sure what to do there... [21:16:52] it says ping an SRE :] [21:17:04] :) [21:17:26] mforns: https://phabricator.wikimedia.org/T274981 [21:18:05] aaaahhhh, sorry for the ping then razzi O.o [21:18:07] hmm [21:18:34] no worries mforns [21:19:17] 10Analytics, 10Event-Platform, 10WMDE-TechWish: VisualEditorTemplateDialogUse Event Platform Migration - https://phabricator.wikimedia.org/T275015 (10awight) Events successfully received from testwiki. [21:20:01] RECOVERY - Hadoop NodeManager on an-worker1112 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [21:20:04] !log started nodemanager on an-worker1112 [21:20:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:20:13] 10Analytics-Data-Quality, 10VisualEditor, 10WMDE-TechWish, 10Editing-team (Tracking): Investigate missing dialog close events - https://phabricator.wikimedia.org/T272020 (10matmarex) One possible explanation would be that 17% of users who open the transclusion dialog immediately recoil in horror and close... [21:20:26] 10Analytics-Data-Quality, 10VisualEditor, 10WMDE-TechWish, 10Editing-team (Tracking): Investigate missing dialog close events - https://phabricator.wikimedia.org/T272020 (10DLynch) Getting out without a `dialog-whatever` event requires that the dialog be closed in such a way that the `getTeardownProcess` m... [21:24:19] 10Analytics-Data-Quality, 10VisualEditor, 10WMDE-TechWish, 10Editing-team (Tracking): Investigate missing dialog close events - https://phabricator.wikimedia.org/T272020 (10DLynch) > I think we should try to piece together complete sessions from these events, and see what events (if any) happen after the "... [21:28:52] Also no traffic from this schema, mw.eventLog.logEvent('TwoColConflictExit', {action: 'save', start_time_ts_ms: 20210101003013000, page_namespace: 1, page_title: 'Test Page', base_rev_id: 1000, latest_rev_id: 2000, selections: "v1:c|o>y|c|o?y|c", session_token: '1234567890ABCDEF'}) [21:40:37] heh awight more typos maybe? [21:40:38] :p [21:40:48] i looked over that big config change like 3 times and didn't catch them [21:44:02] awight: i get TwoColConflictExit events [21:58:15] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Rematerialize all event schemas with enforceNumericBounds - https://phabricator.wikimedia.org/T273069 (10Ottomata) [22:01:31] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TemplateDataEditor Event Platform Migration - https://phabricator.wikimedia.org/T275012 (10awight) Events successfully received from testwiki. [22:01:43] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TwoColConflictExit Event Platform Migration - https://phabricator.wikimedia.org/T275014 (10awight) Events successfully received from testwiki. [22:04:59] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TemplateDataEditor Event Platform Migration - https://phabricator.wikimedia.org/T275012 (10awight) Events successfully received from testwiki. [22:12:07] 10Analytics, 10Event-Platform, 10WMDE-TechWish: TwoColConflictConflict Event Platform Migration - https://phabricator.wikimedia.org/T275013 (10awight) Events successfully received from testwiki. [22:12:40] ottomata: I've seen events land in all topics now, great work! [22:16:02] ottomata: Is there a sort order in the event migration doc? Maybe I should group the schemas my team "owns"? [22:38:21] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10Patch-For-Review, and 2 others: Adjust edit count bucketing for CodeMirror - https://phabricator.wikimedia.org/T273471 (10awight) All new segmentation is broken in various ways: * New preferences are not written t... [22:43:10] 10Analytics: Add superset-next.wikimedia.org domain for superset staging - https://phabricator.wikimedia.org/T275575 (10razzi) [22:45:33] 10Analytics, 10Product-Data-Infrastructure, 10Wikimedia-Logstash, 10observability: Create a separate logstash ElasticSearch index for schemaed events - https://phabricator.wikimedia.org/T265938 (10colewhite) [22:46:40] 10Analytics, 10Product-Data-Infrastructure, 10Wikimedia-Logstash, 10observability: Create a separate logstash ElasticSearch index for schemaed events - https://phabricator.wikimedia.org/T265938 (10colewhite) 05Open→03Resolved Please feel free to reach out if more information is needed. [23:12:08] 10Analytics-Data-Quality, 10VisualEditor, 10WMDE-TechWish, 10Editing-team (Tracking): Investigate missing dialog close events - https://phabricator.wikimedia.org/T272020 (10matmarex) Almost all of the missing close events happen on mobile. `lang=sql select *, case when opened != 0 then 1.000*(opene... [23:36:05] 10Analytics-Data-Quality, 10VisualEditor, 10WMDE-TechWish, 10Editing-team (Tracking): Investigate missing dialog close events - https://phabricator.wikimedia.org/T272020 (10DLynch) Mobile is the most likely situation for us not getting that final `abort` if they abandon the tab -- if they leave the browser...