[07:55:33] good morning :) [07:55:45] o/ [07:55:56] elukey: as usual - A timer failed, I need your help :) [07:56:38] elukey: let me do it please, I'll ask questions and update docs :) [07:56:58] So elukey, I looked at logs, and kinda understand the error [07:57:25] elukey: Now I'd like to manually start a run (ensure it works), then reset alert [07:57:41] resetting alert, I have it: sudo systemctl reset-failed [07:58:06] About restarting, I think I have it as well: sudo systemctl start [07:58:24] Now my question is: If I do start, do I need to do reset-failed? [08:00:45] joal: all correct, if you start and the unit ends up in a correct return code then icinga will be pleased :) [08:00:55] it basically checks the last execution [08:01:00] if it is zero, all good [08:01:12] so a successful run makes the alert go away [08:01:30] reset-failed is more when we want to manually clear the alert even if the last execution was broken [08:02:25] ack elukey - So that means there are two ways of resetting a failed job - One is rerunning the job manually (assuming it doesn't fail), the second is using the reset-failed flag (probably usefull if the job to be run needs to be run only at timers-time [08:03:02] elukey: if that sentence --^ is correct, I'll add an ops-precedure in the https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers page :) [08:05:30] joal: yep! [08:06:25] Thanks again elukey - I think having written some docs will help me not asking again :) [08:07:02] I am happy that timers are less confusing now :) [08:07:28] elukey: less and less everytime I need to fix one :) [08:08:42] (03CR) 10Joal: Add hdfs-rsync script based on Hdfs python lib (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550536 (https://phabricator.wikimedia.org/T234229) (owner: 10Joal) [08:09:10] (03PS2) 10Joal: Add hdfs-rsync script based on Hdfs python lib [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550536 (https://phabricator.wikimedia.org/T234229) [08:09:21] joal: it is also true that they are way less confusing then crons! [08:10:06] elukey: I wouldn't say less confusing - cron is a lot simpler, so easier to undersntad (but also providing a lot less features) [08:10:55] joal: I find looking a crontab vs systemctl list-timers way different :D [08:11:09] of course yes :_ [08:11:23] also checking return codes errors [08:11:33] it is true that you need to be used to systemd units [08:33:53] joal: hi! as you probably have seen I started today's backfilling late too, but now I'm sure I've fixed the problem [08:34:15] basically, the cron doesn't know about the $OOZIE_URL, and that's why it was freaking out [08:35:54] fdans: sorry didn't follow before, you are right we have /etc/profile.d/oozie.sh, executed when you log in [08:36:32] elukey: it's all good, I specified the path explicitly, and now everything should be peachy [08:37:09] indeed fdans - on that regard, it's also true when we sudo - We need to explicitely add '--oozie $OOZIE_URL' to all our restarting commands [08:37:59] !log initiating bacfilling of daily top mediarequests from the mediacounts database - May 2018 to May 2019 [08:38:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:38:22] *mediactounts dataset but whatever [08:47:50] !log starting hdfs-cleaner manually after after failure earlier this night [08:47:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:55:16] RECOVERY - Check the last execution of hdfs-cleaner on an-coord1001 is OK: OK: Status of the systemd unit hdfs-cleaner https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:55:41] okey :) [08:55:50] We need to make hdfs-cleaner more resilient :) [09:03:02] :( [09:22:48] elukey: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers#Failed_service_procedure_(ops_week) [09:23:54] elukey: correcting formatting (sorry) [09:24:33] better :) [09:26:17] nice! [09:26:50] joal: I just filed https://github.com/elukey/snakebite-py3/commit/242c19432d1613fe6d3c1a4bd427e996b49484fd [09:26:57] to internet archive as pull request [09:27:07] seems working in both non encrypted and encrypted [09:27:47] \o/ - You rock elukey :) [09:28:17] joal: still need to add support for datanode protocol, that is way harder :( [09:28:40] 10Analytics: Make hdfs-cleaner resilient to in-flight files deletion - https://phabricator.wikimedia.org/T238304 (10JAllemandou) [09:28:43] indeed elukey [09:28:56] also elukey, just created the above task --^ [09:29:35] ahh makes sense yes! [09:38:04] 10Analytics, 10Analytics-Kanban, 10Operations, 10SRE-Access-Requests: Add system user analytics-privatedata to the anaytics-privatedata-users group - https://phabricator.wikimedia.org/T238306 (10elukey) [10:45:59] joal: I found this in puppet [10:46:10] # Each of these jobs have a readme.html file rendered by dumps::web::html. [10:46:13] # We need to make sure the rsync --delete does not delete these files [10:46:16] # which are put in place on the local destination host by puppet. [10:46:19] Dumps::Web::Fetches::Job { [10:46:21] exclude => 'readme.html' [10:46:24] } [10:46:25] I am looking into the changes to use your rsync script on labstore [10:46:47] elukey: I think for the rsnc we shouldn't use '--sahoudl-delete'; [10:47:24] but then we'd need to clean up manually on labstore nodes no? [10:47:30] The flag exists cause I didn't want to provide single-feature one way or another [10:47:48] are there too many files on labstore we'd need to clean? [10:48:10] elukey: replacing files should be done using same names, no? [10:48:49] So keeping existing files shouldn't matter - Am I missing or misunderstanding something? [10:48:52] joal: in most cases yes, but if we upload stuff by mistake etc.. then it may become a burden to remember that we have to clean up manually [10:49:29] hm [10:50:24] elukey: this mean I should add a --exclude parameter to hdfs_rsync I guess? [10:50:52] joal: if possible yes, even to the lib I guess [10:51:08] it needs to be in the lib [10:51:10] but we can think about doing it later [10:51:21] as second step, and just not delete for now [10:51:33] it is probably better/safer [10:52:38] what do you think? [10:53:04] sounds good - Let's create a task to add the feature to hdfs_rsync, and use a no-delete for now [10:55:56] ack [11:05:36] joal: also another thing (sorry then I'll stop) [11:05:37] bash -c '/usr/bin/rsync -rt ${delete_option}${exclude_option} --chmod=go-w ${source}/ ${destination}/ [11:05:41] this is the command [11:05:51] delete/exclude option are not really needed [11:06:03] but two things should be highlighted [11:06:17] 1) -t preserves modification times (not sure if needed, probably not) [11:06:35] 2) --chmod=go-w removes write perms for group and other [11:07:58] for 2) I think that we can simply execute a bash command after our rsync [11:10:33] agreed for 2 - as for 1, I don't have ideas :( [11:12:57] it may be useful to see when files were created, as opposed to when they are copied to dumps [11:13:07] but I believe that it shouldn't matter a lot [11:13:41] elukey: I agree it feels not so important, but I also agree it's good if we can replicate '-t' [11:43:18] elukey: do you want me to add an optional '-p' as in hdfs copy commands (preserve creation, modification and rights)? [11:43:49] elukey: downside is that it does more than just times - owner/group + perms are also copied, meaning we need to manually handle [11:48:12] 10Analytics, 10Inuka-Team (Kanban): Add KaiOS to the list of OS query options for pageviews in Turnilo - https://phabricator.wikimedia.org/T231998 (10Jpita) @SBisson anything i can check here before moving it to done? [11:55:08] joal: nah what I would add is the chmod option, I checked the hdfs lib and it should be easy to add, ok if I file a code review? [11:55:21] so we'll not chmod all the files under a dir tree [11:55:58] elukey: From my inderstanding, we'd copy the file and then chmod, correct? [11:56:43] joal: yes exactly [11:57:08] if we copy and chmod is needed, then we execute [11:57:22] makes sense - It doubles the number of hdfs calls, but eh, I don't think we can do better [11:58:20] in our case it wouldn't no? [11:58:31] we copy from hdfs to fs and the chmod [11:58:39] TRUE! [11:58:46] the other sense, it would :) [11:59:07] elukey: Thanks a lot for adding - I can also do it if you prefer [11:59:40] joal: I didn't want to add more things to do on your plate, but if you are already on it and have time please do :) [12:00:12] elukey: how would you like to work? same syntax as rsync I guess? [12:01:50] actually elukey, I think we could go for -p and chmod params while we are at it - With chmod being applied on top of the copied perms from -p [12:02:23] 10Analytics, 10Analytics-Kanban, 10Inuka-Team: Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10SBisson) @Neil_P._Quinn_WMF Can you verify that KaiOS is now being recognized? [12:08:32] elukey: syntax-wise, the only diff I can see from rsync-chmod and hdfs one is the use of F or D to apply to files or dirs only [12:08:43] elukey: Is that ok if we go without? [12:09:23] sure [12:09:46] what should -p be? [12:10:03] like -t to preserve ak ok [12:10:26] elukey: it preserves everything (times, prems, and also user+group) [12:11:20] joal: this is something that we don't need though, since the dumpsgen user will own those files [12:11:29] so it we pack all together it might be an issue [12:11:37] let's skip it for the moment [12:11:38] ok, not doing then :) [12:12:15] for chmod syntax, we can do rsync-like, but not super strict [12:12:21] it can be different [12:12:36] I am creating a new define in puppet for the cron that will use our rsync [12:12:40] so we can customize it [12:12:41] I think hdfs is a subset of rsync (without F and D) [12:13:05] ?? [12:13:52] rsync chmod allows to change based on files only (F) or dirs only (D) [12:14:04] hdfs doesn't do that by default [12:15:22] ah the hdfs command you meant, vs rsync [12:15:27] yessir [12:15:37] sorry for the shortcuts :S [12:15:38] okok sorry too many acronyms etc.. [12:15:40] :D [12:15:43] I think it is fine [12:15:47] for F and D [12:16:01] we need something simple to replace what we have on labstores [12:16:09] so if we have chmod, then it is a win [12:16:21] ack [12:16:42] it would be also fine to manually chmod etc.. via bash, but if the dir gets huge then it is a waste of resources [12:16:46] so people will complain etc.. [12:16:53] you know the drill :) [12:17:05] es [12:19:31] elukey: for the simplicity of parsing and arg-checking, should we just accept octal perms in hdfs-rsync? [12:20:05] yes yes something to pass straight to chmod [12:20:10] nothing fancy [12:21:36] elukey: I'm enclined not to arg-check formats like: ([ugo][+-][rwx],)*[ugo][+-][rwx] [12:21:39] :) [12:22:23] nono please :D [12:22:38] Thanks for that ;) [12:25:05] elukey: something else - the chmod only applies for newly copied files, or should it also modify already existing files not having correct perms? [12:25:44] good question [12:25:59] I would say only in the new files as starter [12:28:30] If file already exist and have same size, don't touch - If file don't exist or has different size, copy(overwrite) and change perms [12:28:33] elukey: --^ [12:32:32] joal: yeah +1 [12:32:51] elukey: I'm actually checking sonething else that might be problematic [12:33:52] here we are [12:33:54] hm [12:35:03] if it becomes a big problem we can do it later on [12:35:10] I thought it was quick [12:35:36] actually elukey the problem comes from my rsync not being recursive inside folders [12:36:20] elukey: if we rsync files only, it works as expected, but if we rsync folders only and content of folders change, the change will not be replicated [12:37:35] joal: ah ok so in theory we should look into dirs to see if the file size changed? [12:38:05] yes elukey, we should apply rsync recursively inside folders [12:38:15] hm [12:45:56] elukey: batcave for a minute? [12:50:31] joal: need 5 min and then I'll join! [12:50:41] sure elukey :) [13:25:31] 10Analytics, 10Analytics-Kanban: Make hdfs-rsync process sub-folders recursively - https://phabricator.wikimedia.org/T238326 (10JAllemandou) [13:27:40] afk for a bit! [14:32:35] (03PS1) 10Ottomata: Rename notebook env to 'thin', add labstore hosts [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/550841 (https://phabricator.wikimedia.org/T234229) [14:38:40] elukey: you are right, the | gzip is working fine; just seems very slow! :) [14:38:49] but it is going and space usage is fine [14:40:23] ottomata: I am curious about where it buffer stuff [14:40:31] me too [14:40:41] i mean in memory probably, but i guess it does it in chunks [14:41:01] a while ago Manuel suggested to use mydumper, that should be quicker, just remembered sorry [14:42:20] https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=db1107&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&from=now-24h&to=now [14:42:31] not memory, but I can see a steady increase in file usage [14:42:47] it must save it as tmp file on disk then, as stdout comes out [14:42:57] aye, it must just have some buffer in mem, fill up, compress, and then move on [14:43:33] on disk i guess it could, but that graph also is probably showing the output gzip file taking up space too [14:43:39] 10Analytics, 10Analytics-Kanban, 10Inuka-Team: Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10jlinehan) >>! In T237743#5661488, @Ottomata wrote: > Oh hey, as far as I can tell, this is already done! @JAllemandou updated uap-java with the 0.6.9 version of uap-core on Sept... [14:43:41] yeah makes sense [14:43:43] its up to 650G now [14:43:46] so gzip is smart [14:47:17] compressing by chunk :) [14:48:16] The whole point of gzip is to make big files smaller - If you need to put all the "big" in memory, that's not much of an help I guess :) [14:49:38] I am super ignorant about compression! [14:52:06] elukey: shall I use a single principal in oozie for both hcat and hive2 creds, or do you prefer 2 params (the value is the same, that's why I ask) [14:54:20] I'm thinking to use 2, in case they sometimes differ - but your opinion is very welcome :) [14:55:02] yes it is always the hive2 principal that needs to be stated [14:55:11] so we can use one [14:55:27] elukey: except for spark actions, we need metastore principal [14:55:46] spark interacts directly with metastore, not with hive2 [14:56:51] joal: yep yep but they use the same principal [14:56:53] so we are ok [14:57:13] hm - Shall we call hive_principal then? [14:57:18] elukey: --^ [14:59:32] We'll see that later - dropping for kids, back at standup [14:59:47] joal: the principal is hive/an-coord1001.eqiad.wment@WIKIMEDIA, stored in a keytab that both hive2-server and metastore read.. but I am not sure if we talking about the same name [15:00:20] yessir we are [15:00:49] 10Analytics, 10Analytics-Kanban, 10Inuka-Team: Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10Ottomata) Hm, the versions Joal added were 0.6.9. Let's ask him! [15:00:52] ahahaah okok [15:00:53] heya joal ^ [15:01:23] I was wondering about if the principal was the same for both metastore and server2 - If they are, I suggest using hive_principal as oozie parameter name, and use it for both [15:01:27] elukey: --^ [15:01:52] ah yes ok +1 :) [15:01:55] Hi ottomata - I'm leaving for kids in like 1min - I shorter than short, maybe? [15:02:16] ack elukey! Thanks :) [15:03:12] joal: about uap-core [15:03:32] why is version 0.6.9 in uap-java, but 0.6.9 was only released a week ago? [15:03:58] i guess we use the version in master even if it hasn't actually been released? [15:05:08] correct ottomata - we used the not yet released version because the previsouly released one had a bug making it incompatible with uap-java :( [15:05:36] ahhhhh k, so we need to release but with 0.6.0~2 [15:05:39] 0.6.9 [15:05:44] pfff :( [15:05:50] sorry for that ottomata :( [15:05:56] need to go NOW :) [15:06:08] k bye ty! [15:06:13] 10Analytics, 10Analytics-Kanban, 10Inuka-Team: Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10Ottomata) Ok, from Joal: > we used the not yet released version because the previsouly released one had a bug making it incompatible with uap-java :( Ok, so I will need to build... [15:07:44] ottomata: I added https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide#Run_a_recurrent_job_via_Cron_or_similar_without_kinit_every_day [15:07:56] I hope it makes sense [15:08:15] it is very verbose and should explain to people the purpose of the system user [15:09:03] great stuff elukey [15:09:17] joal: i'm going to release based on uap-core sha in master instead of version, will be clearer [15:09:19] super thanks :) [15:11:48] hmm no version is ok, i'll just note in commit which sha the submodule is at [15:13:23] (03PS1) 10Ottomata: Update pom.xml to release v1.4.4-core0.6.10~1-wmf [analytics/ua-parser/uap-java] (wmf) - 10https://gerrit.wikimedia.org/r/550853 (https://phabricator.wikimedia.org/T237743) [15:13:53] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Update pom.xml to release v1.4.4-core0.6.10~1-wmf [analytics/ua-parser/uap-java] (wmf) - 10https://gerrit.wikimedia.org/r/550853 (https://phabricator.wikimedia.org/T237743) (owner: 10Ottomata) [15:20:11] (03CR) 10Elukey: [V: 03+2 C: 03+2] Rename notebook env to 'thin', add labstore hosts [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/550841 (https://phabricator.wikimedia.org/T234229) (owner: 10Ottomata) [15:21:17] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Cannot merge it, maybe there is a conflict with the patch series?" [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/550841 (https://phabricator.wikimedia.org/T234229) (owner: 10Ottomata) [15:22:11] (03PS2) 10Ottomata: Rename notebook env to 'thin', add labstore hosts [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/550841 (https://phabricator.wikimedia.org/T234229) [15:22:20] (03CR) 10Ottomata: [V: 03+2] Rename notebook env to 'thin', add labstore hosts [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/550841 (https://phabricator.wikimedia.org/T234229) (owner: 10Ottomata) [15:22:43] ah probably a simple rebase, stupid me [15:22:46] :) [15:25:03] (03PS1) 10Ottomata: Add maven-release-plugin to allow for mvn deploy to archiva [analytics/ua-parser/uap-java] (wmf) - 10https://gerrit.wikimedia.org/r/550859 [15:25:55] (03CR) 10Ottomata: "Joal, this allows one to do" [analytics/ua-parser/uap-java] (wmf) - 10https://gerrit.wikimedia.org/r/550859 (owner: 10Ottomata) [15:25:59] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Add maven-release-plugin to allow for mvn deploy to archiva [analytics/ua-parser/uap-java] (wmf) - 10https://gerrit.wikimedia.org/r/550859 (owner: 10Ottomata) [15:26:15] has anyone reported any kind of hiccup in the eventlogging pipeline? I'm missing a lot of data I expect to be in event.mobilewikiappsuggestededits. I've reached out to android engineers to see if it's client related but wanted to check with you folks too [15:26:20] https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1&var-schema=MobileWikiAppSuggestedEdits&from=1571152656401&to=1573744656401 [15:27:28] (03PS1) 10Ottomata: Update uap-java to 1.4.4-core0.6.10~1-wmf [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/550860 (https://phabricator.wikimedia.org/T237743) [15:27:32] 10Analytics, 10Analytics-Kanban, 10Research, 10Patch-For-Review: Add data quality metric: traffic variations per country - https://phabricator.wikimedia.org/T234484 (10mforns) @ssingh I'm trying to match a first draft of the traffic_per_country metric with the outage data that you put together. Also, it wo... [15:27:32] a lot of missing events between oct 31st and, like, nov 11-12th [15:29:25] * elukey errand for a bit [15:29:32] hm, bearloga not that i know of, but certainly possible, is it just that schema? [15:30:16] ottomata: as far as I know yes but I'll check a couple others [15:31:47] ottomata: yeah, seems to be isolated to that schema [15:34:21] 10Analytics, 10Analytics-Kanban, 10Research, 10Patch-For-Review: Add data quality metric: traffic variations per country - https://phabricator.wikimedia.org/T234484 (10ssingh) >>! In T234484#5663940, @mforns wrote: > @ssingh > I'm trying to match a first draft of the traffic_per_country metric with the out... [15:35:59] (03CR) 10Mforns: Add hdfs-rsync script based on Hdfs python lib (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550536 (https://phabricator.wikimedia.org/T234229) (owner: 10Joal) [15:36:21] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550536 (https://phabricator.wikimedia.org/T234229) (owner: 10Joal) [15:43:36] (03CR) 10Ottomata: [C: 03+2] "user_agent:" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/550860 (https://phabricator.wikimedia.org/T237743) (owner: 10Ottomata) [15:48:41] (03Merged) 10jenkins-bot: Update uap-java to 1.4.4-core0.6.10~1-wmf [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/550860 (https://phabricator.wikimedia.org/T237743) (owner: 10Ottomata) [15:53:40] 10Analytics, 10Analytics-Kanban, 10Inuka-Team, 10Patch-For-Review: Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10Ottomata) Ok merged. This will go out with the next (analytics) deployment, and then we need to update pageview jobs to use new refinery-hive version. @JAl... [15:57:42] bearloga: i don't see any processing or validation errors for that schema [15:58:02] ottomata: thank you so much for checking! [16:01:15] bearloga: there are preetty regular events, just not a lot of them [16:01:41] i'm pretty sure the data that is being sent is rolling in, so probably client side problem [16:02:14] hm ya i do see that there are some days with no events [16:02:17] e.g. 11-06 [16:09:53] !log roll restart presto-server on an-presto* to pick up new openjdk [16:09:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:14:34] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 6 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Ottomata) Heya @phuedx, I'm looking for review help on https://gerrit.wikimedia.org/r/c/mediawiki/extension... [16:15:40] 10Analytics, 10Analytics-Kanban, 10Inuka-Team, 10Patch-For-Review: Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10Nuria) Since there was only one change to deploy we postponed this to next week. ping to @mforns as this might change our UA stats and thus is a good case fo... [16:16:05] 10Analytics, 10Tool-Pageviews: Topviews Analysis of the Hungarian Wikipedia is flooded with spam - https://phabricator.wikimedia.org/T237282 (10Nuria) [16:16:32] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 6 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Ottomata) [16:17:51] hip: o/ [16:17:59] i'd love to work on getting all this stuff up in mw vagrant soon! [16:18:06] patches on my end are wip but work [16:18:13] oh great [16:18:21] if we can get your EL stuff up in a patch, then we can pull them all down and set them up together [16:18:23] and iterate from t here [16:18:36] that sounds good, I'll work on that today after this meeting I'm in [16:18:48] :) yeehaw [16:24:50] 10Analytics, 10Analytics-Kanban: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10Nuria) [17:01:44] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 6 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10dr0ptp4kt) Hi @Ottomata, @phuedx is unavailable at the moment. What were the parts of the patchset where yo... [17:07:10] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 6 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Ottomata) It is a new extension and MW API endpoint. I've never created a new extension or made an API end... [17:35:53] 10Analytics, 10Analytics-Kanban: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10Nuria) ping @elukey on task as we would need the couple possible options we have about this [17:47:04] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Prepare the Hadoop Analytics cluster for Kerberos - https://phabricator.wikimedia.org/T237269 (10elukey) [18:04:01] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10ovasileva) a:05phuedx→03ovasileva [18:24:47] joal: snakebite pr merged \o/ [18:25:07] this is GREAT elukey :) [18:26:07] 10Analytics: Archive /home/ezachte data on stat1007 - https://phabricator.wikimedia.org/T238243 (10Ottomata) [18:26:10] 10Analytics, 10Analytics-Kanban: Make hdfs-rsync process sub-folders recursively - https://phabricator.wikimedia.org/T238326 (10fdans) p:05Triage→03High [18:26:12] 10Analytics, 10Analytics-Kanban: Make hdfs-rsync process sub-folders recursively - https://phabricator.wikimedia.org/T238326 (10fdans) a:03JAllemandou [18:26:18] 10Analytics, 10Analytics-EventLogging, 10Event-Platform: Decommission EventLogging backend components by migrating to MEP - https://phabricator.wikimedia.org/T238230 (10Ottomata) p:05Triage→03High [18:26:20] elukey: does that mean we can use snakebite-py3 for refinery? [18:26:29] 10Analytics, 10Analytics-EventLogging, 10Event-Platform: Decommission EventLogging backend components by migrating to MEP - https://phabricator.wikimedia.org/T238230 (10Ottomata) p:05High→03Normal [18:26:39] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Vertical: Virtualpageview datastream on MEP - https://phabricator.wikimedia.org/T238138 (10Ottomata) p:05Triage→03Normal [18:27:57] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 6 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10dr0ptp4kt) Thanks @Ottomata. I see the note from @Krinkle on the patch about the other gating, where you're... [18:29:28] 10Analytics: Make hdfs-cleaner resilient to in-flight files deletion - https://phabricator.wikimedia.org/T238304 (10fdans) p:05Triage→03High [18:31:13] joal: not yet, I need to add the support for the datanote protocol, next on my list.. if we need to create files etc.. it will not work :( [18:31:28] Ah makes sense [18:31:36] of course, I had forgotten that [18:31:48] But for file existence check, it does the trick :) [18:32:34] yes it should! [18:33:10] 10Analytics: Archive /home/ezachte data on stat1007 - https://phabricator.wikimedia.org/T238243 (10fdans) p:05Triage→03Normal Let's archive this in HDFS [18:34:35] * elukey off! [18:34:41] mforns: I have a question for you if you have a minute [18:37:22] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 6 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Ottomata) Thanks @dr0ptp4kt! [18:41:09] 10Analytics, 10Inuka-Team, 10Language-strategy, 10Tool-Pageviews: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10SBisson) This data available in an API would be very useful to the #inuka-team for the #kaios-wikipedia-app to show locally-relevant trendin... [18:44:46] just checked when we started working on kerberos.. june 2018 /o\ [18:59:10] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, and 3 others: Create new eventgate-logging deployment in k8s with helmfile - https://phabricator.wikimedia.org/T236386 (10akosiaris) Namespaces and tokens have been created and populated. @ottomata, you are c... [19:11:44] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, and 3 others: Create new eventgate-logging deployment in k8s with helmfile - https://phabricator.wikimedia.org/T236386 (10akosiaris) p:05Triage→03Normal [19:12:34] 10Analytics, 10Inuka-Team, 10Language-strategy, 10Tool-Pageviews: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Nuria) >I understand the concerns around data quality but for our specific use case, there's no way the page views per country per language... [19:13:59] joal, hey, yes? [19:20:06] 10Analytics, 10Research-Backlog: Label high volume bot spikes in pageview data as automated traffic - https://phabricator.wikimedia.org/T238357 (10Nuria) [19:23:51] 10Analytics: Deploy high volume bot spike detector to hungarian wikipedia - https://phabricator.wikimedia.org/T238358 (10Nuria) [19:29:10] 10Analytics: Hourly Feature extraction for bot detection from webrequest - https://phabricator.wikimedia.org/T238360 (10Nuria) [19:32:50] 10Analytics, 10Inuka-Team, 10Language-strategy, 10Tool-Pageviews: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10SBisson) >>! In T207171#5664793, @Nuria wrote: >>I understand the concerns around data quality but for our specific use case, there's no way... [19:35:03] 10Analytics: Hourly labeling of "automated" traffic before loading of pageviews into pageview_hourly - https://phabricator.wikimedia.org/T238361 (10Nuria) [19:35:25] joal: let me know what you think of these as tasks: https://phabricator.wikimedia.org/T238361 [19:39:49] cc mforns , let me know what you think of tasks here: https://phabricator.wikimedia.org/T238361 [19:40:16] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, and 3 others: Create new eventgate-logging deployment in k8s with helmfile - https://phabricator.wikimedia.org/T236386 (10Ottomata) Yeah. Public endpoint! Which begs the question @fgiunchedi...what should t... [19:40:54] 10Analytics: Vet high volume bot spike detection in hungarian wikipedia - https://phabricator.wikimedia.org/T238363 (10Nuria) [19:41:44] nuria, you mean related tasks? [19:42:15] mforns: ya, do they make sense? [19:42:42] nuria, I can only see parent tasks to that one you pasted [19:43:19] https://usercontent.irccloud-cdn.com/file/e1yrtwkg/Screen%20Shot%202019-11-14%20at%2011.43.02%20AM.png [19:43:29] ^ mforns can you not see these in https://phabricator.wikimedia.org/T238361? [19:43:52] oh wow, I still don't see them in my browser O.o! [19:43:57] no! [19:44:16] oh wait [19:44:19] 10Analytics-Kanban, 10Product-Analytics: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Nuria) 05Open→03Resolved [19:44:38] I think I got the wrong link [19:45:11] nuria, the one you took the screenshot is https://phabricator.wikimedia.org/T238357 right? not https://phabricator.wikimedia.org/T238361 [19:46:22] 10Analytics: Add editors per country data to AQS API - https://phabricator.wikimedia.org/T238365 (10Nuria) [19:46:45] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Rerun sanitization before archiving eventlogging mysql data - https://phabricator.wikimedia.org/T236818 (10Nuria) 05Open→03Resolved [19:46:47] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10Nuria) [19:49:50] 10Analytics-Kanban, 10Analytics-Wikistats: Strip www from project in URL if it's included - https://phabricator.wikimedia.org/T237520 (10Nuria) 05Open→03Resolved [19:50:04] 10Analytics, 10Analytics-Kanban: Add notice to Wikistats 1 about the move to Wikistats 2 - https://phabricator.wikimedia.org/T237999 (10Nuria) [19:50:31] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Correct namespace zero editor counts on geoeditors_monthly table on hive and druid - https://phabricator.wikimedia.org/T237072 (10Nuria) [19:50:33] nuria, is https://phabricator.wikimedia.org/T238361 about labeling the automated traffic in pageview_hourly already or in a separate table still? [19:50:59] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10Nuria) 05Open→03Resolved [19:51:23] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Correct namespace zero editor counts on geoeditors_monthly table on hive and druid - https://phabricator.wikimedia.org/T237072 (10Nuria) 05Open→03Resolved [19:51:27] 10Analytics-Kanban, 10Product-Analytics: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Nuria) [19:51:31] if the latter, we might want to have another task that is joining the labeled data when generating (refining) pageview_hourly [19:51:39] no? [19:52:40] 10Analytics, 10Analytics-Kanban, 10Research-Backlog: Create labeled dataset for bot identification - https://phabricator.wikimedia.org/T206267 (10Nuria) [19:53:00] 10Analytics, 10Analytics-Kanban, 10Research-Backlog: Create labeled dataset for bot identification - https://phabricator.wikimedia.org/T206267 (10Nuria) 05Open→03Resolved [19:53:04] 10Analytics, 10Analytics-Kanban: POC More efficient Bot filtering on pageview data - https://phabricator.wikimedia.org/T211359 (10Nuria) [19:53:46] 10Analytics, 10Analytics-Kanban: POC More efficient Bot filtering on pageview data - https://phabricator.wikimedia.org/T211359 (10Nuria) 05Open→03Resolved [19:53:48] 10Analytics, 10Research-Backlog: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Nuria) [19:54:14] 10Analytics, 10Analytics-Kanban, 10Tool-Pageviews: Load media requests data into cassandra - https://phabricator.wikimedia.org/T228149 (10Nuria) [19:54:28] 10Analytics, 10Analytics-Kanban, 10Tool-Pageviews: Load media requests data into cassandra - https://phabricator.wikimedia.org/T228149 (10Nuria) [19:54:56] 10Analytics, 10Analytics-Kanban: Import siteinfo dumps onto HDFS - https://phabricator.wikimedia.org/T234333 (10Nuria) ping @JAllemandou to see if any docs need to be corrected/added [19:55:17] 10Analytics, 10Analytics-Kanban, 10Tool-Pageviews: Load media requests data into cassandra - https://phabricator.wikimedia.org/T228149 (10Nuria) 05Open→03Resolved [19:55:22] 10Analytics, 10Patch-For-Review, 10Services (watching): Add mediacounts data to AQS and, from there, Restbase - https://phabricator.wikimedia.org/T207208 (10Nuria) [19:55:59] 10Analytics, 10Analytics-Kanban, 10Multimedia, 10Tool-Pageviews: Make job to backfill data from mediacounts into mediarequests tables in cassandra so as to have historical mediarequest data - https://phabricator.wikimedia.org/T234591 (10Nuria) [19:56:03] 10Analytics, 10Analytics-Kanban, 10Multimedia, 10Tool-Pageviews: Create script that returns oozie time intervals every time a coordinator is started from a cron job - https://phabricator.wikimedia.org/T237119 (10Nuria) 05Open→03Resolved [19:56:09] 10Analytics, 10Analytics-Kanban: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10Nuria) [19:56:11] 10Analytics, 10Analytics-Kanban: Add notice to Wikistats 1 about the move to Wikistats 2 - https://phabricator.wikimedia.org/T237999 (10Nuria) 05Open→03Resolved [19:56:25] Hi mforns - Was gone for diner sorry [19:56:31] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: 500k files in hdfs /tmp - https://phabricator.wikimedia.org/T234954 (10Nuria) [19:56:39] no problemo, I was as well when you asked [19:56:54] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Figure out how to $ref common schema across schema repositories - https://phabricator.wikimedia.org/T233432 (10Nuria) 05Open→03Resolved [19:56:57] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Nuria) [19:57:25] mforns: k [19:57:32] nuria: looking at the tasks now [19:58:13] nuria: If the 'smallest' one is the labelling of pageview, work is done, no? [19:58:59] nuria: Also, 'bot-spike detector' vs 'automated traffic tag'? [19:59:58] joal: we will be labelling sessions rather than pageviews right? at the time of computing pageview_hourly we will need to check for every pageview the label given entity [20:00:04] joal: makes sense? [20:01:10] nuria: I think I understand your point - but the tasks says: Hourly labeling of "automated" traffic before loading of pageviews into pageview_hourly - This means pageview-hourly has the tag (whether as a separate one, or a replacement for user) [20:01:47] joal: ah i see, let me rephrase [20:02:14] nuria: however I see no point deploying on specific wikis first [20:02:26] joal: i do, to vet data [20:02:51] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, and 4 others: Create new eventgate-logging deployment in k8s with helmfile - https://phabricator.wikimedia.org/T236386 (10Ottomata) Ok, patches for LVS and discovery ready for review: LVS: - DNS: https://ger... [20:03:00] nuria: I understand - We can add a temporary field? or even duplicate a few hours pageview hourly? [20:03:01] joal: it is easier to see bugs in code that way [20:03:44] joal: or just enable it (via where project="blah") de facto for 1 wiki [20:03:55] joal: without duplicating pageview_hourly [20:04:23] joal: given the state of huwiki now it is hard how it wouldn't be an improvement of terms [20:04:38] nuria: hm - In any case if data is poluted (buggy), we'll probably want to correct [20:05:24] nuria: For data vetting, I'd go for a few (even days) of duplication (not that big), and then switch on all wikis [20:05:45] joal: it will only impact the label, not the data itself right? [20:05:52] ? [20:06:00] nuria: let's batcave if you have a minute :) [20:06:23] it would reduce the pageview count for queries that use agent_type="user" [20:06:34] yes [20:06:54] joal, mforns on bc [20:50:59] 10Analytics, 10Wikimedia-Stream: EventStreams butcher up some Unicode characters - https://phabricator.wikimedia.org/T198994 (10Ottomata) Hey sorry, just saw this! Hm, If I understand correctly, the diff you linked to has the same characters incorrect? '�ир!'. Is that right? If so, this is a problem with... [21:27:18] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10Ottomata) db1107: ` real 1258m49.602s user 1875m25.784s sys 54m34.796s 696G Nov 14 15:50 mysqldump-db1107-2019-11-13... [21:39:27] mforns: Any progress on getting the refresh of the recent pingback statistics scheduled? I would really like to see the "other" lines fixed [21:39:28] e.g. https://pingback.wmflabs.org/#media-wiki-version and https://pingback.wmflabs.org/#php-version/php-version-media-wiki-1-34-timeseries. [21:43:51] (03PS1) 10Joal: Update hive and spark oozie jobs for kerberos [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550945 (https://phabricator.wikimedia.org/T237269) [21:45:20] Gone for tonight team - see you tomorrow [21:52:41] 10Quarry: quarry-web-01 leaks files in /tmp - https://phabricator.wikimedia.org/T238375 (10zhuyifei1999) [22:01:34] 10Quarry: quarry-web-01 leaks files in /tmp - https://phabricator.wikimedia.org/T238375 (10zhuyifei1999) They seem to be uncompressed xlsx worksheets. [22:11:11] 10Quarry: quarry-web-01 leaks files in /tmp - https://phabricator.wikimedia.org/T238375 (10zhuyifei1999) xlsxwriter creates temp files at two places, * https://github.com/jmcnamara/XlsxWriter/blob/38f1fcb4567ec779502dd4c9ae2d06ad360e028f/xlsxwriter/packager.py#L163 * https://github.com/jmcnamara/XlsxWriter/blob/... [22:49:22] CindyCicaleseWMF: is there a ticket for that work? [22:51:27] 10Analytics: Vet high volume bot spike detection code - https://phabricator.wikimedia.org/T238363 (10Nuria) [23:11:07] nuria: It was discussed on https://gerrit.wikimedia.org/r/c/analytics/reportupdater-queries/+/545917 and https://gerrit.wikimedia.org/r/c/analytics/reportupdater-queries/+/548306. The work was triggered by https://phabricator.wikimedia.org/T223414 but should really be its own task since it was in solving that task that I noticed the missing data. [23:11:29] PROBLEM - Check the last execution of hdfs-cleaner on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit hdfs-cleaner https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [23:12:16] CindyCicaleseWMF: I see, I think if you file a task we can better see what needs doing there, from the top of my head i do not remember [23:12:48] Will do. [23:23:08] 10Analytics: Rerun pingback reports to categorize software versions correctly. - https://phabricator.wikimedia.org/T238389 (10CCicalese_WMF) [23:23:29] nuria: done! [23:23:35] 10Analytics, 10Analytics-Kanban: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10Danielsberger) Thank you, @Nuria . Having the upload data set without a save flag makes perfect sense and is great! I did not know that there only 200 edits per... [23:28:01] 10Analytics, 10Analytics-Kanban: logging level of cassandra should be warning or error but not debug - https://phabricator.wikimedia.org/T236698 (10Nuria) To be clear, is not that mapreduce.reduce.log.level : ERROR is not taking effect, it is cause i see it listed on task logs:./attempt_1571142484661_60235_m_...