[00:50:17] PROBLEM - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:51:28] (03PS1) 10QChris: Add .gitreview [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/618657 [05:51:30] (03CR) 10QChris: [V: 03+2 C: 03+2] Add .gitreview [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/618657 (owner: 10QChris) [05:53:36] 10Analytics-Radar, 10Product-Analytics, 10Release-Engineering-Team, 10Repository-Admins: Create a repository and user for Product Analytics Oozie jobs - https://phabricator.wikimedia.org/T230743 (10QChris) >>! In T230743#6363028, @mpopov wrote: > Requested `analytics/wmf-product/jobs` Gerrit repo Done. Cr... [06:12:12] RECOVERY - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:02:42] 10Analytics-Clusters, 10Discovery, 10Discovery-Search, 10Patch-For-Review: Move mjolnir kafka daemon from ES to search-loader VMs - https://phabricator.wikimedia.org/T258245 (10elukey) Remaining things to do: 1) evaluate correctness and performances of mjolnir on search-loader VMs (currently in progress -... [08:07:08] !log roll restart druid-brokers (on both clusters) to pick up new changes for monitorings [08:07:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:15:31] elukey: o/ since today I am not able to run jupyter notebooks on the stat-machines anymore. I can login to jupyterhub but starting the server times out (didnt respond in 30 seconds). might be something only related to me (reached out to diego and it works for him). any idea what could be the problem? [10:16:44] mgerlach: o/ I saw an icinga alert related to your notebook, I think that the venv in your home is messed up. Can I try to delete/re-create it? [10:17:30] yes, please [10:17:52] mgerlach: done, can you retry now? [10:18:01] let me check [10:18:09] (I did it on stat1005) [10:20:11] ok, I can launch the notebooks now in jupyterhub [10:20:35] any idea why this happended? [10:21:37] I see now the venvs are gone (is this what was messed up?) [10:22:18] elukey: and thanks for fixing [10:22:25] so the venv for some reason was missing the pip packages related to launch jupyter notebooks, they are created upon first login by the jupyterhub service [10:22:36] sometimes we have to do it, still unclear why :( [10:22:46] did you have the same problem on all stat boxes? [10:22:58] I tried on several, yes [10:23:10] ah ok this is not great then [10:25:14] mgerlach: can you tell me another stat that you tried on so I can investigate? [10:25:18] (check logs etc..) [10:25:30] on stat1007 it also didnt (doesnt) work [10:25:51] 500:internal server error [10:26:07] basically it says Aug 06 10:25:36 stat1007 bash[113040]: /bin/bash: line 0: exec: jupyterhub-singleuser: not found [10:27:27] the jupyterhub-singleuser should be in theory under /home/mgerlach/venv/bin [10:27:47] (it is not on stat1005) [10:27:53] but it is not on stat1007 [10:27:57] elukey@stat1007:~$ ls /home/mgerlach/venv/bin [10:27:57] ls: cannot access '/home/mgerlach/venv/bin': No such file or directory [10:28:13] so not sure why but the bin dir under your venv was removed [10:28:39] to fix it should be sufficient to remove the venv and retry in theory [10:29:02] not sure if related: I have my different virtual environemnts under ~/venv/, e.g. ~/venv/venv_for_project_1/ and so on (they are now missing on stat1005) [10:29:45] ah yes I had to remove venv, it is where jupyterhub deploys the venb [10:29:48] *venv [10:30:16] do I should put my venvs somewhere else? [10:30:32] yep I think so, venv is used by jupyter [10:30:58] I am sorry if I dropped some of your work on 1005, didn't realize that you were using venv in that way :( [10:31:08] I see, didnt know. is that a recent change? (since it worked until yesterday) [10:31:47] no worries, venv should not be too difficult to rebuild [10:31:49] I think that your notebooks were stopped/killed for some reason [10:32:03] and when you tried to access them, they tried to restart via systemd [10:32:09] but the bin dir was not there anymore [10:32:22] makes sense [10:32:41] what we can try to do is create a different directory name, like "jupyter-notebook-venv" or similar [10:32:48] would it be less confusing? [10:34:39] I think either way is fine. but potentially good to avoid similar issues (maybe I am the only one to put the venvs in that folder : ) [10:38:01] you are probably not the only one, I had some other reports but I haven't dig much into them (always chose the quick turn-off/on again fix :D) [10:38:15] will do in the future, thanks for pointing out [10:38:58] thanks elukey [10:39:26] mgerlach: there is also https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#Resetting_user_virtualenvs [10:39:37] if you want to self-fix stat1007 and others [11:08:31] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10akosiaris) a:03DVrandecic [11:12:00] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10akosiaris) p:05Triage→03Medium [13:03:08] hey teammm [13:05:10] o/ [13:10:08] mgerlach: may I recommend either putting different environments in a ~/venvs/ or in the individual projects directories? [13:12:51] Either approach has its benefits [13:55:21] hey elukey, does cassandra have the ability to inspect the data ingested and create the corresponding table, like druid does? Or should I create the table beforehand? [13:55:33] you know? [13:55:47] mforns: I think that you need to create the schema first [13:55:55] ok [13:56:00] thanks elukey :] [13:56:10] np, lemme know if I can help :) [13:56:25] I am currently having "fun" trying to build hue [13:56:30] so any distraction is appreciated [13:57:46] (03PS1) 10Bearloga: Update README.md [analytics/wmf-product] - 10https://gerrit.wikimedia.org/r/618751 [13:58:04] (03CR) 10Bearloga: [V: 03+2 C: 03+2] Update README.md [analytics/wmf-product] - 10https://gerrit.wikimedia.org/r/618751 (owner: 10Bearloga) [13:58:35] (03PS1) 10Bearloga: Add README.md [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/618752 [13:58:46] (03CR) 10Bearloga: [V: 03+2 C: 03+2] Add README.md [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/618752 (owner: 10Bearloga) [13:59:36] 10Analytics-Radar, 10Product-Analytics, 10Release-Engineering-Team, 10Repository-Admins: Create a repository and user for Product Analytics Oozie jobs - https://phabricator.wikimedia.org/T230743 (10mpopov) [13:59:50] 10Analytics-Radar, 10Product-Analytics, 10Release-Engineering-Team, 10Repository-Admins: Create a repository and user for Product Analytics Oozie jobs - https://phabricator.wikimedia.org/T230743 (10mpopov) [14:00:19] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Set up environment for Product Analytics system user - https://phabricator.wikimedia.org/T258970 (10mpopov) [14:00:21] 10Analytics-Radar, 10Product-Analytics, 10Release-Engineering-Team, 10Repository-Admins: Create a repository and user for Product Analytics Oozie jobs - https://phabricator.wikimedia.org/T230743 (10mpopov) 05Open→03Resolved Thank you @QChris! [14:08:51] (03PS1) 10Bearloga: Update README.md [analytics/wmf-product] - 10https://gerrit.wikimedia.org/r/618754 [14:08:53] (03PS1) 10Bearloga: Update README.md [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/618755 [14:09:06] (03CR) 10Bearloga: [V: 03+2 C: 03+2] Update README.md [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/618755 (owner: 10Bearloga) [14:09:10] (03CR) 10Bearloga: [V: 03+2 C: 03+2] Update README.md [analytics/wmf-product] - 10https://gerrit.wikimedia.org/r/618754 (owner: 10Bearloga) [14:11:06] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Set up environment for Product Analytics system user - https://phabricator.wikimedia.org/T258970 (10mpopov) [14:13:08] elukey: hehe [14:14:22] 10Analytics, 10Voice & Tone: Rename geoeditors_blacklist_country - https://phabricator.wikimedia.org/T259804 (10Isaac) [14:15:41] of course, npm on my vm works, meanwhile on debian building yields error when running webpack [14:31:31] 10Analytics-Radar, 10Better Use Of Data, 10Product-Infrastructure-Data, 10Wikimedia-Logstash, and 4 others: Documentation of client side error logging capabilities on mediawiki - https://phabricator.wikimedia.org/T248884 (10jlinehan) [14:31:36] 10Analytics-Radar, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 7 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10jlinehan) [14:36:36] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Set up environment for Product Analytics system user - https://phabricator.wikimedia.org/T258970 (10mpopov) > when you send a coordinator/workflow job to oozie, there is only XML to send, it is not needed to have specific deps on the stat100x host that is... [14:43:24] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Set up environment for Product Analytics system user - https://phabricator.wikimedia.org/T258970 (10elukey) Yes I agree, oozifying complex workloads is really a pain.. we could start with a one-off cron, and then adding a puppet config later on in case it... [14:44:50] a-team deploying refinery for the two train changes [14:45:01] fdans: how dare you [14:45:14] lol [14:45:20] elukey: my cluster my rules [14:45:39] fdans: we'll see, I set up traps everywhere [14:45:56] elukey: i'll make sure to break everything [14:47:13] !log deploying refinery [14:47:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:00:27] yall going to staff? [15:02:54] ping elukey [15:03:02] ah snap sorry [15:03:06] got lost in debian building [15:22:53] nuria: whenever you have time, we can talk about session lenght [15:22:57] *length [15:25:58] mforns: ya, now? [15:26:12] nuria: sure [15:26:12] mforns: bc? [15:26:16] ok, omw [15:40:15] 10Analytics, 10MediaWiki-REST-API, 10Platform Team Sprints Board (Sprint 0), 10Platform Team Workboards (Green), 10Story: System administrator reviews API usage by client - https://phabricator.wikimedia.org/T251812 (10Pchelolo) @Joe @Ottomata I would really appreciate your view on the general approach to... [15:46:28] 10Analytics, 10MediaWiki-REST-API, 10Platform Team Sprints Board (Sprint 0), 10Platform Team Workboards (Green), 10Story: System administrator reviews API usage by client - https://phabricator.wikimedia.org/T251812 (10Ottomata) I kinda of like the idea of a simple `stdin/file > http_poster https://ev... [16:22:54] bearloga: thanks for the suggestion. going with the former, i.e. putting all the environments in a different folder such as ~/venvs/ works well [16:28:40] 10Analytics, 10MediaWiki-REST-API, 10Platform Team Sprints Board (Sprint 0), 10Platform Team Workboards (Green), 10Story: System administrator reviews API usage by client - https://phabricator.wikimedia.org/T251812 (10Nuria) Not sure if you de-estimated this idea (or if it is really what you called above... [16:29:16] ottomata: see my comment on T251812 [16:29:17] T251812: System administrator reviews API usage by client - https://phabricator.wikimedia.org/T251812 [16:29:38] ottomata: is this something you thought about that seemed not viable? [16:31:47] nuria: the envoy log can be formatted however we like [16:32:04] so taht data can be produced to eventgate or to kafka diretctly, either way [16:32:10] there's no transformation needed [16:33:11] ottomata: thanks for this https://wikitech.wikimedia.org/wiki/Analytics/Systems/Anaconda [16:33:20] mgerlach: it sounds like you're an avid user of python at command line for your research projects, which makes you a really candidate for checking out the experimental https://wikitech.wikimedia.org/wiki/Analytics/Systems/Anaconda that ottomata has been working on [16:33:23] ottomata: i think that if we have event gate as a gateway of events it will be wise to not produce to kafka directly cause otherwise any measures we might add to event gate like throttling [16:33:26] oh! hahaha! [16:33:31] we will not get with direct production [16:33:54] mgerlach: i herad you have some questions about teh actor table? [16:34:23] ottomata: let me add thsi point to ticket [16:35:06] ottomata: just played with this and found it very useful to use additional packages such as https://graph-tool.skewed.de/ [16:36:05] nuria: yes I find it quite useful. joseph mentioned at some point that it is an intermediate table so was wondering where it would go (and if we can give any input) [16:36:12] 10Analytics, 10MediaWiki-REST-API, 10Platform Team Sprints Board (Sprint 0), 10Platform Team Workboards (Green), 10Story: System administrator reviews API usage by client - https://phabricator.wikimedia.org/T251812 (10Nuria) I think we will implement measures such us throttling (and blocking) in event g... [16:36:29] mgerlach: it is an intermediate table we created to be able to do the bots work [16:36:46] nuria: webrequest goes to kafka [16:36:58] there is no requirement for internal services to produce to eventgate [16:37:04] it is a useful tool, but not for all use cases [16:37:11] e.g. rsyslog logging to logstash [16:37:15] but, i mean, it is fine to do so! [16:37:18] in this case it might make sense [16:37:39] my preference so far is to just make a very simple stdin -> http post sidecar process [16:37:46] to send the logs to eventgate [16:39:21] ottomata: i see, have in mind taht webrequest is throttled by varnish [16:40:01] ottomata: streams not throttled at all like netflow that are produced to kafka directly by internal services are harder to monitor for throughput [16:40:24] ottomata: and schema correctness [16:40:35] nuria: these will be api requests logs, which come thhrough varnish [16:40:56] ottomata: netflow just broke recently for this two reasons, i think (need to check on correctness but definitely throughput) [16:42:58] mgerlach: feel free to pass along input [16:43:05] mgerlach: about actor table, sorry [16:48:47] nuria: btw kaffka has throttling [16:49:17] ottomata: per topic? [16:49:26] https://docs.confluent.io/current/kafka/post-deployment.html#enforcing-client-quotas [16:50:07] i think per clientId [16:50:18] ottomata: per client id ya [16:50:45] and you can restrict which clients can produce to certain topics via ACLs [16:54:22] ottomata: i am not sure that for the situation i am thinking this would work , right now the client in the varnish webrequest pipeline are the varnish onlines and the throughput they do is bounded by webrequest topic, thus if we wnat ed to have throttling limits for a specific "analytics" topic, say pagepreviews we wouldn't be able to do that , right? [16:54:44] ottomata: now, if we have gateway like Eventgate those limits are much easier to enforce [16:54:54] i mean, eventgate can't enforce any global imits [16:55:07] but those requests go through varnish anyway [16:55:13] the ones from external clients [16:55:52] nuria: not all events will come via eventgate; e.g. if we are doing stream processing to produce hydrated or joined events [16:56:05] we aren't going to make the stream processing POST to eventgate, esp if they do high volume [16:56:12] eventgate is def slower than producing to kafka [16:56:21] external clients for sure, eventgate is needed [16:56:29] internal ones it is mostly recommended [16:56:29] right, that is true, volume wise [16:56:31] but it won't fit all cases [16:56:35] aham [16:56:43] that said, i think it should the use case at hand, api request logs [16:56:54] we already do it for the action api ones, we should do it for the new rest api ones too [16:57:18] also kafka producer clients are much more featureful [16:57:23] exactly once transactions [16:57:32] custom keys [16:57:34] etc. [16:59:13] mgerlach: ping on the actor topic again [16:59:49] nuria: sorry, in a meeting. will write in a few minutes [17:03:48] mgerlach: ok [17:07:40] * elukey afk! [17:12:20] (03CR) 10Milimetric: [C: 04-1] Requests with app user agents should not be evaluated as app pageviews (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/618635 (https://phabricator.wikimedia.org/T257860) (owner: 10Nuria) [17:13:26] (03CR) 10Milimetric: "This makes sense as TDD but if you split the test and the fix over two gerrit changes we won't be able to commit. Maybe squish these two?" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/618157 (https://phabricator.wikimedia.org/T257860) (owner: 10Nuria) [17:38:32] (03CR) 10Nuria: "> Patch Set 1:" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/618157 (https://phabricator.wikimedia.org/T257860) (owner: 10Nuria) [17:38:37] nuria: coming back to the actors-table. when getting reading sessions from webrequest, we usually filter out sessions which contain one or more edit-attempts (by looking at non-pageviews uri_query) as we did for, e.g. covid-sessions. since the actor-table only contains pageviews, so we could not apply that filter anymore using that table. my question was if you had any idea if there would be any easy workaround without going [17:38:37] back to webrequest [17:48:35] mgerlach: i see [17:50:23] mgerlach: the actor table is called actor cause it does actors not sessions, have in mind that an actor with two requests 10 appart (which will not qualify as a session with a traditional definition) will be counted here as 1 peudo session [17:52:06] nuria: yes, understood [17:56:27] mgerlach: that being said your broader question is whether you can "weed" out those sessions [17:57:10] mgerlach: the only way to do that is calculating the signature for all "edit" requests on webrequest and using that to filter [17:57:30] mgerlach: so you have to comput a small actors_that_edit table [17:58:10] mgerlach: using the signature udf https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/GetActorSignatureUDF.java [17:58:25] mgerlach: for all requests that have an 'edit' [17:58:44] nuria: possbily. or using instead of a filter having a column which indicates yes/no. but am afraid that it is not that trivial and one might just use the webrequest tbale in the first place. [17:59:03] mgerlach: taht shoudl be a small table that you can use to do a map-side join [17:59:46] mgerlach:let me look one sec at how i calculated actors [18:01:01] 10Analytics: page_id is null where it shouldn't be in mediawiki history - https://phabricator.wikimedia.org/T259823 (10Milimetric) [18:01:20] mgerlach: having another feature that is "session_includes_edit" is not unthinkable [18:03:02] nuria: if possible great but not high priority [18:03:11] mgerlach: but computationally is not trivial cause the way actors are computed now is done only with pageview data [18:03:37] mgerlach: and edit requests are not pageviews which means they are not included on teh pesudo-session [18:04:00] mgerlach: it will require probably an intermediate table where all edit requests end up [18:04:14] mgerlach: for which actors are calculated [18:04:20] mgerlach: so it is an involved change [18:04:40] mgerlach: we basically would need to do what i suggested earlier [18:05:06] nuria: I didnt want to open a can of worms : ) I will think about your suggestion. [18:05:12] mgerlach: create a actor_who_has_edited table [18:05:45] mgerlach: the code to create such table is a modification of this one: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/learning/features/actor/hourly/calculate_features_actor_hourly.hql [18:06:14] nuria: thanks [18:22:06] (03PS2) 10Nuria: Requests with app user agents should not be evaluated as web pageviews [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/618635 (https://phabricator.wikimedia.org/T257860) [18:22:33] (03Abandoned) 10Nuria: Test case for pageviews marked as such that should not be so [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/618157 (https://phabricator.wikimedia.org/T257860) (owner: 10Nuria) [18:23:25] (03PS3) 10Nuria: Requests with app user agents are not evaluated as web pageviews [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/618635 (https://phabricator.wikimedia.org/T257860) [18:24:19] (03PS4) 10Nuria: Requests with app user agents are not evaluated as web pageviews [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/618635 (https://phabricator.wikimedia.org/T257860) [18:25:35] (03CR) 10Nuria: Requests with app user agents are not evaluated as web pageviews (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/618635 (https://phabricator.wikimedia.org/T257860) (owner: 10Nuria) [18:44:06] (03PS5) 10Nuria: Exclude requests with app user agents from web pageviews [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/618635 (https://phabricator.wikimedia.org/T257860) [18:44:45] (03CR) 10Milimetric: [C: 03+2] Exclude requests with app user agents from web pageviews [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/618635 (https://phabricator.wikimedia.org/T257860) (owner: 10Nuria) [18:45:29] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Exclude requests with app user agents from web pageviews [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/618635 (https://phabricator.wikimedia.org/T257860) (owner: 10Nuria) [18:45:57] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Exclude requests with app user agents from web pageviews (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/618635 (https://phabricator.wikimedia.org/T257860) (owner: 10Nuria) [18:53:55] 10Analytics: page_id is null where it shouldn't be in mediawiki history - https://phabricator.wikimedia.org/T259823 (10Milimetric) [18:55:57] 10Analytics, 10Analytics-Data-Quality: page_id is null where it shouldn't be in mediawiki history - https://phabricator.wikimedia.org/T259823 (10Nuria) [19:18:25] 10Analytics, 10Analytics-Data-Quality: page_id is null where it shouldn't be in mediawiki history - https://phabricator.wikimedia.org/T259823 (10Milimetric) [19:22:15] 10Analytics-Radar, 10Operations, 10Traffic: Spammy events coming our way for sites such us https://ru.wikipedia.kim - https://phabricator.wikimedia.org/T190843 (10Nathan708) is there any legal caution mated for such an act; because it looks as if these guys mirror wikipedia. But they can only mirror wikipedi... [19:44:27] Hey, quick question about AQS, does the number for individual media files (https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests /per-file/) would count seeing the thumbnail of videos as a request or it's only plays of the video? [19:44:50] I'm using the API in this tool https://mvc.toolforge.org/index.php?category=Videos_by_Terra_X×pan=now-30&limit=200 [21:02:01] Amir1: there is no info about plays at all [21:02:10] Amir1: it is just thumbnail loading [23:57:18] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update PageviewDefinition to only include /api/rest_v1/page/mobile-html requests with X-Analytics: pageview=1 in pageviews - https://phabricator.wikimedia.org/T257860 (10Nuria) Known problems list updated with correction: https://wikitech.wikimedia.org/wik...