[03:11:00] PROBLEM - Check the last execution of refinery-import-page-history-dumps on stat1007 is CRITICAL: CRITICAL: Status of the systemd unit refinery-import-page-history-dumps [06:11:30] morning! [06:17:21] refinery-import-page-history-dumps[11980]: NameError: name 'hdfs' is not defined [06:17:34] s/hdfs.touchz/Hdfs.touchz afaics [06:17:48] I have re-run the script with the above change in refinery on stat1007 [06:17:58] if it works I'll send a patch (not sure if there is one already) [06:26:36] RECOVERY - Check the last execution of refinery-import-page-history-dumps on stat1007 is OK: OK: Status of the systemd unit refinery-import-page-history-dumps [06:31:48] (03PS1) 10Elukey: import-mediawiki-dumps: use Hdfs instead of hdfs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508756 [06:41:35] !log chown /tmp/mobile_apps to analytics:analytics [06:41:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:41:55] on hdfs, forgot to mention :D [08:35:59] (03PS1) 10Elukey: oozie/unique_devices: move coordinators to the analytics user [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508783 (https://phabricator.wikimedia.org/T220971) [08:36:16] ok next step for me is all the unique devices coord [08:36:18] (s) [08:36:22] loooong list [08:45:55] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Merging before the weekly train." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508283 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [08:46:00] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Merging before the weekly train." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508307 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [08:46:18] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Merging before the weekly train." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508314 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [08:46:32] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Merging before the weekly train." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508330 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [08:46:52] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Merging before the weekly train." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508525 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [08:47:06] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Merging before the weekly train." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508541 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [08:47:20] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Merging before the weekly train." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508585 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [08:48:20] 10Analytics, 10EventBus, 10Operations, 10observability, and 3 others: Upgrade statsd_exporter to 0.9 - https://phabricator.wikimedia.org/T220709 (10fgiunchedi) >>! In T220709#5164240, @Ottomata wrote: > Great! I guess it just needs to go into the WMF base docker image somehow? Indeed, I'm not sure about... [09:00:54] !log kill unique_devices coords (10 in total) + chown analytics:analytics /wmf/data/wmf/unique_devices + restart of 10 coords with user analytics [09:01:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:09:12] done! [09:38:54] hi [09:39:09] where can I find event logging data on hive? [09:39:43] aharoni: should be in the 'event' database [09:39:46] (hi!) [09:42:37] elukey: thanks. If I try to do something simple like "select * from event.contenttranslation limit 5;", I get: "FAILED: SemanticException [Error 10041]: No partition predicate found for Alias "contenttranslation" Table "contenttranslation" [09:47:34] aharoni: the query is missing partitions, some info in https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Queries#Use_partitions [09:48:22] (03PS2) 10Elukey: oozie/unique_devices: move coordinators to the analytics user [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508783 (https://phabricator.wikimedia.org/T220971) [09:48:24] (03PS1) 10Elukey: oozie/virtualpageview: move coordinators to the analytics user [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508791 (https://phabricator.wikimedia.org/T220971) [09:50:24] !log kill virtualpageviews coords (3 in total) + chown analytics:analytics /wmf/data/wmf/virtualpageview + restart of the coords with user analytics [09:50:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:27:17] (03PS1) 10Elukey: oozie/projectview: move coordinators to the analytics user [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508800 (https://phabricator.wikimedia.org/T220971) [10:28:47] elukey: thanks, adding `where year = 2019 and month = 5` helped. [10:32:55] elukey: so, next question: I need to read regularly from this event logging table, and plot the results on a Dashiki chart. [10:33:20] I once did something like this by creating my own staging table, reading from event logging, and writing to the persistent staging table, but it was in MySQL on one of the mwmaint servers, [10:33:51] and as far as I understand, this is no longer possible in mysql, and has to be done in hive. [10:34:18] So the question is: How can I create my own staging table in hive? [10:41:07] aharoni: I'd suggest to open a phab task to get a proper suggestion from my team, creating the hive table is fine but then using dashiki to plot it probably not [10:41:24] maybe it could be a use case for our Superset dashboarding UI [10:41:41] I am a bit ignorant about this part so I cannot help much [10:41:43] Huh? That's a bit odd. Doesn't dashiki connect to hive? [10:41:52] I have no idea :D [10:41:56] never used dashiki [10:42:15] in theory it should but I am not sure [10:42:15] I'll try asking milimetric ... :) [10:43:35] anyway, you can create your own databse [10:43:36] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Queries#Scratch_space [10:43:48] and use it as you prefer [10:44:03] but hive is a bit different from SQL, so I'd suggest to check the docs first [10:44:14] about how to properly create the tables etc.. [10:44:26] this is why I suggested the task :) [10:44:33] (going afk for lunch, brb) [11:15:42] !log kill projectview coords (2 in total) + chown analytics:analytics /wmf/data/wmf/projectview and /wmf/data/archive/projectview + restart coords with the analytics user [11:15:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:25:18] heyaaa :] [11:25:30] o/ [11:26:06] lookin at alarms [11:27:45] will be back in 10 mins! [11:37:46] mforns: hi! [11:37:58] I'll repeat what I wrote earler when you weren't around... [11:38:26] I need to read regularly from an event logging table, and plot the results on a Dashiki chart. I once did something like this by creating my own staging table, reading from event logging, and writing to the persistent staging table, but it was in MySQL on one of the mwmaint servers. [11:38:38] and as far as I understand, this is no longer possible in mysql, and has to be done in hive. [11:38:47] So the question is: How can I create my own staging table in hive? [11:38:55] elukey wasn't sure how to do it earlier [11:44:32] aharoni: not completely correct - I pointed to some documentation about that, I wasn't sure about dashiki [11:44:45] namely the scratch space etc.. [11:44:55] it is more or less [11:44:59] 1) create your own database [11:45:19] 2) create tables (external or not depends on you) [11:45:27] 3) create insert statements [11:45:44] but since hive is different than mysql, it needs some reading first (this is what I suggested) [11:52:09] chelsyx: o/ - I noticed that the oozie coord mobile_apps-uniques-by_country-monthly-coord (running under your username) seems suspended, is it meant to be in that way? [11:55:48] aharoni, do you need to have the data in a table? or would a reportupdater report file be sufficient? [11:55:58] you can use RU to create reports by querying hive [11:56:12] the output format of RU is readable by dashiki [11:56:48] even better yes :) [11:58:19] elukey, your idea is good, depends on whether aharoni needs the output data in hive, or just for dashiki [11:59:07] aharoni, are we talking about the CX abuse filter job that we discussed? or is it something else? [12:02:52] mforns: I suggested to open a task to verify with us the best solution, since I wasn't aware of RU or Dashiki magic :) [12:07:37] elukey, cool, a task will be good! [12:26:57] mforns_: now that I think of it, a report updater file is good [12:27:04] and a table is not needed [12:27:13] aharoni, ok [12:29:45] then you can write a RU job that queries Hive, like in this example: https://github.com/wikimedia/analytics-reportupdater-queries/blob/master/browser/all_sites_by_os_family_and_major_percent [12:30:26] aharoni, just as an example of how using hive -e "" to query hive [12:34:48] mforns_: thanks. How will it know on which server to run? [12:38:09] aharoni, you can test it from an-coord1001.eqiad.wmnet, by checking out your query repository there [12:38:32] aharoni, you already have one right? called limn-language-data or something like that? [12:39:46] aharoni, when the job is tested, we Analytics will productionize it by adding a couple lines to puppet, that will tell reportupdater to run your report in an-coord1001 [12:49:50] aharoni, sorry! replace all references to an-coord1001 with stat1007. [12:50:02] it's stat1007 the one who executes reportupdater for hive [12:50:30] actually, it has already other jobs configured for the limn-language-data repo [12:50:51] so you can check out limn-language-data in your home folder in stat1007 [12:51:01] and add the new query and config there [12:51:38] you can run reportupdater there like: /srv/reportupdater/reportupdater/update_reports.py ... [12:51:44] for testing [13:06:11] (03PS1) 10Mforns: Update changelog.md for v0.0.89 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/508819 [13:06:50] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Self-merging for deployment" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/508819 (owner: 10Mforns) [13:09:17] !log started deployment train [13:09:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:09:46] mforns_: there are some patches to review/merge before [13:09:48] that I added [13:09:59] elukey, in refinery source??? [13:10:09] nope in refinery [13:10:17] oh, yes yes [13:10:24] I saw that you logged the start deployment train so I thought to mention it sorry :) [13:10:25] doin it step by step [13:10:30] no problemo [13:10:37] will look into refinery after source [13:20:20] o/ elukey looking at your email now :) [13:20:23] fdans: <3 [13:20:54] mforns_: I have a doctor appointment in ~20 mins, so I'll log off for ~1h. Let me know if you have any doubts etc.. before I leave [13:21:16] elukey, no problemo [13:23:01] ack! [13:23:11] * elukey errand for ~1h [13:23:26] elukey: with joal's permission I will restart the cassandra bundle [13:24:18] and kill those two coordinators under my username [13:24:24] fdans: let's do it with my hdfs->analytics user move, ok? [13:24:36] or better, coupled with the new username deployment [13:24:39] so we'll do it once [13:24:47] elukey: sounds good [13:24:57] it needs some chown on hdfs between kill and restart [13:25:19] got it [13:30:24] mforns_: when you're around, see https://phabricator.wikimedia.org/T209868#5167077 [13:31:09] fdans: cassandrqa bundle needs to be restarted on 1st of the month [13:31:22] fdans: set a reminder, we'll do it together at the offsite :) [13:31:52] joal: setting reminder on gcal and iphone [13:32:30] fdans: I'll set up one for me as well - normally it should be done :Dn [14:05:07] !log deployed refinery-source up to ad74c41b05d5f838df6febb379e883855abb203d [14:05:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:29:47] 10Analytics, 10User-Elukey: Restoring the daily traffic anomaly reports - https://phabricator.wikimedia.org/T215379 (10Dzahn) After seeing a related ticket i would still highly recommend to puppetize this cron job. That will make sure next time the server is moved there will have to be no reminder emails, no c... [14:51:45] (03CR) 10Nuria: [C: 03+2] import-mediawiki-dumps: use Hdfs instead of hdfs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508756 (owner: 10Elukey) [14:52:45] back! [14:53:38] (03PS2) 10Elukey: import-mediawiki-dumps: use Hdfs instead of hdfs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508756 [14:53:45] (03CR) 10Elukey: [V: 03+2] import-mediawiki-dumps: use Hdfs instead of hdfs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508756 (owner: 10Elukey) [14:55:09] !log kill last_access_uniques-daily-asiacell-coord from hue (coord not used anymore) [14:55:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:01:25] ping ottomata [15:01:48] ottomata: swift meeting? [15:02:13] OH my [15:02:13] coming [15:09:05] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Self merge before deploy, trivial change. Coordinators already restarted with the new user." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508783 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [15:09:11] (03PS2) 10Elukey: oozie/virtualpageview: move coordinators to the analytics user [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508791 (https://phabricator.wikimedia.org/T220971) [15:09:21] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Self merge before deploy, trivial change. Coordinators already restarted with the new user." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508791 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [15:09:32] (03PS2) 10Elukey: oozie/projectview: move coordinators to the analytics user [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508800 (https://phabricator.wikimedia.org/T220971) [15:09:40] (03CR) 10Elukey: [V: 03+2 C: 03+2] "Self merge before deploy, trivial change. Coordinators already restarted with the new user." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508800 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [15:14:36] elukey, thanks for merging, I reviewed them while deploying ref-source but didn't merge [15:21:11] (03CR) 10Mforns: [V: 03+2] Remove wikipedia-zero as program is over [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507933 (https://phabricator.wikimedia.org/T213770) (owner: 10Joal) [15:21:35] (03CR) 10Milimetric: [C: 04-1] "Few more loose ends. Good luck with the tests." (037 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/499968 (https://phabricator.wikimedia.org/T219112) (owner: 10Fdans) [15:22:51] mforns_: went ahead since they are super trivial and already running in the coords [15:22:59] sure [15:23:11] I'm listing now the jobs affected by refinery changes [15:23:56] mine are already restarted so you can skip them [15:24:00] (as FYI) [15:28:05] 10Analytics, 10Discovery, 10Operations, 10Research: Make hadoop cluster able to push to swift - https://phabricator.wikimedia.org/T219544 (10CDanis) Some quick notes from today's meeting: - elukey has a cloud-vps hadoop cluster for testing changes like this (although it is kind of flakey / needs poking/re... [15:29:28] (03CR) 10Milimetric: [C: 04-1] "UX review: it's pretty gosh darn tootin' close to perfect, Fran. The start date on the selector is not included on the results, I feel li" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/499968 (https://phabricator.wikimedia.org/T219112) (owner: 10Fdans) [15:31:34] (03PS1) 10Mforns: Bump up jar versions after refinery-source deployment 0.0.89 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508848 [15:32:38] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Self-merging for deployment" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508848 (owner: 10Mforns) [15:33:17] milimetric: my solution for small periods of time would be to be able to click the text in the handles and write your own date with autocomplete [15:33:52] there's also not showing the whole span of the metric if you're selecting less than x months [15:33:57] like a zoom [15:34:23] (03CR) 10Mforns: [V: 03+2 C: 03+2] "OK, merging this after changes and +1 from Otto, to include it in the deployment train." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484250 (https://phabricator.wikimedia.org/T212014) (owner: 10Mforns) [15:36:24] (03Abandoned) 10Mforns: Hide overlay when the cursor is no longer on top of graph [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/448852 (https://phabricator.wikimedia.org/T192416) (owner: 10Sahil505) [15:39:18] !log deploying refinery up to 698f2137aa965b07548ae7565aafaa784628b13c and together with refinery-source 0.0.89 [15:39:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:54:43] I still have on my calendar retro for today, I guess we'll cancel ? [16:00:34] ping joal [16:27:28] going off a bit ealier, will check in ~1h if anything is needed! [16:49:23] elukey: Yes, I suspended the oozie coord mobile_apps-uniques-by_country-monthly-coord for now, because I need to move it to later days in the month (as requested by joal), but I haven't had a chance to do that yet :) [16:53:51] 10Analytics, 10EventBus, 10Services: Make EventBus extension support configurable per-event/stream EventServiceNmae - https://phabricator.wikimedia.org/T222822 (10Ottomata) [16:54:33] 10Analytics, 10EventBus, 10Services: Make EventBus extension support configurable per-event/stream EventServiceName - https://phabricator.wikimedia.org/T222822 (10Ottomata) [17:05:54] laters all! [17:50:33] hi mforns_brb . About that thing we discussed earlier: I think that for now I can do it with mysql and later migrate everything to hive (if the analytics wants) [17:51:46] analytics ^team^, I mean [17:52:55] and I guess that to do it, I need to add the log database to https://github.com/wikimedia/analytics-limn-language-data/blob/master/published_cx2_translations/config.yaml [17:54:01] milimetric usually recommends to reuse existing dashboards and configurations, and that's what the task wants in any case ( https://phabricator.wikimedia.org/T209868 ) [17:54:27] Now this file says: [17:54:34] defaults: [17:54:34] db: wikishared [17:55:26] aharoni: I recommend working with chelsyx on this, what do you think? She’s better versed in your data and we can support if there are issues with the platform [17:55:45] And I guess that I need to remove the "defaults" section, add "db: wikishared" to the existing section, and add a "log" section [17:56:07] oh, I didn't know... is it new? I was away for a few weeks for parental leave [18:48:12] milimetric, will you be here for some more time? [18:48:40] mforns definitely [18:49:01] I was going to ask earlier, do you want to pair up on any deploys? [18:49:10] milimetric, I made a list of the oozie jobs that need restart and the corresponding user, can you... yes :] [18:49:32] milimetric, when is it good for you? [18:49:50] give me a few minutes and cave? [18:49:56] sure! [18:58:50] mforns: omw cave [19:11:39] 10Analytics, 10Growth-Team, 10Product-Analytics: Update ServerSideAccountCreation schema whitelist - https://phabricator.wikimedia.org/T222101 (10JTannerWMF) Hey when are you planning to work on this @nettrom_WMF ? [19:15:15] 10Analytics, 10Growth-Team, 10Product-Analytics: Update ServerSideAccountCreation schema whitelist - https://phabricator.wikimedia.org/T222101 (10nettrom_WMF) [[ https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/508626/ | The patch ]] was submitted for review yesterday morning (Pacific time). I'll upda... [19:15:36] 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Update ServerSideAccountCreation schema whitelist - https://phabricator.wikimedia.org/T222101 (10nettrom_WMF) [20:19:15] (03PS1) 10Milimetric: Improve restart examples after deploy this week [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508927 [21:27:46] (03PS4) 10MarcoAurelio: whitelist: add hi.wikisource [analytics/refinery] - 10https://gerrit.wikimedia.org/r/499462 (https://phabricator.wikimedia.org/T218155) [21:40:48] (03PS11) 10Fdans: Replace time range selector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/499968 (https://phabricator.wikimedia.org/T219112) [21:43:43] (03CR) 10jerkins-bot: [V: 04-1] Replace time range selector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/499968 (https://phabricator.wikimedia.org/T219112) (owner: 10Fdans) [21:55:42] (03CR) 10Nuria: Improve restart examples after deploy this week (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/508927 (owner: 10Milimetric) [21:56:14] (03CR) 10Fdans: Replace time range selector (037 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/499968 (https://phabricator.wikimedia.org/T219112) (owner: 10Fdans) [23:23:55] PROBLEM - HDFS Namenode RPC 8020 call queue length on an-master1001 is CRITICAL: 262 ge 20 https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen [23:26:36] RECOVERY - HDFS Namenode RPC 8020 call queue length on an-master1001 is OK: (C)20 ge (W)10 ge 0 https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen