[00:21:29] (03CR) 10Nuria: [C: 03+1] Update subnet lists for IpUtil (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/538607 (https://phabricator.wikimedia.org/T233504) (owner: 10Joal) [00:22:58] (03CR) 10Nuria: "Looks good. Have we tried the job? If so +1" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538613 (https://phabricator.wikimedia.org/T233504) (owner: 10Joal) [01:36:06] 10Analytics, 10Fundraising-Backlog, 10Operations, 10SRE-Access-Requests: Banner History and page view data access for fundraising analysts - Jerrie and Erin - https://phabricator.wikimedia.org/T233636 (10Nuria) @jrobell Are the two analysts full timer or contractors? If contractors they would need an NDA o... [01:39:47] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team: Drop Navigationtiming data entirely from mysql storage? - https://phabricator.wikimedia.org/T233891 (10Nuria) [01:43:25] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Drop page - https://phabricator.wikimedia.org/T233892 (10Nuria) [01:44:19] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Drop page create event data on mysql - https://phabricator.wikimedia.org/T233892 (10Nuria) [01:44:28] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Drop page create event data on mysql - https://phabricator.wikimedia.org/T233892 (10Nuria) [01:44:30] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Sunset MySQL data store for eventlogging - https://phabricator.wikimedia.org/T159170 (10Nuria) [01:45:09] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10Nuria) [01:45:11] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Drop page create event data on mysql - https://phabricator.wikimedia.org/T233892 (10Nuria) [01:48:59] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: drop CitatitionUsage data on mysql - https://phabricator.wikimedia.org/T233893 (10Nuria) [01:51:16] (03CR) 10Nuria: Correct parameters in mediarequest cassandra jobs (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/537936 (owner: 10Fdans) [02:20:29] (03CR) 10Nuria: "Can we add README to this dir on how do you submit this ingestion spec to druid?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538916 (https://phabricator.wikimedia.org/T229682) (owner: 10Elukey) [02:27:26] (03CR) 10Nuria: [V: 03+2 C: 03+2] Improve README [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/538722 (owner: 10Srishakatux) [05:46:06] (03PS1) 10Elukey: Add nqo.wikipedia to the pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/539225 [05:47:38] !log upload the new version of the pageview whitelist - https://gerrit.wikimedia.org/r/539225 [05:47:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:48:34] stat1005 officially back! [07:48:37] \o/ [07:48:49] * elukey dances [07:54:43] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Decouple analytics zookeeper cluster from kafka zookeeper cluster [2019-2020] - https://phabricator.wikimedia.org/T217057 (10elukey) I tried to deploy openjdk-8 on one node and ended up in an error similar to https://github.com/plasma-umass/doppio/issues/497 (l... [08:11:12] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: drop CitatitionUsage data on mysql - https://phabricator.wikimedia.org/T233893 (10elukey) Space that can be potentially recovered: ` elukey@db1107:~$ du -hsc /srv/sqldata/_log_CitationUsage_* 32M /srv/sqldata/_log_CitationUsage_18051472_key_ix_Citat... [08:12:15] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Drop page create event data on mysql - https://phabricator.wikimedia.org/T233892 (10elukey) Space that can be potentially recovered: ` elukey@db1107:~$ du -hsc /srv/sqldata/_log_PageCreation_7481635_* 40K /srv/sqldata/_log_PageCreation_7481635_key_i... [08:13:41] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team: Drop Navigationtiming data entirely from mysql storage? - https://phabricator.wikimedia.org/T233891 (10elukey) Space that can be potentially recovered: ` elukey@db1107:~$ du -hsc /srv/sqldata/_log_NavigationTiming_1* | grep tot... [09:01:19] (03CR) 10Fdans: "@nuria your comments were in a previous patch set. Check out the latest one, all your comments have been applied." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/537936 (owner: 10Fdans) [09:02:48] (03PS5) 10Fdans: Correct parameters in mediarequest cassandra jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/537936 [09:14:43] (03CR) 10MarcoAurelio: [V: 03+1 C: 03+1] Add nqo.wikipedia to the pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/539225 (owner: 10Elukey) [09:17:08] tjamls "_ [09:17:11] hahaha [09:17:13] thanks :) [09:55:12] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Will deploy today." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/539225 (owner: 10Elukey) [09:58:39] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Decouple analytics zookeeper cluster from kafka zookeeper cluster [2019-2020] - https://phabricator.wikimedia.org/T217057 (10elukey) Next steps: - Fix with DCops the serial console issue - T227025 - Test if the Hadoop Test cluster is working well with ZK on Ja... [10:07:15] mforns: o/ I am planning to go afk for a couple of hours in ~30 mins, is it a problem? (for the deployment I mean) [10:09:35] no, no I think there is not a lot to deploy refinery/refinery-source-wise [10:09:43] thanks elukey :] [10:10:29] elukey, can you comment maybe in the drop-older-than patch? this way I can change it and maybe merge? [10:10:39] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/539146/2/bin/refinery-drop-older-than [10:17:34] (03CR) 10Elukey: Improve refinery-drop-older-than after python3 migration (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/539146 (https://phabricator.wikimedia.org/T204735) (owner: 10Mforns) [10:18:18] cool elukey, I'm not sure which paths are the most important ones though [10:18:31] do you have an idea? [10:18:39] I will add a comment [10:18:42] backup for sure :) [10:18:55] yea :] [10:19:10] maybe let's add them all for the moment, with the comment. Then we can refine later on the list after checking with the team [10:19:15] so this will not block you [10:19:18] how does it sound? [10:19:28] well, I can deploy refinery after standup [10:19:37] sure, I can do that as well [10:20:09] yes I mean blocking you today :) [10:20:18] elukey, BTW, do you know if this change demands any oozie job restart? https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/538607/ [10:20:59] no idea :( [10:21:57] elukey, and the pageview whitelist one? I guess not, but to make sure [10:22:56] nope I think that one is good, it is only a check [10:23:02] I already updated the list on hdfs [10:23:25] oh ok [10:24:34] elukey, how about this one? https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/538744/ Do you know if we already fix the mwh wikitext coord? [10:24:41] *fixed [10:26:16] will restart it anyway [10:28:29] yes it needs a restart, but joseph already fixed manually the one for this month [10:28:49] the spark job completed, so but the fix partitions in hive subworflow I think failed [10:28:54] so he fixed it manually [10:29:16] keep it in mind when restarting since we can avoid the huge spark job for this month [10:29:24] gtg now, see you in ~2h :) [10:42:59] (03PS3) 10Mforns: Improve refinery-drop-older-than after python3 migration [analytics/refinery] - 10https://gerrit.wikimedia.org/r/539146 (https://phabricator.wikimedia.org/T204735) [10:54:10] (03PS1) 10Mforns: Update changelog for 0.0.101 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/539302 [10:54:52] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Self-merging for deployment" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/539302 (owner: 10Mforns) [11:13:09] !log deployed analytics-refinery-source v0.0.101 using Jenkins [11:13:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:01:52] this is massive - https://pypi.org/project/apache-superset/ [13:02:04] 0.34 is on pypi, they restarted to publish apparently [13:04:24] !log removing python2 packages from the analytics hosts (not from eventlog1002) [13:04:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:25:41] not everything though, scap is still python2-only sigh [13:34:44] mgerlach: o/ [13:34:56] apologies for the brief email response yesterday, did my answer make sense? [13:35:16] ottomata: yes, that makes total sense. the pointer to hdfs clarified [13:35:22] great! [13:36:29] ottomata: still curious about that memory issue though -- I implemented all the suggestions you and joseph gave on the querying, io, etc; and still get the same error [13:37:02] hmmm [13:37:16] ottomata: o/ [13:37:17] hm joal is not around [13:37:18] hello! [13:37:30] qq - do you care about the snakebite package? [13:37:32] mgerlach: let's ask joal when he is back around, i think he can help much quicker with that stuff than I can. [13:37:35] elukey, nope! [13:37:39] it is python2 only and I am about to remove it [13:37:40] ack :) [13:37:41] i wish we could use it or something like it [13:37:42] but yes [13:37:43] remove! [13:37:48] i don't think we use it anywhere [13:37:51] mgerlach: also btw [13:37:53] http://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html#saving-to-persistent-tables [13:37:55] I am cleaning up now, looks good so far [13:38:08] you can (and probably should) go ahead and create a hive database with your username [13:38:14] create database mgerlach; [13:38:20] then you can save working tables there [13:38:25] and can do it as parquet too [13:38:35] it does the same thing that writing parquet files does [13:38:39] except you also get a hive table on top [13:38:49] and can see it in e.g. use mgerlach; show tables; [13:38:58] ottomata: great, I will check it out [13:38:59] http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#specifying-storage-format-for-hive-tables [13:38:59] hellooo mforns you're deploying aqs today? [13:39:19] I can do it if you have too much to do [13:40:08] fdans, hey! yes, I planned to deploy AQS [13:40:30] I'd like to do it, because I never did it before... [13:40:44] but......... :D If you pair with me, I'd appreciate it [13:41:55] I'll also help if needed [13:43:30] mgerlach: also useful to understand: [13:43:31] https://cwiki.apache.org/confluence/display/Hive/Managed+vs.+External+Tables [13:43:53] by default hive create's managed tables, which is fine. it just means it will pick where to keep the actuaal data files that back the table. [13:43:54] mforns: sure, we can do it together [13:44:04] but you can make hive tables and also control where and how the files are stored [13:44:21] for adhoc purposes, managed is probably fine and all you need. [13:44:34] but sometimes you want hive to know about other data, and you can! [13:44:36] just FYI! :) [13:45:21] fdans, elukey, thanks :] [13:45:58] elukey, do I have your +1 for this? https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/539146/ [13:51:44] (03CR) 10Elukey: [C: 03+1] Improve refinery-drop-older-than after python3 migration [analytics/refinery] - 10https://gerrit.wikimedia.org/r/539146 (https://phabricator.wikimedia.org/T204735) (owner: 10Mforns) [13:54:51] :D [13:55:58] on the notebook1003/4, after a first cleanup, there are a ton of python2 packages [13:56:07] not sure where all those comes from [13:56:35] like python-napalm-base [13:56:38] it is not in puppet [13:56:51] and apt-cache rdepends doesn't show anything useful [13:57:02] maybe we have a lot of hand installed pkgs? [13:57:51] maybe [13:59:18] anyway, good progress for the moment [13:59:29] eventlogging will be a tough one I think to port [13:59:31] and test [14:00:21] ooof [14:00:23] yea [14:04:14] those are probably indirect dependencies of removed packages, you can run "apt-get autoremove" to doublecheck [14:07:53] moritzm: good point [14:07:55] will try [14:11:55] fdans, when is it a good time for you to help me with aqs? [14:12:30] mforns: now is good if you want. I got the convo with lex in 45 min [14:12:47] fdans, ok, but i need to install docker first [14:14:29] coooool [14:28:07] ok fdans I'm good with docker, I'm cloning the deploy repo, but if you're ok, we can pair [14:28:20] mforns: omw [14:28:28] k [14:42:18] only 237 more phab notifications to go, should be operational soon! [14:42:42] milimetric: o/ o/ o/ o/ [14:42:51] * milimetric hugs luca [14:57:38] (03PS1) 10Mforns: Update aqs to 592ff5f [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/539344 [14:58:38] ottomata: lets doooo this [14:58:45] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Merging for deployment" [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/539344 (owner: 10Mforns) [14:58:50] :) [15:01:39] !log deploying analytics/aqs using scap [15:01:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:29:18] all good with aqs? [15:35:30] 10Analytics, 10Research: Parse wikidumps and extract redirect information for 1 small wiki, romanian - https://phabricator.wikimedia.org/T232123 (10leila) @JAllemandou Martin and I had a chat now about this task. Can you give an update on what is left for Martin to do on this task? (I'm aware of the memory iss... [15:41:28] (03Abandoned) 10Milimetric: [HOTFIX] [do not merge] add logging_with_comment [analytics/refinery] - 10https://gerrit.wikimedia.org/r/472508 (owner: 10Milimetric) [15:42:21] (03Abandoned) 10Milimetric: [WIP] Don't merge this [analytics/refinery] - 10https://gerrit.wikimedia.org/r/370322 (owner: 10Milimetric) [15:45:12] 10Analytics, 10Operations, 10User-Elukey: setup/install codfw kerbos node WMF6577 - https://phabricator.wikimedia.org/T233142 (10elukey) @RobH any chance to get this done by today/tomorrow? Really sorry to press you but it would help a lot in trying to make a quarterly goal.. If you are busy no problem! [15:46:23] 10Analytics, 10Operations, 10User-Elukey: setup/install krb1001/WMF5173 - https://phabricator.wikimedia.org/T233141 (10elukey) 05Open→03Resolved [15:46:26] 10Analytics, 10Operations, 10hardware-requests, 10User-Elukey: eqiad: 1 misc node for the Kerberos KDC service - https://phabricator.wikimedia.org/T227288 (10elukey) [15:49:34] (03CR) 10Milimetric: "looks good to me, just found a comment nit" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538880 (https://phabricator.wikimedia.org/T233717) (owner: 10Fdans) [15:49:45] (03CR) 10Milimetric: [C: 03+1] Add oozie job to load top mediarequests data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538880 (https://phabricator.wikimedia.org/T233717) (owner: 10Fdans) [15:53:14] perhaps silly question...can i log to logstash from spark driver or executors? I know the main bulk of logs goes to `yarn logs`, but there are more targeted (and significantly less verbose) things i'd like to report while jobs are running, particularly things like hyperparameter exploration progress (which runs for like half a day..) [15:53:32] 10Analytics, 10Operations, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) p:05Triage→03High [15:54:11] ebernhardson: probably manually sure, i don't know how logstash accepts stuff... [15:54:18] would probably have to configure that side of tthigns [15:54:24] ottomata: there is a kafka cluster for logstash afaik? [15:54:27] i think they prefer using kafka loggers these days [15:54:29] hm [15:54:34] yes [15:54:35] there is [15:54:53] ok, so nothing prexisting to look at, but probably plausible [15:55:00] a-team: need to stay with my home with my son today, see you all tomorrow [15:55:04] ok nuria ! [15:56:56] ottomata: i suppose as long as i'm bothering you, thoughts on an oozie bundle with 20 per-wiki coordinators? The problem is doing all 20 together leads to individual jobs like this hyperparameter exploration taking 12+ hours. Then error handling has to be all figured out. As independent jobs they just rerun [15:57:13] ottomata: i can write it easily enough, but no clue if that is a "bad idea" in oozie terms or some such [15:57:16] 10Analytics, 10Operations, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) a:05RobH→03Papaul Unfortunately, it appears the switch port for this system is not labeled on asw-d8-codfw, so we'll need @papaul to trace it out and update this task with the port... [15:58:19] ebernhardson: sounds fine i think [15:58:23] i can't see why it would be a problem [15:58:30] ottomata: ok, sounds good [15:58:43] we've never done one with that many, but 20 doesn't sound like that many [15:58:52] and it doesn't really result in any more jobs than it would if you did it without ab undel [15:59:05] well, before i would do one spark job that does 20 wikis and runs for 12+ hours [15:59:10] (and reruns the whole thing if it fails [15:59:18] this will run the script 20 times with a seprate argument [15:59:19] ah yeah, i see, yeah bundle + coordinators make sense [15:59:20] aye [16:00:16] a-team, need a couple minutes before joining standup, sorry :/ [16:00:41] 10Analytics, 10EventBus, 10Product-Analytics: Review draft Modern Event Platform schema guidelines - https://phabricator.wikimedia.org/T233329 (10Neil_P._Quinn_WMF) 05Resolved→03Open This is clearly still open—maybe I shouldn't assume that my recommendations are immaculate and will be implemented without... [16:00:44] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 6 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10Neil_P._Quinn_WMF) [16:01:01] ebernhardson: q for you! [16:01:19] do you know how to enable log4j debug logging for yarn container jobs? [16:01:26] (this is for the camus problem) [16:01:49] i can get debug logs enabled in local output, but not in the worker processes [16:01:51] ottomata: sadly no :( elastic has magic log4j integration where i set log levels through cluster apis [16:02:27] i am very angry at camus righ tnow [16:02:31] i have no idea what is going on. [16:02:35] ottomata: for the worker my best guess would be some muckery with settings the cli args for the jvm, maybe env variables. hmm. [16:02:38] and finding out what is going on is not easy! [16:02:45] yeah.... [16:02:48] trying stuff like that [16:02:49] no luck yet [16:02:50] will keep trying [16:03:23] heh, i might just recompile it with log.info statements... [16:03:59] good luck! Doesn't sound fun...i could still do the thing to copy all of partition 0 into the other partitions for processing. Not sure if you could just set the camus position forward and hope it works? [16:04:37] might be for naught if the problem is unrelated though [16:04:48] yeah if i caan figure out how to do that i might [16:05:20] a-team standup! [16:05:25] fdans: mforns [16:19:07] 10Analytics, 10Operations, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) [16:19:13] 10Analytics, 10Operations, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) a:05Papaul→03RobH [16:30:38] ottomata, are we doing grooming? [16:33:01] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: High volume mediawiki analytics events camus import is lagging - https://phabricator.wikimedia.org/T233718 (10Ottomata) p:05High→03Unbreak! [16:35:12] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: Figure out how to $ref common schema across schema repositories - https://phabricator.wikimedia.org/T233432 (10Milimetric) I agree the npm and submodule ideas are the best two. I prefer the submodule idea, after working t... [16:36:03] gonna run home, will take me an hour or so, be back online then [16:36:15] hi milimetric! :] [16:36:21] see you then [16:59:48] 10Analytics, 10Operations, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) [17:00:56] 10Analytics, 10Operations, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) a:05RobH→03Papaul ` ge-8/0/3 down down krb2001 ` Everything is ready for this to install, but it doesn't see any network attachment on its primary interface when trying to... [17:08:08] 10Analytics, 10Operations, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) a:05Papaul→03RobH it was in disabled and i missed it, papaul pointed it out, fixed. [17:10:24] ottomata, I'm here now if you wanna merge the thaing [17:10:43] k [17:10:46] do you need me, so that you can blame me if everything breaks? :] [17:11:05] ha yes [17:11:36] heh [17:13:47] ottomata, do you know how long it will take to rsync the files? [17:15:51] mforns: https://dumps.wikimedia.org/other/mediawiki_history/readme.html [17:16:02] hm [17:16:20] yay [17:16:27] it looks like it is configured to run once a day at 5am [17:17:07] mforns: i can run a manual rsync now [17:17:41] ottomata, don't worry [17:17:50] it's not yet announced anywhere [17:18:08] its ok it'll be good to see [17:18:13] ottomata, unless you wanna be there when it happens [17:18:16] make sure it works [17:18:16] ok! [17:18:41] hmmmm [17:18:49] mforns is your source correct? [17:18:57] /mediawiki/history/dumps/ [17:18:58] i see [17:19:05] oooh... [17:19:13] /wmf/data/archive/mediawiki/history/2019-08 [17:19:20] no it's not.. :[ [17:19:26] will change now [17:20:53] 10Analytics, 10Operations, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) [17:23:22] ottomata, https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/539374/ [17:23:56] 10Analytics, 10Operations, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10Papaul) [17:28:33] also ottomata if you don't mind, can you have a look at https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/539146/ [17:28:46] maybe I can deploy it in today's train [17:29:31] I'd like to have your input specially on whether the list of undeletable paths is accurate [17:30:14] mforns: am super duper busy, but if elukey +1ed ithink we can merge [17:30:29] ok, will self-merge [17:30:40] no worries [17:31:01] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Self merging for deployment" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/539146 (https://phabricator.wikimedia.org/T204735) (owner: 10Mforns) [17:33:50] !log run apt-get autoremove on stat* and notebook* to clean up old python2 deps [17:33:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:33:53] ottomata: --^ [17:36:29] nice [17:49:57] * elukey ofF! [18:14:01] (03PS1) 10Mforns: Bump up refinery_jar_version for v0.0.101 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/539384 [18:14:50] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Self-merging for deployment" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/539384 (owner: 10Mforns) [18:24:31] a-team FYI we deployed the revision-score schema change today, so I expecct to see a refine failure for it soon [18:24:40] k [18:24:42] once I do I will remove old table and data [18:25:26] there is one for mediawiki_job_MessageGroupStatsRebuildJob [18:25:35] ya saw that [18:25:44] no time to check now, but also probably is not important [18:25:46] will look into that when I finish deploy [18:25:49] oh ok [18:25:58] if you have time, but no worries, job tables are best effort :) [18:26:02] ok [18:27:45] !log deploying refinery using scap (together with refinery-source 0.0.101) [18:27:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:42:08] !log finished deploying refinery using scap (together with refinery-source 0.0.101) [18:42:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:49:07] 10Analytics, 10Operations, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) grub is failing to install on sda, regardless of distro. I think this may be a hardware failure, perhaps we shoudl swap them around and see if the error follows the disk. [18:54:06] 10Analytics, 10Operations, 10User-Elukey: setup/install krb2001/WMF6577 - https://phabricator.wikimedia.org/T233142 (10RobH) a:05RobH→03Papaul @papaul, Since this is failing grub on sda, I'd like to see if it is a disk issue (most likely), bay issue (moderately likely if the backplane is bad), or softwa... [20:06:17] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: High volume mediawiki analytics events camus import is lagging - https://phabricator.wikimedia.org/T233718 (10Ottomata) I've spent all day on this so far and am not much closer to understanding what is happening. As far as I can tell, the iterator returne... [20:15:43] ebernhardson: i might try and do the replay thing you mentioned. [20:15:51] replay all data from theh offending offset to now into the htopic [20:15:54] doesn't matter what position it is in [20:16:10] camus will consume them as new messagess and bucket them appropriately [20:17:56] objections to trying that? [20:17:57] ottomata: should be easy enough with a quick script if you have the two offsets [20:18:00] ya [20:18:03] ottomata: go for it [20:18:06] exactly, i'm just going to kafkacat it in [20:18:36] think camus will still put them in the right days? Ok if it doesn't, but curious [20:18:47] i think the events all have a dt, so should iirc [20:19:01] it should yess, it should examine the camus.message.timestamp.field=meta.dt [20:26:03] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: High volume mediawiki analytics events camus import is lagging - https://phabricator.wikimedia.org/T233718 (10Ottomata) I'm replaying the contents of partition 0 starting at offset 23165504624 until 23858962742 (an offset stored for a recent successful Cam... [20:34:12] heh https://grafana.wikimedia.org/d/000000234/kafka-by-topic?refresh=5m&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=All&var-topic=eqiad.mediawiki.cirrussearch-request&from=1569526447898&to=1569530047898 [20:34:36] ottomata: i could do that, but people asked me to use rate lmiting :P [21:23:14] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: High volume mediawiki analytics events camus import is lagging - https://phabricator.wikimedia.org/T233718 (10Ottomata) @elukey FYI, I'm running this kafkacat process on stat1007. I'm seeing [[ https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&fro... [22:34:21] (03CR) 10Nuria: "One question and if @joal could take a look just to see if we are missing something it will be great" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538880 (https://phabricator.wikimedia.org/T233717) (owner: 10Fdans) [22:48:23] (03CR) 10Nuria: Improve refinery-drop-older-than after python3 migration (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/539146 (https://phabricator.wikimedia.org/T204735) (owner: 10Mforns) [22:52:05] (03CR) 10Nuria: [C: 03+2] "Looks good." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/537936 (owner: 10Fdans)