[00:43:24] (03PS1) 10Milimetric: Add api.wikimedia to the pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624371 [00:43:41] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add api.wikimedia to the pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624371 (owner: 10Milimetric) [01:52:21] !log aborted aqs deploy due to cassandra error [01:52:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [01:53:49] I looked through the logs and I couldn't find anything, but when I query the new endpoint on aqs1004 I get a generic error from hyperswitch, confirmed on other hosts that the new endpoint is just a 404 there, so something's wrong with either the data or aqs code. Querying from cqlsh shows the schema and data look fine. [03:03:49] (03CR) 10Ladsgroup: "> Patch Set 2: Code-Review+1" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/623141 (https://phabricator.wikimedia.org/T253439) (owner: 10Gerrit maintenance bot) [06:00:45] isaacj: exactly that is the file! lemme know if there are issues etc.. [06:01:07] basically if 90% of the ram on the host gets used (leaving the 10% for the OS) the OOM killer interveenes [06:01:54] very soon stat1005 and stat1008 will reach the whopping 1.5TB of RAM (!) so I'll have to make that limit a little different [06:02:17] (if needed, not sure, need to think about it:) [06:07:18] joal: bonjour! The updated to the mediawiki-history-drop-snapshot trigger a refresh of the service unit, that ran.. and failed, you can check ok an-launcher1002 with "sudo journalctl -u mediawiki-history-drop-snapshot | grep ERROR" [06:08:03] !log reset-failed mediawiki-history-drop-snapshot on an-launcher1002 to clear icinga errors [06:08:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:15:13] RECOVERY - Check the last execution of mediawiki-history-drop-snapshot on an-launcher1002 is OK: OK: Status of the systemd unit mediawiki-history-drop-snapshot https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:19:27] 10Analytics-Clusters, 10Discovery, 10Discovery-Search (Current work), 10Patch-For-Review: Move mjolnir kafka daemon from ES to search-loader VMs - https://phabricator.wikimedia.org/T258245 (10elukey) 05Open→03Stalled We are currently blocked on T260305, next week we should be able to deploy a new puppe... [06:27:18] good morning [06:27:44] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Create new mailing list for analytics systems users - https://phabricator.wikimedia.org/T260849 (10elukey) >>! In T260849#6433891, @Nuria wrote: > We need to update the protocol for data access so people (or SRE?) subscribe users with analytics-private da... [06:28:53] elukey: checking error now [06:33:24] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Create new mailing list for analytics systems users - https://phabricator.wikimedia.org/T260849 (10elukey) Just sent an email to the mailing lists that I usually use for maintenance announcements, I asked to subscribe to the new mailing list and spread th... [06:33:38] elukey: I know what the problem is - will fix that and rerun manually to be sure [06:35:48] ok! [06:46:13] elukey: manual check run in dry-run mode, may I restart a manual launch of the timer? [06:46:46] sure! [06:48:19] elukey: just to be sure (I'm always in doubt with timers/services): sudo systemctl start mediawiki-history-drop-snapshot [06:52:56] yes or "restart" [06:53:06] Ack - using restart [06:54:41] !log Manually restart mediawiki-history-drop-snapshot after hive-partitions/hdfs-folders mismatch fix [06:54:43] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:01:38] Sep 04 06:59:31 an-launcher1002 systemd[1]: mediawiki-history-drop-snapshot.service: Succeeded. [07:01:41] \o/ [07:11:22] good! [07:13:45] elukey: so that ou understand what happened - the script deletes folders and hive partitions - In default mode (strict) it fails if there is a mismatch between the partitions to delete and the folders to delete [07:14:08] elukey: the script has a non-strict mode allowing to clean-up those differences and delete even in case of differences [07:19:07] joal: did you ran the script in non scrict mode manually? [07:20:15] elukey: I triple-checked in logs that the error was of that kind (it was) - then manually checked the difference in partitions/folders for the failing table, deleted the partitions with no counter-part folders manually and reran the script in strict and dry-run mode [07:20:30] okok [07:20:40] elukey: When I tested before merge I think I have forgotten to remove the non-strict option (my bad) [07:20:58] elukey: manually running the script in non-strict mode would have gioven the same result [07:21:17] joal: okok, does puppet need to be updated to run in non-scrict mode? [07:21:21] (if worth it) [07:21:52] I think it's better to keep it strict - it means we need to check if there is a discrepancy [07:22:33] elukey: completely different topic - Is there a way for me to look back in AQS logs to try to see what happened to Dan yesterday at deploy? [07:23:37] joal: suuuure! [07:23:52] joal: mmmm no idea [07:23:55] * joal has forgotten how to look at AQS logs :( [07:23:56] (brB) [07:40:49] joal: here I am sorry [07:40:56] did you find a way? Maybe logstash? [07:40:57] np elukey [07:41:14] elukey: I have not looked further, got derailed into something else [07:41:19] Will try logstash [07:41:31] ah ok the error came from hyperswitch [07:41:45] elukey: how did you find that? [07:41:49] elukey: logstash? [07:42:01] nono Dan mentioned in earlier on on IRC [07:42:33] yes I think everything is on logstash [07:42:46] will check [07:57:55] elukey: I can't look at syslog on aqs1004 :( [07:58:13] joal: which ones? [07:58:24] the ones under /srv/log/aqs don't contain much [07:58:27] elukey: /var/log/syslog.1 [07:58:40] yes but I don't think anything is there [07:58:43] I can check [07:58:53] the application logs are shipped to logstash [07:59:04] elukey: /etc/aqs/config.yaml tells me logs are sent to syslog local [07:59:10] see logging section [07:59:46] or at list that's what I understand [08:00:15] nono that is rsyslog [08:00:22] see "port" [08:00:39] so logs from AQS are sent to rsyslog [08:01:21] IIRC that config was indicating logstash [08:02:36] yep in puppet logstash_syslog_port: 10514 [08:02:38] elukey: I think we stopped using loggstash at some time, possibly cause we couldn't use it [08:02:47] ohhhhhh [08:03:04] So actually we log syslog format to a syslog-listerner being logstash [08:03:17] I think so yes [08:03:27] Makes sense [08:04:13] Yay elukey, I finally found some in logstash!!! [08:04:23] sorry for the noise :( [08:06:12] what filter did you use? [08:06:18] we should probably create a dashboard [08:06:31] elukey: I used host:aqs* in query-field [08:07:13] yep just found it as well, gooood [08:07:17] Annnnnnd - we still have hash mismatch :) [08:07:34] that task we closed the other day is still not-fixed [08:11:15] did you filter for type:node or similar? [08:12:11] I hate lucene syntax [08:12:12] nope [08:12:35] Just with host:aqs* I manage to see what happened yesterday [08:13:05] can you give me the link of one log in logstash if you have it handy? [08:13:14] related to aqs node I mean [08:13:25] so deploy went fine, the restart and repool showed no error (warning with schema hash-issue but no error) [08:13:28] sure 1 min [08:14:01] elukey: https://gist.github.com/jobar/73f1ebf16ddd4b08fcab95aa0da9cafb [08:14:55] elukey: I looked for rows using the query-bar, and added the interesting columns for us [08:15:43] At time 01:53:17.882 we can see the service restarted on aqs1004 (deploy I assume) [08:15:47] ahh type is "aqs" [08:16:06] Then a bunch of hash-warning [08:17:20] Then scap says it's happy (Port 7232 up) [08:18:09] But, just after, scap rolls back [08:18:18] Now, what happened in between ??? [08:19:05] Ohhhh - something else - the order is most-recent first!!!! [08:19:10] Meh [08:19:27] * joal needs to spend more time using kibana - not et used to it [08:27:28] 10Analytics: Create a kibana dashboard for AQS hyperswitch's logs - https://phabricator.wikimedia.org/T262012 (10elukey) [08:27:48] ok create --^, life is too short to battle with Kibana on a froday [08:27:51] *friday [08:29:09] joal: ok if I start the roll restart of the hadoop workers? [08:29:18] yessir [08:29:51] elukey: I finally nailed it :) [08:29:54] elukey: AQS sorry [08:30:22] elukey: I updated the gist above with the new link [08:31:59] Undefined name activity-level in selection clause [08:32:00] ahhh [08:32:12] joal: can you add this insight to the above task so we have a reference? [08:32:21] (later on when you have time) [08:32:36] elukey: the cassandra table has activity_level field [08:32:39] :( [08:33:37] I am more sad that it took 3 people to figure out that a typo ruined a deploy, ideally Dan should have had a good dashboard to check right away [08:34:23] since aqs "works" we don't dedicate much time in improving dev experience on it [08:34:35] no bueno [08:34:36] correct elukey [08:34:49] elukey: actually AQS `usually` works [08:35:25] joal: I recall when it didn't and there was a daily mixture of french and italian swear words :D [08:35:35] :) [08:35:51] * elukey brace yourself, cassandra 3 is coming [08:38:32] 10Analytics: Create a kibana dashboard for AQS hyperswitch's logs - https://phabricator.wikimedia.org/T262012 (10JAllemandou) My 2 cents from having found my way through Kibana to debug an issue: - We should filter for `host:aqs*` but not for `type` as more type than `AQS` have proven useful (for instance `scap... [09:11:21] started the roll restart of the prod hadoop workers [09:11:26] the test cluster went fine [09:13:44] ack elukey [09:14:28] elukey: data in cassandra for editors-bycountry is very small - I suggest dropping-recreating the table with the correct filed-name and update/relaunch indexatio [09:14:37] +1 [09:15:22] elukey: the other possible solution is to use a different field-name in hyperswitch (external with a -, internal with a _) - Not a big change, but very error-prone IMO [09:16:04] ok, will create a task describing the issue, suggest solutions and we'll discuss that as a team on monday [09:17:26] nono recreating seems good [09:20:01] 10Analytics, 10Analytics-Kanban: Fix cassandra/hyperswitch geoeditors field miscmatch - https://phabricator.wikimedia.org/T262017 (10JAllemandou) [09:20:04] elukey: --^ [09:31:16] super [09:31:43] 10Analytics, 10Analytics-Kanban: Fix cassandra/hyperswitch geoeditors field miscmatch - https://phabricator.wikimedia.org/T262017 (10elukey) +1 on option 1. [09:35:48] ssh an-tool1009.eqiad.wmnet -L 8080:an-tool1009.eqiad.wmnet:80 [09:35:58] hue seems running (with live patches) [10:02:54] but CI tests are still not working grrr [10:34:54] roll restart of the hadoop workers completed! [10:35:34] all metrics look good :) [10:35:42] * elukey lunch! bb in ~2h [11:33:51] (03PS1) 10Joal: Add BetweenTagsInputFormat to refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 [11:35:27] (03CR) 10jerkins-bot: [V: 04-1] Add BetweenTagsInputFormat to refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal) [11:43:27] (03PS2) 10Joal: Add BetweenTagsInputFormat to refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 [12:16:33] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Review current usage of HDFS and establish what/if data can be dropped periodically - https://phabricator.wikimedia.org/T261283 (10JAllemandou) [12:20:29] elukey: looks like archiva is having weird issues again (might be maven - building has a step waiting for quite some time [12:22:37] elukey: another thing - when using a new spark-shell in scala from stat1008, my first execution to read data always fails with java.util.ServiceConfigurationError: org.apache.hadoop.security.token.TokenIdentifier: Provider org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier not found [12:22:44] And then it works [12:22:49] for the second try [12:38:11] joal: ok let's start with archiva, but I need more info [12:38:40] is it from a stat100x or from your laptop? What step takes time? [12:40:36] elukey: I'm from stat1008 [12:41:28] elukey: and, mvn wants to download a jar and wait until timeout (or at least it was a few minutes ago) [12:41:33] elukey: checking now [12:41:43] elukey: [12:41:44] Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/commons-codec/commons-codec/maven-metadata.xml [12:41:47] Downloading from apache.snapshots: https://repository.apache.org/snapshots/commons-codec/commons-codec/maven-metadata.xml [12:42:21] elukey: it feels like the same issue we were having before going for explicit repos [12:43:43] but it doesn't make sense, it worked up to now [12:43:49] does it happen the same on other stat boxes? [12:43:55] I have not checked [12:44:26] joal: can you make it hang again? I'd like to check netstat in the meantime [12:44:28] also elukey I have declined the meeting this afternoon, need to care the kids as Melissa is in meetings [12:44:51] joal: np, I added you because you asked, all things that you know a lot more than me :) [12:45:01] building elukey, noy yet hanging [12:46:23] hanging [12:46:38] one thing that I don't get is why it tries to pull from oss.sonatype [12:47:51] it timedout (I guess), then proceeds [12:50:03] joal: the main issue is the fact that we don't download from archiva, but from other places [12:50:28] so without a proper proxy, it hangs for sure [12:50:30] indeed elukey [12:50:50] have you set the https proxy settings in your m2 dir? [12:50:58] nope [12:51:24] didn't we do it last time? I am confused now [12:51:30] Should I have? [12:51:38] I'm confused as well :S [12:51:45] Can't recall [12:53:02] joal: if we don't pull artifacts from archiva (that is whitelisted in the analytics vlan firewall) then there is no chance to pull from other domains without the https proxy [12:53:16] and I recall me and you checking some settings related to it the last time [12:53:22] but in theory this shouldn't be needed [12:53:25] I know elukey - and by default we don't want to pull from other places IIRC [12:53:29] yeah [12:54:41] * joal cries in a corner [12:59:39] joal: let's open a task, this is probably some dependency issue to fix [13:00:58] the other error on stat1008 is very weird [13:01:01] org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier -> hbase?? [13:01:17] yup :( [13:02:06] so the spark client does try to fetch tokens at the beginning, but only if it finds traces of config indicating that it is needed.. [13:02:15] does it happen on stat1004 for example? [13:02:22] (sorry asking to test multiple times) [13:04:02] (brb( [13:08:28] what time does otto usually log on? [13:08:39] just now AFAICS :) [13:08:54] hello! [13:09:03] cdanis: i saw your patch, want to PTAL :p [13:09:21] ottomata: btw, the numeric bounds commits to jsonschema-tools broke a bunch of the tests in event-primary (because the currently-materialized versions don't have the bounds) [13:09:33] OH INTERESTING! [13:09:52] we think we want to re-materialize them with bounds, but aren't sure if we want to make such a change all at once [13:09:59] FYI fdans ^ (is fdans on vaca?) [13:10:00] for now I'm continuing work with a local version that's pre-bounds w/ a cherrypick of my fix [13:10:30] cdanis you can also temporarily disable the numeric bounds checking [13:10:38] by setting that option to null i guess [13:10:46] or false maybe better [13:10:59] we should probably do that until we are ready to re-materailize the schemas with those bounds [13:11:03] yeah [13:11:12] elukey: no error on stat1004 [13:13:31] joal: what about on 1005? (trying to see if it is buster related) [13:13:45] trying from 1008 again, without hudi jar [13:14:00] seems hudi-jar related [13:14:03] will confirm [13:14:13] cdanis: i'm going to push a patch to disable the bounds check in both schema repos [13:14:26] we should have thought of that before we merged that patch (and I made schema repos install @latest :p ) [13:15:08] elukey: sorry didn't ping you on previous answers [13:15:22] nono I saw them, it would make sense! [13:15:28] hbase errors are very weird [13:15:31] never seen them [13:15:44] Indeed - seems hudi related (when I add the hudi jar error comes back) [13:15:48] Weird [13:16:50] oh, hm cdanis we haven't yet published a new version of jsonschema-tools to npm [13:17:06] i guess you are only seeing this because you pulled down master to make a patch [13:17:13] elukey: at least we kinda have an idea of the issue [13:20:18] joal: maybe hudi's default have hbase settings? [13:20:29] I need to check that elukey [13:20:56] ottomata: yeah indeed :) [13:21:02] was testing my own patch that way [13:21:33] just thinking out loud, don't wanna interrupt yall unless you're curious and have time: I tried to deploy aqs last night, and I got a cassandra error. I tried looking through system-a, system-b, debug-a, and debug-b logs but there was nothing [13:21:43] I'm gonna look closer at the aqs logs, maybe hyperswitch logs something [13:21:56] milimetric: joseph alread found the issue [13:22:02] ah! whaa... [13:22:09] I thought I read backscroll [13:22:10] we opened one task to make a proper dashboard in logstash [13:22:31] and another one to fix the issue (basically a typo in a table attribute IIUC) [13:23:13] milimetric: https://phabricator.wikimedia.org/T262017 and https://phabricator.wikimedia.org/T262012 [13:26:00] oh wow, yeah, that would've been great. The error is super generic, but I was too tired to realize I should look in aqs logs, that would've been fine, don't go to too much trouble [13:26:04] I wasn't like stumped, more tired [13:27:03] heh, there's no way to just rename a column? [13:27:31] milimetric: not if it's part of the primary-ke [13:28:25] ok, I'll drop and recreate [13:37:41] (03CR) 10Ottomata: Add BetweenTagsInputFormat to refinery-spark (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal) [13:39:36] joal: wait, cassandra convention so far has been snake case, activity_level, so I think the table is correct. The definition in AQS is following druid convention, because it's in the druid file, but I don't think that's right. So I propose to change that instead of the table [13:39:54] (for example, see media_type in the mediarequests table) [13:40:07] milimetric: triple checking [13:42:13] milimetric: I think the only case where it's an _ is mediarequest (unique-devices and others have -, and the non-parameters are all '-' as in per-article for instance) [13:42:50] those are druid, no? [13:43:12] unique-devices aren't [13:45:12] right, sorry, ugh [13:45:52] I vote for _ as the convention in cassandra, that's what it uses for its own fields like _tid and _del, and it doesn't have to be escaped [13:46:24] (there are also more _ fields in tables with more data, so it'd be harder to change if we wanted to) [13:46:34] since uniques data is relatively small [13:47:04] milimetric: _ in cassandra has to be escaped [13:47:37] We have '-' everywhere else except mediarequest - I vote for keeping it this way [13:48:33] joal: it turns out ... no! :) [13:48:34] select media_type from "local_group_default_T_mediarequest_per_referer".data limit 10; [13:49:13] no, the tally is like this: [13:49:14] milimetric: fields starting with _ need to [13:52:01] https://www.irccloud.com/pastebin/6Vu2OiRK/ [13:52:21] so 3 use _ and 2 use -, if you don't count the ones that start with _ [13:52:48] my point was that _ is convention in Cassandra in general, and the tables with - can be more easily changed [13:53:39] milimetric: I don't buy that we should make our conventions based on the backend technolgy - almost all our URLs use '-' as separator [13:53:47] 10Analytics-Clusters: install mwparserfromhell on spark for efficient usage of wikitext-dump in hive - https://phabricator.wikimedia.org/T262044 (10MGerlach) [13:55:33] this is pretty standard, that's how people have used sql databases since forever. And we already do in these 3 cases: https://github.com/wikimedia/analytics-aqs/blob/5e188136866b31c0228ba16ad4826b871e43fe3f/sys/mediarequests.js#L189 [13:55:48] ew........ but there we use _ on the front-end .... oh man... this is the WORST [13:55:58] ahahahah [13:56:14] I'm gonna use ~ [13:56:22] 10Analytics-Clusters: install mwparserfromhell on spark for efficient usage of wikitext-dump in hive - https://phabricator.wikimedia.org/T262044 (10MoritzMuehlenhoff) JFTR, it's packaged in Debian as well: https://packages.qa.debian.org/m/mwparserfromhell.html [13:56:36] (jk, but jeez... like... consistency matters) [13:57:35] full support milimetric --^ What a mess :S [13:57:42] I've no idea... a-team: quick summary below, I need other opinions [13:58:09] for all AQS endpoints that hit druid on the backend we use - (dash) [13:58:45] for 2 AQS endpoints that hit Cassandra, we use - (dash) on the frontend and the backend [13:59:03] for 3 AQS endpoints that hit Cassandra, we use _ (underscore) on the frontend and the backend [13:59:21] so the question is, what should we do for a new AQS endpoint that hits Cassandra [13:59:22] 10Analytics-Clusters: install mwparserfromhell on spark for efficient usage of wikitext-dump in hive - https://phabricator.wikimedia.org/T262044 (10Ottomata) Oh cool! I'd like to try our Anaconda-wmf approach for this if we can; as it will be the same approach we use for other packages like this. @elukey would... [14:00:14] (03PS3) 10Joal: Add BetweenTagsInputFormat to refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 [14:00:20] milimetric: I'd say that we decide an option (say, -) and then we document the use cases that don't comply [14:00:52] Dropping for kids folks - back around 6pm [14:00:57] if fixing them is an option it will go into tech debt reduction backlog, otherwise we'll live with them [14:00:57] (in 2h) [14:02:19] (03CR) 10jerkins-bot: [V: 04-1] Add BetweenTagsInputFormat to refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal) [14:03:26] (03CR) 10Joal: Add BetweenTagsInputFormat to refinery-spark (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal) [14:13:15] (03CR) 10Ottomata: [C: 03+1] "Ok! one more nit, but +1 from me after that, feel free to merge." (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal) [14:14:27] elukey: the tables that hit Cassandra with - columns are smaller so we could fix that maybe, the others we would never take the time to reload [14:15:33] milimetric: oh yes I wasn't advocating for '-', I took it as example [14:15:37] whatever is the best [14:16:15] yep, me neither, I agree with your approach, just needed more votes [14:16:26] (Jo and I are tied atm :)) [14:55:32] ottomata: about constants in scala- https://stackoverflow.com/questions/9745488/naming-convention-for-scala-constants [14:55:45] ottomata: I'm ok following any convention, we just need to pick one:) [14:58:33] joal: reading that...it seems ThisIsConstant [14:58:35] is the one we should go with [14:58:46] that patttern matching subtly is crazy [14:58:51] ottomata: that one (ThisIsConstant) is the one suggested by scala [14:58:55] yeah [14:59:01] we can use the one we prefer [14:59:13] i don't love it, but i also don't love CamelCase no matter what :p [14:59:19] let's go wtih scala convention [14:59:21] what do you think? [14:59:31] either that or THIS_IS_CONSTANT (to be consistent with java) [14:59:34] ThisIsConstant is matching type convention (uppercase camel-case), so pattern matching looks the same [14:59:36] but def not thisIsConstant [14:59:42] Makes sense [14:59:46] i wonder how THIS_IS_CONSTANT pattern matches :p [14:59:56] joal: i'm fine with either of those, whatever you prefer [15:00:29] Ok, going for scala way as it is scala [15:00:40] also ottomata, I'm gonna move the file to wikihadoop [15:00:52] where other file-formats belong [15:05:47] joal: ok [15:07:30] Pchelolo: did you have any other comments or thoughts on the reportingapi schema? [15:07:45] cdanis: no, I have been satisfied by your answers [15:07:53] I can have another pass if you want [15:07:54] ok! thanks for taking a good look :) [15:08:07] nah, I think not necessary, unless you want [15:08:18] cool. great schema :) [15:12:51] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: jsonschema-tools should have option to materialize schemas with default max/min validation for e.g. max long, max double, etc. - https://phabricator.wikimedia.org/T258659 (10Nuria) Making note so @fdans can work on adding bounds to schemas that need it whe... [15:14:11] ♥️ [15:21:46] (03CR) 10Nuria: [C: 04-1] "> Patch Set 1:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/623456 (https://phabricator.wikimedia.org/T257691) (owner: 10Nuria) [15:22:14] (03PS2) 10Nuria: Removing seasonality cycle as it is fixed once granularity is set [analytics/refinery] - 10https://gerrit.wikimedia.org/r/623456 (https://phabricator.wikimedia.org/T257691) [15:22:58] (03PS5) 10Nuria: Chopping timeseries for noise detection [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/612454 (https://phabricator.wikimedia.org/T257691) [15:24:03] (03CR) 10Joal: Add BetweenTagsInputFormat to refinery-spark (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal) [15:27:59] cdanis: just pushed another commit to your version PR [15:28:04] woot [15:28:13] yeah feel free, I don't really know what I'm doing in Node :) [15:28:18] if you like i will merge [15:28:42] ah that's much better! [15:28:45] please do merge ottomata [15:28:57] merged... [15:29:02] i'm going to set up some CI now [15:29:03] :p [15:31:12] um... ok joal I'll change the underscore for this to - [15:31:33] my logic is then it'll be 3 - and 3 _, so whoever has to add the next field will have a really hard time and I can laugh at them [15:32:00] ^ this is what happens when you let me make decisions :P [15:32:22] * joal cries of laugh and sadness :) [15:44:55] (03PS1) 10Milimetric: Fix typo in cassandra fields [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624743 (https://phabricator.wikimedia.org/T238365) [15:45:49] (03PS1) 10Joal: Add BetweenTagsInputFormat to refinery-spark [analytics/wikihadoop] - 10https://gerrit.wikimedia.org/r/624745 [15:46:03] (03Abandoned) 10Joal: Add BetweenTagsInputFormat to refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal) [15:47:03] ottomata: added you to the new patch to wikihadoop - comments added to constants and constants renamed [15:53:43] joal: I'm getting this error when trying to drop table "local_group_default_T_editors_bycountry".data: [15:53:44] Warning: schema version mismatch detected, which might be caused by DOWN nodes; if this is not the case, check the schema versions of your nodes in system.local and system.peers. [15:53:57] but I did nodetool-a describecluster and nodetool-b too, and the schema versions look the same [15:54:01] any idea? [15:54:44] nope milimetric - looks uncool - could be on different machines - ping elukey on that --^ [15:58:01] similarly, select schema_version from local and select schema_version from peers both return the same [15:59:38] this is weird :( [16:00:13] (03CR) 10Ottomata: [C: 03+1] Add BetweenTagsInputFormat to refinery-spark [analytics/wikihadoop] - 10https://gerrit.wikimedia.org/r/624745 (owner: 10Joal) [16:00:58] oh... weird joal it deleted it, that was just a warning... [16:01:00] uh... [16:01:00] mmm as far as I know there are no down nodes [16:01:24] ok - table gone, problem solved :) [16:01:35] well... I guess I'll create it and cross my fingers and sacrifice some goats [16:01:53] milimetric: while you are at dropping tables and kespaces, could you please drop the TEST table as well? [16:02:01] k [16:02:10] actually, the TEST keyspace altogether [16:02:16] thanks a lot :) [16:03:01] k, done, and looks like the new table's fine, updated the coordinator.properties, launching population coord now, we'll seeee [16:03:59] \o/ [16:04:03] you rock milimetric [16:11:18] not so fast... load failed... [16:11:23] https://hue.wikimedia.org/jobbrowser/jobs/job_1596639839773_148513/single_logs [16:11:54] I'm not sure about anyone else but I spend like 90% of my ops week trying to figure out how to find logs [16:12:07] this seems like a solved problem [16:13:41] milimetric: you have not update the field-name in the loading job config [16:14:18] didn't this do that? https://gerrit.wikimedia.org/r/c/analytics/refinery/+/624743/1/oozie/cassandra/coord_editors_bycountry_monthly.properties [16:14:31] (that's the config I used) [16:14:53] milimetric: I looked at the job conf in hue [16:15:06] yeah, I'm looking too, it has activity-level... [16:15:24] the hive fields are activity_level, I didn't see where that was used... [16:15:39] hm you're absolutely right!y [16:17:43] milimetric: you need to change the hive-field as well [16:18:03] I'm looking at the workflow right now, these are really cassandra-input-fields not hive_fields [16:18:08] milimetric: names are completely misleading - hive field is the name you give to a column, and it is reused by cassandra-field [16:18:16] correct milimetric [16:18:20] k, got it [16:18:43] past me had not done a good job on naming [16:19:29] (03PS2) 10Milimetric: Fix typo in cassandra fields [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624743 (https://phabricator.wikimedia.org/T238365) [16:22:02] joal: what do you think about... "cassandra_input_from_hive_fields"? Too long? [16:22:18] cassandra_from_hive_fields? I can push a patch for that and another change I wanted to make [16:22:41] milimetric: thinking [16:22:47] ok for 2 patches [16:22:51] for sure [16:22:56] about the name hm - [16:23:20] those fields are the name we give to tabular data [16:23:34] generated by hive [16:23:39] (the other change is I was going to try and see if oozie can send emails with analytics-alerts as the "to" instead of the oozie.eqiad.wmnet address, that way replying is easier) [16:23:50] hive_fields_for_cassandra? [16:24:14] would be nice milimetric (the email one) [16:24:52] fields_as_generated_by_hive ? [16:25:06] fields_from_hive [16:25:19] milimetric: --^ ? [16:25:24] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "tested, works, launched in prod, merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624743 (https://phabricator.wikimedia.org/T238365) (owner: 10Milimetric) [16:25:30] \o/ [16:25:55] joal: I feel like it needs "cassandra" so you know to make it match the cassandra_fields [16:26:32] milimetric: actualy then names should always be the cassandra ones, as they are reused in CQL [16:26:44] or are they? [16:26:53] Did we need to do this rename? [16:28:15] 10Analytics, 10Analytics-Kanban: Fix cassandra/hyperswitch geoeditors field miscmatch - https://phabricator.wikimedia.org/T262017 (10Milimetric) p:05Triage→03High a:03Milimetric We did option 1 [16:29:06] joal: well, otherwise we'd have to change the mapping in AQS, from the request parameters [16:30:13] or you mean to rename hive_fields? [16:30:19] milimetric: yes you're right - I triple checked that [16:30:21] (no, we don't need to do that, it's just confusing) [16:30:24] milimetric: impromptu batcavbe? [16:30:27] sure! [16:38:07] joal: I didn't copy our name! Was it input_order_of_cassandra_fields? [16:38:21] yessir! [16:38:25] k phew :) [16:38:27] :) [16:46:35] * elukey hates Hue [16:47:21] * elukey bbiab [16:51:43] (03PS1) 10Milimetric: Rename hive_fields to be more descriptive [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624779 [16:52:56] ottomata: I think all the issues with the deploy are solved, so I was gonna deploy aqs slowly after lunch, and I can roll back if something goes wrong. But it's up to you, I'll wait for Monday if you think it's too risky [16:56:57] milimetric: if you do the deploy, let me know when; I'd like to follow along [16:57:46] def! Oh yeah, let’s do it then, you’re ops, you can have my back [17:01:25] :) [17:14:19] gone for tonight! [17:15:05] me too! [17:37:17] milimetric: i'm here for the afternoon, if you think its safe go ahead [17:42:57] k razzi, like... 10 minutes? [17:43:08] cool [17:52:57] ok, to the batcave! [18:05:52] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: jsonschema-tools should have option to materialize schemas with default max/min validation for e.g. max long, max double, etc. - https://phabricator.wikimedia.org/T258659 (10Ottomata) FYI had to add a few of fixes: - https://github.com/wikimedia/jsonschem... [18:06:22] cdanis: FYI added a patch to your schema and rebased with newer jsonschema-tools [18:11:46] !log aqs deploy went well! Geoeditors endpoint is live internally, data load job was successful, will submit pull request for public endpoint. [18:11:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:12:46] ottomata: ah thanks! didn't think to $ref the first example, clever [18:13:39] ottomata: re: network-error vs network_error -- I had kinda wanted to have the schema names match the strings used in the report `type` field, but, that's not really necessary it just seemed nice [18:14:08] I think I should add a test to avoid hyphens, the schema names should probably have the same rules as field names (excepting /) [18:32:32] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Add editors per country data to AQS API (geoeditors) - https://phabricator.wikimedia.org/T238365 (10Milimetric) pull request for public endpoints is here: https://github.com/wikimedia/restbase/pull/1273 [18:34:56] ottomata: anything left to do on that patch? shall you/I +2 and merge? [18:37:22] cdanis: i think its good! how ready are you to use it? past experience makes me lean towards not merging until just before usage; sometimes dev use will inform schema changes [18:38:35] ottomata: hm, I'll do one last local test of wikimedia-eventgate-dev.js receiving reports from actual Chrome [18:39:37] ok cool [18:41:59] POST /v1/events 201 All 5 out of 5 events were accepted. [18:42:01] :) [18:44:13] so I think we're good to go [18:44:52] there's future work where I'd like to talk about maybe doing some stream processing to add fields like geoIP country, AS number, the timestamp at which the event occurred (`meta.dt` minus `age` milliseconds, basically) but I don't think we need any of that immediately [18:46:48] because you added the http fragment and got the http.client_ip field, geocding will be done for you in hive :o [18:46:51] (but not in the stream data :/ ) [18:47:19] ok cdanis merging [18:47:40] oh interesting [18:47:59] dumb q, how real-time is Hive? [18:48:06] lags a couple of hours usually [18:48:10] maybe 3 or 4 max [18:48:20] (in normal operations) [18:48:24] mmm [18:48:35] I think you had mentioned that at some point that eventgate-logging-external would soon go to both logstash and Hive? [18:48:45] if that's right, how soon? :) [18:48:48] ah, we coulds make it do that if we need to [18:48:59] petr was thiking about making the api request logging go to eventgate-logging external [18:49:02] right [18:49:12] but he decided not to, and it goes to eventgate-analytics (like the mw api logging) [18:49:16] ah I see [18:49:22] so, we don't have a plan to do it [18:49:25] but we could [18:49:34] we just need to mirror maker from kafka logging cluster to kafka jumbo cluster [18:49:35] it'd be nice to have realtime here in addition to aggregated data [18:49:58] we have done some simple realtime in druid stuff [18:50:25] druid has kafka integration and consume realtime data, and then the historical batch job comes around later and fills in the history from hive [18:50:27] like for netflow? [18:50:29] (since streaming is less reliable) [18:50:29] right [18:50:30] yes exactly [18:50:37] we have to manually set that part up [18:50:40] but it is possible [18:50:45] is there a good way to look at individual events in druid? [18:50:53] hm no [18:50:54] not really [18:51:18] actually, we are just now for the first time discussing what to do about stream procssing at wmf with the search team [18:51:25] since they are building the wdqs updater in flink [18:51:32] they'll need to have tier one support for it [18:51:37] and are even thinking about hiring for it [18:51:42] but, more generally [18:51:48] we're discussing what we want to do [18:51:53] and trying to look at real use cases [18:52:04] some are like theirs and will be prod data jobs [18:52:06] right [18:52:14] others might be more like yours, where we want to do streaming monitoring and alerting [18:52:20] i think SRE might have a lot of use cases like that [18:52:27] yeah, I think that's likely :) [18:53:08] cdanis: if you can think of some [18:53:11] please add to [18:53:11] https://phabricator.wikimedia.org/T185233#use-case-collection [18:54:06] thanks! I'll take a look [18:54:35] and probably add a link to the NEL task w/ a short description [18:57:06] nice thank you [18:57:42] so ottomata I'm going on vacation for all of next week, but, before enabling this we need to enable the cors flag on whichever eventgate we're using (let's assume eventgate-logging-external) [18:58:04] would you have time for pushing a new build (with the patches up to the schema) and flipping that next week? [18:58:43] Sure, the cors settings can be set in the helm templates [18:58:49] can you tell me what you want them set to? [19:00:29] I just need an eventgate >= 1.3.2 and that thinks app.conf.cors !== false [19:00:56] https://github.com/wikimedia/eventgate/blob/master/app.js#L104 [19:03:27] I'm guessing we just need some values flipped in the vairous eventgate-logging-external files under helmfiles.d ? [19:03:43] ya [19:04:13] anything in main_app.conf will be passed to service runner conf [19:06:03] ok! I'll write up a task [19:20:10] k danke [19:21:53] 10Analytics, 10Operations: Deploy an updated eventgate-logging-external with NEL patches - https://phabricator.wikimedia.org/T262087 (10Ottomata) [19:22:19] oh cdanis the stuff in values overrides the stuff in the template [19:22:23] so it is parameterized [19:22:29] ...pretty sure... [19:22:31] but ya [19:24:34] ahhh okay [19:24:38] well I'm happy to be wrong :D [19:26:00] 10Analytics, 10Operations: Deploy an updated eventgate-logging-external with NEL patches - https://phabricator.wikimedia.org/T262087 (10CDanis) [19:26:05] cdanis: what do you need for cors tho? [19:27:06] i can't say i'm well versed on good values for this [19:27:10] ottomata: I had just planned on '*', if that's what you mean [19:27:12] i think our default false will allow any? [19:27:19] does false not == '*' hmmm [19:27:31] uhh let me check again [19:27:44] https://github.com/wikimedia/service-template-node/blob/master/config.prod.yaml [19:27:45] sauys [19:27:51] # to disable use: [19:27:51] # cors: false [19:27:59] so maybe not? [19:28:01] interesting. [19:28:13] bearloga: yt? [19:28:28] have you been able to send events to eventgate from mobile apps before? [19:29:08] yeah so, on my local eventgate-wikimedia, I wasn't modifying config.dev.yaml at all -- so it was undefined and '*' [19:29:31] https://github.com/wikimedia/service-template-node/blob/b3b59baff9354b5701ba48b7c124106153fbb405/app.js#L98 [19:29:38] so then we sent `access-control-allow-origin: *` and plus the allow-headers, expose-headers, and allow-methods, which Chrome was happy with [19:29:57] I don't really see harm in configuring eventgate-logging-external that way, but I could be missing something [19:30:09] (and am in fact kind of wondering how it works for client side JS logging as-is?) [19:30:36] if those cors headers don't get set, do browsers deny sending? [19:30:37] ohhh I didn't think about this interacting with service-runner heh [19:30:48] ottomata: yeah, Chrome will send an OPTIONS preflight and then not send the POST [19:30:51] i guess client side is currently all from mw client side [19:30:54] so same origin already [19:31:08] bearloga: is working on mobile app integration though [19:31:17] so this wouldn't work for them unless we set cors properly for eventgate-analytics-external i guess [19:31:19] well even on the MW side, it's different domains [19:31:25] tha'ts true! [19:31:26] right [19:31:26] like en.wikipedia.org vs intake.wikimedia.org [19:31:30] huh. [19:32:07] and I don't see the CORS headers returned by `curl -v -X OPTIONS -H 'Origin: https://en.wikipedia.org' -H 'Access-Control-Request-Method: POST' -H 'Access-Control-Request-Headers: content-type' https://intake-logging.wikimedia.org/v1/events` [19:32:22] which models the preflight request that Chrome sends [19:33:00] heh was just trying to construct the same curl comamnd ty [19:33:30] here, I'll dump full request/responses out of my ngrok that's still running [19:35:06] https://phabricator.wikimedia.org/P12494 [19:36:33] cdanis: maybe the eventlogging stuff is working now because it is using sendBeacon? [19:37:03] ohhhh hm [19:37:07] that does seem likely [19:37:15] https://fetch.spec.whatwg.org/#cors-safelisted-request-header [19:38:03] If mimeType’s essence is not "application/x-www-form-urlencoded", "multipart/form-data", or "text/plain", then return false. [19:38:35] pretty sure sendBeacon is using text/plain [19:39:06] that makes sense [19:39:09] 10Analytics, 10Operations: Deploy an updated eventgate-logging-external with NEL patches - https://phabricator.wikimedia.org/T262087 (10CDanis) Example request/responses of both preflight and actual request are in NDA'd paste P12494 (has my own PII in it) Chrome sends an OPTIONS request to the endpoint URL wi... [19:39:45] hah, it didn't use to stop you https://medium.com/@longtermsec/chrome-just-hardened-the-navigator-beacon-api-against-cross-site-request-forgery-csrf-690239ccccf [19:43:55] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1102.eqiad.w... [19:44:37] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1103.eqiad.w... [19:45:20] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1104.eqiad.w... [19:45:59] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1105.eqiad.w... [19:46:40] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1106.eqiad.w... [19:47:39] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1107.eqiad.w... [19:57:45] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1104.eqiad.wmnet'] ` Of which those **FAILED**: ` ['an-worker1... [20:02:35] nuria: have you seen this task? it is so cool [20:02:35] https://phabricator.wikimedia.org/T257527 [20:03:51] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1104.eqiad.w... [20:08:08] ottomata: INDEED [20:09:04] ottomata: "sampling fractions for each of failures and successes" nice [20:16:16] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1104.eqiad.wmnet'] ` Of which those **FAILED**: ` ['an-worker1... [20:17:23] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1113.eqiad.w... [20:17:26] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1112.eqiad.w... [20:17:34] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1111.eqiad.w... [20:17:38] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1110.eqiad.w... [20:17:43] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1109.eqiad.w... [20:17:47] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1108.eqiad.w... [20:20:49] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1102.eqiad.wmnet'] ` and were **ALL** successful. [20:23:24] 😊 [20:23:28] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1104.eqiad.w... [20:24:35] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1114.eqiad.w... [20:25:06] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1115.eqiad.w... [20:25:44] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1116.eqiad.w... [20:26:52] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1117.eqiad.w... [20:26:56] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1107.eqiad.wmnet'] ` and were **ALL** successful. [20:35:03] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1106.eqiad.wmnet'] ` and were **ALL** successful. [20:35:16] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1103.eqiad.wmnet'] ` and were **ALL** successful. [20:35:52] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1105.eqiad.wmnet'] ` and were **ALL** successful. [20:55:02] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1112.eqiad.wmnet'] ` and were **ALL** successful. [20:55:31] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1110.eqiad.wmnet'] ` and were **ALL** successful. [20:55:35] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1108.eqiad.wmnet'] ` and were **ALL** successful. [20:55:37] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1109.eqiad.wmnet'] ` and were **ALL** successful. [21:00:39] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1104.eqiad.wmnet'] ` and were **ALL** successful. [21:04:58] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1115.eqiad.wmnet'] ` and were **ALL** successful. [21:05:31] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1116.eqiad.wmnet'] ` and were **ALL** successful. [21:17:38] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1113.eqiad.wmnet'] ` Of which those **FAILED**: ` ['an-worker1... [21:18:22] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1111.eqiad.wmnet'] ` Of which those **FAILED**: ` ['an-worker1... [21:25:16] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1114.eqiad.wmnet'] ` Of which those **FAILED**: ` ['an-worker1... [21:32:10] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1117.eqiad.wmnet'] ` Of which those **FAILED**: ` ['an-worker1... [21:44:03] ottomata: i had loads of thoughts that i added to the task, hopefully cdanis does not mind [23:30:17] nuria: thanks so much!! very appreciated :) I'm on vacation all next week but great to hear from you, probably won't have a detailed reply until I'm back [23:30:51] cdanis: sounds good, you let us know [23:40:37] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Add editors per country data to AQS API (geoeditors) - https://phabricator.wikimedia.org/T238365 (10Nuria) the typo means we need to backfill correct?