[00:43:24] <wikibugs>	 (03PS1) 10Milimetric: Add api.wikimedia to the pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624371
[00:43:41] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add api.wikimedia to the pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624371 (owner: 10Milimetric)
[01:52:21] <milimetric>	 !log aborted aqs deploy due to cassandra error
[01:52:23] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[01:53:49] <milimetric>	 I looked through the logs and I couldn't find anything, but when I query the new endpoint on aqs1004 I get a generic error from hyperswitch, confirmed on other hosts that the new endpoint is just a 404 there, so something's wrong with either the data or aqs code.  Querying from cqlsh shows the schema and data look fine.
[03:03:49] <wikibugs>	 (03CR) 10Ladsgroup: "> Patch Set 2: Code-Review+1" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/623141 (https://phabricator.wikimedia.org/T253439) (owner: 10Gerrit maintenance bot)
[06:00:45] <elukey>	 isaacj: exactly that is the file! lemme know if there are issues etc..
[06:01:07] <elukey>	 basically if 90% of the ram on the host gets used (leaving the 10% for the OS) the OOM killer interveenes
[06:01:54] <elukey>	 very soon stat1005 and stat1008 will reach the whopping 1.5TB of RAM (!) so I'll have to make that limit a little different
[06:02:17] <elukey>	 (if needed, not sure, need to think about it:)
[06:07:18] <elukey>	 joal: bonjour! The updated to the mediawiki-history-drop-snapshot trigger a refresh of the service unit, that ran.. and failed, you can check ok an-launcher1002 with "sudo journalctl -u mediawiki-history-drop-snapshot  | grep ERROR"
[06:08:03] <elukey>	 !log reset-failed mediawiki-history-drop-snapshot on an-launcher1002 to clear icinga errors
[06:08:04] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[06:15:13] <icinga-wm>	 RECOVERY - Check the last execution of mediawiki-history-drop-snapshot on an-launcher1002 is OK: OK: Status of the systemd unit mediawiki-history-drop-snapshot https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[06:19:27] <wikibugs>	 10Analytics-Clusters, 10Discovery, 10Discovery-Search (Current work), 10Patch-For-Review: Move mjolnir kafka daemon from ES to search-loader VMs - https://phabricator.wikimedia.org/T258245 (10elukey) 05Open→03Stalled We are currently blocked on T260305, next week we should be able to deploy a new puppe...
[06:27:18] <joal>	 good morning
[06:27:44] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Create new mailing list for analytics systems users - https://phabricator.wikimedia.org/T260849 (10elukey) >>! In T260849#6433891, @Nuria wrote: > We need to update the protocol for data access so people (or SRE?) subscribe users with analytics-private da...
[06:28:53] <joal>	 elukey: checking error now
[06:33:24] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Create new mailing list for analytics systems users - https://phabricator.wikimedia.org/T260849 (10elukey) Just sent an email to the mailing lists that I usually use for maintenance announcements, I asked to subscribe to the new mailing list and spread th...
[06:33:38] <joal>	 elukey: I know what the problem is - will fix that and rerun manually to be sure
[06:35:48] <elukey>	 ok!
[06:46:13] <joal>	 elukey: manual check run in dry-run mode, may I restart a manual launch of the timer?
[06:46:46] <elukey>	 sure!
[06:48:19] <joal>	 elukey: just to be sure (I'm always in doubt with timers/services): sudo systemctl start mediawiki-history-drop-snapshot
[06:52:56] <elukey>	 yes or "restart"
[06:53:06] <joal>	 Ack - using restart
[06:54:41] <joal>	 !log Manually restart mediawiki-history-drop-snapshot after hive-partitions/hdfs-folders mismatch fix
[06:54:43] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[07:01:38] <joal>	 Sep 04 06:59:31 an-launcher1002 systemd[1]: mediawiki-history-drop-snapshot.service: Succeeded.
[07:01:41] <joal>	 \o/
[07:11:22] <elukey>	 good!
[07:13:45] <joal>	 elukey: so that ou understand what happened - the script deletes folders and hive partitions - In default mode (strict) it fails if there is a mismatch between the partitions to delete and the folders to delete
[07:14:08] <joal>	 elukey: the script has a non-strict mode allowing to clean-up those differences and delete even in case of differences
[07:19:07] <elukey>	 joal: did you ran the script in non scrict mode manually?
[07:20:15] <joal>	 elukey: I triple-checked in logs that the error was of that kind (it was) - then manually checked the difference in partitions/folders for the failing table, deleted the partitions with no counter-part folders manually and reran the script in strict and dry-run mode
[07:20:30] <elukey>	 okok
[07:20:40] <joal>	 elukey: When I tested before merge I think I have forgotten to remove the non-strict option (my bad)
[07:20:58] <joal>	 elukey: manually running the script in non-strict mode would have gioven the same result
[07:21:17] <elukey>	 joal: okok, does puppet need to be updated to run in non-scrict mode?
[07:21:21] <elukey>	 (if worth it)
[07:21:52] <joal>	 I think it's better to keep it strict - it means we need to check if there is a discrepancy
[07:22:33] <joal>	 elukey: completely different topic - Is there a way for me to look back in AQS logs to try to see what happened to Dan yesterday at deploy?
[07:23:37] <elukey>	 joal: suuuure! 
[07:23:52] <elukey>	 joal: mmmm no idea
[07:23:55] * joal has forgotten how to look at AQS logs :(
[07:23:56] <elukey>	 (brB)
[07:40:49] <elukey>	 joal: here I am sorry
[07:40:56] <elukey>	 did you find a way? Maybe logstash?
[07:40:57] <joal>	 np elukey 
[07:41:14] <joal>	 elukey: I have not looked further, got derailed into something else
[07:41:19] <joal>	 Will try logstash
[07:41:31] <elukey>	 ah ok the error came from hyperswitch
[07:41:45] <joal>	 elukey: how did you find that?
[07:41:49] <joal>	 elukey: logstash?
[07:42:01] <elukey>	 nono Dan mentioned in earlier on on IRC
[07:42:33] <elukey>	 yes I think everything is on logstash
[07:42:46] <joal>	 will check
[07:57:55] <joal>	 elukey: I can't look at syslog on aqs1004 :(
[07:58:13] <elukey>	 joal: which ones?
[07:58:24] <elukey>	 the ones under /srv/log/aqs don't contain much
[07:58:27] <joal>	 elukey: /var/log/syslog.1
[07:58:40] <elukey>	 yes but I don't think anything is there
[07:58:43] <elukey>	 I can check
[07:58:53] <elukey>	 the application logs are shipped to logstash
[07:59:04] <joal>	 elukey: /etc/aqs/config.yaml tells me logs are sent to syslog local
[07:59:10] <joal>	 see logging section
[07:59:46] <joal>	 or at list that's what I understand
[08:00:15] <elukey>	 nono that is rsyslog
[08:00:22] <elukey>	 see "port" 
[08:00:39] <joal>	 so logs from AQS are sent to rsyslog
[08:01:21] <elukey>	 IIRC that config was indicating logstash
[08:02:36] <elukey>	 yep in puppet logstash_syslog_port: 10514
[08:02:38] <joal>	 elukey: I think we stopped using loggstash at some time, possibly cause we couldn't use it
[08:02:47] <joal>	 ohhhhhh
[08:03:04] <joal>	 So actually we log syslog format to a syslog-listerner being logstash
[08:03:17] <elukey>	 I think so yes
[08:03:27] <joal>	 Makes sense
[08:04:13] <joal>	 Yay elukey, I finally found some in logstash!!!
[08:04:23] <joal>	 sorry for the noise :(
[08:06:12] <elukey>	 what filter did you use? 
[08:06:18] <elukey>	 we should probably create a dashboard
[08:06:31] <joal>	 elukey: I used host:aqs* in query-field
[08:07:13] <elukey>	 yep just found it as well, gooood
[08:07:17] <joal>	 Annnnnnd - we still have hash mismatch :)
[08:07:34] <joal>	 that task we closed the other day is still not-fixed
[08:11:15] <elukey>	 did you filter for type:node or similar?
[08:12:11] <elukey>	 I hate lucene syntax
[08:12:12] <joal>	 nope
[08:12:35] <joal>	 Just with host:aqs* I manage to see what happened yesterday
[08:13:05] <elukey>	 can you give me the link of one log in logstash if you have it handy?
[08:13:14] <elukey>	 related to aqs node I mean
[08:13:25] <joal>	 so deploy went fine, the restart and repool showed no error (warning with schema hash-issue but no error)
[08:13:28] <joal>	 sure 1 min
[08:14:01] <joal>	 elukey: https://gist.github.com/jobar/73f1ebf16ddd4b08fcab95aa0da9cafb
[08:14:55] <joal>	 elukey: I looked for rows using the query-bar, and added the interesting columns for us
[08:15:43] <joal>	 At time 01:53:17.882 we can see the service restarted on aqs1004 (deploy I assume)
[08:15:47] <elukey>	 ahh type is "aqs"
[08:16:06] <joal>	 Then a bunch of hash-warning
[08:17:20] <joal>	 Then scap says it's happy (Port 7232 up)
[08:18:09] <joal>	 But, just after, scap rolls back
[08:18:18] <joal>	 Now, what happened in between ???
[08:19:05] <joal>	 Ohhhh - something else - the order is most-recent first!!!!
[08:19:10] <joal>	 Meh
[08:19:27] * joal needs to spend more time using kibana - not et used to it
[08:27:28] <wikibugs>	 10Analytics: Create a kibana dashboard for AQS hyperswitch's logs - https://phabricator.wikimedia.org/T262012 (10elukey)
[08:27:48] <elukey>	 ok create --^, life is too short to battle with Kibana on a froday
[08:27:51] <elukey>	 *friday
[08:29:09] <elukey>	 joal: ok if I start the roll restart of the hadoop workers?
[08:29:18] <joal>	 yessir
[08:29:51] <joal>	 elukey: I finally nailed it :)
[08:29:54] <joal>	 elukey: AQS sorry
[08:30:22] <joal>	 elukey: I updated the gist above with the new link
[08:31:59] <elukey>	 Undefined name activity-level in selection clause
[08:32:00] <elukey>	 ahhh
[08:32:12] <elukey>	 joal: can you add this insight to the above task so we have a reference?
[08:32:21] <elukey>	 (later on when you have time)
[08:32:36] <joal>	 elukey: the cassandra table has activity_level field
[08:32:39] <joal>	 :(
[08:33:37] <elukey>	 I am more sad that it took 3 people to figure out that a typo ruined a deploy, ideally Dan should have had a good dashboard to check right away
[08:34:23] <elukey>	 since aqs "works" we don't dedicate much time in improving dev experience on it
[08:34:35] <elukey>	 no bueno
[08:34:36] <joal>	 correct elukey 
[08:34:49] <joal>	 elukey: actually AQS `usually` works
[08:35:25] <elukey>	 joal: I recall when it didn't and there was a daily mixture of french and italian swear words :D
[08:35:35] <joal>	 :)
[08:35:51] * elukey brace yourself, cassandra 3 is coming
[08:38:32] <wikibugs>	 10Analytics: Create a kibana dashboard for AQS hyperswitch's logs - https://phabricator.wikimedia.org/T262012 (10JAllemandou) My 2 cents from having found my way through Kibana to debug an issue:  - We should filter for `host:aqs*` but not for `type` as more type than `AQS` have proven useful (for instance `scap...
[09:11:21] <elukey>	 started the roll restart of the prod hadoop workers
[09:11:26] <elukey>	 the test cluster went fine
[09:13:44] <joal>	 ack elukey 
[09:14:28] <joal>	 elukey: data in cassandra for editors-bycountry is very small - I suggest dropping-recreating the table with the correct filed-name and update/relaunch indexatio
[09:14:37] <elukey>	 +1
[09:15:22] <joal>	 elukey: the other possible solution is to use a different field-name in hyperswitch (external with a -, internal with a _) - Not a big change, but very error-prone IMO
[09:16:04] <joal>	 ok, will create a task describing the issue, suggest  solutions and we'll discuss that as a team on monday
[09:17:26] <elukey>	 nono recreating seems good
[09:20:01] <wikibugs>	 10Analytics, 10Analytics-Kanban: Fix cassandra/hyperswitch geoeditors field miscmatch - https://phabricator.wikimedia.org/T262017 (10JAllemandou)
[09:20:04] <joal>	 elukey: --^
[09:31:16] <elukey>	 super
[09:31:43] <wikibugs>	 10Analytics, 10Analytics-Kanban: Fix cassandra/hyperswitch geoeditors field miscmatch - https://phabricator.wikimedia.org/T262017 (10elukey) +1 on option 1.
[09:35:48] <elukey>	 ssh an-tool1009.eqiad.wmnet -L 8080:an-tool1009.eqiad.wmnet:80
[09:35:58] <elukey>	 hue seems running (with live patches)
[10:02:54] <elukey>	 but CI tests are still not working grrr
[10:34:54] <elukey>	 roll restart of the hadoop workers completed!
[10:35:34] <elukey>	 all metrics look good :)
[10:35:42] * elukey lunch! bb in ~2h
[11:33:51] <wikibugs>	 (03PS1) 10Joal: Add BetweenTagsInputFormat to refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645
[11:35:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add BetweenTagsInputFormat to refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal)
[11:43:27] <wikibugs>	 (03PS2) 10Joal: Add BetweenTagsInputFormat to refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645
[12:16:33] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Review current usage of HDFS and establish what/if data can be dropped periodically - https://phabricator.wikimedia.org/T261283 (10JAllemandou)
[12:20:29] <joal>	 elukey: looks like archiva is having weird issues again (might be maven - building has a step waiting for quite some time
[12:22:37] <joal>	 elukey: another thing - when using a new spark-shell in scala from stat1008, my first execution to read data always fails with java.util.ServiceConfigurationError: org.apache.hadoop.security.token.TokenIdentifier: Provider org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier not found
[12:22:44] <joal>	 And then it works
[12:22:49] <joal>	 for the second try
[12:38:11] <elukey>	 joal: ok let's start with archiva, but I need more info
[12:38:40] <elukey>	 is it from a stat100x or from your laptop? What step takes time?
[12:40:36] <joal>	 elukey: I'm from stat1008
[12:41:28] <joal>	 elukey: and, mvn wants to download a jar and wait until timeout (or at least it was a few minutes ago)
[12:41:33] <joal>	 elukey: checking now
[12:41:43] <joal>	 elukey: 
[12:41:44] <joal>	 Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/commons-codec/commons-codec/maven-metadata.xml
[12:41:47] <joal>	 Downloading from apache.snapshots: https://repository.apache.org/snapshots/commons-codec/commons-codec/maven-metadata.xml
[12:42:21] <joal>	 elukey: it feels like the same issue we were having before going for explicit repos
[12:43:43] <elukey>	 but it doesn't make sense, it worked up to now
[12:43:49] <elukey>	 does it happen the same on other stat boxes?
[12:43:55] <joal>	 I have not checked
[12:44:26] <elukey>	 joal: can you  make it hang again? I'd like to check netstat in the meantime
[12:44:28] <joal>	 also elukey I have declined the meeting this afternoon, need to care the kids as Melissa is in meetings
[12:44:51] <elukey>	 joal: np, I added you because you asked, all things that you know a lot more than me :)
[12:45:01] <joal>	 building elukey, noy yet hanging
[12:46:23] <joal>	 hanging
[12:46:38] <elukey>	 one thing that I don't get is why it tries to pull from oss.sonatype
[12:47:51] <joal>	 it timedout (I guess), then proceeds
[12:50:03] <elukey>	 joal: the main issue is the fact that we don't download from archiva, but from other places
[12:50:28] <elukey>	 so without a proper proxy, it hangs for sure
[12:50:30] <joal>	 indeed elukey 
[12:50:50] <elukey>	 have you set the https proxy settings in your m2 dir?
[12:50:58] <joal>	 nope
[12:51:24] <elukey>	 didn't we do it last time? I am confused now
[12:51:30] <joal>	 Should I have?
[12:51:38] <joal>	 I'm confused as well :S
[12:51:45] <joal>	 Can't recall
[12:53:02] <elukey>	 joal: if we don't pull artifacts from archiva (that is whitelisted in the analytics vlan firewall) then there is no chance to pull from other domains without the https proxy
[12:53:16] <elukey>	 and I recall me and you checking some settings related to it the last time
[12:53:22] <elukey>	 but in theory this shouldn't be needed
[12:53:25] <joal>	 I know elukey - and by default we don't want to pull from other places IIRC
[12:53:29] <elukey>	 yeah
[12:54:41] * joal cries in a corner
[12:59:39] <elukey>	 joal: let's open a task, this is probably some dependency issue to fix
[13:00:58] <elukey>	 the other error on stat1008 is very weird
[13:01:01] <elukey>	 org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier -> hbase??
[13:01:17] <joal>	 yup :(
[13:02:06] <elukey>	 so the spark client does try to fetch tokens at the beginning, but only if it finds traces of config indicating that it is needed..
[13:02:15] <elukey>	 does it happen on stat1004 for example?
[13:02:22] <elukey>	 (sorry asking to test multiple times)
[13:04:02] <elukey>	 (brb(
[13:08:28] <cdanis>	 what time does otto usually log on?
[13:08:39] <joal>	 just now AFAICS :)
[13:08:54] <ottomata>	 hello!
[13:09:03] <ottomata>	 cdanis:  i saw your patch, want to PTAL :p
[13:09:21] <cdanis>	 ottomata: btw, the numeric bounds commits to jsonschema-tools broke a bunch of the tests in event-primary (because the currently-materialized versions don't have the bounds)
[13:09:33] <ottomata>	 OH INTERESTING!
[13:09:52] <ottomata>	 we think we want to re-materialize them with bounds, but aren't sure if we want to make such a change all at once
[13:09:59] <ottomata>	 FYI fdans ^ (is fdans on vaca?)
[13:10:00] <cdanis>	 for now I'm continuing work with a local version that's pre-bounds w/ a cherrypick of my fix
[13:10:30] <ottomata>	 cdanis you can also temporarily disable the numeric bounds checking 
[13:10:38] <ottomata>	 by setting that option to null i guess
[13:10:46] <ottomata>	 or false maybe better
[13:10:59] <ottomata>	 we should probably do that until we are ready to re-materailize the schemas with those bounds
[13:11:03] <cdanis>	 yeah
[13:11:12] <joal>	 elukey: no error on stat1004
[13:13:31] <elukey>	 joal: what about on 1005? (trying to see if it is buster related)
[13:13:45] <joal>	 trying from 1008 again, without hudi jar
[13:14:00] <joal>	 seems hudi-jar related
[13:14:03] <joal>	 will confirm
[13:14:13] <ottomata>	 cdanis:  i'm going to push a patch to disable the bounds check in both schema repos
[13:14:26] <ottomata>	 we should have thought of that before we merged that patch (and I made schema repos install @latest :p )
[13:15:08] <joal>	 elukey: sorry didn't ping you on previous answers
[13:15:22] <elukey>	 nono I saw them, it would make sense!
[13:15:28] <elukey>	 hbase errors are very weird
[13:15:31] <elukey>	 never seen them
[13:15:44] <joal>	 Indeed - seems hudi related (when I add the hudi jar error comes back)
[13:15:48] <joal>	 Weird
[13:16:50] <ottomata>	 oh, hm cdanis we haven't yet published a new version of jsonschema-tools to npm
[13:17:06] <ottomata>	 i guess you are only seeing this because you pulled down master to make a patch
[13:17:13] <joal>	 elukey: at least we kinda have an idea of the issue
[13:20:18] <elukey>	 joal: maybe hudi's default have hbase settings?
[13:20:29] <joal>	 I need to check that elukey 
[13:20:56] <cdanis>	 ottomata: yeah indeed :)
[13:21:02] <cdanis>	 was testing my own patch that way
[13:21:33] <milimetric>	 just thinking out loud, don't wanna interrupt yall unless you're curious and have time: I tried to deploy aqs last night, and I got a cassandra error.  I tried looking through system-a, system-b, debug-a, and debug-b logs but there was nothing
[13:21:43] <milimetric>	 I'm gonna look closer at the aqs logs, maybe hyperswitch logs something
[13:21:56] <elukey>	 milimetric: joseph alread found the issue 
[13:22:02] <milimetric>	 ah! whaa...
[13:22:09] <milimetric>	 I thought I read backscroll
[13:22:10] <elukey>	 we opened one task to make a proper dashboard in logstash
[13:22:31] <elukey>	 and another one to fix the issue (basically a typo in a table attribute IIUC)
[13:23:13] <elukey>	 milimetric: https://phabricator.wikimedia.org/T262017 and https://phabricator.wikimedia.org/T262012
[13:26:00] <milimetric>	 oh wow, yeah, that would've been great.  The error is super generic, but I was too tired to realize I should look in aqs logs, that would've been fine, don't go to too much trouble
[13:26:04] <milimetric>	 I wasn't like stumped, more tired
[13:27:03] <milimetric>	 heh, there's no way to just rename a column?
[13:27:31] <joal>	 milimetric: not if it's part of the primary-ke
[13:28:25] <milimetric>	 ok, I'll drop and recreate
[13:37:41] <wikibugs>	 (03CR) 10Ottomata: Add BetweenTagsInputFormat to refinery-spark (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal)
[13:39:36] <milimetric>	 joal: wait, cassandra convention so far has been snake case, activity_level, so I think the table is correct.  The definition in AQS is following druid convention, because it's in the druid file, but I don't think that's right.  So I propose to change that instead of the table
[13:39:54] <milimetric>	 (for example, see media_type in the mediarequests table)
[13:40:07] <joal>	 milimetric: triple checking
[13:42:13] <joal>	 milimetric: I think the only case where it's an  _ is mediarequest (unique-devices and others have -, and the non-parameters are all '-' as in per-article for instance)
[13:42:50] <milimetric>	 those are druid, no?
[13:43:12] <joal>	 unique-devices aren't
[13:45:12] <milimetric>	 right, sorry, ugh
[13:45:52] <milimetric>	 I vote for _ as the convention in cassandra, that's what it uses for its own fields like _tid and _del, and it doesn't have to be escaped
[13:46:24] <milimetric>	 (there are also more _ fields in tables with more data, so it'd be harder to change if we wanted to)
[13:46:34] <milimetric>	 since uniques data is relatively small
[13:47:04] <joal>	 milimetric: _ in cassandra has to be escaped
[13:47:37] <joal>	 We have '-' everywhere else except mediarequest - I vote for keeping it this way
[13:48:33] <milimetric>	 joal: it turns out ... no! :)
[13:48:34] <milimetric>	 select media_type from "local_group_default_T_mediarequest_per_referer".data limit 10;
[13:49:13] <milimetric>	 no, the tally is like this:
[13:49:14] <joal>	 milimetric: fields starting with _ need to
[13:52:01] <milimetric>	 https://www.irccloud.com/pastebin/6Vu2OiRK/
[13:52:21] <milimetric>	 so 3 use _ and 2 use -, if you don't count the ones that start with _
[13:52:48] <milimetric>	 my point was that _ is convention in Cassandra in general, and the tables with - can be more easily changed
[13:53:39] <joal>	 milimetric: I don't buy that we should make our conventions based on the backend technolgy - almost all our URLs use '-' as separator 
[13:53:47] <wikibugs>	 10Analytics-Clusters: install mwparserfromhell on spark for efficient usage of wikitext-dump in hive - https://phabricator.wikimedia.org/T262044 (10MGerlach)
[13:55:33] <milimetric>	 this is pretty standard, that's how people have used sql databases since forever.  And we already do in these 3 cases: https://github.com/wikimedia/analytics-aqs/blob/5e188136866b31c0228ba16ad4826b871e43fe3f/sys/mediarequests.js#L189
[13:55:48] <milimetric>	 ew........ but there we use _ on the front-end .... oh man... this is the WORST
[13:55:58] <elukey>	 ahahahah
[13:56:14] <milimetric>	 I'm gonna use ~
[13:56:22] <wikibugs>	 10Analytics-Clusters: install mwparserfromhell on spark for efficient usage of wikitext-dump in hive - https://phabricator.wikimedia.org/T262044 (10MoritzMuehlenhoff) JFTR, it's packaged in Debian as well: https://packages.qa.debian.org/m/mwparserfromhell.html
[13:56:36] <milimetric>	 (jk, but jeez... like... consistency matters)
[13:57:35] <joal>	 full support milimetric --^ What a mess :S
[13:57:42] <milimetric>	 I've no idea... a-team: quick summary below, I need other opinions
[13:58:09] <milimetric>	 for all AQS endpoints that hit druid on the backend we use - (dash)
[13:58:45] <milimetric>	 for 2 AQS endpoints that hit Cassandra, we use - (dash) on the frontend and the backend
[13:59:03] <milimetric>	 for 3 AQS endpoints that hit Cassandra, we use _ (underscore) on the frontend and the backend
[13:59:21] <milimetric>	 so the question is, what should we do for a new AQS endpoint that hits Cassandra
[13:59:22] <wikibugs>	 10Analytics-Clusters: install mwparserfromhell on spark for efficient usage of wikitext-dump in hive - https://phabricator.wikimedia.org/T262044 (10Ottomata) Oh cool!  I'd like to try our Anaconda-wmf approach for this if we can; as it will be the same approach we use for other packages like this.  @elukey would...
[14:00:14] <wikibugs>	 (03PS3) 10Joal: Add BetweenTagsInputFormat to refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645
[14:00:20] <elukey>	 milimetric: I'd say that we decide an option (say, -) and then we document the use cases that don't comply
[14:00:52] <joal>	 Dropping for kids folks - back around 6pm
[14:00:57] <elukey>	 if fixing them is an option it will go into tech debt reduction backlog, otherwise we'll live with them
[14:00:57] <joal>	 (in 2h)
[14:02:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add BetweenTagsInputFormat to refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal)
[14:03:26] <wikibugs>	 (03CR) 10Joal: Add BetweenTagsInputFormat to refinery-spark (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal)
[14:13:15] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] "Ok!  one more nit, but +1 from me after that, feel free to merge." (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal)
[14:14:27] <milimetric>	 elukey: the tables that hit Cassandra with - columns are smaller so we could fix that maybe, the others we would never take the time to reload
[14:15:33] <elukey>	 milimetric: oh yes I wasn't advocating for '-', I took it as example
[14:15:37] <elukey>	 whatever is the best
[14:16:15] <milimetric>	 yep, me neither, I agree with your approach, just needed more votes
[14:16:26] <milimetric>	 (Jo and I are tied atm :))
[14:55:32] <joal>	 ottomata: about constants in scala- https://stackoverflow.com/questions/9745488/naming-convention-for-scala-constants
[14:55:45] <joal>	 ottomata: I'm ok following any convention, we just need to pick one:)
[14:58:33] <ottomata>	 joal:  reading that...it seems ThisIsConstant
[14:58:35] <ottomata>	 is the one we should go with
[14:58:46] <ottomata>	 that patttern matching subtly is crazy
[14:58:51] <joal>	 ottomata: that one (ThisIsConstant) is the one suggested by scala
[14:58:55] <ottomata>	 yeah
[14:59:01] <joal>	 we can use the one we prefer
[14:59:13] <ottomata>	 i don't love it, but i also don't love CamelCase no matter what :p
[14:59:19] <ottomata>	 let's go wtih scala convention
[14:59:21] <ottomata>	 what do you think?
[14:59:31] <ottomata>	 either that or THIS_IS_CONSTANT (to be consistent with java)
[14:59:34] <joal>	 ThisIsConstant is matching type convention (uppercase camel-case), so pattern matching looks the same
[14:59:36] <ottomata>	 but def not thisIsConstant
[14:59:42] <joal>	 Makes sense
[14:59:46] <ottomata>	 i wonder how THIS_IS_CONSTANT pattern matches :p
[14:59:56] <ottomata>	 joal:  i'm fine with either of those, whatever you prefer
[15:00:29] <joal>	 Ok, going for scala way as it is scala
[15:00:40] <joal>	 also ottomata, I'm gonna move the file to wikihadoop
[15:00:52] <joal>	 where other file-formats belong
[15:05:47] <ottomata>	 joal:  ok
[15:07:30] <cdanis>	 Pchelolo: did you have any other comments or thoughts on the reportingapi schema?
[15:07:45] <Pchelolo>	 cdanis: no, I have been satisfied by your answers
[15:07:53] <Pchelolo>	 I can have another pass if you want
[15:07:54] <cdanis>	 ok! thanks for taking a good look :)
[15:08:07] <cdanis>	 nah, I think not necessary, unless you want
[15:08:18] <Pchelolo>	 cool. great schema :)
[15:12:51] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: jsonschema-tools should have option to materialize schemas with default max/min validation for e.g. max long, max double, etc. - https://phabricator.wikimedia.org/T258659 (10Nuria) Making note so @fdans can work on adding bounds to schemas that need it whe...
[15:14:11] <fdans_onvacation>	 ♥️
[15:21:46] <wikibugs>	 (03CR) 10Nuria: [C: 04-1] "> Patch Set 1:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/623456 (https://phabricator.wikimedia.org/T257691) (owner: 10Nuria)
[15:22:14] <wikibugs>	 (03PS2) 10Nuria: Removing seasonality cycle as it is fixed once granularity is set [analytics/refinery] - 10https://gerrit.wikimedia.org/r/623456 (https://phabricator.wikimedia.org/T257691)
[15:22:58] <wikibugs>	 (03PS5) 10Nuria: Chopping timeseries for noise detection [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/612454 (https://phabricator.wikimedia.org/T257691)
[15:24:03] <wikibugs>	 (03CR) 10Joal: Add BetweenTagsInputFormat to refinery-spark (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal)
[15:27:59] <ottomata>	 cdanis:  just pushed another commit to your version PR 
[15:28:04] <cdanis>	 woot
[15:28:13] <cdanis>	 yeah feel free, I don't really know what I'm doing in Node :)
[15:28:18] <ottomata>	 if you like i will merge
[15:28:42] <cdanis>	 ah that's much better!
[15:28:45] <cdanis>	 please do merge ottomata 
[15:28:57] <ottomata>	 merged...
[15:29:02] <ottomata>	 i'm going to set up some CI now
[15:29:03] <ottomata>	 :p
[15:31:12] <milimetric>	 um... ok joal I'll change the underscore for this to -
[15:31:33] <milimetric>	 my logic is then it'll be 3 - and 3 _, so whoever has to add the next field will have a really hard time and I can laugh at them
[15:32:00] <milimetric>	 ^ this is what happens when you let me make decisions :P
[15:32:22] * joal cries of laugh and sadness :)
[15:44:55] <wikibugs>	 (03PS1) 10Milimetric: Fix typo in cassandra fields [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624743 (https://phabricator.wikimedia.org/T238365)
[15:45:49] <wikibugs>	 (03PS1) 10Joal: Add BetweenTagsInputFormat to refinery-spark [analytics/wikihadoop] - 10https://gerrit.wikimedia.org/r/624745
[15:46:03] <wikibugs>	 (03Abandoned) 10Joal: Add BetweenTagsInputFormat to refinery-spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/624645 (owner: 10Joal)
[15:47:03] <joal>	 ottomata: added you to the new patch to wikihadoop - comments added to constants and constants renamed
[15:53:43] <milimetric>	 joal: I'm getting this error when trying to drop table "local_group_default_T_editors_bycountry".data:
[15:53:44] <milimetric>	 Warning: schema version mismatch detected, which might be caused by DOWN nodes; if this is not the case, check the schema versions of your nodes in system.local and system.peers.
[15:53:57] <milimetric>	 but I did nodetool-a describecluster and nodetool-b too, and the schema versions look the same
[15:54:01] <milimetric>	 any idea?
[15:54:44] <joal>	 nope milimetric - looks uncool - could be on different machines - ping elukey on that --^
[15:58:01] <milimetric>	 similarly, select schema_version from local and select schema_version from peers both return the same
[15:59:38] <joal>	 this is weird :(
[16:00:13] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] Add BetweenTagsInputFormat to refinery-spark [analytics/wikihadoop] - 10https://gerrit.wikimedia.org/r/624745 (owner: 10Joal)
[16:00:58] <milimetric>	 oh... weird joal it deleted it, that was just a warning...
[16:01:00] <milimetric>	 uh...
[16:01:00] <elukey>	 mmm as far as I know there are no down nodes
[16:01:24] <joal>	 ok - table gone, problem solved :)
[16:01:35] <milimetric>	 well... I guess I'll create it and cross my fingers and sacrifice some goats
[16:01:53] <joal>	 milimetric: while you are at dropping tables and kespaces, could you please drop the TEST table as well?
[16:02:01] <milimetric>	 k
[16:02:10] <joal>	 actually, the TEST keyspace altogether
[16:02:16] <joal>	 thanks a lot :)
[16:03:01] <milimetric>	 k, done, and looks like the new table's fine, updated the coordinator.properties, launching population coord now, we'll seeee
[16:03:59] <joal>	 \o/
[16:04:03] <joal>	 you rock milimetric
[16:11:18] <milimetric>	 not so fast... load failed...
[16:11:23] <milimetric>	 https://hue.wikimedia.org/jobbrowser/jobs/job_1596639839773_148513/single_logs
[16:11:54] <milimetric>	 I'm not sure about anyone else but I spend like 90% of my ops week trying to figure out how to find logs
[16:12:07] <milimetric>	 this seems like a solved problem
[16:13:41] <joal>	 milimetric: you have not update the field-name in the loading job config
[16:14:18] <milimetric>	 didn't this do that?  https://gerrit.wikimedia.org/r/c/analytics/refinery/+/624743/1/oozie/cassandra/coord_editors_bycountry_monthly.properties
[16:14:31] <milimetric>	 (that's the config I used)
[16:14:53] <joal>	 milimetric: I looked at the job conf in hue
[16:15:06] <milimetric>	 yeah, I'm looking too, it has activity-level...
[16:15:24] <milimetric>	 the hive fields are activity_level, I didn't see where that was used...
[16:15:39] <joal>	 hm you're absolutely right!y 
[16:17:43] <joal>	 milimetric: you need to change the hive-field as well
[16:18:03] <milimetric>	 I'm looking at the workflow right now, these are really cassandra-input-fields not hive_fields
[16:18:08] <joal>	 milimetric: names are completely misleading - hive field is the name you give to a column, and it is reused by cassandra-field
[16:18:16] <joal>	 correct milimetric 
[16:18:20] <milimetric>	 k, got it
[16:18:43] <joal>	 past me had not done a good job on naming
[16:19:29] <wikibugs>	 (03PS2) 10Milimetric: Fix typo in cassandra fields [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624743 (https://phabricator.wikimedia.org/T238365)
[16:22:02] <milimetric>	 joal: what do you think about... "cassandra_input_from_hive_fields"?  Too long?
[16:22:18] <milimetric>	 cassandra_from_hive_fields?  I can push a patch for that and another change I wanted to make
[16:22:41] <joal>	 milimetric: thinking
[16:22:47] <joal>	 ok for 2 patches
[16:22:51] <joal>	 for sure
[16:22:56] <joal>	 about the name hm - 
[16:23:20] <joal>	 those fields are the name we give to tabular data
[16:23:34] <joal>	 generated by hive
[16:23:39] <milimetric>	 (the other change is I was going to try and see if oozie can send emails with analytics-alerts as the "to" instead of the oozie.eqiad.wmnet address, that way replying is easier)
[16:23:50] <milimetric>	 hive_fields_for_cassandra?
[16:24:14] <joal>	 would be nice milimetric (the email one)
[16:24:52] <joal>	 fields_as_generated_by_hive ?
[16:25:06] <joal>	 fields_from_hive
[16:25:19] <joal>	 milimetric: --^ ?
[16:25:24] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] "tested, works, launched in prod, merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624743 (https://phabricator.wikimedia.org/T238365) (owner: 10Milimetric)
[16:25:30] <joal>	 \o/
[16:25:55] <milimetric>	 joal: I feel like it needs "cassandra" so you know to make it match the cassandra_fields
[16:26:32] <joal>	 milimetric: actualy then names should always be the cassandra ones, as they are reused in CQL
[16:26:44] <joal>	 or are they?
[16:26:53] <joal>	 Did we need to do this rename?
[16:28:15] <wikibugs>	 10Analytics, 10Analytics-Kanban: Fix cassandra/hyperswitch geoeditors field miscmatch - https://phabricator.wikimedia.org/T262017 (10Milimetric) p:05Triage→03High a:03Milimetric We did option 1
[16:29:06] <milimetric>	 joal: well, otherwise we'd have to change the mapping in AQS, from the request parameters
[16:30:13] <milimetric>	 or you mean to rename hive_fields?
[16:30:19] <joal>	 milimetric: yes you're right - I triple checked that
[16:30:21] <milimetric>	 (no, we don't need to do that, it's just confusing)
[16:30:24] <joal>	 milimetric: impromptu batcavbe?
[16:30:27] <milimetric>	 sure!
[16:38:07] <milimetric>	 joal: I didn't copy our name!  Was it input_order_of_cassandra_fields?
[16:38:21] <joal>	 yessir!
[16:38:25] <milimetric>	 k phew :)
[16:38:27] <joal>	 :)
[16:46:35] * elukey hates Hue
[16:47:21] * elukey bbiab
[16:51:43] <wikibugs>	 (03PS1) 10Milimetric: Rename hive_fields to be more descriptive [analytics/refinery] - 10https://gerrit.wikimedia.org/r/624779
[16:52:56] <milimetric>	 ottomata: I think all the issues with the deploy are solved, so I was gonna deploy aqs slowly after lunch, and I can roll back if something goes wrong.  But it's up to you, I'll wait for Monday if you think it's too risky
[16:56:57] <razzi>	 milimetric: if you do the deploy, let me know when; I'd like to follow along
[16:57:46] <milimetric>	 def!  Oh yeah, let’s do it then, you’re ops, you can have my back
[17:01:25] <razzi>	 :)
[17:14:19] <joal>	 gone for tonight!
[17:15:05] <elukey>	 me too!
[17:37:17] <ottomata>	 milimetric:  i'm here for the afternoon, if you think its safe go ahead
[17:42:57] <milimetric>	 k razzi, like... 10 minutes?
[17:43:08] <razzi>	 cool
[17:52:57] <milimetric>	 ok, to the batcave!
[18:05:52] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: jsonschema-tools should have option to materialize schemas with default max/min validation for e.g. max long, max double, etc. - https://phabricator.wikimedia.org/T258659 (10Ottomata) FYI had to add a few of fixes:  - https://github.com/wikimedia/jsonschem...
[18:06:22] <ottomata>	 cdanis:  FYI added a patch to your schema and rebased with newer jsonschema-tools
[18:11:46] <milimetric>	 !log aqs deploy went well!  Geoeditors endpoint is live internally, data load job was successful, will submit pull request for public endpoint.
[18:11:48] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[18:12:46] <cdanis>	 ottomata: ah thanks! didn't think to $ref the first example, clever
[18:13:39] <cdanis>	 ottomata: re: network-error vs network_error -- I had kinda wanted to have the schema names match the strings used in the report `type` field, but, that's not really necessary it just seemed nice
[18:14:08] <ottomata>	 I think I should add a test to avoid hyphens, the schema names should probably have the same rules as field names (excepting /)
[18:32:32] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Add editors per country data to AQS API  (geoeditors) - https://phabricator.wikimedia.org/T238365 (10Milimetric) pull request for public endpoints is here: https://github.com/wikimedia/restbase/pull/1273
[18:34:56] <cdanis>	 ottomata: anything left to do on that patch?  shall you/I +2 and merge?
[18:37:22] <ottomata>	 cdanis:  i think its good!  how ready are you to use it?  past experience makes me lean towards not merging until  just before usage; sometimes dev use will inform schema changes
[18:38:35] <cdanis>	 ottomata: hm, I'll do one last local test of wikimedia-eventgate-dev.js receiving reports from actual Chrome
[18:39:37] <ottomata>	 ok cool
[18:41:59] <cdanis>	 POST    /v1/events             201 All 5 out of 5 events were accepted.                                                                                                                                           
[18:42:01] <cdanis>	 :)
[18:44:13] <cdanis>	 so I think we're good to go
[18:44:52] <cdanis>	 there's future work where I'd like to talk about maybe doing some stream processing to add fields like geoIP country, AS number, the timestamp at which the event occurred (`meta.dt` minus `age` milliseconds, basically) but I don't think we need any of that immediately
[18:46:48] <ottomata>	 because you added the http fragment and got the http.client_ip field, geocding will be done for you in hive :o
[18:46:51] <ottomata>	 (but not in the stream data :/ )
[18:47:19] <ottomata>	 ok cdanis  merging
[18:47:40] <cdanis>	 oh interesting
[18:47:59] <cdanis>	 dumb q, how real-time is Hive?
[18:48:06] <ottomata>	 lags a couple of hours usually
[18:48:10] <ottomata>	 maybe 3 or 4 max
[18:48:20] <ottomata>	 (in normal operations)
[18:48:24] <cdanis>	 mmm
[18:48:35] <cdanis>	 I think you had mentioned that at some point that eventgate-logging-external would soon go to both logstash and Hive?
[18:48:45] <cdanis>	 if that's right, how soon? :)
[18:48:48] <ottomata>	 ah, we coulds make it do that if we need to
[18:48:59] <ottomata>	 petr was thiking about making the api request logging go to eventgate-logging external
[18:49:02] <cdanis>	 right
[18:49:12] <ottomata>	 but he decided not to, and it goes to eventgate-analytics (like the mw api logging)
[18:49:16] <cdanis>	 ah I see
[18:49:22] <ottomata>	 so, we don't have a plan to do it
[18:49:25] <ottomata>	 but we could
[18:49:34] <ottomata>	 we just need to mirror maker from kafka logging cluster to kafka jumbo cluster
[18:49:35] <cdanis>	 it'd be nice to have realtime here in addition to aggregated data
[18:49:58] <ottomata>	 we have done some simple realtime in druid stuff
[18:50:25] <ottomata>	 druid has kafka integration and consume realtime data, and then the historical batch job comes around later and fills in the history from hive
[18:50:27] <cdanis>	 like for netflow?
[18:50:29] <ottomata>	 (since streaming is less reliable)
[18:50:29] <cdanis>	 right
[18:50:30] <ottomata>	 yes exactly
[18:50:37] <ottomata>	 we have to manually set that part up
[18:50:40] <ottomata>	 but it is possible
[18:50:45] <cdanis>	 is there a good way to look at individual events in druid?
[18:50:53] <ottomata>	 hm no
[18:50:54] <ottomata>	 not really
[18:51:18] <ottomata>	 actually, we are just now for the first time discussing what to do about stream procssing at wmf with the search team
[18:51:25] <ottomata>	 since they are building the wdqs updater in flink
[18:51:32] <ottomata>	 they'll need to have tier one support for it
[18:51:37] <ottomata>	 and are even thinking about hiring for it
[18:51:42] <ottomata>	 but, more generally
[18:51:48] <ottomata>	 we're discussing what we want to do 
[18:51:53] <ottomata>	 and trying to look at real use cases
[18:52:04] <ottomata>	 some are like theirs and will be prod data jobs
[18:52:06] <cdanis>	 right
[18:52:14] <ottomata>	 others might be more like yours, where we want to do streaming monitoring and alerting
[18:52:20] <ottomata>	 i think SRE might have a lot of use cases like that
[18:52:27] <cdanis>	 yeah, I think that's likely :)
[18:53:08] <ottomata>	 cdanis:  if you can think  of some
[18:53:11] <ottomata>	 please add to 
[18:53:11] <ottomata>	 https://phabricator.wikimedia.org/T185233#use-case-collection
[18:54:06] <cdanis>	 thanks! I'll take a look
[18:54:35] <cdanis>	 and probably add a link to the NEL task w/ a short description
[18:57:06] <ottomata>	 nice thank you
[18:57:42] <cdanis>	 so ottomata I'm going on vacation for all of next week, but, before enabling this we need to enable the cors flag on whichever eventgate we're using (let's assume eventgate-logging-external)
[18:58:04] <cdanis>	 would you have time for pushing a new build (with the patches up to the schema) and flipping that next week?
[18:58:43] <ottomata>	 Sure, the cors settings can be set in the helm templates
[18:58:49] <ottomata>	 can you tell me what you want them set to?
[19:00:29] <cdanis>	 I just need an eventgate >= 1.3.2 and that thinks app.conf.cors !== false
[19:00:56] <cdanis>	 https://github.com/wikimedia/eventgate/blob/master/app.js#L104
[19:03:27] <cdanis>	 I'm guessing we just need some values flipped in the vairous eventgate-logging-external files under helmfiles.d ?
[19:03:43] <ottomata>	 ya
[19:04:13] <ottomata>	 anything in main_app.conf will be passed to service runner conf
[19:06:03] <cdanis>	 ok!  I'll write up a task
[19:20:10] <ottomata>	 k danke
[19:21:53] <wikibugs>	 10Analytics, 10Operations: Deploy an updated eventgate-logging-external with NEL patches - https://phabricator.wikimedia.org/T262087 (10Ottomata)
[19:22:19] <ottomata>	 oh cdanis  the stuff in values overrides the stuff in the template
[19:22:23] <ottomata>	 so it is parameterized
[19:22:29] <ottomata>	 ...pretty sure...
[19:22:31] <ottomata>	 but ya
[19:24:34] <cdanis>	 ahhh okay
[19:24:38] <cdanis>	 well I'm happy to be wrong :D
[19:26:00] <wikibugs>	 10Analytics, 10Operations: Deploy an updated eventgate-logging-external with NEL patches - https://phabricator.wikimedia.org/T262087 (10CDanis)
[19:26:05] <ottomata>	 cdanis: what do you need for cors tho?
[19:27:06] <ottomata>	 i can't say i'm well versed on good values for this
[19:27:10] <cdanis>	 ottomata: I had just planned on '*', if that's what you mean
[19:27:12] <ottomata>	 i think our default false will allow any?
[19:27:19] <ottomata>	 does false not == '*' hmmm
[19:27:31] <cdanis>	 uhh let me check again
[19:27:44] <ottomata>	 https://github.com/wikimedia/service-template-node/blob/master/config.prod.yaml
[19:27:45] <ottomata>	 sauys
[19:27:51] <ottomata>	       # to disable use:
[19:27:51] <ottomata>	       # cors: false
[19:27:59] <ottomata>	 so maybe not?
[19:28:01] <ottomata>	 interesting.
[19:28:13] <ottomata>	 bearloga: yt?
[19:28:28] <ottomata>	 have you been able to send events to eventgate from mobile apps before?
[19:29:08] <cdanis>	 yeah so, on my local eventgate-wikimedia, I wasn't modifying config.dev.yaml at all -- so it was undefined and '*'
[19:29:31] <ottomata>	 https://github.com/wikimedia/service-template-node/blob/b3b59baff9354b5701ba48b7c124106153fbb405/app.js#L98
[19:29:38] <cdanis>	 so then we sent `access-control-allow-origin: *` and plus the allow-headers, expose-headers, and allow-methods, which Chrome was happy with
[19:29:57] <cdanis>	 I don't really see harm in configuring eventgate-logging-external that way, but I could be missing something
[19:30:09] <cdanis>	 (and am in fact kind of wondering how it works for client side JS logging as-is?)
[19:30:36] <ottomata>	 if those cors headers don't get set, do browsers deny sending?
[19:30:37] <cdanis>	 ohhh I didn't think about this interacting with service-runner heh
[19:30:48] <cdanis>	 ottomata: yeah, Chrome will send an OPTIONS preflight and then not send the POST
[19:30:51] <ottomata>	 i guess client side is currently all from mw client side
[19:30:54] <ottomata>	 so same origin already
[19:31:08] <ottomata>	 bearloga:  is working on mobile app integration though
[19:31:17] <ottomata>	 so this wouldn't work for them unless we set cors properly for eventgate-analytics-external i guess
[19:31:19] <cdanis>	 well even on the MW side, it's different domains
[19:31:25] <ottomata>	 tha'ts true!
[19:31:26] <ottomata>	 right
[19:31:26] <cdanis>	 like en.wikipedia.org vs intake.wikimedia.org
[19:31:30] <ottomata>	 huh.
[19:32:07] <cdanis>	 and I don't see the CORS headers returned by `curl -v -X OPTIONS -H 'Origin: https://en.wikipedia.org' -H 'Access-Control-Request-Method: POST' -H 'Access-Control-Request-Headers: content-type' https://intake-logging.wikimedia.org/v1/events`
[19:32:22] <cdanis>	 which models the preflight request that Chrome sends
[19:33:00] <ottomata>	 heh was just trying to construct the same curl comamnd ty
[19:33:30] <cdanis>	 here, I'll dump full request/responses out of my ngrok that's still running
[19:35:06] <cdanis>	 https://phabricator.wikimedia.org/P12494
[19:36:33] <ottomata>	 cdanis:  maybe the eventlogging stuff is working now because it is using sendBeacon?
[19:37:03] <cdanis>	 ohhhh hm
[19:37:07] <cdanis>	 that does seem likely
[19:37:15] <ottomata>	 https://fetch.spec.whatwg.org/#cors-safelisted-request-header
[19:38:03] <ottomata>	 If mimeType’s essence is not "application/x-www-form-urlencoded", "multipart/form-data", or "text/plain", then return false.
[19:38:35] <ottomata>	 pretty sure sendBeacon is using text/plain
[19:39:06] <cdanis>	 that makes sense
[19:39:09] <wikibugs>	 10Analytics, 10Operations: Deploy an updated eventgate-logging-external with NEL patches - https://phabricator.wikimedia.org/T262087 (10CDanis) Example request/responses of both preflight and actual request are in NDA'd paste P12494 (has my own PII in it)  Chrome sends an OPTIONS request to the endpoint URL wi...
[19:39:45] <cdanis>	 hah, it didn't use to stop you https://medium.com/@longtermsec/chrome-just-hardened-the-navigator-beacon-api-against-cross-site-request-forgery-csrf-690239ccccf
[19:43:55] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1102.eqiad.w...
[19:44:37] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1103.eqiad.w...
[19:45:20] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1104.eqiad.w...
[19:45:59] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1105.eqiad.w...
[19:46:40] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1106.eqiad.w...
[19:47:39] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1107.eqiad.w...
[19:57:45] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1104.eqiad.wmnet'] `  Of which those **FAILED**: ` ['an-worker1...
[20:02:35] <ottomata>	 nuria:  have you seen this task?  it is so cool
[20:02:35] <ottomata>	 https://phabricator.wikimedia.org/T257527
[20:03:51] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1104.eqiad.w...
[20:08:08] <nuria>	 ottomata: INDEED
[20:09:04] <nuria>	 ottomata: "sampling fractions for each of failures and successes" nice
[20:16:16] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1104.eqiad.wmnet'] `  Of which those **FAILED**: ` ['an-worker1...
[20:17:23] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1113.eqiad.w...
[20:17:26] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1112.eqiad.w...
[20:17:34] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1111.eqiad.w...
[20:17:38] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1110.eqiad.w...
[20:17:43] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1109.eqiad.w...
[20:17:47] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1108.eqiad.w...
[20:20:49] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1102.eqiad.wmnet'] `  and were **ALL** successful.
[20:23:24] <cdanis>	 😊
[20:23:28] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1104.eqiad.w...
[20:24:35] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1114.eqiad.w...
[20:25:06] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1115.eqiad.w...
[20:25:44] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1116.eqiad.w...
[20:26:52] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` an-worker1117.eqiad.w...
[20:26:56] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1107.eqiad.wmnet'] `  and were **ALL** successful.
[20:35:03] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1106.eqiad.wmnet'] `  and were **ALL** successful.
[20:35:16] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1103.eqiad.wmnet'] `  and were **ALL** successful.
[20:35:52] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1105.eqiad.wmnet'] `  and were **ALL** successful.
[20:55:02] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1112.eqiad.wmnet'] `  and were **ALL** successful.
[20:55:31] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1110.eqiad.wmnet'] `  and were **ALL** successful.
[20:55:35] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1108.eqiad.wmnet'] `  and were **ALL** successful.
[20:55:37] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1109.eqiad.wmnet'] `  and were **ALL** successful.
[21:00:39] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1104.eqiad.wmnet'] `  and were **ALL** successful.
[21:04:58] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1115.eqiad.wmnet'] `  and were **ALL** successful.
[21:05:31] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1116.eqiad.wmnet'] `  and were **ALL** successful.
[21:17:38] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1113.eqiad.wmnet'] `  Of which those **FAILED**: ` ['an-worker1...
[21:18:22] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1111.eqiad.wmnet'] `  Of which those **FAILED**: ` ['an-worker1...
[21:25:16] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1114.eqiad.wmnet'] `  Of which those **FAILED**: ` ['an-worker1...
[21:32:10] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1117.eqiad.wmnet'] `  Of which those **FAILED**: ` ['an-worker1...
[21:44:03] <nuria>	 ottomata: i had loads of thoughts that i added to the task, hopefully cdanis does not mind
[23:30:17] <cdanis>	 nuria: thanks so much!! very appreciated :) I'm on vacation all next week but great to hear from you, probably won't have a detailed reply until I'm back
[23:30:51] <nuria>	 cdanis: sounds good, you let us know
[23:40:37] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Add editors per country data to AQS API  (geoeditors) - https://phabricator.wikimedia.org/T238365 (10Nuria) the typo means we need to backfill correct?