[05:50:48] good morning [05:51:44] (03CR) 10Elukey: [V: 03+2 C: 03+2] Release upstream version 0.36.0 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/598423 (https://phabricator.wikimedia.org/T249495) (owner: 10Elukey) [05:51:57] let's try again superset :) [05:52:08] !log attempt to upgrade Superset to 0.36 - downtime expected [05:52:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:17:50] !log superset upgraded to 0.36 [06:17:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:17:52] finally [06:19:03] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: Upgrade to Superset 0.36.0 - https://phabricator.wikimedia.org/T249495 (10elukey) Superset upgraded! [06:43:40] now is matomo's turn [06:44:28] o/ [06:45:30] for https://gerrit.wikimedia.org/r/c/operations/puppet/+/598681/ I'd need to create /wmf/data/raw/wikidata/dumps/lexemes_ttl in hdfs as the analytics user [06:51:37] dcausse: o/ [06:52:24] dcausse: done [06:53:04] I can merge if you want [06:53:29] elukey: oh thanks! yes that'd be great if you have time! :) [06:56:18] dcausse: done! (and deployed) [06:56:27] thanks! [07:04:28] !log matomo upgraded to 3.13.5 on matomo1001 [07:04:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:05:56] Hi team [07:06:00] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: Upgrade to Superset 0.36.0 - https://phabricator.wikimedia.org/T249495 (10elukey) [07:06:03] Internet is back, yay [07:06:10] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Investigate sporadic failures in oozie hive actions due to Kerberos auth - https://phabricator.wikimedia.org/T241650 (10elukey) 05Open→03Resolved [07:06:42] joal: bonjour [07:06:54] what's up elukey, what have I missed? [07:09:15] !log Rerun webrequest-druid-hourly-wf-2020-5-26-17 [07:09:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:10:35] joal: nothing interesting from my point of view, all normal [07:10:56] * joal note thast normal is not interesting to elukey :-P [07:11:01] Mwahaha [07:11:06] cool elukey [07:11:17] anything you'd like me to help with? [07:12:55] joal: nope all good, I just updated matomo and superset [07:13:01] nothing blew up [07:13:05] I've seen that - thanks a lot! [07:13:22] there's also turnilo, I'll work on it soon-ish [07:13:28] ack [07:13:50] do you want to make a coffee-cave for a minute? so that we share talking? [07:17:22] joal: sure, do you mind if I make coffe meanwhile we talk? [07:17:28] please :) [07:28:48] elukey: T251858 [07:28:50] T251858: hdfs-rsync of mediawiki history dumps fails due to source not present (yet) - https://phabricator.wikimedia.org/T251858 [07:58:07] fdans, elukey: if you have some time this morning I'd gladly accept CRs on those two patches before I deploy this afternoon :) Many thanks! https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/598706/ and https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/598722/ [08:41:25] 10Analytics, 10Performance-Team: Unable to access Kerberos keytab - https://phabricator.wikimedia.org/T253730 (10Gilles) [08:45:11] 10Analytics, 10Performance-Team: Unable to access Kerberos keytab - https://phabricator.wikimedia.org/T253730 (10Gilles) a:05Gilles→03None [08:47:12] 10Analytics, 10Performance-Team: Unable to access Kerberos keytab - https://phabricator.wikimedia.org/T253730 (10elukey) @Gilles the error is explained in the doc that you linked, you are missing `sudo -u analytics-privatedata` :) [09:08:33] (03CR) 10Elukey: [C: 03+1] "Don't have a lot of context on the table but it looks good :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/598706 (https://phabricator.wikimedia.org/T251749) (owner: 10Joal) [09:20:11] (03CR) 10Fdans: "just a question on numerical types, otherwise looks good to me" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/598706 (https://phabricator.wikimedia.org/T251749) (owner: 10Joal) [09:25:33] (03CR) 10Joal: "Thanks for reviews :)" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/598706 (https://phabricator.wikimedia.org/T251749) (owner: 10Joal) [09:26:00] (03CR) 10Fdans: [C: 03+1] Add page_restrictions table to sqoop script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/598706 (https://phabricator.wikimedia.org/T251749) (owner: 10Joal) [10:26:56] * elukey lunch! [10:27:02] (03CR) 10Hashar: "recheck" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/598722 (https://phabricator.wikimedia.org/T252565) (owner: 10Joal) [10:27:04] (03CR) 10Hashar: "recheck" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/598706 (https://phabricator.wikimedia.org/T251749) (owner: 10Joal) [13:19:16] ottomata: o/ if it's possible to increase retention on jumbo mediawiki.revision.create up to 30days that'll help me to run a first test [13:24:45] ok sure, can you make a ticket and I'll do? just want a bit of a paper trail [13:24:49] dcausse: ^ [13:25:03] ottomata: sure! [13:37:05] 10Analytics, 10Wikidata, 10Wikidata-Query-Service: Increase retention for mediawiki.revision-create on the kafka jumbo cluster - https://phabricator.wikimedia.org/T253753 (10dcausse) [13:37:59] 10Analytics, 10Wikidata, 10Wikidata-Query-Service: Increase retention for mediawiki.revision-create on the kafka jumbo cluster - https://phabricator.wikimedia.org/T253753 (10dcausse) [13:39:35] dcausse: gonna do 31 days just to get full months [13:39:47] ottomata: thanks! [13:42:02] !log increased Kafka topic retention in jumbo-eqiad to 31 days for (eqiad|codfw).mediawiki.revision-create - T253753 [13:42:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:42:05] T253753: Increase retention for mediawiki.revision-create on the kafka jumbo cluster - https://phabricator.wikimedia.org/T253753 [13:42:16] 10Analytics, 10Wikidata, 10Wikidata-Query-Service: Increase retention for mediawiki.revision-create on the kafka jumbo cluster - https://phabricator.wikimedia.org/T253753 (10Ottomata) Did 31 days: ` $ kafka topics --alter --topic eqiad.mediawiki.revision-create --config retention.ms=2678400000 $ kafka topic... [13:43:23] hi team [13:43:47] hi mforns ! [13:44:08] joal: if you want to talk about mediawiki history metric explosions, I'm here [13:44:12] heya :] [13:46:38] mforns: q for ya. i've started implementing some topic canary and ingetsion stuff like we discussed in java (it sucks but ok!) mostly because i figured it would be more useful there for things like stream processing later (making it easier to map from streams to e.g. spark schemas)...but I could also do it in python, which might make it easier to use in airflow? [13:46:42] dunno [13:46:51] i iguess a java CLI could output what airflow might need? [13:50:12] reading [13:51:48] ottomata: what would the code do? generate canary events? [13:51:57] yes, and also identify what topics need to be ingested [13:52:00] via camus or gobblin or whatever [13:52:16] i also figured since those techs are java, it might be useful to be able to use some java funcs to do that for them [13:52:36] ottomata: it feels to me that the "figuring out which topics to import" part would be useful for airflow [13:52:39] yeah [13:52:47] but the canary.. not sure [13:52:55] hmm, it could, actually [13:53:03] it might be nice to schedule the canary events job via arflow [13:53:19] something will at least once an hour find all topics and streams and eventgate instances and figure out canary events and pots them [13:53:22] ok, makes sense [13:53:22] post them [13:53:41] and hm, doing it in java doesn't prevent us from doing that with airflow, right? [13:53:44] that won [13:53:53] the canary job doesn't have a source dataset, just a list of topics [13:54:13] no, we can execute java from airflow, in the worst case, we can use command line [13:54:16] so we don't need airflow there to react to any data presence, just do somethign periodically [13:54:17] bashoperator [13:54:18] yeah [13:54:36] i guess also if the java tooling just has some nice way to export the stream info it gathers via CLI [13:54:42] to json or somethinig [13:54:45] we can always invoke it via pythong [13:54:49] and do whatever we need [13:54:58] yes, the "figuring out which topics to import" part would be used by more systems? [13:55:01] or jobs? [13:55:37] or just the canary-generating one? [13:57:09] there would be two jobs that would use the same code [13:57:44] and why use java? [13:58:24] for canary generator, we need to: [13:58:24] - map kafka topics to stream names and eventgate-isntances [13:58:24] - lookup schema for stream and get examples [13:58:24] - modify examples event to canary event [13:58:24] - POST to canary eventgate [13:58:33] for topics to ingest we need to [13:58:49] - map kafka topics to active stream names (via eventgate instance stream config) [13:58:57] my reasoning for java was [13:59:18] A. camus/gobblin etc. are java and it might be easier to use them if we can implement some class that does what we need [13:59:54] and B. I want to make it easy to map from a stream name to Kafka topics AND their schemas, so that we can automate e.g. spark or flink streaming [13:59:55] so [14:00:19] for example [14:00:21] this kinda stuff [14:00:21] https://gist.github.com/ottomata/ec5cd742fc2d2e894126e31ddc34ebd3 [14:00:35] that's in python, but using java stuff [14:01:37] so in spark I'd like to have a helper that creates a structured data stream from just a stream name [14:01:46] right now you kind of have to manually construct it with a spark schema [14:02:28] hm, it also might make refine to hive nicer too, because we can map from stream name to schema, rather than having to read a single data event and get the schema uri [14:03:01] reading [14:04:13] mforns: also some pros cons on line 81 here [14:04:13] https://etherpad.wikimedia.org/p/analytics-ingestion [14:05:18] ottomata: yea, makes sense in java [14:05:30] def [14:06:10] the only drawback I think: - Using java for monitoring and ingestion outside of analytics stack not easy [14:06:13] ottomata: is eventbus stream config available anywhere for reading? similar to the stream-(secondary) schema pairs in wgEventStreams in wmf-config/InitialiseSettings.php but for primary schemas? I'd love to add those streams to https://mep-index.wmflabs.org/ (for helping keep track of which tables in hive are from mep vs legacy) [14:07:05] ottomata: sorry if I've messed up the terminology, hopefully you understand what I mean [14:07:09] wow! [14:07:39] you are a crazy man! you are cloning mw config and parsing it?!? [14:07:55] :) [14:07:59] cool! [14:08:45] hahaha :D thanks! [14:09:04] bearloga: this is something i'm struggling with too [14:09:14] mostly with myself in https://phabricator.wikimedia.org/T251609 [14:09:25] centralizing stream configs is complicated [14:09:32] but that is what i'm trying to do in that ticket [14:09:33] so [14:09:44] we don't have a centralized place of streams anywhere [14:09:55] the best we have is the list of kafka topics in kafka jumbo cluster [14:10:03] since that has all topics from all kafka's mirrored to it [14:10:07] so [14:10:13] what i'm going to do is [14:10:16] get that list of all topics [14:10:27] and query each eventgate endpoint and ask if it can produce that stream via stream config [14:10:33] some eventgates have static stream config [14:10:40] others have dynamic stream config from mw [14:10:48] it'd be nice if we could put all stream config in mw config [14:11:01] that is this ticket [14:11:01] https://phabricator.wikimedia.org/T251935 [14:11:40] maybe we'll do that one day, but it could be difficult [14:15:09] right, I remember you mentioning that all the tier 1 stream configs are hard-coded on eventgate intake for those and that we can't technically have them in mw because of cache invalidation problem + hitting mw api? [14:15:38] i mostly jsut don't want to couple the prod eventgate instances to another remote api call [14:15:56] i could use mw config as the canonical source and somehow generate the static stuff before deploy for those [14:16:07] i'd be ok with that; as long as runtime eg doesn't reach out to get the stream config [14:16:09] but [14:16:13] https://phabricator.wikimedia.org/T251935#6110028 [14:17:16] "But if there is some bug that say causes EventBus to produce to eventgate-analytics" I realize this may be a dumb question [14:17:33] but how could something like that happen? [14:19:43] mediawiki does produce to both eventagte-main and eventgate-analytics, and will also produce to eventgate-analytics-external when we do T253121 [14:19:43] T253121: EventLogging Server Side client should POST to EventGate - https://phabricator.wikimedia.org/T253121 [14:20:07] if there was some misconfiguration and EventBus accidentally had a stream configured to produce to eventgate-analytics somehow [14:20:09] it would [14:24:37] bearloga: o/ - I am going to merge your change if you are around [14:25:16] ottomata: i'll be honest and admit I don't know the full extent of our CI capabilities but I wonder if that's the kind of safety check that could be instrumented into CI? [14:25:18] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/596221/ [14:25:21] elukey: I'm around :) [14:25:35] thank you! [14:25:58] bearloga: merged! [14:27:31] elukey: grazie! [14:28:05] ahahha :D [14:29:06] elukey: does https://gerrit.wikimedia.org/r/595540 need anything else? [14:29:28] used to be https://gerrit.wikimedia.org/r/c/operations/puppet/+/589320 [14:29:53] bearloga: I'll do another pass today, sorry for the lag, if good I'll merge (I'll need to create keytabs and deploy too, but will not take much) [14:32:47] elukey: ah, okay! thanks! I decided to split it up into multiple patches. this one is just a small one that adds the system user. i'll need to pair with you on using role::statistics::explorer. i'll make a phab task with details [14:33:43] bearloga: if we did it, i'd like to find a nice way to keep each eventgate enforcing that it only allows certain streams [14:34:06] so, we'd need stream config to somehow specify that... or hmmmm [14:34:14] i guess we could just hardcode that in eventgate config.... [14:34:23] it just has a static list of allowed streams...hmmmm [14:34:26] hm, no [14:34:55] we need eventgate-analytics-external to have dynamic stream names too [14:35:07] so yeah, the streams config would have to somehow say what eventgate instance was allowed [14:35:15] or have someway to indicate that fact [14:37:21] ottomata: i actually like the idea of specifying destination aka intake url for each stream in the stream config [14:37:50] i think that could be useful too, but i think we couldn't use the url for this, it is a different url depending on where you are [14:37:53] e.g. [14:38:24] eventgate-analytics-external.discovery.wmnet, intake-analytics..wikimedia.org, eventgate-analytics-external.svc.eqiad.wmnet [14:38:26] are all the same [14:38:32] but we could use a name [14:38:39] it just feels brittle.... [14:38:44] maybe we could abstract it a bit [14:38:49] and make some tags config entry? [14:39:12] and eventgate could have some logic that says the stream only allowed if the stream has some X value in tags? [14:39:29] so eventgate-main would be configured to only allow streams that have 'eventgate-main' or something in tags [14:39:33] i dunno... [14:39:54] right, we don't need to specify url as long as we can specify something that can be used to lookup a url [14:40:07] dude i like that tagging idea [14:40:13] like, a lot [14:40:17] hm, i mean, a URL is ok for e.g. destination [14:40:27] but just not for eventgate enforcing what streams are allowed [14:42:27] the nice thing about the tag idea is that it can be used for destination + eventgate guarding, but can also be used for other things down the line like using tags to enable specific behaviors and properties [14:43:04] and boom, everything can be centralized into a single canonical source of truth [14:43:14] yeah something like that might work [14:43:20] it would def make the stuff i'm doing now much easier... [14:43:21] hmmm [14:43:27] maybe we should revisit the centralizing idea... [14:43:34] esp if it would help you with stuff, and not just me [14:44:28] fewer independently maintained hardcoded pieces of information would be less tech debt and less of a maintenance burden in long term [14:44:43] as you know [14:44:56] yeah [14:44:57] hm [14:45:02] i wonder.... [14:45:16] so i need a bunch of logic to do T251609 [14:45:17] T251609: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 [14:45:24] in addition to all existent stream configs [14:45:38] i wonder if i couuld/should bake that logic into EventStreamConfig extension [14:45:46] it would make ESG more opionated [14:45:58] instead of just giving you the stufff from wgEventStreams [14:46:20] but if it had the ability to say, give you the latest schema URI for a stream, or even get the schema for you... [14:46:21] hm [14:46:37] HMMM [14:46:38] actually [14:46:58] hardcoding the latest schema URI (or maybe just A schema uri) into the stream config would do that [14:47:01] so [14:47:05] add an entry to stream config [14:48:16] schema_uris: [/analytics/legacy/searchsatisfcation/latest], or even, schema_uris: [https://schema.wikimedia.org/secondary/jsonschema/analytics/legacy/searchsatisfaction/latest] [14:48:28] i can map from schema_title to a schema_uri, but that is just due to convention [14:48:45] having explicitURIs would make thigns easier. [14:48:46] hmMmmmmm [14:54:46] ooooooooh [14:59:27] so, in summary: each stream in wgEventStreams would have three pieces of info: stream name, schema uri, tags. EventStreamConfig then has logic to direct events to the correct intake based on the stream's tagging, EventGate instances use the tagging to accept some streams and reject others, and use the schema uri in this one centralized place to look up the schema. [14:59:33] correct? [15:01:37] joal: standup? [15:04:22] yes sorry [15:17:35] 10Analytics, 10Better Use Of Data, 10Event-Platform: Document in-schema who sets which fields - https://phabricator.wikimedia.org/T253392 (10mpopov) Thank you for the clarification! Maybe we can come back to this later if we see that there's still a lot of confusion around who sets what. [15:20:58] bearloga: kinda, i don't think we'd use eventstreamconfig to do any direction; clients should still configure where the POST the events; they could USE the stream config entires to do that if we support that. [15:21:11] but that would be optional [15:26:28] ottomata: so the instrumentation would have an intake url dictionary at the top mapping tags to destinations and then logEvent uses that lookup where to POST an event? [15:29:41] (for example) [15:37:28] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Performance-Team (Radar): Convert WikimediaEvents to use ResourceLoader packageFiles - https://phabricator.wikimedia.org/T253634 (10jlinehan) [16:10:15] * elukey afk for a bit! [16:19:27] 10Analytics, 10Performance-Team: Unable to access Kerberos keytab - https://phabricator.wikimedia.org/T253730 (10Gilles) 05Open→03Resolved a:03Gilles Duh, thanks [16:41:44] 10Analytics: Web publication doesn't work - https://phabricator.wikimedia.org/T253661 (10jwang) I did some experiments. Seems that I can publish new files but cannot overwrite the existing files. Also manual sync-up didn't reflect immediately. I am wondering how often the auto sync up is at stat1005. Thanks, J... [17:12:03] Hi a-team, I went to stop my server on nb3 but wasn't able to log into jupyter on nb3. I had this same issue on nb4 where I was going to pull the last file that didn't make it through on my last rsync. [17:17:50] iflorez: hi! what file wasn't rsynced? If it is the venv, you'll have to recreate from scratch on the target host (so that one should not be copied) [17:19:07] bearloga: ya perhaps! I mean, destination url might be ok. but what you say is how eventbus works now. e.g. https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/master/wmf-config/ProductionServices.php#47 [17:19:19] in this case that URL is the local envoy tls proxy [17:19:21] one of my GLOW folders didn't make it through [17:19:25] so ya, it gets weird :p [17:19:46] bearloga: another issue I had forgotten: stream config has regex stream names, whcihh makes figuring out the complete list of streams a little difficult [17:19:58] but i think we might just ignore that for now and not worry about the regexes [17:20:05] i think im' going to reopen the ticket about centralizing conifg... [17:20:06] BTW [17:20:13] bearloga: schema.wikimedia.org is an API [17:20:15] not just a gui [17:20:24] so you don't have to clone the schema repos if you don't want to [17:20:49] https://schema.wikimedia.org/repositories/ [17:20:59] it is a json based file listing api [17:21:21] https://schema.wikimedia.org/repositories/secondary/jsonschema/analytics/legacy/ [17:21:32] you can traverse the tree via the api directly [17:45:40] ottomata: that's how i was originally going to do it but decided that doing dozens of requests on every page load was not satisfactory [17:48:30] ottomata: i do love that it's an api, though! :D [17:48:54] success, thank you elukey with the permissions issue. [17:48:57] bearloga: i'd love to see some version of what you are writing deployed in an official place, maybe even schema.wm.org, who knows? [17:49:27] we also want to do data discovery / governance next year so that would be part of that project I thhikn [17:54:54] ottomata: oh that's a good idea! i think a more fleshed out version of the index would be a great addition to that project [17:56:43] * elukey off! [18:26:24] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) 05Declined→03Open Had a discussion with @mpopov about this in IRC today, and then discussed it more with the Analyti... [18:26:29] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10Ottomata) [18:26:31] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: Refactor EventBus mediawiki configuration - https://phabricator.wikimedia.org/T229863 (10Ottomata) [18:36:37] here we go, deploy-time :) [18:37:03] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/598706 (https://phabricator.wikimedia.org/T251749) (owner: 10Joal) [18:38:35] (03PS2) 10Joal: Update sqoop table script for schema changes [analytics/refinery] - 10https://gerrit.wikimedia.org/r/598722 (https://phabricator.wikimedia.org/T252565) [18:38:53] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/598722 (https://phabricator.wikimedia.org/T252565) (owner: 10Joal) [18:39:23] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) Hm, an interesting extension of this idea would be to make the eventgate-wikimedia's `schema_title` checking more flexib... [18:41:59] !log Deploying refinery with scap [18:42:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:54:08] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Pchelolo) > We'd still need to figure out how to make eventgate-wikimedia generate static stream config from EventStreamConfig API... [18:56:35] 10Analytics, 10CheckUser, 10Core Platform Team Workboards (Clinic Duty Team), 10Patch-For-Review, 10Schema-change: Schema changes for `cu_changes` and `cu_log` table - https://phabricator.wikimedia.org/T233004 (10dbarratt) [18:57:34] 10Analytics, 10Wikidata, 10Wikidata-Query-Service: Increase retention for mediawiki.revision-create on the kafka jumbo cluster - https://phabricator.wikimedia.org/T253753 (10JAllemandou) An idea: How about sending back to kafka the update stream and make THAT one retention higher? Moving retention to 30 days... [18:59:50] ottomata: heya - I have an issue with deploy using scap :( [19:00:41] hello! [19:00:42] ok [19:00:44] wassup? [19:00:54] ottomata: error on an-launcher1001, git-fat throwing up a lot [19:01:46] ottomata: 5 hosts deployed correctly, 1 failed - shall I rollback or redeploy manually the failed one? [19:02:20] lets manually redeploy [19:02:36] joal: an-launcher disk is full [19:02:49] I would have guessed that [19:02:57] PROBLEM - Check the last execution of reportupdater-wmcs on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-wmcs https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:03:08] ok so no rollback [19:03:57] PROBLEM - Check the last execution of reportupdater-browser on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-browser https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:04:09] PROBLEM - Check the last execution of reportupdater-reference-previews on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-reference-previews https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:04:19] PROBLEM - Check the last execution of reportupdater-flow-beta-features on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-flow-beta-features https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:04:19] PROBLEM - Check the last execution of reportupdater-page-creation on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-page-creation https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:04:28] hm [19:04:28] right - obviously an-launcher starts breaking because of disk :S [19:04:31] PROBLEM - Check the last execution of reportupdater-edit-beta-features on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-edit-beta-features https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:04:31] hah yeah [19:04:37] PROBLEM - Check the last execution of reportupdater-ee-beta-features on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-ee-beta-features https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:04:40] it sholud recover i just freed some space [19:04:46] ack ottomata [19:04:58] ottomata: shall I redeploy using -l ? [19:05:06] or first wait for stuff to settle down\. [19:05:37] ok shoudl be enough space now [19:05:42] go ahead deploy with -l joal [19:05:47] ack ottomata [19:05:52] we will ned to fix this though, / is only 100G [19:05:53] PROBLEM - Check the last execution of reportupdater-published_cx2_translations on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-published_cx2_translations https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:06:01] PROBLEM - Check the last execution of reportupdater-structured-data on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-structured-data https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:06:18] oh 18G in home! [19:06:25] meh [19:06:34] elukey: 9.1G [19:06:41] PROBLEM - Check the last execution of reportupdater-mt_engines on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-mt_engines https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:06:45] otto 7.6 G [19:06:46] tsk tsk tsk [19:06:51] !log Deploy refinery using scap to an-launcher1001 only [19:06:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:09:53] PROBLEM - Check the last execution of reportupdater-interlanguage on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-interlanguage https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:11:25] PROBLEM - Check the last execution of reportupdater-language on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-language https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:12:01] PROBLEM - Check the last execution of reportupdater-ee on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-ee https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:12:01] PROBLEM - Check the last execution of reportupdater-cx on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-cx https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:12:10] uou [19:15:10] I confirm deploy succeeded on an-launcher1001 - will check on host for possibly non-downloaded jars (usual error after broken deploy) [19:16:18] looks all good - great :) [19:16:42] Thanks a lot ottomata [19:16:54] ottomata: is there anything special we should do about the alerts? [19:22:09] !log restart failed services on an-launcher1001 [19:22:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:23:19] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) > We'd still need to figure out how to make eventgate-wikimedia generate static stream config from EventStreamConfig API... [19:23:43] joal: hm, not sure i'd expect them to just recover? [19:23:47] but restarting them is good oo? [19:24:10] ottomata: I restarted all reportupdater services manually - I hope it will recover [19:24:44] !log Deploy refinery onto hdfs [19:24:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:39:29] PROBLEM - Check the last execution of reportupdater-published_cx2_translations_mysql on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-published_cx2_translations_mysql https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:53:54] !log Start pageview-complete dump oozie job after deploy [19:53:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:57:26] 10Analytics, 10Analytics-Kanban: Add page_restrictions table to hive - https://phabricator.wikimedia.org/T253803 (10JAllemandou) [19:57:43] 10Analytics, 10Analytics-Kanban: Add page_restrictions table to hive - https://phabricator.wikimedia.org/T253803 (10JAllemandou) a:03JAllemandou [20:00:03] RECOVERY - Check the last execution of reportupdater-structured-data on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-structured-data https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:12:41] PROBLEM - Check the last execution of reportupdater-structured-data on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-structured-data https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [21:53:42] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) Oh ho, if I do implement this as an API param in EventStreamConfig, then we don't need any extra eventgate configs; we c...