[00:01:36] 10Analytics, 10Operations: setup/install eventlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#3929813 (10Ottomata) @Robh, sorry I missed your ping on this, Trusty please! :) [00:04:48] tgr: I think that the kafka-jumbo beta hosts have disks filled up, so no data is imported by eventlogging02 :( [00:26:17] 10Analytics, 10Operations: setup/install eventlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#3929857 (10faidon) Trusty has about a year left of upstream support, and likely less for our own purposes. Any reason to not switch to somewhere more recent while we're at it? [00:39:41] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure: EventLogging broken in beta - https://phabricator.wikimedia.org/T185952#3929484 (10elukey) I think that the kafka-jumbo hosts have their disk filled up, so el is not getting any new events :( [00:39:42] 10Analytics, 10Operations: setup/install eventlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#3929884 (10Ottomata) We are holding for Kubernetes! :) When it is ready, we will move the many individual processes (which are managed and monitored via a custom upstart based eventloggingctl scripts... [02:05:40] tgr: sorry, just saw this [02:07:49] nuria_: filed T185952 about it in the meanwhile, it's not that urgent [02:07:50] T185952: EventLogging broken in beta - https://phabricator.wikimedia.org/T185952 [02:10:34] tgr: we are trying a new setting for kafka to force it to drop old data more frequently, but it somehow not working as expected [03:17:45] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure: EventLogging broken in beta - https://phabricator.wikimedia.org/T185952#3929484 (10Tbayer) The problem seems to be more general: it appears that all the EventLogging tables in Hive (in the new "event" database) are likewise lagging behind,... [05:09:47] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure: EventLogging broken in beta - https://phabricator.wikimedia.org/T185952#3930141 (10elukey) So Andrew and I tried to check why the current kafka topic retention policy is not applied: ``` # The minimum age of a log file to be eligible for... [05:15:53] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure: EventLogging broken in beta - https://phabricator.wikimedia.org/T185952#3930142 (10elukey) >>! In T185952#3930033, @Tbayer wrote: > The problem seems to be more general: it appears that all the EventLogging tables in Hive (in the new "even... [05:16:16] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure, 10User-Elukey: EventLogging broken in beta - https://phabricator.wikimedia.org/T185952#3930143 (10elukey) [05:46:45] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure, 10User-Elukey: EventLogging broken in beta - https://phabricator.wikimedia.org/T185952#3930153 (10Tbayer) >>! In T185952#3930142, @elukey wrote: >>>! In T185952#3930033, @Tbayer wrote: >> The problem seems to be more general: it appears... [05:55:15] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure, 10User-Elukey: EventLogging broken in beta - https://phabricator.wikimedia.org/T185952#3930164 (10elukey) I don't think the two things are related, but I'll triple check with Andrew tomorrow. [11:00:06] 10Analytics-Kanban, 10EventBus, 10Pywikibot-core, 10Patch-For-Review: EventStreams doesnt find any messages anymore - https://phabricator.wikimedia.org/T184713#3930393 (10Xqt) sseclient>=0.0.18 is required for py2 [11:35:02] 10Analytics-Kanban, 10EventBus, 10Pywikibot-core, 10Patch-For-Review: EventStreams doesnt find any messages anymore - https://phabricator.wikimedia.org/T184713#3930437 (10Xqt) I guess I have it. sseclient >= 0.0.18 and requests >= 2.9 should solve this problem. [14:26:15] (03CR) 10Joal: [C: 031] "Comments about using a list of functions instead of a signle one, and also a possible unwanted change. Overall looks great :)" (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/405800 (https://phabricator.wikimedia.org/T185237) (owner: 10Ottomata) [15:12:13] elukey, ottomata[m]: there's a disk space warning in Icinga for meitnerium, disk space usage is at 94% for the root partition with /var/lib/archiva , with /var/lib/archiva using 32.5 out of 49 GB [16:00:38] moritzm: ack, will check with Andrew in a bit! [16:24:48] Thanks a lot fot that elukey [16:45:48] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure, 10User-Elukey: EventLogging broken in beta - https://phabricator.wikimedia.org/T185952#3930959 (10Tbayer) Thanks @elukey - yes, this was just a guess, based on the fact that the Hive tables stopped updating on the same day (January 26).... [16:47:09] Hi elukey - is there something I can help with? [17:14:06] joal: o/ [17:15:47] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure, 10User-Elukey: EventLogging broken in beta - https://phabricator.wikimedia.org/T185952#3931069 (10elukey) I truncated all the eventlogging topic partitions on the deployment-kafka-jumbo hosts and restarted the brokers, let's see if the l... [17:16:04] joal: we have just tried to fix --^ but nothing super urgent [17:16:27] all good from your side? [17:19:29] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure, 10User-Elukey: EventLogging broken in beta - https://phabricator.wikimedia.org/T185952#3931091 (10Tgr) Thanks! Event logging in beta seems to work fine now. [17:21:54] I’m fiddling around with the mediawiki_history table in the Data Lake, and it seems that “revision_deleted_timestamp” for a revision that creates a page is set to the timestamp of the last revision before deletion. Is that by design? Based on the documentation, I’d think it would reflect the page deletion event in the logging table instead? [17:22:20] elukey: super :) [17:23:31] joal: there is something not super urgent though, namely archiva getting to 94% disk usage [17:23:38] Hi nettrom - The deletion timestamp value on revisions depends on if we have managed to extract the deletion timestamp from the logging table [17:23:50] elukey: I've seen that ... Crap :( [17:25:03] joal: it is a ganeti host so we can just probably stop the host, expand the root partition, start again [17:25:36] joal: I am a bit ignorant about archiva but would it be fine to shutdown it anytime [17:25:39] ? [17:25:44] or do we need to do something before? [17:25:57] joal: hmmm, okay! does that mean that the identification of the deletion event never relies on page ID being the authoritative identifier? [17:26:05] elukey: I think if archiva is down it means our build system is not operational anymore [17:26:15] elukey: fine for some time, but maybe not for long [17:26:23] I’ll dig a little further, because IIRC it’s a problem for both older and more recent pages [17:26:33] elukey: This issue goes with the task about cleaning jars in refinery [17:28:07] Nettrom: This is the piece of code that does the thing: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/mediawikihistory/denormalized/DenormalizedRevisionsBuilder.scala#L71 [17:29:34] joal: thanks for the link, I’ll go take a look :) [17:29:39] joal: yep definitely, but 40G of root partition is too low.. getting to 100G seems reasonable [17:29:42] opening a task [17:29:46] Nettrom: If we have managed to correctly match a delete timestamp for the page, we use it - otherwise, we use last-archived revision timestamp, otherwise current revision timestamp [17:31:13] Nettrom: If it happens broadly, it's probably due to issues building correct deletion/restore history of pages (those are really kinda complicated) [17:34:15] 10Analytics-Kanban, 10Operations, 10User-Elukey: Expand meitnerium's root partition to 100G - https://phabricator.wikimedia.org/T186020#3931188 (10elukey) [17:34:18] joal: --^ [17:34:57] Thanks elukey - by the way - Have you silemced the alarm? [17:38:01] joal: only a warning for the moment, should be good [17:40:47] 10Analytics-Kanban, 10Operations, 10User-Elukey: Expand meitnerium's root partition to 100G - https://phabricator.wikimedia.org/T186020#3931235 (10elukey) Looking to https://wikitech.wikimedia.org/w/index.php?title=Ganeti#Resize_a_VM, it might be less painful to create a new disk, format it and then use it a... [17:54:23] 10Analytics-Kanban, 10Operations, 10User-Elukey: Expand meitnerium's root partition to 100G - https://phabricator.wikimedia.org/T186020#3931334 (10MoritzMuehlenhoff) Yeah, it's probably easiest to add a new disk and move /var/lib/archiva to it [18:03:45] joal, Nettrom : let's please document these findings on data set on some FAQ so we do not have to remember the details [18:05:11] joal: would here be a good place? https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_page_history#FAQ [18:05:19] cc Nettrom [18:14:09] joal & nuria_ : I’m not sure about this. Using page creations from July 2015, where page IDs should work as authoritative identifiers, I find no deletion timestamps in the data lake match the logging table. Even found one where the “revision_deleted_timestamp” is the patrol timestamp from the logging table. I’ll work on documenting this and open a ticket on Phab. [18:14:39] Nettrom: super thanks [18:20:39] ottomata: thoughts about the EL lag on Hive https://phabricator.wikimedia.org/T185952#3930033 ? [18:24:43] Thanks a lot Nettrom [19:43:20] 10Analytics-Kanban, 10Operations, 10User-Elukey: Expand meitnerium's root partition to 100G - https://phabricator.wikimedia.org/T186020#3931925 (10elukey) So meitnerium seems to be on ganeti1005 that has a ton of disk space free, so in theory the only thing needed to create the new disk would be the followin... [20:00:49] when I put messages to kafka, where the timestamp comes from? is it the time when kafka received the message or can I set it somehow? [20:03:08] SMalyshev: IIRC the producer is responsible for the timestamp, so it either sets up when producing or you can set it explicitly [20:03:14] but I'd need to recheck [20:04:05] is there a way to make kafkacat set timestamps? [20:05:13] not that I know, it might be something to add as a feature [20:05:33] (it is a relatively new feature for kafka producers) [20:06:22] hmm... is there any other tool that can copy kafka streams maybe? [20:10:50] kafka-console-producer doesn't have any mentions of timestamps either :( [20:14:36] Hi SMalyshev - My understanding of kafka timestamp is that it is an insertion-timestamp [20:15:07] SMalyshev: If you're after a consumption-timestamp, I think you need to handle it by yourself [20:16:00] joal: what I want is this: I have a set of messages in prod kafka cluster, and I have a tool that accesses them using offsetsForTimes [20:16:19] I want to test it, which requires recreating the stream on my local kafka [20:16:34] I can recreate the messages, but I can not recreate the timestamps [20:17:10] if I just dump it and put it into my kafka, all messages would be in the same second, so I can not test offsetsForTimes functionality [20:18:03] SMalyshev: Just double checking I understand the ofsetsForTimes function - I'm assuming you're willing to consume only from a certain timestamp [20:18:35] * elukey defers to Joseph's wisdom [20:19:06] * joal is a bit afraid of elukey's view-points [20:19:58] SMalyshev: My way to recreate a topic with fake timestamps would be to write a simple producer to kafka that generates timestamps as I wish [20:21:18] SMalyshev: Would my idea be an acceptable solution? [20:27:00] joal: but kafkacat cannot do it right now afaik, so it should be a producer written manually by SMalyshev (in python/etc..) [20:27:40] correct elukey - That's the only way I can think of - Or, second idea, slow down insertion through scripting of kafkacat reading [20:28:36] SMalyshev: thoughts? [20:46:25] Gone for tonight lads [21:47:06] joal: yeah that would probably work but I was hoping to avoid it and use standard tools [21:47:25] but looks like I have to do it... thanks for the advice [22:35:35] (03PS4) 10Fdans: Launch top pageviews by country on dashboard [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/405708 (https://phabricator.wikimedia.org/T185510) [23:28:33] 10Analytics, 10Analytics-EventLogging, 10Beta-Cluster-Infrastructure, 10User-Elukey: EventLogging broken in beta - https://phabricator.wikimedia.org/T185952#3932886 (10elukey) 05Open>03stalled [23:30:00] 10Analytics-Kanban, 10Patch-For-Review: Launch top per country pageviews on UI - https://phabricator.wikimedia.org/T185510#3932891 (10Nuria) Another UX issue: i think the string "pageviews by country" needs to be more specific regarding time "Monthly Pageviews by country " might be better. [23:32:35] 10Analytics-Kanban, 10Analytics-Wikistats: UI not querying daily granularity for 3-month and 1-month periods - https://phabricator.wikimedia.org/T186075#3932906 (10fdans) [23:36:08] 10Analytics-Kanban, 10Patch-For-Review: Launch top per country pageviews on UI - https://phabricator.wikimedia.org/T185510#3932924 (10fdans) @Nuria the granularity is specified both in the dashboard card (Top countries for December), and in the title in the detail page (which is not being shown right now, see... [23:39:41] 10Analytics-Kanban, 10Analytics-Wikistats: UI not querying daily granularity for 3-month and 1-month periods - https://phabricator.wikimedia.org/T186075#3932947 (10fdans) p:05Triage>03Unbreak!