[01:57:45] I can't reach github from stat1006 for some reason [01:57:48] i can reach google though [01:58:03] wait [01:58:07] I can't reach google either [02:03:53] i changed http proxies and it seems better [06:13:38] 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on dbstore1003 - https://phabricator.wikimedia.org/T239217 (10Marostegui) It looks like disk #4: ` root@dbstore1003:~# megacli -LDPDInfo -aAll Adapter #0 Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name : RAID Level... [07:56:40] Hi team [07:59:00] (03PS4) 10Joal: Fix sqoop after new tables added [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 (https://phabricator.wikimedia.org/T239127) [08:00:06] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merge for next week deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal) [08:09:59] bonjour :) [08:29:14] 10Analytics, 10MediaWiki-Cache, 10Research: Creating a wikipedia CDN caching trace - https://phabricator.wikimedia.org/T239885 (10Addshore) [08:36:37] groceryheist: o/ you should use https://wikitech.wikimedia.org/wiki/HTTP_proxy, let me know if you still have issues [08:42:11] joal: o/ [08:42:32] not sure if you have followed https://phabricator.wikimedia.org/T236180 [08:42:49] but airflow needs a special setting for mariadb (more info in the task) [08:43:00] I have already applied it in test and didn't see anything exploding [08:43:04] I have not followed no [08:43:20] but if you could triple check and tell me your thoughts I would be happier [08:43:25] before applying it to prod :) [08:43:55] (that needs a mariadb restart so I'll have to temporary stop camus etc.. hive/oozie will probably need a restart) [08:44:15] elukey: no need for camus to stop, but hive oowie yes [08:44:35] joal: yes but if camus run then hive is used, this was my point [08:44:44] safer to drain a little bit before doing the restarts [08:46:47] elukey: I don't know enough of how mariadb is used by our systems to provde any value here - A quick read of the doc makes me think this setting is actually good, and the task says this will also become default in next versions, so let's go fir ut :) [08:46:59] elukey: got it for camus, makes sense [08:50:02] joal: yep I am in the same position as you are, but it doesn't seem a problem.. just wanted your opinion :) [08:50:09] all right so I am going to stop DER CAMUS [09:05:04] joal: after a chat with Marcel, we found another kerberos thing to fix (low priority though) [09:05:54] since presto will be kerberized, superset will need to be able to authenticate to it to visualize the data quality dashboards [09:06:07] but of course there doesn't seem to be support yet [09:06:13] Ah - makes sense - so superset will access to a keyrab I guess [09:06:32] in theory yes, we'll have to follow up with superset upstream probably :( [09:06:50] !log temporarily stop timers on an-coord1001 to ease the restart of mariadb on an-coord1001 [09:06:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:09:29] elukey: you now are part of the superset dev team I guess ;) [09:10:53] joal: after a long issue on gh explaining in depth the problem and pointing to the line of code that was triggering it, I got two things [09:11:13] 1) a fix was made but nobody pinged me on the issue, I discovered it just checking related things [09:11:38] 2) I tested the fix and reported that it worked, got a thumb up to my message and the issue was closed [09:11:45] not even a trace of "thank you" [09:12:03] sometimes it is still frustrating to deal with them, even if they improved [09:12:07] this is not the apache way [09:12:09] for sure [09:17:16] :( [09:24:44] going to wait for the cassandra load to finish or ok to restart? [09:30:11] elukey: thinking [09:30:28] elukey: let me pause it, then we can restartand reenable if you don't mind [09:30:58] actually elukey, let's restart without pausing - worst case: one loading fail, I restart it, done [09:31:50] ack [09:34:18] !log stop oozie/hive-*; restart mariadb; restart oozie/hive-* on an-coord1001 to pick up explicit_defaults_for_timestamp - T236180 [09:34:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:34:21] T236180: Deploy search platform airflow service - https://phabricator.wikimedia.org/T236180 [09:34:27] joal: all done [09:34:38] \o/ [09:34:53] elukey: this means there is an available airflow we can play with? [09:34:59] or if not now, soon? [09:35:07] !log enable timers on an-coord1001 after maintenance [09:35:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:35:24] joal: yes check https://phabricator.wikimedia.org/T236180 [09:35:26] we are close [09:36:06] Erik has to init the db and then it should be ready to go [09:36:23] This is awesome :) [09:36:44] Thanks a lot ebernhardson and elukey for setting this up! [09:39:56] it would be nice to have something to talk about in SF [09:40:02] next steps experiences etc.. [10:02:20] 10Analytics, 10Analytics-Kanban, 10Wikidata, 10User-Addshore, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Sqoop wikidata terms tables into hadoop - https://phabricator.wikimedia.org/T239471 (10Addshore) 05Openβ†’03Resolved [10:42:06] 10Analytics, 10Research-Backlog, 10Wikidata: Copy Wikidata dumps to HDFS - https://phabricator.wikimedia.org/T209655 (10GoranSMilovanovic) @JAllemandou Thank you - as ever! [10:51:59] 10Analytics, 10Discovery, 10Event-Platform, 10Wikidata, and 2 others: Log Wikidata Query Service queries to the event gate infrastructure - https://phabricator.wikimedia.org/T101013 (10dcausse) 05Openβ†’03Resolved [11:02:20] * elukey bbiab [11:15:59] joal o/ [11:16:12] I guess I have another query that is having issues due to joins and partitioning ? [11:16:18] maybe.... [11:16:46] https://www.irccloud.com/pastebin/b0pMmORC/ [11:17:16] ^^ I would expect that to result in around 2 billion rows, as wikibase_wbt_item_terms has around 2 billion, instead I end up with less than 1 billion...? [11:26:49] Hi addshore - THis doesn't seem related to partitioning [11:27:15] any ideas? :) [11:29:27] nope, not really [11:29:46] I guess going for joins one after the other would help, but it's cumbersome [11:29:47] hmmm :( [11:30:00] Yes, Maybe I will have to try that [11:35:35] So, it is infact the first join that seems to reduce the number of rows [11:35:40] https://www.irccloud.com/pastebin/2ELCD2er/ [11:44:39] https://github.com/dropbox/PyHive/issues/288 [11:44:40] sigh [11:45:13] so 0.6.2, that contains support for presto and kerberos, has not been released, no response from upstream in months [11:45:17] (dropbox) [11:49:04] joal: my debugging has lead me to believe that this will work.... https://www.irccloud.com/pastebin/k4BwK6vD/ [11:49:08] Now to run it and see [11:50:11] hm - This feels wrong, but maybe :) [11:56:24] Ah! addshore you're actually right, I get it now :) [11:56:27] I guess those joins should actually be wmf_raw.wikibase_wbt_text.wiki_db = wmf_raw.wikibase_wbt_item_terms.wiki_db for example [11:56:39] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Kerberize Superset to allow Presto queries - https://phabricator.wikimedia.org/T239903 (10elukey) [11:57:22] sigh [11:57:27] addshore: yes, it would be better this way - reason for the issue: you do a left join, therefore many rows will have null right-table values, and the filter applies after the join, therefore removing those nulls [11:57:40] Tricky! [11:57:53] Good catch addshore :) [11:58:00] Yes, thats right! now I think about it it makes sense, just struggled to notice it while just staring at the query [11:58:20] disappearing for a bit, lunch + errands [15:05:39] hey yall [15:17:32] o/ [15:28:05] ottomata: the only weird thing about the refine failures is that there's no _REFINE_FAILED flag in those directories, and the data looks the same as other hours. The partitions for that hour aren't on any of the tables, but the data is there [15:29:45] according to the docs, to rerun I'd have to first write the _REFINE_FAILED flag, so I'm reading some code [15:30:12] (and none of the hours have a _REFINED flag, I checked a few days - that's also inconsistent with the docs) [15:30:34] milimetric: there is an ignore flag iirc that will force the execution, but in theory refine should be ok now [15:30:49] since we didn't get any alarm from the monitor stuff [15:30:53] like data missing etc.. [15:31:06] refine's ok, it didn't send failures for the other hours and the other hours are added as partitions to the tables (before and after the hour with the meta failure) [15:31:08] in theory the next execution of refine should have picked up the prev one [15:31:24] yeah, so there's two inconsistencies [15:31:37] 1. none of the hours have a _REFINED flag, and the docs suggest they all should [15:32:03] milimetric: there is or was probably partial data there [15:32:05] 2. the partition for the failed hour is not on the tables [15:32:33] if there is no _REFINED or _REFINE_FAILED flag [15:32:35] hm, doubtful there was partial data since it failed to fetch the schema, I think it does that at the beginning [15:32:46] then refine will attempt to refine the data [15:32:51] overwriting whatever is there what what it gets from raw [15:33:03] hm, yeanh depends on the timing i guess [15:33:03] ottomata: but there's no _REFINED flag on *any* hour [15:33:10] usually a delay of 2 hours will have refeine get everything. [15:33:13] so yeah maybe not [15:33:21] and yeah, there would have been a flag [15:33:22] you are right [15:33:31] why data but no flag? [15:33:32] that is weird [15:33:46] data and partitions are there, but when I say any hour I really mean like going back months [15:33:47] milimetric: on any of the 'failed' hours you mean? [15:33:52] no *ANY* [15:33:53] ???? [15:33:55] ok looking [15:34:02] is it possible that since it failed while fetching from meta, it just didn't write any REFINE_FAILED and the next run picked it up? [15:34:12] like hdfs dfs -ls /wmf/data/raw/eventlogging/eventlogging_MobileWikiAppiOSFeed/hourly/2019/10/01/20 [15:34:25] elukey: no, like in September, October, etc. [15:34:27] no _REFINED flag [15:34:38] nono I meant for REFINE FAILED, not REFINED [15:34:54] no idea if it should be there or not [15:34:55] :) [15:34:59] I think Andrew's right that there's no _REFINE_FAILED because it probably re-refined it automatically [15:35:33] lemme check other schemas, maybe it's just this one [15:35:46] milimetric: there is?i see them [15:35:48] oh [15:35:54] milimetric: _REFINED flag goes in the output data di [15:35:55] dir [15:35:56] not the source [15:36:00] not in raw [15:36:14] sl [15:36:33] hdfs dfs -ls /wmf/data/event/MobileWikiAppiOSFeed/year=2019/month=12/day=4/hour=20 [15:36:36] has _REFINED_FAILED [15:36:37] as expected [15:36:45] so you just need to re-refine with the --ignore_failed_flag=true [15:36:57] ok, I'll clear up the docs on that [15:38:04] ottomata: if we don't see for some reason the email from refine failure, shouldn't we get also an alarm from the monitor refine timer? Or am I remembering incorrectly? [15:38:07] it is easy to miss emails [15:38:26] we did get the email from refine failure [15:38:45] refine monitor will alert if e.g. a refine job hasn't run by the expected time [15:38:54] if the job fails, it has run, refine monitor won't check [15:39:07] refine monitor uses the same logic as refine to find directories that NEED refinement [15:39:19] if the _REFINED_FAILED flag exists, the directory does not need refinement, [15:39:43] often _REFINE_FAILED happens because refine is not possible. this happened more often when we inferred schemas from json data instead of the schemas [15:39:51] so there is no reason to try and re-refine somethign that will never refine [15:42:36] maybe I am misremembering but when an email for refine was sent in the past we had also a correspondent icinga alert for monitor_etcc.. firing [15:42:53] but in this case, as you said, there is nothing wrong from its point of view [15:43:26] my point is that say we don't receive the email about refine failures for $reasons [15:43:40] (spam, problems with emails in prod, etc..) [15:43:54] how can we know that something failed refine? [15:44:22] heh, our only alerting is email, so if we don't receive the email we won't know [15:45:32] elukey: i think if RefineMonitor sends an alert, it does exit(1) [15:45:38] which causes the timer to fail, which ends up in icigna [15:46:14] that doesn't work for Refine, since it runs in YARN [15:46:15] 10Analytics, 10Performance-Team, 10Research, 10Security-Team, and 2 others: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10JFishback_WMF) a:05JFishback_WMFβ†’03Gilles Assigning back to @Gilles but let me know if there's anything... [15:46:53] and, we don't really want to alert just because of _REFINE_FAILED [15:46:59] ideally we'd have an interface like oozie's [15:47:04] where we could see the status of all the refines [15:47:23] switching to airflow or whatever would probably give us that [15:50:44] ottomata: no upon every refine failed, but something that after a day sweeps dirs on hdfs and calls out _REFINE_FAILED might be good as icinga alert [15:51:03] it would be a safe net in my opinion [15:51:31] an icinga alert can't be missed for long, an email definitely can [15:51:31] that would be good [15:52:11] elukey: dynamic icinga alerts aren't really possible though, we can't easily have alerts for each day or hour or whatever [15:52:14] we could have a rolling one though [15:52:30] like, > 0 _REFINE_FAILED in last hour, day, week, whatever [15:53:00] yes yes this is my idea as well [15:53:03] ya that woudl be cool [15:53:56] 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10ArielGlenn) >>! In T182351#5710173, @ArielGlenn wrote: ... > No, there is a ticket for dumping parsed wikitext as it is stored in RESTBase but that's not a full page view with ski... [15:58:34] ottomata: could be a good excercise for me with pyspark? what do you think? [16:33:23] elukey: hey! airflow db initialized correctly now. if you have a moment i typo'd one of the airflow directories in puppet: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/554909/ [16:35:12] heh, i've also been having similar thoughts about failure emails :) An email isn't enough to ensure the failure doesn't get missed [16:35:32] going to merge it in a bit :) [16:35:38] thanks! [16:35:49] re emails: yes we now rely almost exclusively on icinga alerts for systemd timers [16:36:00] I complely agree [16:40:14] ebernhardson: merged and ran puppet on an-airflow1001 [16:40:32] elukey: might not really need spark for it..., might be just an hdfs api thing [16:40:41] you might be able to reuse some of the RefineTarget find logic though [16:40:58] ottomata: sure it is an clear excuse for me to work with spark :D [16:41:05] *a clear [16:48:04] ya but i think it might not really be a spark thing.... [16:48:08] spark wont' really help you much [16:48:18] it'd be more like the tmp cleaner thing i just did [16:48:28] basicallty implementing a find in hdfs [16:48:36] hmmm [16:48:39] but [16:48:39] mabye [16:48:42] using RefineTarget might help [16:48:46] https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/RefineTarget.scala#L476-L581 [16:49:06] not sure if that is the right approach for this though [16:49:21] you might just want to look forΒ any _REFINE_FAILED flags found [16:49:36] RefineTarget considers both input and output [16:49:53] uff you ruined my happiness about spark! :P [16:50:05] now I need to find something else [16:50:06] :D :D :D [16:50:20] hmm actually [16:50:21] you could use that [16:50:29] targets = RefineTarget.find(...) [16:51:22] failedTargets = targets.filter(_.failureFlagExists()) [16:51:43] yeah luca that would do it [16:53:09] is that two lines of spark? :D [16:53:41] anyway joking, opening a task with all the info [16:56:26] elukey: ! wanna see if it works together real quick in hangout? [16:56:29] before standup? [16:58:58] ottomata: I am in! [17:07:34] bearloga: o/ - when you have time, I sent two code reviews for you :) [17:09:51] elukey: thank you! Looking at them now [17:10:48] (03PS1) 10Milimetric: Normalize coordinator properties to bundle props [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554915 [17:15:06] (03CR) 10Milimetric: "moved all the refinery_hive_jar versions to 0.0.100 because one of the coordinators had 0.0.100 and that means the bundle should have that" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554915 (owner: 10Milimetric) [17:17:19] bearloga: fixed your comment thanks :) [17:18:33] (main.sh I meant) [17:19:18] elukey: https://gist.github.com/ottomata/6841ee47f8b4b86081856a1ff636d38f [17:20:21] ottomata: nice! [17:28:38] bearloga: also thanks for puppet, pebkac :) [17:29:41] :) [17:55:44] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Kerberize Superset to allow Presto queries - https://phabricator.wikimedia.org/T239903 (10mforns) We could ping dropbox, to see if they want to upgrade pypi? We could also ping superset if they want to change lib? <-- Maybe better bet. [17:56:04] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Kerberize Superset to allow Presto queries - https://phabricator.wikimedia.org/T239903 (10mforns) p:05Triageβ†’03High [17:57:17] 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped at 2019-11-24Z00:57 and again at 2019-12-01Z22:45 - https://phabricator.wikimedia.org/T239121 (10Jdforrester-WMF) And again. @Ottomata, can you kick webperf1001 again? [17:57:57] 10Analytics, 10MediaWiki-Cache, 10Research: Creating a wikipedia CDN caching trace - https://phabricator.wikimedia.org/T239885 (10mforns) Does this data set answer your needs? Not sure if you're asking for that. https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Caching [17:58:42] 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped due to statsv falling over (?) on webperf1001 - https://phabricator.wikimedia.org/T239121 (10Jdforrester-WMF) [17:59:16] 10Analytics: Add pertinent wdqs_external_sparql_query metrics and wdqs_internal_sparql_query to a superset dashboard - https://phabricator.wikimedia.org/T239852 (10mforns) We should ingest that data into druid, so that is queryable from Superset. [17:59:39] 10Analytics: Add pertinent wdqs_external_sparql_query metrics and wdqs_internal_sparql_query to a superset dashboard - https://phabricator.wikimedia.org/T239852 (10mforns) p:05Triageβ†’03High [18:00:18] 10Analytics, 10Analytics-Kanban: Delay cassandra mediarequest-per-file daily job one hour so that it doesn't colide with pageview-per-article - https://phabricator.wikimedia.org/T239848 (10mforns) p:05Triageβ†’03High [18:01:26] 10Analytics, 10Analytics-Kanban: Delay cassandra mediarequest-per-file daily job one hour so that it doesn't colide with pageview-per-article - https://phabricator.wikimedia.org/T239848 (10mforns) 05Openβ†’03Resolved [18:01:48] 10Analytics: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10mforns) p:05Triageβ†’03High [18:02:54] 10Analytics: Change sqoop project list config so that content sqoop doesn't fail - https://phabricator.wikimedia.org/T239589 (10mforns) p:05Triageβ†’03High [18:03:47] 10Analytics: Check home leftovers of dfoy - https://phabricator.wikimedia.org/T239571 (10mforns) p:05Triageβ†’03Normal [18:04:52] 10Analytics: Check home leftovers of dfoy - https://phabricator.wikimedia.org/T239571 (10mforns) a:03Milimetric [18:05:36] 10Analytics: Check home leftovers of maxsem - https://phabricator.wikimedia.org/T239047 (10elukey) Pinging also @MaxSem :) [18:06:07] 10Analytics: Check home leftovers of dfoy - https://phabricator.wikimedia.org/T239571 (10elukey) ` ====== stat1004 ====== total 1152 -rw-r--r-- 1 2318 wikidev 12564 Mar 24 2017 angola24.txt -rw-r--r-- 1 2318 wikidev 12836 Mar 27 2017 angola26.txt -rw-r--r-- 1 2318 wikidev 12555 Mar 29 2017 angola28.txt -rw... [18:06:10] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10SDC General, 10Wikidata: Create reportupdater reports that execute SDC requests - https://phabricator.wikimedia.org/T239565 (10mforns) p:05Triageβ†’03High [18:09:26] 10Analytics, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Public data set review for T237728 - https://phabricator.wikimedia.org/T239393 (10mforns) I think @JFishback_WMF can help you with this task. [18:09:49] 10Analytics, 10Core Platform Team: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10WDoranWMF) [18:10:17] 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on an-worker1089 - https://phabricator.wikimedia.org/T239365 (10mforns) p:05Triageβ†’03High [18:10:46] 10Analytics: Revise wiki scoop list from labs once a quarter - https://phabricator.wikimedia.org/T239136 (10mforns) p:05Triageβ†’03Low [18:12:35] 10Analytics: Superset getting slower as usage increases - https://phabricator.wikimedia.org/T239130 (10mforns) It could maybe be Druid as well. Let's troubleshoot and determine the cause. [18:12:43] 10Analytics: Superset getting slower as usage increases - https://phabricator.wikimedia.org/T239130 (10mforns) a:03Nuria [18:17:18] 10Analytics, 10Analytics-Kanban: Import slots/slots_roles and wikibase.wbc_entity_usage through scoop - https://phabricator.wikimedia.org/T239127 (10mforns) 05Openβ†’03Resolved [18:17:24] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10mforns) [18:18:05] 10Analytics, 10WMDE-Analytics-Engineering, 10Privacy, 10User-GoranSMilovanovic: Public data set review for T237728 - https://phabricator.wikimedia.org/T239393 (10JFishback_WMF) [18:19:11] 10Analytics, 10WMDE-Analytics-Engineering, 10Privacy, 10User-GoranSMilovanovic: Public data set review for T237728 - https://phabricator.wikimedia.org/T239393 (10JFishback_WMF) p:05Triageβ†’03Normal [18:19:28] 10Analytics: Test if Hue can run with Python3 - https://phabricator.wikimedia.org/T233073 (10mforns) Grosking: We can try to see if it works with CDH5, or we could deprecate Hue, and use sth else? [18:19:56] 10Analytics: Test if Hue can run with Python3 - https://phabricator.wikimedia.org/T233073 (10mforns) p:05Triageβ†’03Normal [18:20:20] 10Analytics: Test if Hue can run with Python3 - https://phabricator.wikimedia.org/T233073 (10mforns) p:05Normalβ†’03High [18:21:01] 10Analytics: Test if Hue can run with Python3 - https://phabricator.wikimedia.org/T233073 (10mforns) p:05Highβ†’03Normal [18:27:57] 10Analytics, 10Wikimedia Design Style Guide: Analytics: Some pages/page requests are not reflected in statistics - https://phabricator.wikimedia.org/T239685 (10mforns) a:03Nuria [18:28:22] 10Analytics, 10Wikimedia Design Style Guide: Analytics: Some pages/page requests are not reflected in statistics - https://phabricator.wikimedia.org/T239685 (10mforns) a:05Nuriaβ†’03Milimetric [18:28:42] 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped due to statsv falling over (?) on webperf1001 - https://phabricator.wikimedia.org/T239121 (10Ottomata) Done. BTW same errors were present in recent logs. [18:29:12] 10Analytics, 10Product-Analytics: Many special pages missing from pageview_hourly dataset starting on July 23, 2019 - https://phabricator.wikimedia.org/T239672 (10Milimetric) a:03Nuria [18:31:52] 10Analytics, 10Product-Analytics: Many special pages missing from pageview_hourly dataset starting on July 23, 2019 - https://phabricator.wikimedia.org/T239672 (10Nuria) This is correct, Special:Page pages other than search should have not been included (there was a long standing bug on the pageview definition... [18:34:03] 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog, 10Wikimedia-Logstash, and 3 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Ottomata) [18:56:09] 10Analytics, 10Core Platform Team: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10WDoranWMF) @mforns Adding @cicalese the PM for this project as well to make sure she has space to correct me. But the timehorizon for this is quite far, and I beli... [19:04:38] I am rebooting an-worker1089 for hw maintenance [19:04:55] if any job fails it might be due to me [19:11:11] 10Analytics, 10Core Platform Team: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10CCicalese_WMF) My understanding was that the fields may be dropped from the replicas earlier, but they would not be dropped from the master until the switchover. [19:16:01] 10Analytics, 10Core Platform Team: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10Nuria) Let's move this item to late Q2 given timelines. [19:18:16] 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog, 10Wikimedia-Logstash, and 3 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Ottomata) Ah, I won't be able to make tomorrow's meeting, I'm not working and I have a grocery coop shift!... [19:22:45] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) - events-.wikimedia.org/v1/events - -events.wikimedia.org/v1/events - intake-.wikim... [19:25:24] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) - logging-sink & analytics-sink (or sink-*) ? [19:25:37] 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on an-worker1089 - https://phabricator.wikimedia.org/T239365 (10elukey) 05Openβ†’03Resolved Disk swapped, raid status ok. Thanks! [19:27:22] * elukey off! [19:27:31] (an-worker1089 back working) [19:50:30] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10mpopov) >>! In T236386#5716552, @Ottomata wrote: > > - events-.wikimedia.org/v1/events > - -events.wiki... [19:56:07] 10Analytics, 10Research: Parse wikidumps and extract redirect information for 1 small wiki, romanian - https://phabricator.wikimedia.org/T232123 (10leila) 05Openβ†’03Resolved [20:02:16] 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on an-worker1089 - https://phabricator.wikimedia.org/T239365 (10Jclark-ctr) Replaced failed drive [20:25:20] 10Analytics, 10Code-Stewardship-Reviews, 10Operations, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319 (10Dzahn) Is this really replacing the IRCd from T134271 ? [20:26:04] 10Analytics, 10SDC General, 10Wikimedia-Stream: Verify that EventStreams work with WikiBase MediaInfo - https://phabricator.wikimedia.org/T210702 (10Abbe98) 05Openβ†’03Resolved a:03Abbe98 [20:30:51] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) > Is /v1/events plural with the intention that eventually EventGate will support batches of events in the same req... [21:31:36] 10Analytics, 10Better Use Of Data, 10Performance-Team, 10Product-Infrastructure-Team-Backlog, 10Product-Analytics (Kanban): Switch mw.user.sessionId back to session-cookie persistence - https://phabricator.wikimedia.org/T223931 (10mpopov) Okay, if `localStorage` is out of the discussion (due to lack of e... [21:53:19] ottomata: The risk assessment for the dataset has been completed, are you free to move the files from my user dir to the release location? [22:27:27] (03CR) 10Nuria: "So I understand, * I think* that without these changes the jobs should still work, correct?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554915 (owner: 10Milimetric) [22:32:17] lexnasser: I think if you move the data under /srv/published-datasets/caching/2019 [22:32:26] lexnasser: it will automagically appear as released [22:32:45] nuria: Do i have the proper user permissions to do that? [22:32:56] lexnasser: i think you should, please do try [22:43:16] nuria: moved successfully to that location, I'd assume it would appear somewhere in this dir, but doesn't appear yet: https://analytics.wikimedia.org/published/datasets/ . How often does it update? [22:47:25] nuria: Just appeared. Success! Thanks for your help, will follow up on ticket and update docs right now [22:48:22] lexnasser: nice [22:48:30] lexnasser: ACHIEVEMENT UNLOCKED [22:49:51] 10Analytics, 10Analytics-Kanban: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10lexnasser) The data has been released! URL: https://analytics.wikimedia.org/published/datasets/caching/2019/ Wikitech will be updated soon. I hope you find this... [22:52:42] (03CR) 10Nuria: Fix sqoop after new tables added (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal)