[05:12:35] morning! [05:12:48] so we need to bounce the processors to apply the whitelist [05:12:49] sigh [05:12:57] I didn't know it [05:20:44] addshore: o/ [05:21:01] whenever you have time - https://phabricator.wikimedia.org/T205846#4694101 [05:55:27] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move users from stat1005 to stat1007 - https://phabricator.wikimedia.org/T205846 (10elukey) [05:55:31] !log reportupdater hadoop migrated to stat1007 [05:55:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:18:05] !log add AAAA DNS records for aqs and matomo1001 [06:18:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:14:56] Hi team - got disconnected yesterday - Anything I missed? [07:17:25] (03PS1) 10Elukey: Add info to the README about how to build turnilo [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/469842 [07:17:30] not that I can see :) [07:18:05] (03CR) 10Elukey: [V: 032 C: 032] Add info to the README about how to build turnilo [analytics/turnilo/deploy] - 10https://gerrit.wikimedia.org/r/469842 (owner: 10Elukey) [07:18:18] morning elukey :) [07:18:28] bonjour! [07:18:53] This morning I continue to resume the pile of jobs that were stuck :( [07:19:18] I will also merge the various patches I have that have been +1ed [07:20:41] I didn't get what happened yesterday with those jobs [07:22:24] elukey: I have no clue why the jobs disn't fail for real [07:22:44] The problem was the /user/hive/hive-site.xml file permissions [07:23:35] Andrew pushed a patch to have the file copied from stat1004 instead of an-coord1001 - Meaning no paswd inside, and therefore can be +r-all [07:24:01] The problematic jobs were under not-hdfs users [07:24:36] weird [07:24:44] how so? [07:25:35] that we didn't notice jobs stuck in that state [07:25:51] I mean, we carefully checked a ton of things [07:26:08] it is sad :( [07:26:36] elukey: Actually it was tricky - The jobs didn't fail, there is no SLA in those jobs - The only way to notice was to look in the "workflows" tab of hue [07:27:09] probably worth to add a SLA by default? [07:27:14] even a big one [07:27:16] elukey: I do agree - it is however more difficult to catch errors on stuff that fail silently [07:27:35] elukey: why not - those jobs are discovery - We should ask them [07:28:32] oh yes I am not saying that we were sloppy, but only that it is sad that things like jobs of other people fell through the cracks [07:28:53] yes [07:29:02] joal: as FYI I move the hdfs report updater jobs to stat1007 this morning [07:29:08] also logged in sal [07:29:20] super [07:29:27] (03PS7) 10Joal: Update DataFrameToHive for dynamic partitions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465202 (https://phabricator.wikimedia.org/T164020) [07:29:29] (03PS8) 10Joal: Add webrequest_subset_tags transform function [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465206 (https://phabricator.wikimedia.org/T164020) [07:29:31] (03PS5) 10Joal: Add WebrequestSubsetPartitioner spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468322 (https://phabricator.wikimedia.org/T164020) [07:31:52] (03PS6) 10Joal: Add oozie job partitioning webrequest subset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/357814 (https://phabricator.wikimedia.org/T164020) [07:32:20] (03CR) 10Joal: [V: 032 C: 032] "Merging to deploy next week" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465202 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal) [07:33:09] (03CR) 10Joal: [C: 032] "Merging for deploy next week" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465206 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal) [07:33:32] I also added a note to the turnilo repo about how to build it [07:33:43] elukey: I have seen that - many thanks :) [07:34:40] (03CR) 10Joal: "ping ottomata - Would like to merge this for next week deploy" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468322 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal) [07:36:31] joal: whenever you have time I'd like to talk about those banner impressions, so I can make a plan :) [07:36:59] I have time elukey :) [07:37:21] joal: quick batcave? [07:37:25] suer [08:05:46] (03Merged) 10jenkins-bot: Update DataFrameToHive for dynamic partitions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465202 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal) [08:13:13] (03Merged) 10jenkins-bot: Add webrequest_subset_tags transform function [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465206 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal) [08:36:38] 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on aqs1006 - https://phabricator.wikimedia.org/T206915 (10jijiki) [09:27:29] 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on aqs1006 - https://phabricator.wikimedia.org/T206915 (10elukey) https://www.thegeekdiary.com/replacing-a-failed-mirror-disk-in-a-software-raid-array-mdadm/ is a good reference about how to swap the disk [09:29:36] (03PS7) 10Joal: Add spark code for wikidata json dumps parsing [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726 [09:48:14] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move users from stat1005 to stat1007 - https://phabricator.wikimedia.org/T205846 (10elukey) [09:54:58] 10Analytics, 10Analytics-Kanban: Geoip data archive repository cause puppet to run for minutes - https://phabricator.wikimedia.org/T208028 (10elukey) p:05Triage>03High [09:55:03] sigh --^ [09:56:15] :( [10:05:15] 10Analytics, 10Analytics-Kanban: Geoip data archive repository cause puppet to run for minutes - https://phabricator.wikimedia.org/T208028 (10elukey) [10:19:01] joal: I suppose that all the alarms are not you restarting right? [10:19:47] elukey: Ahh! This is me indeed restarting and having weord failures - I forgot I wasn't in my home-oozie anymore - Excuse me for the spam :( [10:20:22] However the pageview-hourly thing is different [10:20:29] there's also a camus failure [10:20:40] I wonder if the failures I have experienced with wikitext is not a more broader one [10:20:43] Yes [10:20:51] IO error: /tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/006532.log: Read-only file system [10:20:51] Ok - saving-the-cluster mode [10:20:55] wgattttt [10:21:05] hm - feels like HDFS has entered safemode [10:25:08] Safe mode is OFF in an-master1001.eqiad.wmnet/10.64.5.26:8020 [10:25:08] Safe mode is OFF in an-master1002.eqiad.wmnet/10.64.21.110:8020 [10:25:16] :( [10:25:48] that's good no? [10:26:05] well, it is, but must mean that something else is goind on no? [10:26:37] so for camus I suspect that the node manager on which it was running was not ok [10:27:08] https://yarn.wikimedia.org/cluster/app/application_1539594093291_33569 [10:27:40] must be an1042 [10:28:01] elukey@analytics1042:~$ cat /et-bash: cannot create temp file for here-document: Read-only file system [10:28:04] yep :D [10:28:10] okey [10:28:28] [4892482.602481] EXT4-fs error (device dm-1): ext4_validate_inode_bitmap:96: comm java: Corrupt inode bitmap - block_group = 193, inode_bitmap = 6291473 [10:28:31] [4892482.617607] Aborting journal on device dm-1-8. [10:28:34] [4892482.623331] EXT4-fs (dm-1): Remounting filesystem read-only [10:28:43] I'm gonna restart the failed jobs in a minute [10:28:55] wait a sec, lemme disable the nm first [10:29:08] sure [10:32:07] done! [10:32:39] need to go in a bit but jobs should be unblocked now [10:32:52] luckily this is the last host of the batch that needs to be refreshed :D [10:32:54] k - will check failed jobs [10:35:06] joal: wait a little more [10:35:10] k [10:35:21] I've run fsck, corrected some errors and now I am rebooting [10:36:02] elukey: only 2 pages left of jobs to resume \o/ ! [10:40:14] thanks a lot for this work joal :( [10:40:30] elukey: no problem - Had to be done :) [10:40:33] hey all :] [10:40:41] morning mforns [10:40:51] hey p [10:40:52] o/ [10:40:56] :] [10:41:04] so an1042 seems back up [10:41:07] with root in rw [10:41:09] weeeird [10:41:14] yes [10:41:17] :( [10:41:55] ok joal, let's see if it still keep going [10:42:06] yessir [10:42:09] I am inclined to mark this as temporary weirdness [10:42:18] I can now restart stuff and all, right? [10:42:22] and it will be decom hopefully soonish [10:42:26] I do hope it is so [10:42:30] yep exactly! [10:42:34] ok, moving [10:42:36] also an1028 had a failed disk [10:42:44] I am starting to see a pattern from the same batch :D [10:42:44] Thanks elukey for unweirding an1042 :) [10:42:52] gently kicking it :D [10:42:58] Oh no - please- Don't jixe that [10:43:05] ;) [10:43:07] mforns: now I have to go but I'll explain what happened in here [10:43:26] ok elukey [10:44:05] going afk for a couple of hours! [10:44:11] later elukey [11:49:03] !log Rerun failed oozie jobs (pageview and projectview) [11:49:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:02:10] joal: all good? [13:02:26] elukey: still some rerun ongoing, but so far so good [13:02:29] nothing broke again [13:02:41] wow I need to send beers to you after today [13:04:15] joal: do you have a minute for a IRC brainbounce? [13:04:18] since I feel stupid [13:04:40] I do have time, and please stop being stupid thinking you're stupid :) [13:05:21] ahahah [13:05:28] so the puppet run on stat1005 [13:05:42] takes 5 minutes [13:05:43] !!! [13:06:05] and the bulk of time is spent in read(9,.. [13:06:17] and 9 is [13:06:17] elukey@stat1005:~$ sudo file /proc/35849/fd/9 [13:06:18] /proc/35849/fd/9: symbolic link to /usr/share/GeoIP/archive/2014-09-14/GeoIP.dat [13:06:48] the funny part is that [13:06:51] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/469855/1/modules/profile/manifests/statistics/private.pp [13:07:00] the archiver now runs only on stat1007 [13:07:43] and if I use cumin to confirm [13:07:43] elukey@cumin2001:~$ sudo cumin 'R:file = "/usr/share/GeoIP/archive"' 'ls -l' --dry-run [13:07:47] 1 hosts will be targeted: [13:07:48] o/ [13:07:49] stat1007.eqiad.wmnet [13:07:51] o/ [13:07:55] how are you feeling dan? [13:08:07] Hi Dan :) [13:08:26] heyall, I'm ok, both baby and I still have this weird mild cold [13:08:28] elukey: anything still leaving in puppet about GeoIP data? [13:08:32] joal: so I am a little bit confused, is there anything that I am missing? [13:08:45] it seems so yes [13:10:55] :S [13:24:08] 10Analytics, 10Analytics-Kanban: Geoip data archive repository cause puppet to run for minutes - https://phabricator.wikimedia.org/T208028 (10elukey) As an experiment (and to facilitate the copy of the data over to stat1007 via rsync) I moved /usr/share/GeoIP/archive to /home/elukey/archive, and the time taken... [13:31:03] joal: very nice - I move the dir to my home, and now puppet runs in 60s [13:31:14] so it seems that it tries to read the content of the dir when it ensures it [13:31:26] but I am still wondering why it keeps doing it on stat1005 [13:31:31] maybe I am missing some class [13:31:35] this seems bizarre indeed [13:32:03] elukey: Did puppet recreate the folder after you moved it? [13:32:26] nope [13:32:44] then it didn't try to ensure it, right? [13:33:11] well in theory it shouldn't even have tried before, so it is doing the correct thing now [13:33:21] for some reason before it was reading the content of the dir [13:33:37] Meh [13:35:31] ahhh joal [13:35:32] elukey@cumin2001:~$ sudo cumin 'R:file = "/usr/share/GeoIP" and stat*' 'ls -l' --dry-run [13:35:35] 4 hosts will be targeted: [13:35:36] maybe this is the issue [13:35:38] stat[1004-1007].eqiad.wmnet [13:35:58] ? [13:36:07] /usr/share/GeoIP is managed via puppet [13:36:19] that is the parent dir of archive [13:36:32] in all the stats [13:37:30] Ahhhh :) [13:37:33] This explains :) [13:37:45] ok - oozie seems back in order ... pfff [13:37:58] gone for kids, will be back for standup [13:38:42] and now stat1007 does the same [13:38:43] ahahahh [13:38:46] I love puppet [13:40:02] it must be geoip::data::puppet [13:41:53] 10Analytics, 10Analytics-Kanban: Geoip data archive repository cause puppet to run for minutes - https://phabricator.wikimedia.org/T208028 (10elukey) Copied over the files to stat1007, same problem. I think it is due to the following: ``` class geoip::data::puppet( # lint:ignore:puppet_url_without_modules... [13:42:16] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move users from stat1005 to stat1007 - https://phabricator.wikimedia.org/T205846 (10elukey) [13:42:18] 10Analytics, 10Analytics-Kanban: Geoip data archive repository cause puppet to run for minutes - https://phabricator.wikimedia.org/T208028 (10elukey) [13:42:25] mforns: o/ [13:42:36] if you have time we can discuss about the alarms [13:42:37] hey elukey :] [13:42:40] yes! [13:42:45] here or bc? [13:42:48] I'd also like to chat with you about Eventlogging2Druid [13:42:53] bc is fine if you have time! [13:43:00] sure omw! [13:49:41] (03CR) 10Milimetric: "few thoughts, and single quotes! :)" (035 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468964 (https://phabricator.wikimedia.org/T206968) (owner: 10Fdans) [14:22:47] Hi elukey you pinged me? [14:28:16] addshore: yes! Whenever you have time https://phabricator.wikimedia.org/T205846#4694101 [14:30:07] elukey, here's the docs on how to configure datasources in the Turnilo config file [14:33:52] 10Analytics, 10Analytics-Data-Quality, 10Contributors-Analysis, 10Product-Analytics, 10Growth-Team (Current Sprint): Resume refinement of edit events in Data Lake - https://phabricator.wikimedia.org/T202348 (10Neil_P._Quinn_WMF) >>! In T202348#4693441, @nettrom_WMF wrote: > Thanks for the feedback, @Nuri... [14:34:12] elukey, and the turnilo config file is this one: https://github.com/wikimedia/puppet/blob/production/modules/turnilo/templates/config.yaml.erb [14:35:09] yep yep thanks! [15:02:47] milimetric: yt? could use a code arch brain bounce if you have a min [15:04:36] ya ottomata, omw cave [15:06:21] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Geoip data archive repository cause puppet to run for minutes - https://phabricator.wikimedia.org/T208028 (10elukey) a:03elukey [15:24:22] 10Analytics-Kanban, 10User-Elukey: Q1 2018/19 Analytics procurement - https://phabricator.wikimedia.org/T198694 (10elukey) [15:25:06] 10Analytics-Kanban, 10User-Elukey: Q1 2018/19 Analytics procurement - https://phabricator.wikimedia.org/T198694 (10elukey) [15:25:09] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Presto cluster online and usable with test data pushed from analytics prod infrastructure accessible by Cloud (labs) users - https://phabricator.wikimedia.org/T204951 (10elukey) [15:25:11] 10Analytics, 10Operations, 10hardware-requests, 10User-Elukey: eqiad | (3) Labs Data Lake hardware - https://phabricator.wikimedia.org/T199674 (10elukey) 05Open>03Resolved [16:07:53] ottomata: this is the one - https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/468322 [16:08:43] (03CR) 10Ottomata: Add oozie job partitioning webrequest subset (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/357814 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal) [16:16:27] (03CR) 10Ottomata: Add WebrequestSubsetPartitioner spark job (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468322 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal) [16:17:15] ottomata: dt format: 17:04:35 < milimetric> ya ottomata, omw cave [16:17:17] oops [16:17:20] man [16:17:24] 2015-10-12 00:00:00 [16:17:26] ottomata: --^ [16:17:46] joal: should that be a _dt field to follow our other conventions? [16:18:30] ottomata: I think we're back to the same issue we had woth a previous wrong name for exactly the same usecase: insertion_ts in pageview_whitelist [16:18:47] You'd prefer me to have insertion_dt I imagine [16:19:02] ah [16:19:02] It's not used anywhere, so I'm super happy to have it this way or the other [16:19:15] every ISO 8601 field I know of we use _dt for [16:19:18] we should keep that up [16:19:26] sounds like pageview_whitelist is just wrong :/ [16:19:33] Actually this one is not ISO - It;s SQL [16:19:39] ? [16:19:44] maybe SQL is ISO? [16:19:52] ISO would have a T in between date and time no? [16:19:55] i think its optional [16:19:58] Ah ok [16:20:04] Then it's ISO :) [16:20:14] checking [16:20:50] hm it seems not optional! [16:20:50] https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations [16:20:59] From what I read on a famous website, it seems not optionnal [16:21:05] yup [16:21:09] ISO 8601 parser implementatinos seem to accept the space [16:21:12] which is why i thought that [16:21:13] hm [16:21:32] well, still tho, i think i'd rather use _dt here. ts is for unix epoch timestamps [16:21:38] in our conventions [16:21:45] ottomata: "It is permitted to omit the 'T' character by mutual agreement." [16:21:56] np - patching [16:21:59] ah! [16:22:00] great [16:22:17] ottomata: We mutually agree, our version is ISO :) [16:22:22] haha [16:22:38] ottomata: Sould I update pageview_whitelist while on it? [16:24:01] (03PS7) 10Joal: Add oozie job partitioning webrequest subset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/357814 (https://phabricator.wikimedia.org/T164020) [16:24:50] (03PS1) 10Joal: Update pageview_whitelist fieldname for convention [analytics/refinery] - 10https://gerrit.wikimedia.org/r/469924 [16:24:53] ottomata: --^ [16:27:26] (03CR) 10Ottomata: [C: 031] Add oozie job partitioning webrequest subset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/357814 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal) [16:27:54] oo pageview whitelist needs alter ya? [16:27:57] alter rename? [16:28:08] (03CR) 10Ottomata: [C: 031] "Needs alter rename but great thank you!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/469924 (owner: 10Joal) [16:28:10] ottomata: will do it next week when we deploy [16:28:12] coo danke [16:28:18] np :) [16:28:22] Thanks for reviews [16:28:23] no queries use that field i bet, right? [16:28:29] that's just for debugging purposes? [16:28:31] Will add some docs in the portion [16:28:35] danke [16:28:50] ottomata: used only for logging yes [16:31:43] ottomata: About the comment for lines 174 in spark code, would: "Preparing the transform function object assigning the webrequest_subset_tags table name through a var" [16:31:47] be good for ou? [16:32:37] (03PS6) 10Joal: Add WebrequestSubsetPartitioner spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/468322 (https://phabricator.wikimedia.org/T164020) [16:33:38] joal: is that something that needs to be set for the transform function to work? [16:33:55] maybe instead of making a separate transform function class file [16:33:55] ottomata: there is a default value, but since I have parameter, I reassing [16:34:02] you can jsut make the funcion here directly? [16:34:15] the only reason for having those separate transform function objects is so you can configure them from the cli [16:34:18] in Refine params [16:34:27] ottomata: I tried that and didn't manage to have it working for serialization reasons :( [16:34:29] ? [16:34:40] you can't do [16:34:41] hm - I'll try again on monday [16:34:58] But tried that before, and went trough serialization issues [16:35:28] not sure why it would be different? [16:35:32] This is the reason for which I have used a var instead of case class param [16:35:38] right oh [16:35:39] Not sure either [16:35:45] i'm not saying you should do it via the cli [16:35:48] Will try again and ask for help [16:36:00] just make the funcion in this file directly [16:36:01] right? [16:36:18] hm - I don't get it [16:36:39] a TransformFunction is just a a DataFrame => DataFrame [16:36:40] so [16:37:15] you can make a funciton in scope here? (or make a function that returns a new function with webrequest_subset_tags_table set) [16:37:19] if in scope maybe something like [16:38:15] def partitoinWebrequestSubset(df: DataFrame): DataFrame = { [16:38:16] doStuffToDf(df); [16:38:16] } [16:38:16] ? [16:38:21] then [16:38:34] val transformFunctions = Seq(partitoinWebrequestSubset) [16:38:34] ? [16:39:19] I could move the function from transform-functions file in here, very much [16:39:30] You'd prefer that? [16:44:18] ottomata: will try to have something like that - but on monday - done for today :) [16:44:24] See you monday a-team :) [16:44:32] byeee joal :] [16:55:43] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Decide whether to use schema references in the schema registry - https://phabricator.wikimedia.org/T206824 (10Pchelolo) I think by now we've all reached the agreement to use references. A... [16:57:42] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Geoip data archive repository cause puppet to run for minutes - https://phabricator.wikimedia.org/T208028 (10Ottomata) Am fine with /srv/geoip! [17:02:09] joal: i think i'd prefer that, really the only reason for them to be separate is so they can be configurable [17:02:15] otherwise you should just make them it hink [17:02:19] in your code [17:03:06] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Decide whether to use schema references in the schema registry - https://phabricator.wikimedia.org/T206824 (10Ottomata) I think we can close this. How we actually use them hasn't been deci... [17:19:57] * elukey off! [17:44:57] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata) Ping @nuria too [18:42:51] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Nuria) > This is a bit of a busy week for everyone and especially the security team, but we're going to sync up next week... [18:46:17] 10Analytics, 10Analytics-Data-Quality, 10Contributors-Analysis, 10Product-Analytics, 10Growth-Team (Current Sprint): Resume refinement of edit events in Data Lake - https://phabricator.wikimedia.org/T202348 (10Nuria) >I don't object to the addition, but if we start collecting this additional data in the... [19:22:18] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Set up automated email to report completion of mediawiki_history snapshot and Druid loading - https://phabricator.wikimedia.org/T206894 (10Neil_P._Quinn_WMF) @Milimetric, I remember you offering to set this up for me a while ago. Any chance I could t... [19:25:14] neilpquinn: do you want to set up an email or the job to compute the metrics? I can show you either thing, and they should both be relatively straightforward [20:15:06] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Set up automated email to report completion of mediawiki_history snapshot and Druid loading - https://phabricator.wikimedia.org/T206894 (10Milimetric) Sure, no problem. It's probably a good idea to do it together. So, options: A) set up an oozie j... [21:37:40] 10Quarry, 10Cloud-Services, 10cloud-services-team (Kanban): Migrate 'Quarry' project to eqiad1 - https://phabricator.wikimedia.org/T207677 (10Andrew) OK, I'll go first :) How about if we schedule downtime for Monday the 5th? [22:00:28] 10Quarry, 10Cloud-Services, 10cloud-services-team (Kanban): Migrate 'Quarry' project to eqiad1 - https://phabricator.wikimedia.org/T207677 (10zhuyifei1999) That should work, though I will probably be off 10-11 AM, 1-2PM, & 3-5PM Central Time [22:03:31] 10Quarry, 10Cloud-Services, 10cloud-services-team (Kanban): Migrate 'Quarry' project to eqiad1 - https://phabricator.wikimedia.org/T207677 (10Andrew) How about noon CST on that Monday? (that's probably 17:00 UTC although that week is the week-of-timezone-slip so I can't make any promises) [22:06:50] 10Quarry, 10Cloud-Services, 10cloud-services-team (Kanban): Migrate 'Quarry' project to eqiad1 - https://phabricator.wikimedia.org/T207677 (10zhuyifei1999) 11AM-1PM sounds good to me. @Framawiki is it okay for you? (if not, are you fine with me handling it?) [22:07:54] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on notebook1004 is CRITICAL: Return code of 255 is out of bounds [22:14:08] I was just working on notebook1004 [22:14:21] seems like I lost contact [22:25:31] I can't reconnect. [22:25:44] perhaps notebook1004 needs a restart. [22:28:10] ok Its back for me now I think [22:29:36] looks like swp on notebook1004 is nearly full [23:15:03] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Set up automated email to report completion of mediawiki_history snapshot and Druid loading - https://phabricator.wikimedia.org/T206894 (10Nuria) Let's see, there 2 things here: when data is available in a snapshot and when data is available publicy...