[01:17:31] lzia, nuria_ : actually the pageview_hourly table does have data for may 2015 already. page ids were still missing then, but page names are available [01:18:29] one can clearly see the drop on May 19. 2015 when zh.wikipedia was blocked in China: [01:19:16] https://www.irccloud.com/pastebin/oaN4YPdu/daily%20pageview%20for%20Chinese%20Wikipedia%20from%20China%2C%20May%202015 [01:20:23] HaeB: thanks. where can I look at the full table? [01:21:00] sorry, forgot to include the query: [01:21:05] SELECT year, month, day, [01:21:05] SUM(view_count) AS views [01:21:05] FROM wmf.pageview_hourly [01:21:05] WHERE year = 2015 AND month = 5 [01:21:05] AND country_code = 'CN' AND agent_type = 'user' [01:21:06] AND project = 'zh.wikipedia' [01:21:06] GROUP BY year, month, day ORDER BY year, month, day LIMIT 1000; [01:21:34] the table is documented at https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly [01:22:04] feel free to loop me in, happy to help with queries (i also looked into some related matters back then https://meta.wikimedia.org/wiki/Research_talk:HTTPS_Transition_and_Article_Censorship#/media/File:Ratio_of_pageviews_from_Turkey_for_five_reportedly_blocked_articles_-_June_2015.png ) [01:23:49] Thanks, HaeB. I'll dig in the table a little bit more and let you know. Thanks for offering the help, too. If we proceed, I'll let you know. but please make sure to tell me when you don't have time. [01:25:20] HaeB: I'm going to separate myself from the screen for at least some hours. see you around tomorrow, and thanks for the help. [01:25:55] ciao everyone. [01:26:03] lzia: will do. see you! [02:18:42] (03PS1) 10Ottomata: Fix venv path [analytics/swap/deploy] - 10https://gerrit.wikimedia.org/r/419653 [02:20:01] (03CR) 10Ottomata: [V: 032 C: 032] Fix venv path [analytics/swap/deploy] - 10https://gerrit.wikimedia.org/r/419653 (owner: 10Ottomata) [07:34:01] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4052448 (10elukey) During the first puppet run I have seen two issues: 1) the /etc/hadoop directory seems not present when the /etc/h... [08:14:14] (03CR) 10Joal: "@nuria: This change is marked as [WIP], completely not done yet." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/419516 (owner: 10Joal) [08:16:29] Investigating the uniques-not-in-cassandra issue [08:22:28] joal: we have 2PB of total space on hdfs! [08:23:22] elukey: <3 and 2.8Tb RAM -- Thanks so much for that :) [08:23:52] 10Analytics-Kanban, 10Operations, 10ops-eqiad: DIMM errors for analytics1062 - https://phabricator.wikimedia.org/T187164#4052486 (10elukey) 05Open>03Resolved a:03elukey Thanks Chris, going to close the task and re-open if 1062 freezes again! [08:25:34] 10Analytics: unique devices data for january not in cassandra - https://phabricator.wikimedia.org/T189740#4051542 (10JAllemandou) I double checked data: - Monthly uniques are missing for 2018-01 - Daily uniques are not missing Here is what happened: - At the beginning of February I restarted he Cassandra bun... [08:26:18] joal: still missing 2 nodes, I want to solve a puppet issue before [08:26:22] so moar space + ram [08:26:31] MOOOAAAAAR ! [08:27:40] joal: not sure if you've read/heard me saying yesterday that the geoip stuff in webrequest is helping a lot the Traffic team to figure out valuable data for Singapore [08:27:48] so great work done by all of you :) [08:28:03] I've seen that elukey :) [08:28:09] Thanks for the feedback [08:28:16] elukey: Is it geoip or ISP ? [08:29:06] both, they'd need to figure out peering+transit details and data broken down by macro-area and AS-number (contained in ISP) definitely help [08:29:40] like top 5 AS numbers for Asia [08:29:42] or similar [08:29:58] awesome elukey [08:43:32] 10Analytics: unique devices data for january not in cassandra - https://phabricator.wikimedia.org/T189740#4052519 (10JAllemandou) Solved - Data is in cassandra (we need to wait for cache updates before seeing the data in AQS). [08:44:06] 10Analytics-Kanban: unique devices data for january not in cassandra - https://phabricator.wikimedia.org/T189740#4052520 (10JAllemandou) [08:44:21] * elukey is pretty sure that --^ involved french swearing [08:44:27] 10Analytics-Kanban: unique devices data for january not in cassandra - https://phabricator.wikimedia.org/T189740#4051542 (10JAllemandou) a:03JAllemandou [08:50:55] ok I am definitely lost in how puppet evaluates our classes [09:23:51] (03PS1) 10Joal: Correct cassandra loading monthly jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/419692 (https://phabricator.wikimedia.org/T189740) [09:57:46] hi joal! [09:58:07] did you had time to check my code? [09:58:42] dsaez: I've checked, seems correct - I'm trying to run my own and went through some memory issues indeed [10:03:17] joal: make sense. I was trying different approaches and always having the same problem. My last idea was try to splt the dataframe in many parts, but is slow. Do you know if there is anything similar to an Index, that is already on the parquet? [10:05:35] dsaez: I don't understand what "index-like" means in our case - An faster way to find revisions? [10:07:29] I mean like in a mysql table, you can have one or more columns that are indexed, so searching/querying that column is faster. I know that in HIVE there a partitions, and we already talk about that, my question is if in the parquet dumps that we are already using, there are any column that is faster to query than the others [10:08:52] dsaez: partitioning in hive, as discussed yesterday, is about reading less data. On dumps, partitioning is done by wiki (allowing you to read by wiki) [10:09:05] dsaez: There are no other specific things in the parquet-dumps [10:09:50] got it [10:10:27] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4052652 (10elukey) >>! In T188294#4052448, @elukey wrote: > During the first puppet run I have seen two issues: > > 1) the /etc/hadoo... [10:15:40] dsaez: parquet is columnar oriented, which means that if you query a subset of columns instead of the full set, it'll be faster - In your case, you're after the biggest column (wikitext), therefore no real improvement here [10:16:36] got it [10:18:50] what works, relatively fast, is to take just the column of rev_id and page_id, and join with HIVE. So, I assume that the memory problem is caused by that big column, wikitext [10:20:05] another approach that I was considering, it was to save the dump in HIVE and do the join inside HIVE, but I really don't know if this will make any difference [10:23:17] dsaez: hive uses mapreduce - It'll be slower than spark [10:23:26] dsaez: However, it should be more resilient [10:24:07] dsaez: also, you don't need to "save" the dumps in hive [10:24:27] hive reads data from hdfs, the way spark does - You'd need to configure a table reading the data [10:25:15] yes, I know, that's what I meant :) [10:44:24] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4052691 (10elukey) Also I found this bit interesting: https://puppet.com/docs/puppet/4.8/function.html#include ``` [10:58:52] 10Analytics-Kanban, 10User-Elukey: Reduction of stat1005's disk space usage - https://phabricator.wikimedia.org/T186776#4052709 (10elukey) [11:00:09] 10Analytics-Kanban, 10User-Elukey: Reduction of stat1005's disk space usage - https://phabricator.wikimedia.org/T186776#3954945 (10elukey) ``` elukey@stat1005:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/stat1005--vg-data 7.2T 4.2T 2.6T 62% /srv ``` Much better now,... [11:02:00] 10Analytics-Kanban, 10User-Elukey: Eventlogging's forwarder/zmq-legacy leaks memory over time - https://phabricator.wikimedia.org/T186510#4052718 (10elukey) 05Open>03Resolved a:03elukey closing since we already have a tracking task/code review [11:02:42] joal: after lunch I'd need to reboot all the druid nodes [11:02:52] would you be available sanity check that nothing goes on fire? [11:02:53] :D [11:02:59] *to sanity check [11:12:51] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Run eventlogging purging script on beta labs to avoid disk getting full - https://phabricator.wikimedia.org/T171203#4052728 (10elukey) The overall situation seems really good now: ``` elukey@deployment-eventlog05:~$ df -h Fil... [11:14:16] 10Analytics, 10User-Elukey: [EL sanitization] Ensure presence of EL YAML whitelist in analytics1003 - https://phabricator.wikimedia.org/T189691#4052732 (10elukey) [11:14:34] 10Analytics, 10User-Elukey: [EL sanitization] Ensure presence of EL YAML whitelist in analytics1003 - https://phabricator.wikimedia.org/T189691#4050215 (10elukey) Let me know when it is needed, I'll make sure to prioritize this task :) [11:18:34] 10Analytics-EventLogging, 10Analytics-Kanban: Find an alternative query interface for eventlogging on analytics cluster that can replace MariaDB - https://phabricator.wikimedia.org/T189768#4052744 (10Neil_P._Quinn_WMF) p:05Triage>03Normal [11:18:44] 10Analytics-EventLogging, 10Analytics-Kanban: Sunset MySQL data store for eventlogging - https://phabricator.wikimedia.org/T159170#4052755 (10Neil_P._Quinn_WMF) [11:29:25] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Sanitize Hive EventLogging - https://phabricator.wikimedia.org/T181064#3778325 (10Neil_P._Quinn_WMF) >>! In T181064#3778626, @Ottomata wrote: > This task is just to purge 90 days. Implementing the intelligent whitelist based refining will be... [11:29:46] 10Analytics, 10Analytics-EventLogging, 10User-Elukey: [EL sanitization] Ensure presence of EL YAML whitelist in analytics1003 - https://phabricator.wikimedia.org/T189691#4052765 (10Neil_P._Quinn_WMF) [11:30:04] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Remove AppInstallIId from EventLogging purging white-list - https://phabricator.wikimedia.org/T178174#4052766 (10Neil_P._Quinn_WMF) [11:42:50] * elukey lunch! [12:24:48] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Should it be possible for a schema to override DNT in exceptional circumstances? - https://phabricator.wikimedia.org/T187277#4052896 (10phuedx) In T187277#3982615, @Tbayer provided clear reasoning as to why the logging and consequent aggregation o... [13:38:18] Plop elukey - Is now a good moment to loak after druid? [13:38:54] joal: let's do it in a bit, currently changing firewall rules :)( [13:39:02] yups elukey [13:46:46] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Should it be possible for a schema to override DNT in exceptional circumstances? - https://phabricator.wikimedia.org/T187277#4053113 (10Ottomata) > consequent aggregation of VirtualPageviews events should match that of the wmf.webrequest table Per... [13:50:24] joal: now I am going to reboot kafka1001 and then let's do druid :) [13:50:30] ok elukey [13:54:13] hiiioooo [13:54:22] hullo [13:54:29] elukey: just curious, does that druid deb policy change for zookeeper select a differnet zk package? [13:54:39] or just reorder some stuff that won't actually do anything [13:55:30] ottomata: yeah afaics it will select the cdh one, meanwhile we are using the debian one [13:55:46] I wanted to have a chat with you before proceeding :) [13:55:48] right, so that will be a problem probably [13:55:53] because we run zookeeperd on those nodes [13:56:06] and need the debian versions [13:56:19] oh yes I didn't mean zookeeperd, but zookeeper (the client) [13:56:53] yaaaa, but...will the zookeeperd package fail because it can't install the client? [13:56:54] it might... [13:56:55] dunno [13:57:12] ah good point, didn't check its dependency [13:57:41] anyhow, the alternative could be to have a parameter in the profile::cdh::apt to put the pin or not [13:57:49] default true, and then we can tune it for druid [13:59:29] so basically things would stay as they are now [13:59:50] aye [14:00:00] sounds good, overall this is way cleaner so if it works yeehaw [14:02:50] \o/ [14:02:57] all right going to amend the cr soon [14:03:08] I still haven't figure out what puppet is doing on the new workers [14:06:53] sheesh elukey after kafka restarts: https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?refresh=30s&orgId=1&var-instance=main-eqiad_to_jumbo-eqiad&from=now-1h&to=now [14:07:03] just on the jumbo ones though! [14:07:16] !log bouncing kafka jumob -> eqiad mirrormaker [14:07:16] * elukey cries [14:07:17] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:07:31] elukey: i'm going to seriously consider ureplicator i think [14:07:32] this sucks [14:07:38] at least try it [14:07:45] is it the uber one? [14:07:47] yeah [14:07:56] i think we shouldn't move farther with the jumbo migration until we feel good about this [14:08:06] yep I agree 100% [14:08:08] eventstreams relies on mm [14:08:17] eventbus in hive does too [14:08:24] mw monolog doesn't [14:08:28] but that sorta broke it last time [14:08:32] i might be willing to try that one again [14:25:19] joal: time for druid? [14:25:26] elukey: Here ! [14:27:40] so I'd start with draining druid1003's middlemanager, so we can reboot it in ~45 mins [14:27:49] and then also do one of the public ones [14:28:11] elukey: works for except that at that time I'll be close to leave to grab the kids [14:28:34] joal: super fine [14:28:34] elukey: Let's start, I should be back for almost standup, we'll continue afer? [14:28:42] Arf ... Grskin [14:28:46] ok, let's gop [14:28:49] ack [14:29:23] !log disabled druid1003's middlemanager as prep step for reboot [14:29:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:29:35] now the public overlord [14:29:58] seems druid1006 [14:30:04] so let's do 1004 [14:31:59] elukey: in any case, we're not indexing soon on public :) [14:32:36] (03PS2) 10Joal: [WIP] Update mediawiki-history spark job for performance [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/419516 [14:33:12] joal: yes I realized only afterwards :P [14:33:19] no problemo :) [14:33:25] just sayin' [14:33:52] depooled 1004 from LVS and then stopping the daemons [14:34:48] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Update mediawiki-history spark job for performance [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/419516 (owner: 10Joal) [14:39:48] druid1004 up [14:40:05] and repoooled [14:42:23] elukey: Nothing noticeable from charts [14:43:28] joal: hey, the clickstream dataset for fawiki has not been published for February: https://dumps.wikimedia.org/other/clickstream/2018-02/ is it intentional? [14:44:20] hi Amir1 - Nope - This is a deployment bug [14:45:34] joal: oh, is it tracked down? Can I help? [14:46:58] Amir1: Since it wasn't actually tracked down (I just merged the patch you submitted, but forgot to create a task - As usual for me), I forgot to restart the job after the deploy containing your patch [14:47:03] Amir1: Doing now [14:47:08] joal: coffee and then I'll do druid100[56] [14:47:13] Yay elukey :) [14:47:58] joal: oh thanks. So It will happen for March? [14:48:33] Amir1: If ok for you, it should happen for Marh (I'm not super happy at the idea of rerunning the Feb full thing) [14:49:20] yeah, it's fine. I just need to know if it happen or not so I can plan things. That's all [14:49:22] thank you! [14:49:46] Amir1: Thank you for having spotted the mistake !! [14:50:01] Amir1: Job restarted with new config - Confirmed :) [14:50:12] \o/ [14:50:19] !log Restart clickstream-coord to pick new config including fawiki [14:50:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:21:55] rebooting druid1003 [15:41:00] analytics1076 up and running [15:44:52] (03CR) 10Halfak: [C: 032] worker: change to SIGALRM-based limit instead of row [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/419317 (https://phabricator.wikimedia.org/T188564) (owner: 10Zhuyifei1999) [15:45:09] (03Merged) 10jenkins-bot: worker: change to SIGALRM-based limit instead of row [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/419317 (https://phabricator.wikimedia.org/T188564) (owner: 10Zhuyifei1999) [15:59:37] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Remove AppInstallIId from EventLogging purging white-list - https://phabricator.wikimedia.org/T178174#4053449 (10mforns) Hi all! 2 weeks ago I wrote in the thread with legal and proposed a modification to the appInstallId field, can the Readin... [16:01:36] ah [16:02:16] a-team standup [16:02:32] STANDUP FOR YOUR RIGHTS [16:11:36] 10Analytics-Tech-community-metrics, 10Gerrit, 10Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#4053473 (10Paladox) @TerraCodes nope, I spoke with Dave from google in #gerrit about this task who kindly had a look at my propose change... [16:26:49] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Should it be possible for a schema to override DNT in exceptional circumstances? - https://phabricator.wikimedia.org/T187277#4053553 (10phuedx) >>! In T187277#4053113, @Ottomata wrote: >> To be specific though, I'd word this differently, and say t... [16:30:22] 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Please review: public data sets for the WDCM Biases Dashboard - https://phabricator.wikimedia.org/T189653#4053586 (10Ottomata) [16:31:22] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10User-Elukey: [EL sanitization] Ensure presence of EL YAML whitelist in analytics1003 - https://phabricator.wikimedia.org/T189691#4050215 (10Ottomata) [16:31:33] 10Analytics, 10Analytics-Kanban: [EL sanitization] Modify mysql purging script to read from the new YAML whitelist - https://phabricator.wikimedia.org/T189692#4050230 (10Ottomata) [16:35:21] 10Analytics, 10Analytics-Wikistats: Add wikistats metric about "pagecounts" - https://phabricator.wikimedia.org/T189619#4053630 (10mforns) [16:35:44] 10Analytics, 10Analytics-Wikistats: Intervals for data arround pageviews in wikistats maps - https://phabricator.wikimedia.org/T188928#4053633 (10mforns) [16:35:54] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Alert for Kafka MirrorMaker lag - https://phabricator.wikimedia.org/T189611#4047493 (10Ottomata) [16:35:59] 10Analytics, 10Analytics-Wikistats: Add wikistats metric "top-by-edits" - https://phabricator.wikimedia.org/T189620#4053636 (10mforns) [16:36:02] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Fix Mirror Maker erratic behavior when replicating from main-eqiad to jumbo - https://phabricator.wikimedia.org/T189464#4042527 (10Ottomata) [16:36:14] 10Analytics, 10Analytics-Wikistats: Check wikistats numbers for agreggations for all wikipedias - https://phabricator.wikimedia.org/T189626#4053639 (10mforns) [16:38:32] 10Analytics: pyspark2 different versions in Driver and Workers - https://phabricator.wikimedia.org/T189497#4053658 (10Ottomata) 05Open>03declined Should be fixed after pending Stretch hadoop cluster upgrade, will reopen if still problem then. [16:39:25] 10Quarry, 10Patch-For-Review: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564#4053662 (10zhuyifei1999) 60 seconds is definitely not enough for the query. Switched to the unlimited version, then ran: https://quarry.wmflabs.org/query/25564 Browser reports a... [16:41:49] (03PS1) 10Zhuyifei1999: worker: raise the save time limit to 10 mins [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/419786 (https://phabricator.wikimedia.org/T188564) [16:53:12] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx agent to AQS Cassandra - https://phabricator.wikimedia.org/T184795#4053732 (10elukey) @MoritzMuehlenhoff: would it be worth in your opinion to create a cassandra 2.2 component, rather than relying on thirdparty? As far as I can see... [17:01:22] 10Analytics-Tech-community-metrics, 10Gerrit, 10Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#4053759 (10Paladox) https://gerrit.wikimedia.org/r/#/c/419790/ [17:17:30] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx agent to AQS Cassandra - https://phabricator.wikimedia.org/T184795#4053849 (10MoritzMuehlenhoff) >>! In T184795#4053732, @elukey wrote: > @MoritzMuehlenhoff: would it be worth in your opinion to create a cassandra 2.2 component, ra... [17:34:38] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Should it be possible for a schema to override DNT in exceptional circumstances? - https://phabricator.wikimedia.org/T187277#3969855 (10ovasileva) Sounds like the way forward is to continue with @phuedx's proposal for building something within the... [17:44:32] elukey: do you have a sec for puppet layout brainbounce? [17:44:36] for jupyter? [17:44:55] ottomata: sure! [17:45:11] bc ? [17:46:15] ya [17:46:35] elukey: am there [18:00:16] * elukey off! byyyeee [18:33:57] (03PS1) 10Ottomata: create_virutalenv.sh now takes the destination venv path as $1 [analytics/swap/deploy] - 10https://gerrit.wikimedia.org/r/419821 (https://phabricator.wikimedia.org/T183145) [18:34:20] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Sanitize Hive EventLogging - https://phabricator.wikimedia.org/T181064#4054183 (10mforns) @Neil_P._Quinn_WMF Yes, it does. And that code is already merged. The next steps in this project include: - Translating the current TSV whitelist into t... [18:34:36] (03PS2) 10Ottomata: create_virutalenv.sh now takes the destination venv path as $1 [analytics/swap/deploy] - 10https://gerrit.wikimedia.org/r/419821 (https://phabricator.wikimedia.org/T183145) [18:34:46] (03CR) 10Ottomata: [V: 032 C: 032] create_virutalenv.sh now takes the destination venv path as $1 [analytics/swap/deploy] - 10https://gerrit.wikimedia.org/r/419821 (https://phabricator.wikimedia.org/T183145) (owner: 10Ottomata) [19:10:32] !log bouncing main -> jumbo mirror maker [19:10:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:24:04] 10Analytics: Mount dumps on SWAP machines (notebook1001.eqiad.wmnet / notebook1002.eqiad.wmnet) - https://phabricator.wikimedia.org/T176091#4054301 (10Ottomata) @elukey, can we also add a hole in the analytics vlan firewall for this? I'd like to mount dataset dumps on the notebook servers as part of T183145. [20:10:03] 10Analytics-Kanban, 10Patch-For-Review: Refresh SWAP notebook hardware - https://phabricator.wikimedia.org/T183145#4054375 (10Ottomata) I have rsynced over user home directories from notebook1001 -> notebook1003, and am upgrading the default notebook venv ($HOME/venv) by: ``` wheels_path=/srv/jupyterhub/deplo... [20:32:09] 10Analytics-Kanban, 10Patch-For-Review: Refresh SWAP notebook hardware - https://phabricator.wikimedia.org/T183145#4054437 (10Ottomata) IT DID INDEED WORK! AWESOME! Updated JupyterHub with JupyterLab beta installed on notebook1003 and notebook1004. notebook1003 home directories have been copied over. WOoO... [20:37:25] any chance someone could touch 'hdfs://analytics-hadoop/wmf/data/raw/mediawiki/mediawiki_CirrusSearchRequestSet/hourly/2018/03/08/15/_SUCCESS'? camus didn't seem to leave it behind so the oozie pipeline stalled there [20:39:10] hmmm [20:39:16] ebernhardson: other hours are ok? [20:39:31] i wonder if that is from my attempt at moving the monolog producer to jumbo last week [20:39:50] ottomata: yea, it has some parallelism so only one 'task' is stuck, the other task is still at the current hour [20:40:00] so everything after that was processed [20:40:16] k [20:40:50] 10Quarry, 10Patch-For-Review: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564#4054474 (10Halfak) For the record, the above is crazy. It's writing less than a megabyte and it take 5 minutes! :| This works for now, but getting off of NFS (T178520) seems... [20:40:56] done ebernhardson [20:41:35] ottomata: thanks! [20:42:32] oozie looks to have already picked it up and started processing. great! [20:42:54] 10Quarry: Find somewhere else (not NFS) to store Quarry's resultsets - https://phabricator.wikimedia.org/T178520#3694676 (10Halfak) We've confirmed that there is a bigdisk instance available with 300GB of direct access disk space. We'll need a backup strategy. Here's a couple that seem sane-ish: **Two big di... [20:48:55] great [21:04:30] ottomata: I bless you :) [21:04:44] 10Quarry: Find somewhere else (not NFS) to store Quarry's resultsets - https://phabricator.wikimedia.org/T178520#4054559 (10zhuyifei1999) Any suggestions on how to send results back from the query runners to the bigdisk? [21:04:56] ottomata: new notebooks can access the cluster with Spark [21:05:06] * joal bows low to ottomata :) [21:05:50] YEAHWW [21:07:16] ottomata: I'm gonna continue to ask for scala, but man this is already awesome :D [21:07:24] Thanks a lot ottomata :) [21:07:40] joal: you can do scala on the terminal at least [21:07:43] i'm playing with toree now... [21:07:46] yessir [21:07:48] locally seeig if we can get it [21:08:02] ottomata: If using term, I'll continue with my usuall shh :) [21:08:06] aye [21:08:22] joal: you might be able to get pyspark workign in the notebook [21:10:30] pyspark already works in notebooks :P [21:10:51] ottomata: yes indeed - That's what I meant :) [21:11:00] ebernhardson: True ! [21:11:11] hopefully this is easier, but i used spark on notebook1001 yesterday :P [21:11:14] ebernhardson: with hadoop? [21:11:22] you can launch jobs in hadoop with it? [21:11:25] ottomata: sadly no, but i think thta was only firewalls (i asked about it before) [21:11:30] ya [21:11:35] ebernhardson: now you have it :) [21:11:41] it could talk hdfs, but not to hadoop workers [21:11:48] * ebernhardson doesn't know how that works, ports i guess :P [21:12:44] ebernhardson: The example you showed me with findspark has worked for me on the cluster from notebook1003 [21:14:02] sweet! [21:20:56] 10Quarry: Find somewhere else (not NFS) to store Quarry's resultsets - https://phabricator.wikimedia.org/T178520#4054672 (10Halfak) Good Q. Are there basic file services that we could utilize for this? Maybe something like this: https://medium.com/@keagileageek/paramiko-how-to-ssh-and-file-transfers-with-pyth... [21:34:33] laters yaalll [21:34:47] Bye ottomata [21:34:58] 10Quarry: Find somewhere else (not NFS) to store Quarry's resultsets - https://phabricator.wikimedia.org/T178520#4054721 (10zhuyifei1999) >>! In T178520#4054672, @Halfak wrote: > Good Q. Are there basic file services that we could utilize for this? > > Maybe something like this: > https://medium.com/@keagilea... [21:37:21] Gone as well - Bye folds [22:13:21] !log bounced jumbo mirror makers [22:13:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:16:15] 10Quarry: Find somewhere else (not NFS) to store Quarry's resultsets - https://phabricator.wikimedia.org/T178520#4054844 (10zhuyifei1999) Another thing: gotta make sure whatever method we use to store the results remotely is faster than NFS or we'll do a lot of work and end up worse... [22:42:52] 10Quarry: Find somewhere else (not NFS) to store Quarry's resultsets - https://phabricator.wikimedia.org/T178520#4054966 (10Halfak) I think it'll be hard to beat NFS at that game ;) [23:01:12] (03PS1) 10EBernhardson: Support hourly or daily partition dropping [analytics/refinery] - 10https://gerrit.wikimedia.org/r/419949 [23:02:35] (03CR) 10EBernhardson: "this, at a minimum, leaves us with an odd naming problem. There are lots of refinery-drop--partitions scripts. With the change this n" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/419949 (owner: 10EBernhardson) [23:56:26] (03PS2) 10EBernhardson: Support hourly or daily partition dropping [analytics/refinery] - 10https://gerrit.wikimedia.org/r/419949 (https://phabricator.wikimedia.org/T189845)