[00:26:53] 10Analytics, 10Operations, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10Legoktm) >>! In T210667#4787049, @Jrogers-WMF wrote: > Hi all, commenting on this from WMF Legal. > > As I understand the question and conte... [00:30:38] tgr|away: want to host with me the irc meeting on the 5th? [01:59:40] (03CR) 10Milimetric: [C: 04-1] "Update: this has been tested and found to sqoop correctly. Still haven't tested the new jobs with it, so -1 until we can merge everything" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476100 (https://phabricator.wikimedia.org/T210541) (owner: 10Milimetric) [04:26:39] nuria: sure [04:32:54] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Services (watching): Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10Pchelolo) my 2 cents: EventGate is awesome, GH is the best place for this piece of code. [06:24:43] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) We shouldn't go for multi-source, we specifically discussed and bought hardware to avoid going for multi-source (T159423#4271324... [08:21:11] 10Analytics, 10Operations, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10Legoktm) >>! In T210667#4784225, @Joe wrote: >>>! In T210667#4783582, @Legoktm wrote: >> If it's not re-distributable by us, then it doesn't m... [08:42:42] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up CI system on AQS - https://phabricator.wikimedia.org/T209711 (10hashar) [08:42:44] 10Analytics, 10Continuous-Integration-Config: Add CI to all analytics/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180301 (10hashar) [09:12:14] (03PS5) 10Fdans: Refactor test runner and fix tests stalling [analytics/aqs] - 10https://gerrit.wikimedia.org/r/476528 (https://phabricator.wikimedia.org/T209711) [09:48:25] 10Analytics, 10SDC General, 10Wikimedia-Stream: Verify that EventStreams work with WikiBase MediaInfo - https://phabricator.wikimedia.org/T210702 (10Abbe98) @Ottomata I see, I think the current SDC Captions implementations should behave very much as Labels on Wikidata. I'm currently not able to edit captions... [09:53:07] joal: bonjour! [09:53:16] 10Analytics, 10SDC General, 10Wikimedia-Stream: Verify that EventStreams work with WikiBase MediaInfo - https://phabricator.wikimedia.org/T210702 (10Nirmos) [11:36:06] * elukey lunch! [12:18:17] (03CR) 10Lucas Werkmeister (WMDE): Update metric's items and properties automatically (036 comments) [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/475807 (https://phabricator.wikimedia.org/T209399) (owner: 10Michael Große) [12:28:13] (03PS1) 10Gilles: Add ServerTiming to EventLogging whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476841 (https://phabricator.wikimedia.org/T207862) [12:45:08] !log Update hive wmf_raw mediawiki schemas (namespace bigint -> int) [12:45:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:46:53] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services, 10Core Platform Team Backlog (Watching / External): Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou) [13:06:33] (03CR) 10Hashar: "recheck" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/476528 (https://phabricator.wikimedia.org/T209711) (owner: 10Fdans) [13:08:28] 10Analytics, 10Continuous-Integration-Config: Add CI to all analytics/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180301 (10hashar) [13:08:32] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up CI system on AQS - https://phabricator.wikimedia.org/T209711 (10hashar) 05Open>03Resolved Done with `npm test` being run under NodeJS 6 :) [13:12:21] (03CR) 10Hashar: "CI is enabled :)" (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/476528 (https://phabricator.wikimedia.org/T209711) (owner: 10Fdans) [13:13:57] (03CR) 10Fdans: Refactor test runner and fix tests stalling (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/476528 (https://phabricator.wikimedia.org/T209711) (owner: 10Fdans) [13:14:30] (03PS6) 10Fdans: Refactor test runner and fix tests stalling [analytics/aqs] - 10https://gerrit.wikimedia.org/r/476528 (https://phabricator.wikimedia.org/T209711) [13:27:19] (03CR) 10Joal: [C: 031] "Looks good! Two questions inside, but no show stopper." (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476100 (https://phabricator.wikimedia.org/T210541) (owner: 10Milimetric) [13:31:56] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Banyek) >>! In T210478#4788172, @Marostegui wrote: > We shouldn't go for multi-source, we specifically discussed and bought hardware to avoi... [13:46:34] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10jcrespo) > We can use the new ones We don't have proxies for this purpose, the proxies you worked on are assigned to misc and should stay f... [13:46:52] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Services (watching): Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10mobrovac) >>! In T206785#4787727, @Nuria wrote: > I tend to agree with Andrew here, this is a very generic piece of code and... [13:47:14] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10mobrovac) [13:48:45] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) On top of what Jaime wrote, which is basically the same thing I was writing...we also have to consider the fact that these hosts... [13:57:14] 10Analytics, 10Analytics-Kanban: Update datasets definitions and oozie jobs for dual-sqoop of comments and actors - https://phabricator.wikimedia.org/T210542 (10JAllemandou) [13:57:31] (03PS1) 10Joal: Update hive and oozie for labs/prod sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476855 (https://phabricator.wikimedia.org/T210542) [14:00:26] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Banyek) with "We can use the new ones" I was wondering that we could (not should) use them beside the current role of the new proxies - tha... [14:00:57] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10elukey) Thanks a lot for all the inputs, I'd say that we don't need proxies for the moment, we'll probably just need some automation around... [14:10:10] (03CR) 10Joal: [C: 031] "Thanks a lot for that patch Fran - Works great on my machine (and thanks for having shown me a nyan :D)" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/476528 (https://phabricator.wikimedia.org/T209711) (owner: 10Fdans) [14:13:38] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up CI system on AQS - https://phabricator.wikimedia.org/T209711 (10elukey) @hashar quick question - we are about to migrate AQS to NodeJS 10, will it be easy to migrate npm test to it when needed? [14:15:54] fdans: can we move wikistats to https://bcrikko.github.io/NES.css/ ? [14:21:40] elukey: haaa, it would be cool to do the charts with it [14:23:36] a-team: I've sent some emails to people holding a big home dir on stat1007 to free some space (if possible), we are again with /srv/ almost filled up :( [14:23:45] :S [14:24:06] elukey: o/ - I'm sorry I realized I missed a ping very late :( [14:26:53] I am deeply offended! [14:26:54] :D [14:27:19] I knew you were, so I actually was actually afraid to even apologize ;) [14:28:32] ahahhaha [14:28:37] current state on stat1007 [14:28:38] 3.5T home [14:28:39] 2.5T log [14:28:45] Maaaaaaan [14:28:46] :( [14:28:49] sad_trombone.wav [14:29:05] how the heck have we managed to have 2.5Tb logs ? [14:29:17] 1.6T are eventlogging's [14:29:24] Ah righ [14:29:45] that in theory sooner or later we'll be able to prune to say a month ago [14:29:47] (03CR) 10Joal: "A bunch of small comments. Globally looks good :)" (038 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476220 (https://phabricator.wikimedia.org/T167539) (owner: 10Fdans) [14:29:55] rather than 90d [14:30:29] ah no wait I think it is 60d of retention [14:30:40] the older file has mtime Sept 1st [14:30:42] mmmm [14:31:05] nope /usr/bin/find /srv/log/eventlogging/archive -type f -mtime +90 -exec rm {} \; [14:31:14] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up CI system on AQS - https://phabricator.wikimedia.org/T209711 (10hashar) >>! In T209711#4788927, @elukey wrote: > @hashar quick question - we are about to migrate AQS to NodeJS 10, will it be easy to migrate npm test to it when needed? The CI conta... [14:31:21] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10CCicalese_WMF) >>! In T206785#4788883, @mobrovac wrote: >>>! In T206785... [14:31:53] thank you for the CR joal :D [14:32:04] np fdans - I owed you that :) [14:33:28] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up CI system on AQS - https://phabricator.wikimedia.org/T209711 (10elukey) @hashar We have a component in Stretch for this (`component/node10`), and @MoritzMuehlenhoff is currently leading an effort to migrate to Nodejs 10 in https://phabricator.wikim... [14:36:33] (03PS2) 10Joal: Update hive and oozie for labs/prod sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476855 (https://phabricator.wikimedia.org/T210542) [14:37:23] (03CR) 10Joal: [V: 04-1 C: 04-1] "This should be deployed with the sqoop and mediawiki-history patches. -1 until the other patches are ready." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476855 (https://phabricator.wikimedia.org/T210542) (owner: 10Joal) [14:49:33] (03PS1) 10Joal: Bump hadoop,hive and spark dependency versions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476866 [14:59:16] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10Patch-For-Review, and 2 others: track number of editors from other Wikimedia projects who also edit on Wikidata over time - https://phabricator.wikimedia.org/T193641 (10JAllemandou) [15:23:00] 10Analytics, 10Operations, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10faidon) So I think this task raises a few different issues (and @legoktm correct me if I'm wrong): 1. Legal concerns about using this particul... [15:25:45] 10Analytics: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10bmansurov) [15:26:30] HaeB: o/ - whenever you have time could you please check if zhousquared's home dir on stat1004 can be deleted? [15:27:21] 10Analytics, 10Research: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10bmansurov) [15:53:22] 10Analytics, 10Operations, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10Platonides) Well, I don't think it even needs to be treated in a private task. There was a situation about how to interpret the rules / how mu... [15:53:43] is there anything in analytics with cmake? In particular i'm trying to compile a custom lightgbm (c++), but no cmake on notebook's and when i compile it locally and ship it over i get some glibc mismatch [15:57:49] ebernhardson: not that I know, but I think that we could easily add it to the stat/notebook hosts if neeeded (they share the same set of packages) [15:58:43] elukey: alright, i'll file a patch [16:05:03] ebernhardson: Hola! if you have time next week i would love a tour of all the search stuff running on cluster and the workflow of how it makes it to prod, i have a very high level idea that is probably most incorrect [16:05:59] nuria: sure! it'll have to be next week, i'm on holiday after that [16:06:31] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10Nuria) Option 3 is reexamining why we cannot mirror from github to gerr... [16:06:42] some parts of how it makes to prod are copy model from analytics to laptop, then laptop to deploy, then upload to es :( [16:06:55] someday i will find a better way to ship 10-50MB files from analytics to prod... [16:07:00] ebernhardson: HIGHLY sofisticated [16:07:32] ebernhardson: it can be done, not that different of how do we load data into cassandra or druid from hadoop [16:07:37] ebernhardson: makes sense? [16:08:09] nuria: hmm, don't those have to open firewalls? [16:08:21] nuria: ops had us specifically reverse that, and remove the firewall opening to analytics [16:08:23] ebernhardson: one more question! IIUC you guys are using Spark 1.6, would it be feasible for next quarter to migrate to Spark 2? (we are trying to move to one version fully supported in the cluster) [16:08:28] elukey: 2.3.1 :) [16:08:41] elukey: we moved awhile ago, probably before it was deployed on the cluster [16:08:44] ebernhardson: how long ago was that? [16:08:48] niceeeeeee [16:08:59] ebernhardson: the ops blockage for firewall opening? [16:09:14] nuria: maybe 3 or 4 months ago i finished up the work to change our data loading to transport over kafka instead of direct connections from analytics-prod [16:09:32] that was actually requested before alexandros even opened the firewall for us. Basically an agreement that it was temporary [16:09:42] ebernhardson: ya, that is the easiest choice, use kafka as your bridge [16:10:12] I think that it was the giant rule in the router's firewall to allow traffic to flow to some es servers [16:10:22] (IIRC) [16:10:45] the firewall was also problematic, it didnt auto-update with changes in the cluster. it worked, but was a pain :) [16:10:54] ebernhardson: i see, i think understanding what is happening is first, but think also that all our wikistats worflows move data from hadoop cluster to public druid cluster [16:11:15] ebernhardson: elukey can correct me if i am mistaken but public druid cluster is out the analytics vlan [16:11:44] tbh, for me something nice would be if i could stick binary blobs into something, and then stick messages into kafka that refer to them [16:11:50] and have the blobs accessible in analytics and prod [16:12:20] i can't ship the models over the current kafka because they are 10-100MB, which is quite large for kafka messages [16:12:36] 10Analytics, 10EventBus, 10Services (done): Create alert on EventBus 400 error rate - https://phabricator.wikimedia.org/T210031 (10Pchelolo) 05Open>03Resolved a:03Pchelolo The alert has been created and tested. [16:12:41] ebernhardson: and those binary blobs represent what tingies? [16:12:46] *thingies [16:12:57] nuria: in this case, trained ranking models [16:13:03] nuria: basically a forest of decision trees [16:14:26] ebernhardson: i see, and tress are represented in binary and accessed in analytics as needed by model and what are they used in prod for? [16:15:15] nuria: the models get uploaded to the model store we built into elasticsearch. Then ~1M times/sec that model is run against results we are considering sending to users [16:15:34] ~1 million? [16:15:59] ebernhardson: ah i see, in your case model runs in both sides [16:16:26] nuria: yes, the model will run up to ~8k documents for each users search. The model basically only needs to run in prod, it trains in analytics [16:17:32] ebernhardson: and the random forest is used for training and running it? mmmm.. this part i do not understand but i think i just need more background [16:18:25] nuria: its actualy lambdamart as opposed to random forrest, but the way the model is evaluated is almost the same. Both are represented as a forest of trees [16:18:52] HaeB: (same thing on stat1007 :) [16:19:06] ebernhardson: i understand SOME WORDS of sentence above [16:19:15] ebernhardson: will do some studing and set up meeting [16:20:09] sure [16:21:15] ebernhardson: can you just use rsync? [16:21:36] (03CR) 10Nuria: [C: 032] "bye bye nyan, +2" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/476528 (https://phabricator.wikimedia.org/T209711) (owner: 10Fdans) [16:21:56] :) [16:22:24] (03CR) 10Fdans: [V: 032] Refactor test runner and fix tests stalling [analytics/aqs] - 10https://gerrit.wikimedia.org/r/476528 (https://phabricator.wikimedia.org/T209711) (owner: 10Fdans) [16:23:32] ottomata: hmm, maybe? Where can i rsync to? [16:23:35] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10Ottomata) @mobrovac can we use Diffusion with Kubernetes? I think ORES... [16:24:34] well, rsync from hdfs is not really a thing, but i'm looking for something hang on [16:24:44] 10-100MB is not really that big though [16:24:48] you might be able to just rsync out of /mnt/hdfs [16:24:56] we do that for stuff we ship to dumps.wm.org [16:26:07] ebernhardson: we could set up a rule that allows ES servers to access e.g. stat1007 or stat1004 rsync port and module [16:26:20] and you could just run rsync on your ES servers to pull data out of /mnt/hdfs [16:26:41] ( /mnt/hdfs isn't 100% reliable, but it works ok enough for occasional copies of smallish datasets) [16:26:42] ottomata: that makes me pretty nervous bc it runs as root across all of /srv if not /home too I think? [16:27:15] chasemp: ya that was something we (I) needed to check on ya? [16:27:23] but, it is how we get data from hdfs to dumps.wm.org now [16:27:40] chasemp: there needs to be some way of shipping data between machines in prod [16:27:45] does dumps pull or push in that relationship? [16:27:45] whatever you think is best :) [16:27:50] dumps pulls [16:28:02] real time security review :D [16:29:12] I'm not the pope so I'm not trying to get up in your biz unnecessarily here but if rsync runs as root and is as broad as I think it is widening teh scope of access is troubled waters [16:29:13] that's all :) [16:29:43] I'd love if you could keep doing it chasemp, it is really appreciated :) [16:29:55] I was just kidding :) [16:30:10] I just wanted to clarify I'm not the pope for posterity really [16:30:24] I may wear big hats but that's a coincidence [16:30:46] milimetric: https://phabricator.wikimedia.org/T210844 [16:30:47] hahahah [16:49:28] bmansurov: https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/analytics/refinery/job/spark_job.pp [16:57:57] 10Analytics, 10Analytics-Wikistats: X-axis is at odds with stated period in header of trend charts for 'total articles' for a wiki - https://phabricator.wikimedia.org/T180118 (10Dvorapa) @Erik_Zachte: Broken: - Czech - Hindi - Simple English - Vietnamese - Indonesian - Swedish - Spanish - Russian Problematic... [17:00:18] 10Analytics, 10Analytics-Wikistats: X-axis is at odds with stated period in header of trend charts for 'total articles' for a wiki - https://phabricator.wikimedia.org/T180118 (10Dvorapa) Some change since 2015 must break the plot generation as the plot from 2015 seems to be correct: https://commons.wikimedia.o... [17:05:37] (03CR) 10Milimetric: "verbal +1, this looks good to test, will have to start testing now that all three changes are generally coordinated" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476855 (https://phabricator.wikimedia.org/T210542) (owner: 10Joal) [17:05:49] (03CR) 10Milimetric: [C: 04-1] Update sqoop selects for new mediawiki schema (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476100 (https://phabricator.wikimedia.org/T210541) (owner: 10Milimetric) [17:07:42] 10Analytics, 10Research: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10Nuria) @bmansurov Let's take a step back and document how will data be created and loaded into mysql, deploying a service in production is quite a rigorous process. Having an... [17:10:12] 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad, and 2 others: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Ottomata) [17:15:21] 10Analytics, 10Research: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10Ottomata) > how is data produced (sounds like now this is pyspark, I think if we want to run this on cluster it probably needs to be migrated to scala spark, not sure) pyspa... [17:17:55] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10Ottomata) I'm not familiar with how deploys with Kubernetes work, but I... [17:23:14] 10Analytics, 10Analytics-Wikistats: X-axis is at odds with stated period in header of trend charts for 'total articles' for a wiki - https://phabricator.wikimedia.org/T180118 (10Dvorapa) Per web.archive.org it seems it started to be broken between January 2016 and November 2017 [17:25:52] 10Analytics, 10Operations, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10chasemp) Thanks @faidon for weighing in, I think you got right to the heart of it. Not responding to you necessarily but I'm going to steal... [17:34:06] elukey: https://phabricator.wikimedia.org/T207194#4787093 i'm thikning we should do raid 10, not raid 0, eh? [17:34:08] what do you think? [17:36:40] 10Analytics, 10Operations, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10faidon) >>! In T210667#4789588, @chasemp wrote: > In this case specifically, my thinking was that I had agreement and understanding with anoth... [17:40:04] 10Analytics, 10Operations, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10chasemp) >>! In T210667#4789604, @faidon wrote: >>>! In T210667#4789588, @chasemp wrote: >> In this case specifically, my thinking was that I... [17:48:39] ottomata: from a quick look I'd say that raid10 is the less painful way, raid0 is really annoying [17:49:27] ottomata: but it depends if we want to trade resilience for disk space [17:49:49] i think in this case we can deal with this loss of space [17:50:02] then I'd say raid10 [17:50:55] 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad, and 2 others: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Ottomata) Hm. They are cattle, but it would probably be nice if the whole node doesn't go down if we lose a drive, and we can deal wit... [17:51:14] agree [17:52:23] joal: re checkpoint - yes we do have it done by the standby regularly, IIRC once every hour by hadoop default's (sorry I just seen the message in the chat) [17:52:26] milimetric: actually I'm gonna have diner soon, so I'll test later on tonight, probably with ou I guess [17:53:02] joal: in my (labs) case the standby was trying to start the checkpoint via a HTTP call, failing due to firewall rules [17:53:12] and the primary was not getting the new fs image [17:53:27] so upon restart, it was using the last one and requesting a ton of edit logs from journalnodes [17:53:39] (taking up to 15mins to exit safe mode) [17:53:44] of course [17:54:03] that's the point of having a regular checkpointer for HDFS: prevent the long restart [17:54:24] yeah, but I didn't know that this was happening via HTTP (sharing the port with the UI...) [17:54:43] this is actually why there is a secondary-namenode if you don't have HA [17:55:01] yep [17:55:03] elukey: I didn't know either - I know the dynamic and the whys, not the detailed how [17:55:23] Thanks for caring that as usual elukey <3 [17:55:27] <3 [17:55:49] brace yourself for when you'll have to review all the settings :P [17:56:25] I'm gonna exercise my eyes reading some oozie ;) [18:05:54] milimetric: pushing the change for private folder in your patch if you're ok [18:08:42] joal: just in case I’m not back, my snapshots in my home folder are something like 2018_10_join_test and same for 09, you can see them in my home on 1004/sqoop-log [18:09:09] ok milimetric - will do that once I'm back from diner - pushing the patch [18:09:24] (03PS3) 10Joal: Join to new actor and comment tables [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476553 (https://phabricator.wikimedia.org/T210543) (owner: 10Milimetric) [18:15:52] 10Analytics, 10Cloud-VPS, 10DBA, 10User-Elukey: Hardware for cloud db replicas for analytics usage - https://phabricator.wikimedia.org/T210749 (10elukey) A possible solution, instead of ordering new hardware, would be to reuse one/two of the new Hadoop nodes racked in T207192 for this use case: they have 1... [18:18:03] 10Analytics, 10DBA, 10Data-Services, 10User-Elukey: Hardware for cloud db replicas for analytics usage - https://phabricator.wikimedia.org/T210749 (10bd808) [18:24:03] 10Analytics, 10Operations, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10elukey) Reading the backlog only now, this was good learning lesson for me too (I was aware of what Chase did as mentioned, and didn't think t... [18:24:33] 10Analytics, 10DBA, 10Data-Services, 10User-Elukey: Hardware for cloud db replicas for analytics usage - https://phabricator.wikimedia.org/T210749 (10Marostegui) >>! In T210749#4789691, @elukey wrote: > A possible solution, instead of ordering new hardware, would be to reuse one/two of the new Hadoop nodes... [18:28:42] 10Analytics, 10DBA, 10Data-Services, 10User-Elukey: Hardware for cloud db replicas for analytics usage - https://phabricator.wikimedia.org/T210749 (10elukey) >>! In T210749#4789731, @Marostegui wrote: >>>! In T210749#4789691, @elukey wrote: >> A possible solution, instead of ordering new hardware, would be... [18:30:26] * elukey off! [18:30:58] 10Analytics, 10DBA, 10Data-Services, 10User-Elukey: Hardware for cloud db replicas for analytics usage - https://phabricator.wikimedia.org/T210749 (10Marostegui) 128GB + non SSDs disks will definitely not work for heavy queries :-( [18:35:38] (03CR) 10Nuria: Bump hadoop,hive and spark dependency versions (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476866 (owner: 10Joal) [18:39:20] (03CR) 10Nuria: "Looks good." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476855 (https://phabricator.wikimedia.org/T210542) (owner: 10Joal) [18:44:58] (03CR) 10Nuria: "+1 to joseph's comments to make explicit what is not being used." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476220 (https://phabricator.wikimedia.org/T167539) (owner: 10Fdans) [19:14:34] Hey milimetric - Starting the tests :) [19:14:47] or more precisely - Starting to setup the tests :) [19:54:09] nuria: on your comment on my patch for version numbers - Do you wish me to update the commit message or to add a comment in the code ? [19:54:26] 10Analytics, 10Operations, 10Security-Team, 10WMF-Legal, 10Software-Licensing: Can exfat be used in WMF production? - https://phabricator.wikimedia.org/T210667 (10Legoktm) >>! In T210667#4789103, @faidon wrote: > So I think this task raises a few different issues (and @legoktm correct me if I'm wrong): >... [20:00:23] (03PS4) 10Joal: Join to new actor and comment tables [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476553 (https://phabricator.wikimedia.org/T210543) (owner: 10Milimetric) [20:02:29] joal: I'm back now, catching up, thanks for the patch 3, that's where I was thinking when you mentioned it in standup [20:03:05] np milimetric - currently building the jar after having copied the data in my folder and reogranised a bit to match production style [20:03:17] milimetric: Like if I break stuff, data is till there :) [20:03:32] psh, I'm not worried, sqoops fast for just those 3 wikis [20:03:41] ok, can I help? [20:04:11] milimetric: I'm gonna launch a manual job once the jar is created - If it succeeds, I'll test the oozie job [20:04:23] if not, we'll see :) [20:04:47] joal, qq: when you add druid transforms to a druid spec, you still have to add the resulting field names to the dimension and metric specs? [20:05:22] mforns: depending on what you transform, you can use the field in dimensions a [20:05:33] and in metric-aggregations yes :) [20:06:03] but, do you need to add the transformed field name to the list of dimensions and or metrics? [20:06:10] mforns: transform acts as an input-modifier to the json, it creates a new json row to ingest with for instance new fields you can use [20:06:22] or the sole fact of adding it as a transform will add it as a member of the datasource? [20:06:44] you need to use it somewhere - druid wouldn't now what to do with it b default - only YOU know :) [20:06:58] right, so yes, need to specify somewhere, cooool, thanks! [20:07:57] milimetric: does the plan sound correct to you? [20:08:23] yes! was keeping quite while you talked to marcel [20:08:30] :) [20:08:54] joal: I'm not sure what I can do in the meantime... maybe I can run the other partition? [20:09:01] are you doing 09 or 10? [20:11:10] milimetric: I'll start with 09 [20:11:31] cool, and I can do 10 with the same jar, then? [20:12:08] if you want :) I spotted a bug, checking if we can bypass it [20:14:39] We can bypass the bug by specifying explicit parameters (I forgot to change the 1-letter option for the new rivate parameter) [20:16:01] Ah - another small glitch milimetric: let's use - and not _ in snapshot names ;) [20:16:34] oh, right, bad date format, sorry [20:16:44] I did that last time too and forgot [20:16:55] np milimetric - patching the folder names :) [20:17:05] I'll change them on my copy [20:18:53] milimetric: here the command I used - https://gist.github.com/jobar/2060af82cf0e1889778744f14524ec1a [20:19:16] milimetric: failure in user due to sql-name ambiguity [20:19:35] ? [20:19:48] sql-name? [20:19:50] joining between tables both containing wiki_db field [20:20:00] oh! but I prefixed them, no? [20:20:07] not everywhere :) [20:20:10] apparently [20:29:05] (03PS5) 10Joal: Join to new actor and comment tables [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476553 (https://phabricator.wikimedia.org/T210543) (owner: 10Milimetric) [20:29:11] 10Analytics: Investigate lowering "per-article" resolution data in AQS - https://phabricator.wikimedia.org/T144837 (10jeblad) There are several cases where a daily article resolution for pageviews could make sense, or even hourly or per minute. This is not so much for the usual article, but for special marker ar... [20:39:25] (03PS6) 10Joal: Join to new actor and comment tables [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476553 (https://phabricator.wikimedia.org/T210543) (owner: 10Milimetric) [20:40:21] joal spark_sql needs the "as" keyword? [20:40:24] that's weird... [20:41:07] hive doesn't: "select 1 blue" and "select 1 as blue" [20:41:50] milimetric: I'm completely not sure if it needs it or not - I always put it for fields [20:42:01] oh ok :) thought you had another error [20:42:21] milimetric: there have been, but for something different :) [20:42:25] following along and trying to spot errors, but you're finding them faster [20:42:42] a forgotten update in a group by [20:42:47] very tricky [20:42:55] yeah, noticed that, the coalesce and the table prefix [20:43:10] well, I usually fix these things up on my own, but you're faster [20:43:56] it'd be nice if there was some sqlite unit test for those queries [20:44:06] Yeah ! [21:00:58] joal: maybe both? [21:01:10] feasible :) [21:03:51] 10Analytics, 10Product-Analytics: Bug: can't make a YoY time series chart in Superset - https://phabricator.wikimedia.org/T210687 (10Nuria) Errors in log: 12]: Traceback (most recent call last): Nov 30 07:23:12 analytics-tool1003 superset[7912]: File "/srv/deployment/analytics/superset/venv/lib/python3.5/si... [21:04:28] milimetric: I have an issue with data format :( [21:04:44] joal: can I help more real-time? cave? [21:04:58] milimetric: OMW [21:10:44] wow joal check this one out: [21:10:45] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/temporal_tables.html [21:10:56] could use that for things like page title history! [21:11:21] WAT? [21:11:42] * joal will spend the weekend trying to understand how this thing work [21:13:04] wow joal and with https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/joins.html#join-with-a-temporal-table [21:14:16] could use a revision's create time directly to get the page title [21:14:40] select page_title from page_title_history(revision.rev_timestamp) [21:14:59] "Each record from the probe side will be joined with the version of the build side table at the time of the correlated time attribute of the probe side record. " [21:19:27] That's super awesome ottomata :) [21:19:48] ottomata: I love flink since a long time, I hope one of these days we'll work with it :) [21:21:05] (03PS5) 10Milimetric: Update sqoop selects for new mediawiki schema [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476100 (https://phabricator.wikimedia.org/T210541) [21:21:29] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10Patch-For-Review, and 2 others: track number of editors from other Wikimedia projects who also edit on Wikidata over time - https://phabricator.wikimedia.org/T193641 (10JAllemandou) Info backfilled since beggining of time: https://grafana.wikimedia.o... [21:32:03] (03PS6) 10Milimetric: Update sqoop selects for new mediawiki schema [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476100 (https://phabricator.wikimedia.org/T210541) [21:32:25] more for you joal: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/match_recognize.html and https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html [21:32:40] api for detecting patterns in event streams [21:32:52] with SQL TOO! [21:33:00] the sql syntax is weird, but its bascially [21:33:20] select from stream where there is a sequence of events like pattern X [21:33:51] ottomata: this could be extremely usefull for edit-wars, and possibly anti-harassmne [21:34:03] git st [21:34:05] oops [21:34:59] 10Analytics: Investigate lowering "per-article" resolution data in AQS - https://phabricator.wikimedia.org/T144837 (10jeblad) Another pretty kewl thing to do is to calculate which articles are trending in the morning. Because the whole pageview-mix is pretty noisy, you must first try to create a model for how th... [21:35:16] (03PS2) 10Joal: Bump hadoop,hive and spark dependency versions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476866 [21:36:53] nuria --^ [21:37:31] joal: looking [21:45:00] joal: running tests and mvn is downloading the whole world , one sec [21:45:15] nuria: sure no prob :) [21:48:45] joal: test run fine and change looks good, now since it changes the java classpath (right?) we probably want to be looking ta this when we deploy [21:49:27] nuria: it changes the dependencies bundled in the jars [21:49:33] nuria: no classpath change [21:49:58] joal: yes, right, deps in fat jar totally my mistake [21:50:04] no prob :) [21:50:12] we should still watch :) [21:50:29] (03CR) 10Nuria: [C: 032] Bump hadoop,hive and spark dependency versions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476866 (owner: 10Joal) [21:51:15] (03CR) 10Nuria: [V: 032 C: 032] Bump hadoop,hive and spark dependency versions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476866 (owner: 10Joal) [21:55:50] (03Merged) 10jenkins-bot: Bump hadoop,hive and spark dependency versions [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476866 (owner: 10Joal) [22:05:10] milimetric: user and page successful, still an issue with revision [22:05:31] just saw it finish, yeah [22:06:05] ah, it's ok, I've got the 09 data now too, can run it myself and see [22:10:44] heh, it really doesn't like two people working together Permission denied: user=milimetric, access=WRITE, inode="/tmp/mediawiki/history/checkpoints": [22:11:08] Ah ! [22:11:44] You can add this parameter: --temporary-path /.... [22:11:48] milimetric: --^ [22:11:52] sorry for that :( [22:12:06] no! don't worry about this, was just mentioning 'cause I thought it was funny [22:12:11] I would've chmoded it [22:12:11] :) [22:13:28] (03PS7) 10Joal: Join to new actor and comment tables [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/476553 (https://phabricator.wikimedia.org/T210543) (owner: 10Milimetric) [22:30:20] milimetric: The last patch allows the job to finish :) - Gone now, will check data on monday ;)