[06:39:16] Hi a-team [06:46:40] Analytics-Kanban: Productionize scala code for edit reconstruction - https://phabricator.wikimedia.org/T142552#2539061 (JAllemandou) [06:51:22] o/ [06:52:19] elukey: No perf change so far on old-aqs (new user related thinking) [07:00:09] yes sadly, I hoped for a bit more but the auth caching stole it all :) [07:00:27] also, good news, nothing exploded so far [07:00:39] :) [07:00:59] elukey: That was expected, but yes, it's good :) [07:01:11] elukey: Thanks mate for having made that change very fast ans [07:01:17] and securely :) [07:01:48] well thanks to both, it was a collaborative fix :) [07:02:13] I am curious now about compaction with RAID10 arrays [07:27:24] * elukey commutes to the office! [08:29:26] Analytics-Cluster: Install SparkR on the cluster - https://phabricator.wikimedia.org/T102485#2539188 (elukey) Some links: Discussion about why SparkR is not included in Cloudera CDH 5.5 distro (we are using it atm): https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/SparkR-in-CDH-5-5/td-p/346... [08:36:29] Analytics, Analytics-Cluster: Monitor cluster running out of HEAP space with Icinga - https://phabricator.wikimedia.org/T88640#2539194 (elukey) Note: we have created https://grafana.wikimedia.org/dashboard/db/analytics-hadoop to monitor all the GC/Heap metrics of the Hadoop cluster actors, but we are sti... [08:46:17] Analytics, Operations, Traffic: Correct cache_status field on webrequest dataset - https://phabricator.wikimedia.org/T142410#2539220 (elukey) @BBlack thanks a lot! I checked https://grafana.wikimedia.org/dashboard/db/varnishkafka and everything looks good, plus the cache_status field looks sane from... [11:20:39] * elukey lunch! [11:23:27] "/me is AFK for a while [12:47:42] Analytics, Operations, Ops-Access-Requests: Add analytics team members to group aqs-admins to be able to deploy pageview APi - https://phabricator.wikimedia.org/T142101#2522728 (akosiaris) aqs-users was introduced in T117473. Seems to have been created with something different in mind. I 'll rather c... [13:41:30] (PS3) Milimetric: [WIP] Oozify sqoop import of mediawiki tables [analytics/refinery] - https://gerrit.wikimedia.org/r/303339 (https://phabricator.wikimedia.org/T141476) [13:51:44] elukey: It's first month, so difficult to compare to previous runs, but when it kicked in, compaction went fast (almost done) [13:52:52] joal: so overall ~2 days for a month? [13:53:51] elukey: I'd say 2.5 [13:55:45] joal: the only thing left is to make sure that we run load tests [13:56:14] we could do the first one this week and the rest when we'll be back from vacations? [13:56:18] elukey: I suggest: 1 load test with 3 month loaded, then 6 month loaded, then 12 month loaded [13:56:26] +1 [13:56:56] not sure if we'll able to load test 3 months before going on vacation though [13:57:46] elukey: pretty sure not actually [14:02:01] maybe with two months it could be [14:02:50] Yeah, we'll do with what we have [14:21:13] Hi milimetric [14:21:19] hi joal [14:21:31] milimetric: I found that: http://stackoverflow.com/questions/19646645/how-can-i-provide-password-to-sqoop-through-oozie-to-connect-to-ms-sql [14:21:48] milimetric: Would that be a correct answer to our problem? [14:22:41] joal: my code already uses an options file for the import [14:22:49] I tried a password file but that didn't work for some reason [14:23:01] ok milimetric [14:23:11] (it's supposed to look for it either locally or on a hadoop node and it refused to pick it up from either place) [14:23:22] but either way we need a secret repo from which to pull the password file [14:23:29] milimetric: There is way to overcome that IIRC [14:23:43] or some way to read it from the one where puppet already writes it to (we can just do it with shell I think) [14:23:48] Analytics-Kanban, EventBus, Services, User-mobrovac: Improve schema update process on EventBus production instance - https://phabricator.wikimedia.org/T140870#2539875 (mobrovac) [14:23:51] milimetric: The oozie action launching sqoop is shell-action? [14:24:16] yeah, because it's the python that runs the wikis in parallel [14:24:44] milimetric: in that case you can provide a file to the script using the option [14:24:45] I'm pretty lost in oozie world right now :) but it's ok [14:24:56] right, but the file can't be checked into refinery [14:25:16] milimetric: not in refinery, possibly on hdfs nonetheless? [14:25:30] milimetric: The file just needs to in hdfs I think [14:26:08] mmm I don't know [14:26:17] that's supposed to work for the passwords file, maybe I can keep trying [14:26:36] hm ... not following you [14:26:37] it definitely seemed buggy last time I tried it, but maybe I was doing something wrong [14:26:44] there are two files you can pass [14:26:53] one is --options-file and that works, but I think the file has to be local [14:26:59] milimetric: I managed to have a file passed to a shell script in oozie - let me know if you need that :) [14:27:02] and one is --password-file and that didn't work for me [14:27:14] no, this is a sqoop thing [14:27:31] anyway, thx, I'll just keep working on it [14:27:35] milimetric: ah, ok - the way oozie does it is providing a local file to the script it runs [14:28:39] milimetric: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/util/druid/load/workflow.xml#L106 [14:33:30] milimetric: other question: Have you wiped milimetric.namespace_mapping data? [14:33:45] no [14:33:54] joal: I do have a fresh file with the dbnames [14:34:14] and I have fresh data for over 600 wikis [14:34:23] milimetric: awesome :) [14:34:30] but some of the imports failed because of course the database schemas aren't all the same :/ [14:34:51] so I just got done double checking everything and removing the fields that aren't always present [14:35:14] joal: wait but why, you don't see the namespace_mapping file? [14:35:36] nah it's there [14:35:36] milimetric: actually I do, I must have an issue in my query or something [14:35:40] milimetric: will double check [14:35:56] joal: want the new file instead since you're changing it anyway? I didn't wanna disrupt your flow by changing it [14:36:05] milimetric: Oh yeah ! [14:39:24] joal: ok, milimetric.namespace_db_mapping has dbname as a column [14:39:50] joal: it also only has dbs that are not private or closed, so it doesn't have all of them [14:39:52] milimetric: I'd rather use a file (or a folder) directly, instead of a hive table [14:40:02] (we should at some point chat about the closed wikis) [14:40:11] milimetric: agreed ! [14:40:15] joal: oh, np, mediawiki/namespace_db_mapping/ is where the dictionary is [14:40:22] uh one sec [14:40:46] full: /user/milimetric/mediawiki/namespace_db_mapping/namespace.dictionary.csv [14:41:23] milimetric: super thanks milimetric :) [15:01:22] sorry team, joining [15:02:59] milimetric: standup? [15:03:01] milimetric: hiii standup [15:03:04] aah! [15:05:46] Analytics, Collaboration-Team-Triage, Community-Tech, Editing-Analysis, and 6 others: statistics about edit conflicts according to page type - https://phabricator.wikimedia.org/T139019#2539972 (Lea_WMDE) fyi: @addshore just updated https://grafana.wikimedia.org/dashboard/db/mediawiki-edit-conflic... [15:19:04] elukey: ja and if you get a sec, a quick review of this would be useful: https://gerrit.wikimedia.org/r/#/c/304029/ [15:20:25] sure! [15:41:19] !log Loading 2016-07 in new aqs [15:41:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [15:54:54] GOGOGO! :) [15:55:36] joal: tomorrow would you mind to teach me how to load? We could do something like the "how is made" tv series [15:57:16] elukey: I don't know the serie, but I can definitely make something up :) [15:57:33] Ahhhh, the internet TV serie ! [15:57:39] elukey: I remember :) [15:58:34] http://www.sciencechannel.com/tv-shows/how-its-made/ [15:58:35] :D [16:13:03] mforns: I got deeper in my understanding of the maven issue :) Thanks again for having listened to me :) [16:13:20] joal, rubber ducking :] [16:13:26] ottomata: Do you mind a brainstorm on maven and shaded jars ? [16:13:30] mforns: yop :) [16:15:06] hola! this might be a long answer to a short question, but, is there a task about "the analytics cluster in beta cluster is hard to maintain and thus isn't maintained"? Or something else that that explains the issues? [16:15:13] I'm here because of https://phabricator.wikimedia.org/T129602#2539028 [16:16:18] joal: ja gimme a few.. [16:27:20] madhuvishy: ^^ since you were the one who said it :) Just curious as to why/what we can do to make it better. [16:31:46] ottomata: LGTM, thanks for the explanations :) [16:32:02] k cool thanks for the review! [16:32:34] sorry for the extra questions, puppet is still a mistery to me :) [16:34:35] joal: batcave? [16:34:41] ottomata: OMW ! [16:36:12] Analytics, Cassandra, Discovery, Maps, and 2 others: Investigate and implement possible simplification of Cassandra Logstash filtering - https://phabricator.wikimedia.org/T130861#2540345 (Eevans) I apologize this has sat for so long, it keeps sliding down the list of priorities. I will try to ge... [16:38:30] going afk team! [16:38:32] byeeee o/ [16:49:43] oh man its gonna pour, heading home fast! [17:16:32] I'm done for today ! Have a good end of day a-team ! [17:16:39] nite joal [17:16:44] bye joal! [17:17:04] laters! [19:38:42] bye a-team, have a good night :] [19:38:50] have a good one mforns [19:39:36] byyye [19:42:48] (PS17) Mforns: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) (owner: Joal) [20:47:19] (PS4) Milimetric: [WIP] Oozify sqoop import of mediawiki tables [analytics/refinery] - https://gerrit.wikimedia.org/r/303339 (https://phabricator.wikimedia.org/T141476) [22:06:25] Analytics, Fundraising-Analysis: FR tech hadoop onboarding {flea} - https://phabricator.wikimedia.org/T118613#2541966 (atgo) a:atgo>None [23:58:12] milimetric: hey does this still work ? https://www.npmjs.com/package/passport-mediawiki-oauth [23:59:31] it should, jdlrobson, but we aren't using it actively so I'm not sure. Feel free to file bugs