[04:12:31] 10Analytics, 10Operations, 10netops: Replace eventlog1001's IP with eventlog1002's in analytics-in4 - https://phabricator.wikimedia.org/T189408#4041596 (10ayounsi) a:03ayounsi 1st change applied. Waiting for confirmation for the 2nd. [05:37:16] 10Analytics, 10Operations, 10Traffic, 10HTTPS: Update documentation for "https" field in X-Analytics - https://phabricator.wikimedia.org/T188807#4041656 (10Tbayer) [05:40:34] 10Analytics, 10Operations, 10Traffic, 10HTTPS: Update documentation for "https" field in X-Analytics - https://phabricator.wikimedia.org/T188807#4041659 (10Tbayer) @BBlack Thanks again! Back to the task at hand: I have tentatively updated the documentation based on my understanding of your remarks: https:/... [07:07:18] morning people! [07:07:19] https://grafana.wikimedia.org/dashboard/db/prometheus-druid?refresh=1m&orgId=1&from=now-12h&to=now&panelId=42&fullscreen&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=druid_public&var-druid_datasource=All [07:08:02] we have been seeing a ton of reqs to druid public brokers from ~4:30 AM [07:20:32] from the logs it seems that somebody is using the aqs edit api for big chunks of history [07:51:19] added the "edited-pages" graph to aqs-elukey [07:51:21] https://grafana.wikimedia.org/dashboard/db/aqs-elukey?orgId=1&panelId=21&fullscreen [07:51:48] so we can see spikes in those metrics too [07:52:12] top by edits 2xx seems to be what changed at 4:30 am [07:58:11] now one thing that I've never done is tracking the external IP making requests [07:58:22] of course on druid it tells me aqs, and on aqs restbase [08:04:29] Hi elukey [08:04:57] elukey: We've not experienced this kind of traffic yet - this is interesting [08:05:08] hey joal [08:05:26] very nice to see how cache grows for brokers and then eviction kicks in [08:06:32] yup [08:06:48] elukey: no error from aqs not druid for tongiht traffic? [08:10:05] not that I can see [08:10:22] the last stream of aqs errors on icinga was yesterday at around midnight utc [08:12:01] elukey: I can't even see it [08:14:02] 10Analytics: Upload XML dumps to hdfs - https://phabricator.wikimedia.org/T186559#4041789 (10JAllemandou) @diego : Thanks :) I push for productionization to be one of our priorities, but there is a fight for spots in the prioritized items ;) [08:15:09] joal: should be https://grafana.wikimedia.org/dashboard/db/prometheus-druid?refresh=1m&orgId=1&from=1520719048809&to=1520736408635&panelId=42&fullscreen&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=druid_public&var-druid_datasource=All [08:15:36] Oh right, it was on the 10th [08:15:56] sorry didn't get the date correct [08:17:02] elukey: from log size, the 10th have been a big day for druid [08:18:14] weird that I can't see it from the aqs graph [08:18:28] elukey: I have the same wonder [08:18:41] elukey: It think it might be related to reindex-day [08:19:47] so on the 10th ~22:40 there was a big spike for aqs [08:19:48] https://grafana.wikimedia.org/dashboard/db/aqs-elukey?orgId=1&from=1520717902632&to=1520725321361&panelId=21&fullscreen [08:20:45] I am surely missing some metrics in this graph [08:24:28] (there is also an issue with cache text in esams now, lovely morning) [08:24:52] Yay - aqs can wasit elukey [08:25:32] a lot of people are checking it, so I can watch both :) [08:25:54] atm I am trying to think if there is a way to see who is requesting data to restbase -> aqs -> druid [08:26:44] elukey: check webrequest ! [08:27:47] it might be our best bet yes [08:28:34] joal: can you run a query (you'll be faster and more precise than me) to figure out who's requesting top by edits ? [08:29:37] elukey: Yes sir- time interval? [08:30:10] basically anytime between 4:31 UTC to now [08:30:18] k [08:30:45] also on oxygen there should be sampled logs [08:33:21] joal: how does the path for these api reqs looks like in a webrequest? [08:34:25] elukey: hive seems to have an issue [08:35:22] elukey: sorry, usual PBCAK [08:35:31] elukey: will have data soon [08:35:57] ahhahaah [08:36:25] joal: whenever you have time let me know the path that your are looking for so I can also check the samples webreq logs [08:36:33] as they come in to oxygen [08:36:39] elukey: /api/rest_v1/metrics/edited-pages/top-by-edits [08:36:48] thanks [08:37:12] elukey: hive gives me no result for hour 6 [08:37:13] hm [08:37:44] tailing from oxygen doesn't return anything too [08:38:47] elukey: Given the sampling, I'm not too much surprised [08:39:05] elukey: However I did another PBCAK ... Maybe I'll have data at some point [08:39:25] * joal will have some more coffee [08:39:45] elukey: I haz dataz [08:39:51] Single IP [08:50:47] !log fixed evenglog1002's ipv6 (https://gerrit.wikimedia.org/r/#/c/418714/) [08:50:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:52:08] nice, now rsync eventlog1002 -> stat1005 seems to work [09:05:51] 10Analytics, 10Operations, 10netops: Replace eventlog1001's IP with eventlog1002's in analytics-in4 - https://phabricator.wikimedia.org/T189408#4041924 (10elukey) [09:12:38] 10Analytics, 10Operations, 10netops: Replace eventlog1001's IP with eventlog1002's in analytics-in4 - https://phabricator.wikimedia.org/T189408#4041928 (10elukey) Since we are doing some cleanups, I'd also like to review the following: ``` term mysql { from { destination-address { 10... [09:13:09] 10Analytics, 10Operations, 10netops: Review some IPs in the analytics-in4 filter - https://phabricator.wikimedia.org/T189408#4041932 (10elukey) [09:49:26] 10Analytics-Kanban: Correct mediawiki-reduced loading job - https://phabricator.wikimedia.org/T189448#4042079 (10JAllemandou) [09:49:45] 10Analytics-Kanban: Correct mediawiki-reduced loading job - https://phabricator.wikimedia.org/T189448#4042090 (10JAllemandou) [09:50:14] (03PS2) 10Joal: Correct oozie mediawiki-reduced job dependency [analytics/refinery] - 10https://gerrit.wikimedia.org/r/417994 (https://phabricator.wikimedia.org/T189448) [09:51:42] 10Analytics-Kanban: Improve mediwiki-history performan - https://phabricator.wikimedia.org/T189449#4042107 (10JAllemandou) a:03JAllemandou [09:51:54] 10Analytics-Kanban: Improve mediwiki-history performance - https://phabricator.wikimedia.org/T189449#4042096 (10JAllemandou) [09:55:11] (03CR) 10Joal: "Single comment inline, but not preventing from moving forward. Super good solution." (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/418052 (https://phabricator.wikimedia.org/T189332) (owner: 10Ottomata) [10:21:04] joal: as fyi there was a problem with mirror maker main-eqiad->jumbo [10:21:12] I had to restart mirror maker on kafka1020 [10:21:26] apparently all consumers in the group where getting no partitions assigned [10:21:33] :( [10:21:37] so nobody was producing :( [10:21:44] https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?orgId=1&from=now-24h&to=now&var-instance=main-eqiad_to_jumbo-eqiad [10:22:07] Since yesterday 20:00 :( [10:22:20] no alarms on that - not super nice [10:22:36] Have we changed anything on kafka yesterday? [10:22:47] nothing that I know [10:23:13] I agree that we need alarming, I think that we were going to add them this week but this happened yesterday [10:34:12] joal: should we file a pull req to lower down the rate limit for druid to 50rps? [10:34:24] (iirc it is per ip) [10:34:37] elukey: it is per IP - I wonder if it is needed though [10:35:48] I think so, I'd also lower it down more, this is only one client and I don't love how latency metrics changed [10:36:57] GC time for historical is horrible now :D [11:32:16] 10Analytics, 10Analytics-Cluster, 10User-Elukey: Fix Mirror Maker erratic behavior when replicating from main-eqiad to jumbo - https://phabricator.wikimedia.org/T189464#4042527 (10elukey) p:05Triage>03High [11:33:21] 10Analytics, 10Analytics-Cluster, 10User-Elukey: Fix Mirror Maker erratic behavior when replicating from main-eqiad to jumbo - https://phabricator.wikimedia.org/T189464#4042527 (10elukey) @Gehel found the problem this morning while trying to switch wqds back to Jumbo. He the rolled back and opened https://ph... [11:34:39] ok so we have currently two pending issues: [11:35:22] 1) Druid's public endpoint seems to be consumed by a bot and we should lower down our rate limit to 50rps/ip or even lower [11:35:39] 2) Mirror Maker is not behaving/replicating, started from yesterday [11:35:44] I opened a task about 2) [11:36:34] going to be afk for ~2h for lunch + errand but ping me if I am needed [11:36:48] when I am back I am planning to work on both with Joseph/Andrew [12:19:09] (03PS27) 10Joal: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) (owner: 10Mforns) [12:56:17] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Create an LVS endpoint for jobrunners on videoscalers - https://phabricator.wikimedia.org/T188947#4042934 (10Joe) I'd take the chance we have to do this to do as follows: # Add `mediawiki::multimedia` to the jobrunners # Add a seco... [13:25:07] yoohooo [13:26:12] (03CR) 10Ottomata: Get smart and hacky about compatible type casting (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/418052 (https://phabricator.wikimedia.org/T189332) (owner: 10Ottomata) [13:26:50] (03PS2) 10Ottomata: Get smart and hacky about compatible type casting [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/418052 (https://phabricator.wikimedia.org/T189332) [13:29:29] joal: am testing a couple of things, you think we can do a refinery source release in a min? [13:33:42] Hey ottomata [13:33:56] ottomata: We an go for refinery-source when you wish [13:35:15] cooool i haven't been following the recent changes you and mforns have been doing to whitelist, but I say +1 to wahtever yall are doing [13:35:23] so merge that at will, maybe add an entry to changelog about it [13:35:23] :) [13:38:30] (03PS28) 10Joal: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) (owner: 10Mforns) [13:38:37] ottomata: Just pushed a patch including changelog modif [13:38:47] ottomata: +1? [13:39:21] joal: +1 but derby.log :) [13:39:29] ottomata: Ah !! Crap [13:40:12] (03PS29) 10Joal: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) (owner: 10Mforns) [13:40:24] I am back :) [13:40:29] ottomata: o/ [13:40:43] Hi again elukey [13:40:50] hiii [13:43:32] ottomata: did you make any change to mirror maker? (restart etc.. ?) [13:43:36] main-eqiad -> humbo [13:43:37] hahaha [13:43:39] jumbo [13:44:01] it works now but with that weird spiky pattern [13:44:20] elukey: not since i rolled back the mw monolog producer last week [13:44:55] ahhh okok so there is a task in your emails then :) [13:45:48] ok :) [13:45:53] hmm, joal we might have a problem: [13:46:03] ls -lh /mnt/hdfs/wmf/data/event/SearchSatisfaction/year=2018/month=3/day=5/hour=2 [13:46:08] vs [13:46:18] ls -lh /mnt/hdfs/wmf/data/event/SearchSatisfaction/year=2018/month=3/day=5/hour=1 [13:46:20] file sizes [13:46:26] hour=2 is with the new code [13:46:42] i wonder if this is beacuse it iterates over all the row rdd somehow? [13:47:04] (also no _REFINED flag, not sure why, checking) [13:48:07] ottomata: could be because of the number of files [13:48:18] I don't why, but hour=2 has 200 files [13:49:23] yeah, but i just refined hour=2 with the new code [13:49:26] hour=1 was done with the old code [13:49:38] so, the new code creates more smaller files [13:50:51] ottomata: the new code repartition the data, and therefore writes a lot more smaller files [13:51:12] I think 200 is the default number of partitions for spark-SQL related commands [13:52:16] ohhh [13:52:17] ok [13:52:22] so it this acceptable you think? [13:52:36] ottomata: Nope [13:53:10] ottomata: hour 1 has 3 small files, so 200 makes them too small (and too many) [13:54:04] ottomata: 150kb avg for hour 2, 1Mb, 5Mb and 5Mb for hour 1 [13:54:32] ottomata: Where do we use SQL-related commands in the code? [13:56:21] yes agree [13:56:45] joal: when writing the data [13:56:55] well [13:56:56] maybe? [13:56:57] insertInto? [13:56:58] outputDf.write.mode("overwrite") [13:56:58] .partitionBy(partitionNames:_*) [13:56:58] .insertInto(partition.tableName) [13:57:10] ottomata: This has not changed, has-it? [13:57:11] but, we did that before too [13:57:12] right [13:57:14] yeah [13:57:17] what has changed is convertDataFarme [13:57:21] hm [13:57:23] before, we re-read the json with the schema [13:57:26] and then wrote that dataframe [13:57:46] convertToSchema* [13:57:56] ok [13:58:00] * fdans DST is stupid [13:58:02] maybe convRdd has too many partitions? [13:58:40] ottomata: I think convertRdd makes use of DataFrame API, which by default set the number of partitions to 200 [13:59:04] ottomata: 2 approaches: remove DataFrame API calls - Or preset number of partitions for SQL [13:59:13] ottomata: Is the code merged [13:59:15] ? [13:59:20] no] [13:59:21] oh [13:59:25] convert is [13:59:26] the casting isn't [13:59:33] let's merge the casting, it works [13:59:36] i refined some data that failed before [13:59:40] awesome [13:59:45] so i'm going to merge that, i also have a fix for a string thing [13:59:47] and then lets fix this [13:59:50] repartition? [13:59:56] yessir [14:00:08] joal +1 on https://gerrit.wikimedia.org/r/#/c/418052/ [14:00:10] I think whitelisting is ready (I did not test, but Marcel did) [14:00:14] ok cool [14:00:15] let's merge it too [14:00:18] yes [14:00:29] i merge whitelist you merge mine [14:00:31] :) [14:00:33] :) [14:00:44] (03CR) 10Ottomata: [C: 032] Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) (owner: 10Mforns) [14:00:56] (03CR) 10Joal: [C: 032] Get smart and hacky about compatible type casting [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/418052 (https://phabricator.wikimedia.org/T189332) (owner: 10Ottomata) [14:01:19] Ok, now let's wait for jenkins o do its job [14:01:28] And let's debug hat repartition thing [14:02:54] hm ottomata - where is that patch with convert? [14:03:07] (03PS3) 10Ottomata: Get smart and hacky about compatible type casting [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/418052 (https://phabricator.wikimedia.org/T189332) [14:03:17] (03PS1) 10Ottomata: Fix hivePartitionPath to print proper partition Location [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/418916 [14:06:43] joal this one? https://gerrit.wikimedia.org/r/#/c/410942/ [14:07:59] this is code that added convert function https://gerrit.wikimedia.org/r/#/c/410241/6/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/jsonrefine/SparkSQLHiveExtensions.scala [14:10:47] Oh ! It's been merged [14:10:54] ok, reading agian [14:11:54] 10Analytics, 10Analytics-Cluster, 10User-Elukey: Fix Mirror Maker erratic behavior when replicating from main-eqiad to jumbo - https://phabricator.wikimedia.org/T189464#4043165 (10Gehel) [14:15:57] very nice that https://github.com/ben-manes/caffeine is suggested for recent versions of druid rather than the local cache [14:15:58] ottomata: I think the repartitioning occurs when applying transform functions [14:17:16] elukey: I didn't of caffeine ! [14:17:36] Oh joal intersting [14:17:37] trying [14:17:38] elukey: Now we have j8, I'll patch our LRUCache in refinery-source for a caffeine one :) [14:17:55] ottomata: those transform functions make use of DF api [14:19:36] what I currently don't like is the GC timings for the historical [14:19:44] because they spikes to seconds [14:20:10] elukey: Right [14:20:13] (young gen if I am seeing correctly) [14:20:38] In term of load, it's not that bad [14:20:41] also, we have set broker cache to ~2G of heap size but metrics show ~4 ? [14:20:44] elukey: --^ https://grafana-admin.wikimedia.org/dashboard/file/server-board.json?orgId=1&var-server=druid100%5B456%5D&var-network=eth0&from=now-12h&to=now&refresh=1m [14:20:56] elukey: have we restarted since? [14:21:03] joal: you are right, without transform funcs, 3 files. [14:21:05] o [14:21:06] ok [14:21:08] so, what can we do? [14:21:15] ottomata: I have a patch on the way :) [14:21:44] joal: restarted ? [14:21:56] since the patch for 4G [14:22:00] :) [14:22:32] joal: I am pretty sure we did [14:22:52] and we haven't patched to 4G [14:23:09] # Increase druid broker query cache size to 2G. [14:23:09] # TBD: Perhaps we should also try using memcached? [14:23:09] druid.cache.sizeInBytes: 2147483648 [14:23:41] joal [14:23:52] fyi, it is specifically the deduplicate function that does it [14:23:59] dropDuplicates [14:24:10] joal: i could get num partitions before [14:24:14] and then repartition after drop duplicates back to orig [14:24:21] not sure what you are gonna do :p [14:24:26] maybe that ^ ? [14:25:23] to the point ottomata :) [14:25:25] (03PS1) 10Joal: Update Refine partitioning strategy [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/418919 [14:25:28] ottomata: --^ [14:26:05] nice joal :) [14:26:21] ottomata: Since the point of transform functions is to provide extendable code, let's not enforce RDD only API :) [14:26:33] waiting for jenkins, then will merge :) [14:26:41] cool [14:27:49] (03CR) 10Ottomata: [C: 032] Fix hivePartitionPath to print proper partition Location [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/418916 (owner: 10Ottomata) [14:28:07] yargghhh joal i have meetings today from standup +4 hours! [14:28:14] crap [14:28:18] hm [14:28:22] 2018-03-10T22:38:52,481 WARN org.eclipse.jetty.server.ServerConnector: [14:28:25] java.io.IOException: Too many open files [14:28:25] ottomata: Let's merge and deploy now [14:28:28] lovely [14:28:37] elukey: druid?A [14:29:12] druid1004's broker [14:29:16] so the processs has Max open files 4096 4096 files [14:29:29] joal ok, but i kinda want to be ready when json refine changes,... the puppet cron uses the unversioned symlink [14:29:32] hm [14:29:38] i can make a patch to use versioned [14:29:43] everything else uses versioned, right? [14:30:06] (03CR) 10Ottomata: [C: 032] Update Refine partitioning strategy [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/418919 (owner: 10Joal) [14:30:11] ottomata: hm, I don't think so [14:30:25] ottomata: straming job was using simlink, I think camus is as well [14:30:43] joal: ah no wait for some reason the logs stopped at 2018-03-10T22:38:52 [14:31:03] joal: but those ones are unchnaed, right? [14:31:06] elukey: probably because of that specific error :) [14:31:15] unchaned? [14:32:28] !log restart druid-broker on druid1004 - no /var/log/druid/broker.log after 2018-03-10T22:38:52 (java.io.IOException: Too many open files_ [14:32:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:33:24] elukey: I don't understand that issue thoug [14:33:56] elukey: from what I read, druid uses files when doing group-by, which we normally don't do [14:34:15] elukey: maybe the request that broke it didn't come from AQS but from superset? [14:34:35] on druid1005 it does this [14:34:36] 2018-03-12T14:26:57,097 WARN org.eclipse.jetty.server.HttpChannel: Could not send response error 500: java.io.IOException: Stream closed [14:34:38] (03PS1) 10Cicalese: Added queries of PHP version by MediaWiki version. [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/418922 [14:34:39] 2018-03-12T14:26:58,736 WARN org.eclipse.jetty.servlet.ServletHandler: /druid/v2/ [14:34:42] java.io.IOException: Stream closed [14:34:44] all different :D [14:34:57] yay, awesomeness of distributed system [14:35:05] joal: starting to do release [14:36:26] ottomata: let me know if you wish me to help [14:36:51] ottomata: also, I need to provide a patch for refinery before deploy (adding new jar version for mobile-apps session job) [14:36:52] https://integration.wikimedia.org/ci/job/analytics-refinery-release/95/ [14:36:53] joal: sockets are also counted under open files no? [14:37:00] ok [14:37:12] elukey: very possible [14:37:16] joal: perhaps I let you deploy refinery [14:37:25] ottomata: works for me [14:38:37] ottomata: we already have v0.0.58 [14:38:52] ottomata: have you forced a rebuild before feleasing? [14:39:59] Ah ottomata - My mistake, I read the first commit of the list, not the provided version [14:43:23] other interesting thing: [14:43:25] elukey@druid1006:/var/log/druid$ sudo du -hs /var/lib/druid/segment-cache/* [14:43:28] 2.3M /var/lib/druid/segment-cache/info_dir [14:43:31] 157G /var/lib/druid/segment-cache/mediawiki_history_reduced [14:43:37] max size of the segment cache is 2T [14:47:22] elukey: I think I don't understand :( [14:48:12] joal: release finished [14:48:32] (03PS1) 10Joal: Update mobile-app session job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/418925 (https://phabricator.wikimedia.org/T184768) [14:48:38] ottomata: --^ if you don' mind [14:49:43] joal: in theory we have on every druid host a separate partition that has ~2.4T of free space, that is mounted as /var/lib/druid [14:49:57] ok [14:50:08] joal: in there there should be our segment cache, that we set maximum at ~2.4T [14:50:31] buuut we use only 150G - is it because mw history is not that big [14:50:38] elukey: correct [14:50:41] or because for some reason we are not caching a lot? [14:50:52] The reduced data is no that big [14:51:06] do we know how much are all the segments ? [14:51:24] (03CR) 10Ottomata: [V: 032 C: 032] Update mobile-app session job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/418925 (https://phabricator.wikimedia.org/T184768) (owner: 10Joal) [14:51:25] well, 150Gb of very condensed data in columnar storage is actually pretty big, but not in size :) [14:51:26] merged joal [14:51:31] Thanks ottomata [14:53:01] ottomata: we need to add the jars to refinery - Do you launch jenkins? [14:54:54] oh [14:55:02] oh forgot that step [14:55:02] doing [14:58:05] 10Analytics: pyspark2 different versions in Driver and Workers - https://phabricator.wikimedia.org/T189497#4043377 (10diego) [14:59:11] ottomata: are you ready for me to deploy refinery (therefore the jars and symlinks) ? [14:59:39] https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars/66/ yes! [14:59:54] 1. Add refinery-source jars for v0.0.58 to artifacts (https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars/66/changes#detail0) [14:59:54] ? [15:00:36] ok it did it [15:00:38] yeah joal am ready [15:00:38] do it [15:00:43] ok ottomata [15:03:20] a-team standup! [15:03:33] goddammit [15:03:46] sorry [15:05:07] milimetric: ? [15:05:26] oh you are off? [15:07:22] !log Deploy refinery from scap [15:07:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:09:09] !log Deploy refinery onto hdfs [15:09:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:09:38] 10Analytics-Tech-community-metrics, 10Developer-Relations (Apr-Jun-2018): Explain decrease in number of patchset authors for same time span when accessed 3 months later - https://phabricator.wikimedia.org/T184427#4043440 (10Aklapper) [15:10:42] ottomata: refiner deployed - jars are new [15:11:44] (03PS3) 10Joal: Correct oozie mediawiki-reduced job dependency [analytics/refinery] - 10https://gerrit.wikimedia.org/r/417994 (https://phabricator.wikimedia.org/T189448) [15:12:08] gr8 [15:18:45] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): Check detached accounts in DB with same username for "mediawiki" and "phab" sources but different uuid's (and merge if connected) - https://phabricator.wikimedia.org/T170091#4043473 (10Aklapper) [15:25:32] ottomata: I messed up my patch with mobile_apps [15:25:36] fixing now [15:26:51] (03PS1) 10Joal: Fix bug in mobile_apps session job conf [analytics/refinery] - 10https://gerrit.wikimedia.org/r/418931 [15:26:55] ottomata: --^ [15:37:27] (03CR) 10Ottomata: [V: 032 C: 032] Fix bug in mobile_apps session job conf [analytics/refinery] - 10https://gerrit.wikimedia.org/r/418931 (owner: 10Joal) [15:46:29] heeey team :] sorry for being late, I missed the DST change and thought meetings were 1 hour later... [15:46:59] in spain only in 2 weeks... [15:48:56] joal: https://github.com/wikimedia/restbase/pull/965 [15:49:06] mforns too --^ [15:56:58] elukey, left a comment cause I think there's one limit that slipped, but makes sense and looks good! [15:59:19] checking! [16:21:47] (03PS4) 10Joal: Correct oozie mediawiki-reduced job dependency [analytics/refinery] - 10https://gerrit.wikimedia.org/r/417994 (https://phabricator.wikimedia.org/T189448) [16:22:06] (03CR) 10Joal: [V: 032 C: 032] "Merging before deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/417994 (https://phabricator.wikimedia.org/T189448) (owner: 10Joal) [16:26:11] !log Deploying refinery again to provide patch for mobile_apps_session_metric job [16:26:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:28:49] hi joal: can a I make you a quick question about the parquet dumps? [16:28:55] sure dsaez [16:29:39] I'm trying to get the timestamp for each revision. I'm doing a query on hive, and then try to join, but I get a memory error [16:30:39] makes sense given that there are around 700M revisions [16:30:56] dsaez: the question is more about: what do you do wih those? [16:31:44] I run a regex on all the revisions (amazingly fast in spark) [16:32:21] But there is no timestamp [16:32:42] I need the timestamp. Because I'm creating a temporal link graph [16:32:43] dsaez: Indeed I have removed timestamp from the dumps - that was a mistake! [16:33:29] dsaez: joining between your subset of revisions already processed and mediawiki-history is what I'd do [16:34:11] dsaez: In the meantime I'm gonna update the XMLConverter to add timestamp to it [16:34:15] I'm doing that, but getting the memory error java.lang.OutOfMemoryError: Java heap space [16:34:24] dsaez: spark? [16:34:30] joal: that sounds great [16:34:34] yes [16:34:38] pyspark [16:34:39] hm [16:35:21] ok a few hints here and there dsaez - We've done some experiments (well, the discovery team did ;) - For those kind of big stuff using scala-spark is really better [16:35:37] we've an improvement of 10x between pyspark and scala-spark depending of the work [16:36:02] Also, the question is: how much memory o you give to your workers? [16:36:05] and driver? [16:36:16] joal: I now that scala is much more efficient, but basically I'm doing one-time usage code, and also I should share this with the research community and they use python [16:36:44] joal: I'm not explicitly given the memory, because this: https://phabricator.wikimedia.org/T189497 [16:37:03] I'm just running pypsark --master yarn [16:38:13] dsaez: you run spark from a notebook? [16:38:31] joal: yes! [16:38:51] no good idea? [16:38:54] dsaez: And from the ticket, you run spark2 [16:39:03] joal: yes [16:39:14] dsaez: just making sure I'm up to date on settings [16:39:25] dsaez: I never use pyspark, so I can't really say [16:39:39] For OOM in spark, giving it more memory is the way to go [16:40:04] I see [16:40:18] And since the request is about a huge join, I'd say give more memory to your master, and some more memory-overhead [16:40:27] s/master/driver [16:41:05] joal: ok, but when declaring the workers, I got that version problem [16:41:51] dsaez: It's actually super bizarre that it doesn't fail without executor-memory setting if python versions are different [16:42:24] joal: maybe is all working the on the driver? [16:42:45] dsaez: I don't think so [16:42:56] dsaez: Or the data you're processing is small enough [16:43:30] joal: the full dump enwiki 2018-01 [16:43:35] and goes pretty fast [16:44:10] dsaez: do you have a spark instance currently running/ [16:44:11] ? [16:44:17] From the cluster, I'd say no [16:44:26] joal: I've just stop [16:44:38] dsaez: Can you start it again please? [16:44:40] sure [16:44:44] Not the job, the spark app [16:44:47] :) [16:45:07] joal: done, this is the way that I get no error [16:45:36] dsaez: ok, so you're running in non-distributed mode [16:45:43] dsaez: https://yarn.wikimedia.org/cluster/scheduler [16:45:47] ok, that's why [16:46:03] I'll try to install a python2.7 virtualenv [16:46:22] joal: ok, I'll run now In the way that I get the error [16:46:26] the version error [16:46:44] dsaez: try without executor-memory, but with --master yarn only [16:47:03] joal: ok, done [16:47:21] dsaez: no yarn job [16:47:31] dsaez: From which machine do you launch it? [16:47:44] 10Analytics: pyspark2 different versions in Driver and Workers - https://phabricator.wikimedia.org/T189497#4043377 (10Ottomata) Ah, python3 is installed! I think the issue is that pyspark2 from stat1005 is stretch with Python 3.5, but most workers are Jessie with Python 3.4. We're in the process of adding more... [16:47:47] stat1005 [16:48:13] joal: now is running [16:48:40] dsaez: no yarn job [16:48:56] ¿? [16:49:02] let's do it again [16:49:06] dsaez: How do you launch spark? [16:49:11] from a terminal? [16:49:18] the druid bot stopped in the meantime [16:49:26] but I'll push anyway for the rate limit [16:49:28] :) [16:49:38] joal> pyspark2 --master yarn [16:49:48] elukey: afer having downloaded the full history of top pages per day, it finally stopped :) [16:49:54] joal> what about now? [16:50:24] nope dsaez - nothing [16:50:35] trying myself dsaez [16:51:22] dsaez: look at the yarn UI (link above) [16:52:01] I exectly did what you did dsaez: pyspark2 --master yarn [16:52:25] (joal i added to analytics-goals etherpad: Spark 2, deprecate and remove spark 1.6 from cluster.) [16:52:32] for next quarter :) [16:52:34] !log Deploying refinery on HDFS for mobile_apps patch [16:52:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:52:44] ottomata: Many thanks :) [16:52:45] joal: application_1520532368078_12129 dsaez [16:53:01] Here we go dsaez :) [16:53:10] dsaez: what have you changed? [16:53:26] I've just cleaned the cookies of my browser [16:53:49] apparently, when the jupyter notebook restart the kernel (when reconnect) kills the spark process [16:54:07] Ah [16:54:14] That's weird :) [16:54:38] joal: now, I'll execute a udf ...let's see if gives the error [16:54:55] However dsaez - If this is running that way (disributed etc) you should be able to provide some conf settings [16:55:25] joal: now I get the error Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set. [16:55:33] !log Restart mobile_apps_session_metrics [16:55:34] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:56:13] joal: i see to possible solutions, install python3 on the workers, or that I create a virtualenv with python2.7 on the stat1005 [16:56:59] joal: let me try the last one, that it would be the easiest [16:57:10] dsaez: see comment on your ticket [16:57:12] python 3 is installed [16:57:14] but python 3.4 [16:57:15] but [16:57:23] try running from stat1004 [16:57:27] which has python3.4 too [16:57:37] but, i think that is not the problem, since it says python version is 2.7 [16:57:43] i'm not sure how PYSPARK_PYTHON works [16:57:54] ottomata, let me check [16:58:04] but maybe by setting it to jupyter, you are causing workers to use python 2.7 [16:58:09] but your driver is using 3 [16:58:59] ottomata, I'll check the jupyter documentation ... but don't think that's possible [16:59:18] well Python in worker has different version 2.7 seems to indicate that worker is running python 2.7 [16:59:20] dunno why though [17:00:04] https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/ [17:00:31] (03CR) 10Mforns: [C: 032] "LGTM!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/413689 (owner: 10Joal) [17:00:55] (03PS2) 10Mforns: Use an accumulator to count in spark Refine [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/413689 (owner: 10Joal) [17:01:01] (03CR) 10jerkins-bot: [V: 04-1] Use an accumulator to count in spark Refine [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/413689 (owner: 10Joal) [17:01:13] ottomata, joal: at least using python2.7 on my virtualenv doesn't help [17:02:25] 10Analytics, 10Cloud-VPS, 10EventBus, 10Services (watching): Set up a Cloud VPS Kafka Cluster with replicated eventbus production data - https://phabricator.wikimedia.org/T187225#4043848 (10Ottomata) Hm, possibly. Something like 3 nodes, 2TB+ storage each, 32G+ RAM each, 12ish CPUs. I haven't loved maint... [17:03:03] ottomata: mirror-maker graphs are a bit flat :( [17:04:34] dsaez: I tried to run a simple SQL query using pyspark2, it worked [17:04:40] what are you trying to do? [17:04:54] queries work [17:05:11] joal: I get the error running a regular expression [17:06:37] dsaez: can you paste some code please? [17:06:52] from pyspark.sql.functions import udf [17:06:52] import re [17:06:52] def getWikilinks(wikitext): #UDF to get wikipedia pages titles [17:06:52] links = re.findall("\[\[(.*?)\]\]",wikitext) #get wikilinks [17:06:52] titles = [link.split('|')[0] for link in links] #get pages [17:06:53] return titles [17:06:55] udfGetWikilinks = udf(getWikilinks) [17:07:04] df = spark.read.parquet('hdfs:///user/joal/wmf/data/wmf/mediawiki/wikitext/snapshot=2018-01/cawiki') [17:07:09] df2 = df.withColumn('wikilinks',udfGetWikilinks(df.revision_text)) [17:07:13] df2.show() [17:12:01] dsaez: I have your exact function working well with rdds [17:12:11] I'm assuming the problem comes from udf [17:12:34] joal: will try, I can code this with lambda function [17:13:02] dsaez: https://gist.github.com/jobar/aa576e0ba736c1ed0a4b4621187afeda [17:15:08] joal: I get Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. [17:15:33] ups, same error [17:15:34] Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set. [17:15:41] joal: are you using any virtualenv? [17:15:56] dsaez: I suggest running pyspark out of a notebook :) [17:16:23] joal: yes! I think that's the point! I'll try and let you know, thank you very much [17:17:14] dsaez: your exact code has worked fine for me out of a workbook [17:17:53] joal, are you using a interactive-shell or just spark-submit ? [17:18:04] dsaez: interactive sheel [17:18:09] s/sheel/shell [17:18:18] shouldn't mnake any difference dsaez [17:18:41] I know... weird notebooks [17:19:17] prety but crazy [17:19:46] joal: now I managed to make it work with the notebook using python2.7! [17:19:57] weird [17:22:17] elukey: (in meetign) HMMM, MM logs look pretty goood.. [17:22:32] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Create an LVS endpoint for jobrunners on videoscalers - https://phabricator.wikimedia.org/T188947#4043978 (10mobrovac) These jobs are not high-traffic, so consolidating the job runners and spreading the load all over them sounds lik... [17:23:25] elukey: going to bounce one of the brokers.. [17:23:31] sorry [17:23:32] not brokers [17:23:34] mm instances :p [17:24:32] 10Analytics: pyspark2 different versions in Driver and Workers - https://phabricator.wikimedia.org/T189497#4043992 (10diego) @Ottomata and @JAllemandou I found a work-around by creating an python2.7 virtualenv on stat1005. I think that is the easiest solution right now. Updating python3 on the workers might b... [17:27:08] ottomata: ok :D [17:27:34] ottomata: just added a role::eventlogging::analytics::legacy to eventlog1001, with only the forwarder [17:27:41] enabled puppet and removed all the other configs [17:28:01] so there is no risk to accidentally have both running at the same time [17:31:01] gr8 [17:31:07] elukey: i just bounced mm instance on 1023 [17:31:17] it started producing, and also seems to be the only one doing so [17:31:23] even though others do 'own' partitions [17:32:22] elukey: i'm really not sure what the heckay is happening here. [17:32:31] i'm going to remove the mw job topics from mm replication to jumbo [17:32:50] ok ottomata - mobile_app job fixed - sorry for the mess [17:33:33] ottomata: ack [17:34:28] !log Restart mediawiki-history-reduced oozie job to add a dependency [17:34:28] i didn't even know it was broken :p! [17:34:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:35:36] or API version? i'll try that first. [17:36:31] oh elukeyhmm [17:36:33] actually [17:36:33] first [17:36:38] we should fix the webrequest api version [17:36:41] remember? i set that incorrectly [17:36:46] all vks are using the wrong version [17:36:48] right now [17:38:11] ottomata: yep but it shouldn't be a huge deal no? we should switch to no enforced version though (so enable auto-negotiation) [17:38:19] right [17:38:40] https://gerrit.wikimedia.org/r/#/c/418959/ [17:38:52] i'm going to manually apply on canaray and restart vk [17:39:54] should we just remove that bit since we are using jumbo only? [17:40:00] yes [17:40:02] well [17:40:06] oh we should yaeh [17:40:13] elukey: canary is still using analytics! [17:40:15] i thought it used text... [17:40:46] role text [17:40:49] ahahha lol well let's just switch it to jumbo too [17:41:11] AHH, because it uses role function on canary [17:41:18] we need to put it in hiera canary.yaml [17:41:42] https://gerrit.wikimedia.org/r/#/c/418959/2/modules/profile/manifests/cache/kafka/eventlogging.pp - eventlogging still needs this right? [17:42:09] (03PS6) 10Joal: Add XmlConverter spark job [analytics/wikihadoop] - 10https://gerrit.wikimedia.org/r/361440 [17:42:14] yes [17:42:38] OH sorry didn't mean to remove it... [17:44:28] elukey: looks ok on 1052 [17:44:35] (i modifeid it there too) [17:44:39] https://gerrit.wikimedia.org/r/#/c/418959/ [17:44:52] yar [17:44:53] still wrong [17:45:21] thaaar we go [17:45:50] elukey: +1? [17:46:39] lemme check [17:46:55] gogogog [17:47:03] ook [17:48:38] (03CR) 10Cicalese: "This is ready to be reviewed/merged. Thanks!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/418922 (owner: 10Cicalese) [18:06:30] ottomata: afaics all vk instances restarted, and no issue reported/visible in our metrics [18:06:40] great [18:06:47] did you restart all of them at once? [18:07:11] ah no they seems a bit staggered [18:07:15] super good [18:07:16] i ran puppet via cumin [18:07:23] but staggered per cluster [18:07:30] yep my bad, I was just curious no concern [18:07:58] all right, I'd log off if nothing is outstanding [18:09:55] elukey: i'm really just trying things now, i'm wondering if local buffers are filling up too fast, and then blocking? going to increase producer buffer.memory and see [18:10:49] !log bouncing kafka mirrormaker for main-eqiad -> jumbo with buffer.memory=128M [18:10:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:15:06] (Peter is deploying restbase now, so edit-related endpoints will be rated limited to 25rps now) [18:17:38] !log bouncing kafka mm eqiad -> jumbo witih acks=1 [18:17:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:23:28] seems going well now right? [18:24:19] going off, will read laterz :) [18:24:20] * elukey off! [18:26:41] inteesting elukey yeah, but by making acks=1 [18:26:43] which is not ideal [18:26:49] laters [18:31:20] nope, not fine, just took longer [18:43:47] 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Goal, 10Services (doing): FY17/18 Q3 Program 8 Services Goal: Migrate two high-traffic jobs over to EventBus - https://phabricator.wikimedia.org/T183744#4044428 (10Pchelolo) [18:43:51] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, 10Services (done): Support claimTTL and rootClaimTTL in change-prop - https://phabricator.wikimedia.org/T189303#4044425 (10Pchelolo) 05Open>03Resolved [19:03:59] heya ottomata - Have switched refine to new code? [19:06:16] abou tto! [19:06:24] been dealing with mirror maker [19:07:56] (03CR) 10Mforns: [C: 032] "Looks good! Nice use of the explode_by feature :]" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/418922 (owner: 10Cicalese) [19:39:23] !log deployed new Refine jobs (eventlogging, eventbus, etc.) with deduplication and geocoding and casting [19:39:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:43:17] 10Analytics-Kanban, 10Google-Summer-of-Code (2018): [Analytics] Improvements to Wikistats2 front-end - https://phabricator.wikimedia.org/T189210#4044585 (10sahil505) @mforns - I would be interested to work on this project for GSoC 2018. I have explored some of the sub-tasks that have been listed above and have... [19:45:59] haha dsaez, yt? [19:46:11] you have somehow requested ALL of the memory on the cluster [19:46:13] almost all [19:46:21] ottomata yeah! [19:46:21] 2TB [19:46:28] you meant to do that? [19:46:37] I am the king [19:46:40] hahah [19:46:44] no, sorry, let me kill stuff [19:47:00] better now? [19:48:08] still says 2022400 allocated, its ok though (maybe) i'm going to make the jobs i'm trying to run production queue [19:48:15] hopefully htat wil pre-empt some of your containers [19:49:12] I'm joining two dataframes of 500M lines each... anyhow, seems to be too much memory ( I had another process runing but I've just killed) [19:49:50] I'm browsing in yarn.wikimedia.org, but where exactly can you see the memory usage per user? [19:50:46] not per user, but per job [19:50:48] https://yarn.wikimedia.org/cluster/app/application_1520532368078_12332 [19:50:51] https://yarn.wikimedia.org/cluster/scheduler [19:51:01] i'm looking at ^ 3rd column from right [19:51:06] Allocated Memory MB [19:51:19] I see [19:51:50] should I kill that process? not sure how long it will take... as I told it is just join [19:52:12] maybe I can write to hive and join there, I image that would be better [19:52:35] hm, i dunno, maybe, are you using spark dynamic allocation? [19:52:55] or just requesting 42 workers with 48G of ram each? [19:53:18] dsaez: 42Gb of ram per executor is really a lot [19:53:35] ok, i'll kill this, sorry, I was just doing some trial and error [19:53:55] i dunno if he actually requested that, i just see 43 exectors and ~2TB RAM used [19:54:10] maybe 42 executors, one driver? [19:54:13] dsaez: we usually look at the ratio RAM/CPU - The cluster is configured with 2x more RAM than core, leading to ~2Gb RAM per CPU core [19:54:30] ottomata: indeed, 42 exsec, 1 driver, 42Gb RAM each [19:54:47] killed. [19:54:50] Thanks :) [19:56:20] dsaez: I suggest you use 8Gb per executor, 2 cores per executor, and 8Gb for the driver as well. It would also be good, with those settings, to add: --conf spark.dynamicAllocation.maxExecutors=128, leading to max 1/2 the cluster used for the job [19:56:49] sorry guys, that was wrong debuging strategy [19:56:58] joal: ok, sounds reasonable [19:57:23] thank you dsaez! [19:57:37] i was just enjoying the power of the cloud :D [19:58:00] dsaez: I love that feeling of CPUs heating :) [19:58:40] ;) [20:04:52] ottomata: how's the thing with new-Refine? [20:12:05] running the first one now! crons for other start in a bit [20:12:14] its in cluster mode, so will see logs soon i guess... :) [20:18:58] joal: mostly good! [20:19:08] in this run i got a cast exception for something we don't catch, but should be workable [20:19:09] will add it [20:19:13] (double -> long) [20:19:34] ottomata: funny - Still the same schema? [20:19:39] no different one [20:20:08] hmm, actually [20:20:10] it got past our checks [20:20:59] which means hive type coercion should have worked [20:21:04] its failling at parquet level [20:21:17] maybe [20:21:18] java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Long [20:21:18] at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaLongObjectInspector.get(JavaLongObjectInspector.java:40) [20:21:18] at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$LongDataWriter.write(DataWritableWriter.java:398) [20:22:26] geocoded_data column added just fine though woohooo [20:26:50] coool, RefineMonitor should work too! [20:28:31] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review, 10User-Elukey: Fix Mirror Maker erratic behavior when replicating from main-eqiad to jumbo - https://phabricator.wikimedia.org/T189464#4044688 (10Ottomata) Not sure what is happening still. I thought that I had reduced the Batch Expired error by sett... [20:30:45] ottomata: I got it - And actually that means we need to be good at casting ourselves [20:31:06] ottomata: When we were reading the data using a predefined (merged) schema, casting was done at read time [20:31:20] 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review: Record and aggregate page previews - https://phabricator.wikimedia.org/T186728#4044694 (10Ottomata) FYI, I just deployed the change that does geocoding (and deduplication) for EventLogging events in Hive.... [20:31:25] Now, casting is NOT done anymore (except for the specific cases already implemented [20:31:48] ohhhhh [20:31:55] hm [20:32:02] but, shouldn't the HiveTypeCoercion thing handle this? [20:32:04] makes sense [20:32:16] ottomata: the coercion is made at type level, not value level [20:32:43] So type coercion means the value should cast correctly, but it still needs to be done [20:33:14] hmm [20:33:35] not sure that makes sense to me, why would spark have this logic in it if it didn't use it? [20:33:40] shouldn't htis work if you do something like [20:34:06] sqlContext.sql("INSERT INTO ... VALUES(1.234, ...)"), where the first column is a Long? [20:34:11] i will try! [20:41:07] ottomata: thinking again about that type coercion thing [20:41:24] The function we use is on types - It doesn't affect values [20:41:44] However, it tells you that you should be able to safely cast from to another [20:41:58] Which means casting would still need to be done [20:42:33] joal [20:42:38] bc? [20:42:40] i just tested some stuff [20:43:49] tested inserting doubles into a long hive column, it worked [20:43:50] OMW [21:41:24] 10Analytics, 10Performance-Team: Possible statsv corruption? - https://phabricator.wikimedia.org/T189530#4044794 (10Krinkle) [22:01:48] haha joal we do not need to worry about this schema [22:01:54] https://meta.wikimedia.org/w/index.php?title=Schema:InputDeviceDynamics [22:02:14] "The data depends on the field name. [22:02:30] I'm goign to blacklist this schema [22:07:57] (03CR) 10Cicalese: "Yes, absolutely, that was my intention - one file per MediaWiki version including multiple PHP versions per file. Thanks!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/418922 (owner: 10Cicalese) [22:08:32] (03CR) 10Cicalese: [V: 032] Added queries of PHP version by MediaWiki version. [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/418922 (owner: 10Cicalese) [22:47:25] (03CR) 10Mforns: [C: 032] "Hey joal, this code looks great! We'll have a *lot* of information about the reconstruction process. I left 2 typo comments, but +2 on my " (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415255 (https://phabricator.wikimedia.org/T155507) (owner: 10Joal) [22:49:48] mforns: Is there something else that will need to be done to deploy the new pingback queries, or will the .tsv files just automagically appear at some point soon? Thanks~ [22:50:11] hi CindyCicaleseWMF :] [22:50:29] Hi mforns :-) [22:51:32] whenever your change is merged, and puppet deploys it to stat1006 (20 mins later) your report will start to execute [22:51:54] have you seen my comment on the review? is that OK for you? [22:52:33] oh, OK, reading your comment now [22:52:46] Excellent! I was pretty sure you had some wonderful deployment magic. Yes, I responded to your comment on gerrit. I'm glad you liked the use of explode_by ;-) And, I did realize it would generate multiple files in a subdirectory. I tested it locally before creating the patch. [22:53:04] perfect [22:53:18] Thanks so much for the review! [22:53:38] so yes, you should hopefully see report files being written soon, probably at the next full hour when reportupdater is called by cron [22:54:02] if not, please let us know [22:54:09] Cool. I'll work on the graph configuration now. Will do. [22:54:47] great :] [23:02:37] Yay! The new files are starting to appear. [23:03:59] :]