[00:48:34] Analytics, DNS, Operations, Traffic: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2205680 (MZMcBride) >>! In T132407#2198891, @Milimetric wrote: > There's some debate about this. We haven't used data.wikimedia.org in the past because of the possible confusion with w... [01:59:27] Analytics, Hovercards, Unplanned-Sprint-Work, Patch-For-Review, Reading-Web-Sprint-70-Lady-and-the-Trumps: Capture hovercards fetches as previews in analytics - https://phabricator.wikimedia.org/T129425#2205859 (dr0ptp4kt) >>! In T129425#2204276, @bmansurov wrote: > How should we treat the us... [04:08:58] Analytics, Hovercards, Unplanned-Sprint-Work, Patch-For-Review, Reading-Web-Sprint-70-Lady-and-the-Trumps: Capture hovercards fetches as previews in analytics - https://phabricator.wikimedia.org/T129425#2205924 (Nuria) >How should we treat the user's DoNotTrack setting in this context? I supp... [07:24:25] Analytics-Kanban, Operations, Traffic, Patch-For-Review: varnishkafka logrotate cronspam - https://phabricator.wikimedia.org/T129344#2206044 (elukey) Open>Resolved [07:25:07] \o/ [07:29:25] Bravo elukey :P) [08:19:47] Analytics, Hovercards, Unplanned-Sprint-Work, Patch-For-Review, Reading-Web-Sprint-70-Lady-and-the-Trumps: Capture hovercards fetches as previews in analytics - https://phabricator.wikimedia.org/T129425#2206161 (Jhernandez) [10:36:14] * elukey lunch! [10:48:38] (CR) Mforns: "LGTM" [analytics/dashiki] - https://gerrit.wikimedia.org/r/283336 (https://phabricator.wikimedia.org/T132579) (owner: Milimetric) [11:05:02] (CR) Mforns: Change float to Decimal in dynamic_pivot.py (1 comment) [analytics/reportupdater-queries] - https://gerrit.wikimedia.org/r/282728 (owner: Mforns) [11:05:15] (Abandoned) Mforns: Change float to Decimal in dynamic_pivot.py [analytics/reportupdater-queries] - https://gerrit.wikimedia.org/r/282728 (owner: Mforns) [11:20:43] (CR) Mforns: Fix bug in get_date_threshold (2 comments) [analytics/reportupdater] - https://gerrit.wikimedia.org/r/283101 (https://phabricator.wikimedia.org/T131849) (owner: Mforns) [11:26:31] Analytics, Operations, Traffic: cronspam from cpXXXX hosts related to varnishkafka non existent processes - https://phabricator.wikimedia.org/T132346#2206419 (elukey) a:elukey>None [11:37:14] joal: o/ [11:37:26] joal: any thoughts about https://phabricator.wikimedia.org/T76348#2205087 ? [11:49:46] Analytics, Operations, Traffic: cronspam from cpXXXX hosts related to varnishkafka non existent processes - https://phabricator.wikimedia.org/T132346#2206447 (BBlack) Should be fixed now. I did: ``` root@neodymium:~# salt -v -t 10 -L `cat exmobile` cmd.run 'rm -f /etc/logrotate.d/varnishkafka-*' ``... [11:57:22] elukey: Hi ! [11:57:45] elukey: what opinion are you asking for? Using hdfs as backup? [11:59:41] yeppp [12:00:23] elukey: 2Tb is reaso [12:00:28] sorry, again [12:00:45] elukey: 2TB is not big for hdfs, however copy will take time [12:01:57] elukey: depending on how organise the datasets are, we could paralellize copy, making a bit faster, but we'll fast reach stat1001 IO limit (disk and network) [12:02:57] yep yep.. [12:07:19] Analytics, Hovercards, Unplanned-Sprint-Work, Patch-For-Review, Reading-Web-Sprint-70-Lady-and-the-Trumps: Capture hovercards fetches as previews in analytics - https://phabricator.wikimedia.org/T129425#2206492 (bmansurov) a:bmansurov>None [13:07:36] Analytics, Operations, Traffic: cronspam from cpXXXX hosts related to varnishkafka non existent processes - https://phabricator.wikimedia.org/T132346#2206663 (elukey) Open>Resolved [13:31:24] mmmm is there any particular procedure to get ssh access to a analytics labs instance ? [13:31:49] I created hue.analytics.eqiad.wmflabs but don't managed to ssh into it [13:31:50] elukey: I think some project master has to add you as particiapnt [13:32:07] joal: theoretically I am an admin [13:32:09] oh, you created the instance ! You should have access then I think [13:32:34] * joal apologize to Admin [13:32:42] :-P [13:33:23] come ooooon don't be silly :) [13:34:19] super weird, I can access dan-druid.analytics.eqiad.wmflabs [13:34:41] elukey: maybe your instance is a bit lazy and takes time to boot ? [13:35:12] * joal imagines the labs instance saying 'Doppppo' [13:35:25] mmmm also can't access limn1.analytics.eqiad.wmflabs [13:35:26] sigh [13:36:00] and can access druid1.analytics.eqiad.wmflabs [13:36:10] mmmmmmmmmm [13:36:43] * elukey hopes that doing ottomata's 'mmmm' will solve the problem [13:39:09] elukey: I think ottomata has special 'mmmmm' skills :S [13:39:22] joal: I knoowwww [14:03:12] elukey: limn1 is broken completely [14:03:21] milimetric: o/ [14:03:23] you have to login to it as root@limn1.eqiad.wmflabs [14:04:18] dan-druid and druid are just kept around for config inspiration for the real druid, we can delete them after we stand up the real cluster [14:04:36] ah yeah I was just using them as try to see why I can't login [14:04:38] elukey: but limn1 is fragile, take care around there :) [14:04:56] hahahah no no I can't access even with root@ [14:05:03] can you guys access hue.analytics.eqiad.wmflabs by any chance? [14:05:27] elukey: I guess I'm the only one that got added to the login-as-root group [14:05:32] (it's a special permission) [14:06:03] mmm joal I took a look to the console output and maybe you are right, it is taking agest to come up [14:06:03] I get denied public key when i ssh hue.analytics.eqiad.wmflabs [14:06:44] thanks dan! [14:07:58] I have an idea for a metric: amount of access through the API. This is especially interesting for Wikidata, where I believe number of times an item is accessed through the API is more useful information than number of page views. Is this something the Analytics team is working on or will work on? [14:13:18] (CR) Milimetric: [C: 2 V: 2] Fix bug in get_date_threshold [analytics/reportupdater] - https://gerrit.wikimedia.org/r/283101 (https://phabricator.wikimedia.org/T131849) (owner: Mforns) [14:14:39] Analytics, Analytics-Cluster: Use MySQL as Hue data backend store - https://phabricator.wikimedia.org/T127990#2206923 (elukey) Source: http://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_ig_hue_database.html#concept_id1_wkj_zj_unique_1__section_pmx_plj_zj_unique_1 From what I can see we s... [14:19:16] Analytics: Clean up property passing in dashiki - https://phabricator.wikimedia.org/T132691#2206929 (Milimetric) [14:21:58] (PS1) Milimetric: Always use https on datasets.wikimedia.org [analytics/dashiki] - https://gerrit.wikimedia.org/r/283446 [14:22:10] (CR) Milimetric: [C: 2 V: 2] Always use https on datasets.wikimedia.org [analytics/dashiki] - https://gerrit.wikimedia.org/r/283446 (owner: Milimetric) [14:25:12] ottomata: helloo! [14:25:13] https://puppet-compiler.wmflabs.org/2442/oxygen.eqiad.wmnet/ [14:25:34] related to https://gerrit.wikimedia.org/r/#/c/283411/ [14:25:36] * elukey dances [14:25:48] Analytics, DNS, Operations, Traffic: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2206951 (Milimetric) the pageview API is running on wikimedia.org, that's the prod cluster. This task right now is about having a production domain for reports like this: https://brows... [14:28:32] Analytics, Analytics-Cluster: Use MySQL as Hue data backend store - https://phabricator.wikimedia.org/T127990#2206957 (Ottomata) Ja sounds right! The creation of users and schemas is mostly puppetized for Oozie and Hive. You might want to copy that model and try to make this work with Hue too, although... [14:28:41] Analytics, DNS, Operations, Traffic: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2206958 (Milimetric) > Potentially tangentially: I'm unclear what the distinction between Labs and production is when the Wikimedia Foundation is running/operating both. Are there other... [14:28:48] elukey: cool! [14:29:01] I haven't looked at that patch yet, but consider +1 given, proceed! :) I'm sure it is fine [14:29:46] Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2206968 (Krenair) [14:31:36] ottomata: I'll wait, the main point was the the puppet compiler now works with submodules :) [14:31:43] OHHHHHHHH [14:31:45] OHHHH [14:31:53] awesome! [14:32:01] nice [14:32:52] now I need to test other use cases but this one seems working :) [14:33:34] Analytics, DNS, Operations, Traffic: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2206983 (BBlack) >>! In T132407#2205680, @MZMcBride wrote: > Potentially tangentially: I'm unclear what the distinction between Labs and production is when the Wikimedia Foundation is r... [14:43:48] Analytics, Hovercards, Unplanned-Sprint-Work, Reading-Web-Sprint-70-Lady-and-the-Trumps: Capture hovercards fetches as previews in analytics - https://phabricator.wikimedia.org/T129425#2207022 (Jdlrobson) [14:43:56] Analytics, Hovercards, Unplanned-Sprint-Work, Reading-Web-Sprint-70-Lady-and-the-Trumps: Capture hovercards fetches as previews in analytics - https://phabricator.wikimedia.org/T129425#2105269 (Jdlrobson) @dr0ptp4kt over to you for sign off. [14:49:56] joal, milimetric - deleted the hue instance, restarted a new one huetest.analytics.eqiad.wmflabs and I can ssh -.- [14:53:12] elukey: weird:( [14:58:44] joal:can you hear on bluejeans? [14:59:26] I do [14:59:47] I had my usual issue, but solved now [15:01:35] Analytics: geowiki data for Global Innovation Index - https://phabricator.wikimedia.org/T131889#2181884 (Rafaesrey) The data used in previous editions of the GII focused on measuring Wikipedia Page Edits per Country. As per the definition we showed in our technical notes, the count of monthly page edits data... [15:08:09] joal: 17:05 PROBLEM - aqs endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:08:12] :( [15:08:18] again a lot of timeouts [15:08:39] elukey: request rate .... [15:09:27] I am trying to see if it is from a single source IP [15:09:34] it might be the same one hitting us daily [15:09:54] ah yes it seems so [15:16:01] so there is a single IP issuing requests like /analytics.wikimedia.org/v1/pageviews/per-article/en.wikipedia/all-access/user/Topps_Tiles/daily/2015070100/2016041400 [15:22:11] ok I need coffee, the IP is clearly a sort of Bot.. [15:24:53] I am wondering if we have any throttling in place, or if we can set it at restbase level [15:37:57] hey folks, curious – looking at Nuria’s slides, piwik reports unique visits, does that mean it uses persistent tokens? [15:44:18] DarTar: it uses cookies yes: http://piwik.org/faq/general/#faq_43 [15:45:05] oops [15:45:07] :D [15:45:14] DarTar: that limits its usage to certain sites like wikipedia15 or transparency report [15:45:27] got it [15:45:34] or internal tool [15:45:37] tools [15:46:15] makes sense, I anticipate people may be asking this question, in the context of our commitment not to use tokens [15:52:21] sure, these are used in static sites, very different context [16:09:31] milimetric: I have some questions about https://phabricator.wikimedia.org/T132407 [16:09:42] I know we discussed it but I lost some pieces :( [16:09:56] still finishing up the quarterly review [16:10:01] I'll follow up in a bit [16:10:04] yep yep sure [16:10:14] (I just dropped) [16:14:44] Analytics, DNS, Operations, Traffic: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2207372 (Nuria) [16:15:13] Analytics-Cluster, Operations, hardware-requests, netops: setup/deploy server analytics1003/WMF4541 - https://phabricator.wikimedia.org/T130840#2207374 (faidon) OK, I debugged this some more, and this was caused by a Juniper bug and one that I vaguelly recall experiencing before. The issue was th... [16:19:13] a-team: I will be attending the next session of quaterly reviews [16:19:59] ok, elukey shoot [16:20:14] batcave? [16:21:04] nuria_: can you add me to the invite? [16:21:21] madhuvishy: o/ [16:22:03] elukey: \o [16:22:16] elukey: omw to the batcave but a-team are we cancelling tasking then? [16:22:26] I was going to ask that [16:23:00] or move tasking to one hour later [16:24:15] retro is after tasking [16:24:19] and I have other conflicts [16:24:29] cancel I think [16:24:43] ok, cancel [16:24:47] eeeee... I'm going to abstain since I'm going on vacation and to me tasking isn't all that important ;) [16:24:53] :D [16:24:53] as nuria_ would say: do we have work ?? [16:24:56] i do [16:25:22] I do for a couple days [16:26:03] good job folks [16:27:48] uhhh - elukey do you have invite to the ops quarterly review? add me if you can :) [16:32:04] Analytics-Cluster, Operations, hardware-requests, netops: setup/deploy server analytics1003/WMF4541 - https://phabricator.wikimedia.org/T130840#2207399 (faidon) The network config was reverted and it still works, so it should be final now. However, the installer failed at the last step with: ```... [16:40:27] Analytics, DNS, Operations, Traffic: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2207408 (Milimetric) @BBlack: just in case there's some concern about what the purpose of analytics.wikimedia.org is. We will never use it to proxy to services / dashboards on labs. W... [16:42:04] I have an idea for a metric: amount of access through the API. This is especially interesting for Wikidata, where I believe number of times an item is accessed through the API is more useful information than number of page views. Is this something the Analytics team is working on or will work on? [16:42:22] (re-posting for the people who since joined) [16:42:30] let's cancel tasking. I think is more important for us to learn about what other teams in our org are doing/have done [16:43:08] cc a-team [16:44:22] harej: Analytics team does not have any plans to calculate such a metric, some api metrics have been calculated recently by api team (cc bd808 ) but I do not think we are expanding those. [16:44:49] which api metrics have been calculated? [16:45:09] harej: https://wikitech.wikimedia.org/wiki/Analytics/Data/ApiAction [16:46:11] hmm, that's for api.php, but wikidata has different API endpoints I believe [16:46:19] plus there's the query service; are there metrics for those? [16:47:31] Analytics, DNS, Operations, Traffic: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2207429 (BBlack) @Milimetric - I guess what I'm missing here is the disconnect between our public termination of analytics.wikimedia.org (on, say, cache_misc) and what "a subfolder on a... [16:54:25] (CR) Nuria: "I think this will break usage in our desktops as CORS cannot go across http/https" [analytics/dashiki] - https://gerrit.wikimedia.org/r/283446 (owner: Milimetric) [16:54:40] Analytics, DNS, Operations, Traffic: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2207462 (BBlack) (for services small enough to not need a cluster of their own hardware, I think we do have solutions where we virtualize smaller services on ganeti, too. The above is... [16:54:43] milimetric: did you tested this in your desktop? https://gerrit.wikimedia.org/r/#/c/283446/ [16:55:44] yeah, I thought so, maybe I didn't clear cache [16:55:59] But I thought the deploy of the https change was imminent [16:56:46] The // route would work fine for all our reports anyway because they're all hosted on https. Maybe we should just leave it alone [16:58:07] in any case – if I found you all a "volunteer" whom I could successfully steer through the NDA process to work on API metrics (namely Wikidata-related metrics), would you be interested? [17:06:44] harej: We have not had any requests for those metrics but i imagine that those would go to api team so I think you should first contact them to make sure metrics are of value. cc bd808 [17:07:48] harej: we have a raw data stream now! [17:08:07] I've started working on some aggregate tables based on that. [17:08:24] a raw data stream... that anyone can access? does that include access via special:entitydata? or however else machines use wikidata? [17:08:26] bd808: I pointed harej to https://wikitech.wikimedia.org/wiki/Analytics/Data/ApiAction [17:09:27] harej: so we have that page that nuria_ linked. It's raw api.php request data (similare to wmf_raw.pageviews) [17:09:57] it would log all of the wikidata api actions with full request details [17:10:13] no exposure outside of the hadoop cluster yet [17:10:42] that will be what the aggregate tables are used for at some point in the future. [17:11:09] The epic task is T102079 [17:11:19] T102079: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079 [17:12:13] I need to finish up T132283 so I can turn the pipelines that I'm running via cron into real jobs [17:12:14] T132283: Determine which Action API parameters to whitelist/blacklist for action_param_hourly aggregate table - https://phabricator.wikimedia.org/T132283 [17:13:10] harej: addshore is interested in using the raw data to compile event volume data and push it into graphite [17:13:28] *waves* [17:14:00] bd808: can you remind me what the thing is that runs stuff after data is present in the analytics world? ;) (So I can do some reading) [17:14:18] we have tried before to populate graphite metrics for Action API requests in real-time from MediaWiki, but the volume crashes the aggregators every time [17:14:58] addshore: oozie is the "cron" [17:15:05] thats the one! :) [17:15:25] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Oozie [17:16:40] bd808: [17:16:41] Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts). [17:16:58] looks like we should be able to get it to spit stuff into graphite hourly for example then [17:17:11] thats pretty awesome :D [17:17:21] addshore: bd808 ja we do that for some job already...i think its a spark job scheduled by oozie [17:17:39] addshore: yeah we send metrics to graphite from spark [17:17:48] madhuvishy: awesome! :) [17:17:53] oozie schedules the job to run hourly [17:17:58] where is the code for that for me to have a peek at? :) [17:18:02] so it's not real time [17:18:22] addshore: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/RESTBaseMetrics.scala [17:19:09] addshore: would what you're working on be able to tell me about use of wikidata's data accessed by means other than visiting the website? [17:19:13] addshore: thing to note is you have to send timestamp with the data point for these jobs - since they are batch jobs and you have no idea when they'll be run/rerun etc - which means you can't use statsd [17:19:41] harej: well, you should perhaps talk to lydia about that (thats not exactly what im working on) [17:19:49] since statsd is for realtime. this job sends raw data to graphite - which means we lose some of the out of the box aggregations [17:20:09] madhuvishy: yup! I do a similar thing in some other terrible places pulling data out of random piles of stuff for wmde! [17:20:19] cool [17:20:57] madhuvishy: what periods does that RESTBase thing split the data into? does it only report hourly or run hourly but report minutely numbers? [17:20:58] addshore: https://github.com/wikimedia/analytics-refinery/tree/master/oozie/restbase has the oozie scheduling parts [17:21:31] addshore: no it sends it hourly [17:22:33] addshore: all it does is count requests to /rest/v1 or something like that in the hour and emit the count [17:23:08] madhuvishy: what periods does that RESTBase thing split the data into? does it only report hourly or run hourly but report minutely numbers? [17:23:35] i got that message twice :/ [17:24:02] not sure if you got my reply - so [17:24:07] https://www.irccloud.com/pastebin/jyiuzgia/ [17:24:18] It looks like it just emits a single metric of "varnish_requests" per hour [17:24:44] for all hits to /api/rest_v1* [17:25:14] cool [17:25:27] so bd808 I guess we could do exactly the same for api actions? [17:25:28] addshore: you could group by minute in an hour and emit multipl emetrics though [17:25:41] ottomata: indeed [17:26:32] addshore: yeah. an hourly rate bucketed by the action param would be awesome. Bonus points for counting errors and successes separately [17:26:48] this will be easier after I get the hourly aggregate tables built [17:27:13] yeh, I'll hold off for now until they exist ;) [17:27:15] there will be a table that aggregates (wiki, action, ip_origin) per hour [17:27:32] I probably should add error state to that actually [17:28:09] how about format? ;) [17:28:33] format turns out to be not very interesting at the hourly resolution [17:28:41] irc kicked me out [17:28:52] it's mostly interesting for large time windows [17:29:19] bd808: yes wikidata format data is rather dull https://grafana.wikimedia.org/dashboard/db/wikidata-api?panelId=2&fullscreen [17:30:32] I'd love to see that format=php number drop to 0 [17:30:48] yeh :P [17:31:31] *goes to look at the ticket about opentsdb* [17:31:40] addshore: I've been thinking that format is really only interesting in one-off reports. Do you feel differently? [17:32:14] one off reports would probably be fine [17:32:47] the only reason the wikidata formats thing is daily is because the /infrastructure/ for it was already there and it was thus a few lines of code and not much data [17:32:58] a-team: I am going to log-off in a bit, just remembering you that tomorrow I'll be off but I'll read my emails in the EU evening [17:32:59] *nod* [17:33:09] Bye elukey ! [17:33:25] bye elukey :) Congrats on the varnish4 stuff! [17:34:04] :) [17:34:19] bd808: it would be cool to have opentsdb on the analytics cluster (backed by hadoop) so we could more easily spit out more numbers without worrying about breaking graphite of course ;) [17:35:03] addshore: I'm sure that patches are welcome, but you'll also get stuck maintaining forever [17:35:03] that would be cCOOOl [17:35:12] * bd808 looks grumpily at logstash [17:35:21] or bigger disks on graphite ;) [17:35:23] http://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&c=Miscellaneous+eqiad&h=graphite1001.eqiad.wmnet&jr=&js=&event=hide&ts=0&v=138.220&m=disk_free&vl=GB&ti=Disk+Space+Available [17:35:34] elukey: you missed mark's public kudos to your work [17:35:57] ottomata: super cool ;) [17:36:10] addshore: https://www.mediawiki.org/wiki/User:BDavis_%28WMF%29/Projects/Logging,_metrics_and_monitoring [17:36:18] addshore: but why do you need to publish every single data point to make sense of data though? [17:36:40] addshore: is that a real need? (sometimes it is but more often than not is not) [17:37:48] with reasonably granular time series data we can correlate changes with MediaWiki config and version changes [17:37:50] nuria_: of course you don't need to publish every single data point. [17:37:57] which allows for a form of health monitoring [17:38:28] bd808: indeed, which is why we have this one for wikidata :P https://grafana.wikimedia.org/dashboard/db/wikibase-api-error-rate [17:39:02] yeah. I think Brad would like to have something similar for the whole api.php event stream [17:39:14] must dash for food! [17:39:15] we'll get there [18:07:08] ottomata, meet time? [18:07:16] batcave [18:11:56] OH [18:11:56] sorry [18:11:57] yes [19:01:00] logging off a-team, tomorrow ! [19:01:11] are we having retrospective? [19:01:26] see you joal ! [19:01:31] a-team: no retro [19:01:35] ok ok [19:01:47] a-team: as it is toolate for europe and luca and joal have signed off [19:02:04] ok thanks! [19:02:07] a-team: i think we had enough meetings today with quaterlies [19:02:15] :] [19:02:40] yeah, we need to figure out where to stick the retro, maybe tomorrow post standup [19:03:00] (we can figure out tomorrow not have the retro) [19:05:32] milimetric: tomorrow post standup sounds good [19:05:38] milimetric: retros are real important [19:05:55] milimetric: it is just that today we have had 3 hours of meetings already [19:10:40] oh I wasn't complaining, I agree. Retros are important but we can skip one here and there. [19:13:30] k, iam happy to reschedule for tomorrow [20:11:23] ottomata, yt? [20:12:18] Analytics-Cluster, Analytics-Kanban: Experiment with new Kafka versions and verify that they work with existing clients - https://phabricator.wikimedia.org/T132595#2208212 (Ottomata) Cooool! Tested in labs with a half upgraded cluster. It works as expected for both eventlogging and varnishkafka. The n... [20:12:29] mforns: hiya [20:13:01] hey ottomata, is eventlogging beta still receiving the events at bits.beta.wmflabs.org/event.gif? [20:13:37] hm, not certain mforns, but i think it should be [20:13:47] maybe the url has changed? not sure what bits in beta points to [20:13:48] I'm executing the eventlogging-load-test pointing to there and I don't see activity [20:14:33] looking [20:14:39] ottomata, anyway I'll try to execute the processor reading from a file [20:15:22] ottomata, don't worry it will be something ignored from my side, I can try the file [20:16:50] ook [20:16:51] mforns: let us know if it doesn't work, it should still receive events [20:17:06] nuria_, ok [20:17:34] Analytics, Analytics-Cluster: Debianize Kafka 0.9 - https://phabricator.wikimedia.org/T132631#2208227 (Ottomata) Ok, after some discussion on the ops list, it seems the best option is to use the confluent provided deb package instead of continuing to maintain our own. Changing ticket accordingly. [20:17:37] nuria_, it does receive events (it received a couple today), but not from my test right now. [20:18:08] mforns: are you sure the couple are not server side..or wait, no [20:18:14] now server side does varnish too [20:18:23] that's why I think something's wrong on my side [20:18:34] let's see what beta points to [20:18:41] no, I'm using the eventlogging-load-test script that generates client side events [20:19:02] I'm using bits.beta.wmflabs.org [20:19:09] Analytics-Cluster, Analytics-Kanban: Puppetize and make useable confluent kafka packages - https://phabricator.wikimedia.org/T132631#2208242 (Ottomata) [20:19:34] mforns: can you run that script? [20:19:39] sure [20:19:51] ottomata, sending events.. [20:20:09] varnish is def getting them [20:20:11] don't see data in kafka [20:20:13] checking [20:20:20] and they should be failing validation [20:20:47] but I don't see the corresponding processor logs [20:20:50] ah misocnfiguragion, i think i know why [20:21:30] mforns: http://en.wikipedia.beta.wmflabs.org/beacon/event [20:21:31] the last log in the log file dates from: 2016-04-14 08:33:11 [20:22:04] events are sent to this url [20:23:00] nuria_, so varnish does not forward events to bits.beta.wmflabs.org any more? [20:23:16] the varnish url must have chnaged [20:23:22] *changed [20:23:35] the one i sent you is hitting varnish on beta [20:23:40] I see [20:23:44] mforns: bits will work [20:23:45] fine [20:24:08] ah [20:24:21] ah ok, i just see events from n.m.wikipedia.beta.wmflabs.org sent to teh other one [20:24:25] *the [20:24:35] we merged a change to the way kafka configs are looked up...i tested this in beta before buuuuut i think there was some intermediate change tha broke things [20:24:39] now, i am not sure if they are making it on the eventlogging box [20:24:42] and, i hadn't rebased the beta ppuppetmaster properly [20:24:52] nuria_: it doesn't matter [20:24:58] bits maps to text, and all those are text too [20:25:03] they are all served by the same varnish instance [20:25:04] k [20:25:14] deployment-cache-text04 [20:25:25] to check to see if mforns' requsts were getting there [20:25:27] i logged into that, [20:25:29] and did [20:25:41] varnishncsa -n frontend -m 'RxURL:^/(beacon/)?event(\.gif)?\?' [20:25:42] thanks ottomata [20:25:54] those are the same varnish args provided to the eventlogging varnishkafka instance [20:25:56] but will print on stdout [20:26:33] and .. were tehy getting there? [20:26:42] i do not see them on logs on /srv/ [20:26:59] Analytics-Cluster: Hadoop Yarn Nodemanagers occasionally die with OOM / GC errors - https://phabricator.wikimedia.org/T102954#2208264 (Ottomata) [20:27:23] ^ ottomata [20:27:26] Analytics-Cluster: Hadoop Yarn Nodemanagers occasionally die with OOM / GC errors - https://phabricator.wikimedia.org/T102954#1378495 (Ottomata) Removing this from analytics-kanban. We upped the JVM heap for these processes to 2048m. Will have to wait and see if this happens again. [20:28:03] mforns: try it now [20:28:07] ottomata, ok [20:28:17] yes! [20:28:19] thar they go [20:28:20] :) [20:28:21] \o/ [20:28:26] thanks! [20:28:48] but wait.. what did you do? run puppet? [20:29:01] i had cherry picked an earlier change on beta puppetmaster [20:29:12] that i think was keeping the newly merged one rebased in [20:29:18] so i just un cherry picked it via rebase -i [20:29:20] and ran puppet [20:29:43] k [20:29:50] that i think was keeping the newly merged one from being pulled in* [20:30:02] I see [20:30:06] ja was my fault, that ishow it should usually work [20:30:18] we just broke it in beta puppetmaster [20:30:27] and forgot to fix after we merged the final fix [20:31:00] thank you puppet masterrrrr [20:31:48] thanks all :] [20:44:42] ottomata, :] I'm working on https://phabricator.wikimedia.org/T124799 but cannot reproduce the 404 error. [20:44:57] When I try to send an event with non-existent schema name, it responds 400 and says: [20:45:01] https://meta.wikimedia.org/w/api.php?action=jsonschema&title=Edit&revid=13457859&formatversion=2 [20:45:21] when I try to send an event with an non-existing revision id, it also responds 400 and says: [20:45:26] https://meta.wikimedia.org/w/api.php?action=jsonschema&title=Edit&revid=469803370&formatversion=2 [20:46:01] it seems something changed since you created that task, that makes non-existing schemas not fail with a 404 [20:46:19] anyway, 400s do not break EL, they just log a validation error [20:46:40] hm ja that sounds right mforns... [20:46:53] oh jan 26t...? [20:46:55] hm [20:48:36] do you remember how it was breaking? [20:49:36] mforns: no i don't [20:49:44] what happens if nothing answers at the addy? [20:49:58] like, if the url was nowaynotme.wikimedia.org [20:50:00] instead of meta? [20:50:03] afk for a bit... [20:50:10] ottomata, ok I'll test that [21:16:21] milimetric: added comments to code in https://gerrit.wikimedia.org/r/#/c/278395/ can remove those if they are too obvious [21:16:39] milimetric: all test passing, let me know if that is not teh case for you [21:16:42] *the [21:18:06] looks good to me, the https link works fine [21:18:24] oh wait, you're talking about this patch [21:18:26] sorry, I'll review [21:18:48] I rebased, just to be safe [21:23:06] nuria_: one of the tests fails still, it's a race condition it looks like, don't worry about it for this patch [21:23:37] but nuria_, the dash patterns are still not matching between the left side and the graph, and they change when different metrics are toggled off