Gunfish

Push Provider Server with HTTP/2

Push Provider Server with HTTP/2

Yokohama\.(pm6?|go) #14

Takuya Yoshimura (takyoshi)

Takuya Yoshimura (takyoshi)

About me

  • Takuya Yoshimura
  • @takyoshi (moulin) (github, work)
  • @tkyshm (github, private)
  • KAYAC Inc.
  • Lobi, server side engineer

Lobi - Chat & Game Community

Push Notification

↓ Login same user accounts

APNS

Got my push !

1. A actions to B

4. APNs deliveries puth notofications theses devices

2. Retrieves all device

tokens binded B from DB

3. Send notifications to 

device tokens

A

B

iOS Push Notification on Lobi

iOS Push Notification on Lobi

  • iOS Push Notification using APNs
  • Legacy APNs Provider API
  • Delivery all devices binded user account in our application

Push Notification

↓ Login same user accounts

APNS

Got my push !

1. A actions to B

4. APNs deliveries puth notofications theses devices

2. Retrieves all device

tokens binded B from DB

3. Send notifications to 

device tokens

A

B

About APNs

  • Apple Push Notification service
  • The service to delivery Push Notification to iOS devices.
  • Developers can use the service with APNs Provider API

 

About APNs

  1. Sending Notification to APNs on Background
    • Network latency.
    • e.g. Lobi to APNs average response time is 0.15 sec
  2. Pooling TCP connections (with APNs)
    • APNs will treats with the Denial service of Attacks.
    • Not creates too many tcp connections
  3. Screening Invalid Device Tokens
    • Cannot delivery a part of notifications after the invalid token
    • Didn't delivery some notifications after APNs reponse EOF (on Lobi)
    • 2 pattern screening cases:
      • Error Response = delete imediately
      • Feedback service = delete periodically

Implementation Notification Provider Server

  • Implements Push notification with AnyEvent::APNS (Perl)
    • Persistent TCP connection
    • Receives HTTP POST by JSON array format
  • Delete Invalid Token as Error Response Imediatelly
  • Using Feedback service from cron
  • Retry to Send Push Notification due to Invalid Tokens
    • Stores sending histories

Legacy APNs

  • Due to Invalid tokens, cannot delivery some notification...
    • cannot send properly when to overflow queue size for retring 
  • Cannot know all invalid tokens perfectly
    • So, cannot screen all invalid tokens.
    • Accumulate invalid tokens into DB gradually.
    • These tokens cause EOF.

New APNs

New APNs

  1. HTTP/2
  2. Feedback service includes Error Response
  3. Easy to authenticate APNs
  4. Extends payload size to 4096 bytes


WWDC-2015 (8/26) 「What's New in Notifications」

HTTP/2

  • Delivery notifications after the Invalid token
    • HTTP/2 feature of multiple request and response
  • Growing performance
    • Using efficently one http client because that http/2 allows multiple request and response 

Without Feedback Service cron

  • New APNs replies many error types
  • New APNs replies the error response type as a previous Feedback service

Golang ...

introduced http/2 at Go 1.6.

Go 1.6 will be released at February !

Gunfish

Gunfish

  • Goal is to delivery all Push Notification
  • Notification Provider server created by Go1.6
    • Currently only supports new APNs Provider API
  • Designed to delivery by post JSON array on HTTP/1.1
    • for backword compatibility
  • Command hook and custom error handler when to catch error response
    • Feature to delete invalid tokens.
  • graceful restart
  • Enable to tune a performance

Gunfish Architechture

server

supervisor

worker

sender

APNS

Lobi

Gunfish

: chan

: goroutine

: write

: read

Worker & Sender

Worker

  • 1worker, 1 http/2 client
  • Sender receives APNs response asynchronously

Sender

  • 1 worker, N sender
  • requests APNs
  • give a APNs response own Worker
  • Sender number is max multiplexity of http/2 client

Starts Gunfish

$ gunfish -c /etc/gunfish/config.toml

config.toml example:

[provider]
port = 38003
worker_num = 4 # http/2 client number
queue_size = 2000 # queue length for posted requests
max_request_size = 1000 # max JSON array size of posted request
max_connections = 1000 # max connection size between developer application and gunfish


[apns]
cert_file = "/path/to/cert_file.pem"
key_file = "/path/to/key_file.pem"
request_per_sec = 2000 # flow rate of notification per sec
sender_num = 30 # http/2 client multiplexity
error_hook = "your_hook_cmd.sh" # error response hook command

Gunfish Push Notification

$ curl -X POST -H "Content-type: application/json" \ 
    -d '[{"token":"83fa0eb8a743c118fca80b0136bfee0", \
    "payload": {"aps": {"alert": "hoge", "sound": "default", "badge": 1}}}]' \
    http://localhost:38103/apns/push

{"result": "ok"}
// POST json array format
[
    {
        "token": "apns published device token",
        "payload": "APNs Payload"
    }
]

// Payload structure
{
    "aps": {
                "alert": {
                              "title": "foo",
                              "body": "bar"
                         },
                "sound": "default",
                "badge": 1
           },
    "option1": "some properties your application"
}

Custom Error Response Handler

// ResponseHandler interface
type ResponseHandler interface {
    OnResponse(*Request, *Response, error)
    HookCmd() string
}
  • Handling error response on golang layer
  • Implements ResponseHandler and set this Handler
InitErrorResponseHandler(YourCustomErrorHandler{})

Graceful Restart

  • Not terms application finished to send that queue contains data
    • Timeout to force shutdown is 2 min.
  • Server::Starter
$ start_server --port 38003 --interval 5 -- gunfish -c /etc/gunfish/config.toml
  • About Server::Starter

 

Go言語でGraceful Restartをする (How to graceful restart with golang)

http://shogo82148.github.io/blog/2015/05/03/golang-graceful-restart/

Go言語でGraceful Restartをするときに取りこぼしを少なくする

(Minimize missing when to graceful restart with golang)

http://shogo82148.github.io/blog/2015/11/23/golang-graceful-restart-2nd/​

Performance tunning

Problem (example):

  • Goal is to delivery all notifications at flow rate as 5000 notification/sec

 

Setting parameters:

  • request_per_sec = 5000
  • worker num
  • sender num

Performance tunning

  • h2o APNs Mock server ( mruby )

 

  • wrk command bench
    • 1 POST, 200 Notification

 

 

  • Starts gunfish for bench

 

 

$ h2o -c conf/h2o/h2o.conf
$ wrk2 -t2 -c20 -d10 -s bench/scripts/err_and_success.lua -L -R25 http://localhost:38103
$ gunfish -c /etc/gunfish/config.toml -E test 2> gunfish.log

Performance tunning

# result of wrk2

#[Mean    =       23.171, StdDeviation   =       22.064]
#[Max     =      160.384, Total count    =          240]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  242 requests in 10.01s, 31.43KB read
Requests/sec:     24.18
Transfer/sec:      3.14KB


# log

  38542 msg:"Succeeded to send a notification", type:worker   # success to send
   9067 msg:"Response queue is full.", type:sender                     # the message due to too many notifications
    968 msg:"Catch error response.", type:"http/2-client"           # receive error response 
    506 msg:MissingTopic, type:worker
    190 msg:BadDeviceToken, type:worker
     95 msg:Unregistered, type:worker
      8 msg:"Worker Queue size: 2083", type:
      8 msg:"Succeeded to establish new connection.", type:worker
      8 msg:"Response queue size: 2083", type:
      1 msg:, type:

242 request * 10 * 200 = 48400 (send)

38542 + 9067 + 506 + 190 + 95 = 48400

Release

Release

  • First, deployed only one production
  • Observed gunfish in a few days...
    • by zabbix
  • To observe 2 API status.
{
  "pid": 19184,
  "debug_port": 17889,
  "uptime": 14,
  "start_at": 1458276491,
  "su_at": 0,
  "period": 1,
  "retry_after": 10,
  "workers": 8,
  "senders": 400,
  "queue_size": 0,
  "retry_queue_size": 0,
  "workers_queue_size": 0,
  "cmdq_queue_size": 0,
  "retry_count": 0,
  "req_count": 0,
  "sent_count": 0,
  "err_count": 0
}
{
  "time": 1458276536967232300,
  "go_version": "go1.6",
  "go_os": "linux",
  "go_arch": "amd64",
  "cpu_num": 4,
  "goroutine_num": 424,
  "gomaxprocs": 4,
  "cgo_call_num": 5,
  "memory_alloc": 7276200,
  "memory_total_alloc": 9109144,
  "memory_sys": 13641976,
  "memory_lookups": 104,
  "memory_mallocs": 41548,
  "memory_frees": 24228,
  "memory_stack": 2228224,
  "heap_alloc": 7276200,
  "heap_sys": 8257536,
  "heap_idle": 352256,
  "heap_inuse": 7905280,
  "heap_released": 0,
  "heap_objects": 17320,
  "gc_next": 8999555,
  "gc_last": 1458276491942982000,
  "gc_num": 2,
  "gc_per_second": 0,
  "gc_pause_per_second": 0,
  "gc_pause": []
}

Memory Leak Problem

  1. HTTP/2 client cause memory leak (2016/2/2-2/3)
  2. Restart Gunfish per 30 min by cron as interim measures (2016/2/3)

Memory Leak Problem

  • Analized by pprof
$ go tool pprof gunfish http://localhost:2412/debug/pprof/profile

(pprof) > top
(pprof) > list
  • Invesitigates a problem and cause with pprof
  • checks that already issued official repository

Memory Leak Problem

net/http: http2 Transport retains Request.Body after request is complete, not GCed #14084

Memory Leak Problem

Solve memory leak problems, so deployed all productions

Summary

Summary

  • APNs Provider Server with Golang http/2 client
  • Gunfish design concept is 'delivering all notifications'
  • Catch up problems quickly when to use latest
  • I wanted to create bench as tunning is more easy
  • I want to support GCM in the future
Made with Slides.com