Elasticsearch 笔记

Monday, September 26, 2016

最近把项目的 ES 升级到了 2.4.0, 并优化了一下搜索结果, 做个笔记。


如果没有安装 JDK,请安装 JDK

sudo add-apt-repository ppa:webupd8team/java -y
sudo apt-get update
sudo apt-get install oracle-java8-installer

安装 ES

aria2c https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.4.0/elasticsearch-2.4.0.tar.gz
tar zxvf elasticsearch-2.4.0.tar.gz
sudo mv elasticsearch-2.4.0 /usr/local/
# 修改一下相关配置 (path.data, path.logs等)
vim /usr/local/elasticsearch-2.4.0/config/elasticsearch.yml

用 supervisor 来管理 ES, 创建配置文件 /etc/supervisor/conf.d/es.conf


reload supervisor config and check status

sudo supervisorctl reload
sudo supervisorctl status


$ http localhost:9200
HTTP/1.1 200 OK
Content-Length: 311
Content-Type: application/json; charset=UTF-8

    "cluster_name": "es001",
    "name": "Weapon X",
    "tagline": "You Know, for Search",
    "version": {
        "build_hash": "ce9f0c7394dee074091dd1bc4e9469251181fc55",
        "build_snapshot": false,
        "build_timestamp": "2016-08-29T09:14:17Z",
        "lucene_version": "5.5.2",
        "number": "2.4.0"





  1. 前往下载页面 https://github.com/medcl/elasticsearch-analysis-ik/releases 下载编译好的 zip 包,
  2. 解压到 /usr/local/elasticsearch-2.4.0/plugins/ik/,
  3. 重启 ES sudo supervisorctl restart elasticsearch


$ http localhost:9200/posts/_analyze analyzer=ik_smart text=上海大学生

HTTP/1.1 200 OK
Content-Length: 177
Content-Type: application/json; charset=UTF-8

    "tokens": [
            "end_offset": 2,
            "position": 0,
            "start_offset": 0,
            "token": "上海",
            "type": "CN_WORD"
            "end_offset": 5,
            "position": 1,
            "start_offset": 2,
            "token": "大学生",
            "type": "CN_WORD"

Rails 项目中配置 ik 分词


# elasticsearch
gem "elasticsearch", git: "git://github.com/elasticsearch/elasticsearch-ruby.git"
gem "elasticsearch-model", git: "git://github.com/elasticsearch/elasticsearch-rails.git"
gem "elasticsearch-rails", git: "git://github.com/elasticsearch/elasticsearch-rails.git"

配置 model , 在 CURD 的时候同步到 ES

class Post < ActiveRecord::Base
  # index_name "posts-#{Rails.env}"
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks
  # ...



  • 创建 Rakefile, vi {Rails.root}/lib/tasks/elasticsearch.rake
require 'elasticsearch/rails/tasks/import'
  • 对某个字段用上 ik 分词器, 在 model 中配置
settings index: { number_of_shards: 1, number_of_replicas: 0 } do
  mapping do
   indexes :title, type: 'string', analyzer: 'ik_smart'
   indexes :body, type: 'string', analyzer: 'ik_smart'
  • 重新导入
bundle exec rake environment elasticsearch:import:model CLASS='Post' BATCH=500  FORCE=y
  • 检查 mapping 是否更新, http localhost:9200/subjects/_mapping

  • done.

使用 function_score 来优化搜索结果

2.4 中这个 feature 真是太赞了, 我们可以写一段 groovy 脚本来修改 score 达到优化搜索结果,比如把最新的高分的匹配排在前面。

def self.search(query)
    {query: {
      function_score: {
        functions: [
          { script_score: {
            script: "_score * log10(max(doc['like_count'].value, 1)) + ((doc['created_at'].value/1000) - 1412092800)/86400.0"
        query: {
          match: { title: query }
        boost_mode: "replace"

[TIL] Concert, Let’s Encrypt 证书生成工具

Wednesday, September 7, 2016

setup go enviroment

[email protected]:~# aria2c https://storage.googleapis.com/golang/go1.7.linux-amd64.tar.gz
[email protected]:~# tar zxvf go1.7.linux-amd64.tar.gz
[email protected]:~# mv go /usr/local

[email protected]:~# vim .bashrc

    export GOPATH=~/gocode
    export PATH=$GOPTH/bin:/usr/local/go/bin:$PATH

[email protected]:~# source .bashrc

# test env
[email protected]:~# mkdir -p gocode/src/hello
[email protected]:~# vi gocode/src/hello/hello.go

    package main

    import "fmt"

    func main() {
        fmt.Printf("hello, world\n")

[email protected]:~# go install hello
[email protected]:~# $GOPATH/bin/hello
hello, world

install concert

[email protected]:~# go get -u github.com/minio/concert

concert usage


[email protected]:~# sudo $GOPATH/bin/concert gen [email protected] lxneng.com
2016/09/07 11:14:51 Generated certificates for lxneng.com under certs will expire in 89 days.


[email protected]:~# sudo $GOPATH/bin/concert renew [email protected]
2016/09/07 11:16:52 Keys have not expired yet, please renew in 89 days.

auto renew once in every 45 days.

[email protected]:~# sudo $GOPATH/bin/concert server [email protected] lxneng.com
2016/09/07 11:18:06 Starting timer thread waiting for 45

config nginx

upstream lxneng {
    listen 443 ssl;
    ssl_certificate /root/certs/public.crt;
    ssl_certificate_key /root/certs/private.key;

    server_name lxneng.com;
    location / {
        root /var/www/lxneng.com/src/lxneng/static;
        try_files $uri $uri @wsgiapp;

    location @wsgiapp {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_pass   http://lxneng;
server {
    listen       80;
    server_name  www.lxneng.com lxneng.com;
    rewrite ^ https://lxneng.com$request_uri? permanent;


[TIL] Cloudera Manager 监控数据的存储

Wednesday, September 7, 2016

cm 监控数据默认是存储在 /var/lib/ 目录下的,为了避免系统盘空间不够的问题,可以修改 cm 的监控数据配置

Service Monitor 数据存储的配置

Service Monitor存储了时间序列和健康数据,Impla查询的元数据,Yarn应用的元数据。默认情况下,数据时存储在 /var/lib/cloudera-service-monitor/ 目录下,你也可以修改Service Monitor Storage Directory 配置 firehose.storage.base.directory

Host Monitor 数据存储的配置

Host Monitor存储了时间序列和健康数据。默认情况下,数据存储在 /var/lib/cloudera-host-monitor/ 目录下,你也可以修改 Host Monitor Storage Directory 配置。


[TIR] HyperLogLogs in Redis

Wednesday, August 17, 2016

A hyper-what-now?

A HyperLogLog is a probabilistic data structure used to count unique values — or as it’s referred to in mathematics: calculating the cardinality of a set.

These values can be anything: for example, IP addresses for the visitors of a website, search terms, or email addresses.

Counting unique values with exact precision requires an amount of memory proportional to the number of unique values. The reason for this is that there is no way of determining if a value has already been seen other than by comparing it to the previously seen values.

Since memory is a limited resource, doing this becomes problematic when working with large sets of values.


[TIL] 在 Hive 中把带分区的文本格式的表转换成 ORC 格式

Tuesday, August 9, 2016

在我们 Data Pipeline 中有一个步骤我们需要对带分区的文本格式的表转换成 ORC 格式并进行
SNAPPY 压缩,放到 airflow 中 T+1 处理.

比如我们有一张 access_log_txt 外部表

CREATE EXTERNAL TABLE access_log_txt (
time string,app_id string,app_version string, ...more fields)
PARTITIONED BY (dt string)

有一张 access_log_orc 的表

CREATE TABLE access_log_orc (
time string,app_id string,app_version string, ...more fields)
PARTITIONED BY (dt string)
STORED AS ORC tblproperties ("orc.compress" = "SNAPPY");

如果数据表未分区,直接 insert into xxx select * from yyy

insert into access_log_orc select * from access_log_txt where foo=bar;

但是有分区的时候, 分区字段会包含在 select * from yyy 中,

hive> insert into access_log_orc PARTITION(dt='2016-08-09') select * from
access_log_txt where dt='2016-08-09';
FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into target
table because column number/types are different ''2016-08-09'': Table
insclause-0 has 62 columns, but query has 63 columns.

指定 select 字段列表的话,字段太多太累了,找到一种把分区字段从结果集排除的方法

hive > set hive.support.quoted.identifiers=none;
hive > insert into access_log_orc PARTITION(dt='2016-08-09') select `(dt)?+.+` from
access_log_txt where dt='2016-08-09';

Python Web开发中常用的第三方库

Thursday, November 28, 2013

Python Web开发中常用的第三方库


经常有朋友问,如果用Python来做Web开发,该选用什么框架?用Pyramid开发Web该选用怎样的组合等问题?在这里我将介绍一些Python Web开发中常用的第三方库。基本适用于Django以外的Web框架(Pyramid, Flask, Tornado, Web.py, Bottle等).


  • SQLAlchemy, 在ORM方面,首选SQLAlchemy,没有之一!
    支持SQLite, PostgreSQL, MySQL, Oracle, MS-SQL, Firebird, Sybase等主流关系数据库系统
    主要的特性请移步 Key Features of SQLAlchemy

  • MongoEngine, 如果你用MongoDB,推荐MongoEngine.

Template Engine

在模板引擎方便选择也是比较多, 有ChameleonJinja2Mako等可供选择,用过ChameleonJinja2,性能都非常好.

Form Engine

Cache Engine & Session Store

  • Beaker 缓存和Session管理首选Beaker, 没有之一! 可以搭配文件、dbm、memcached、内存、数据库、NoSQL等作为存储后端. 如果你用Pyramid作为Web框架,那么可以直接使用pyramid_beaker.




  • Celery (芹菜)一个分布式异步任务队列, 很强大!
  • RQ 这是一个轻量级的任务队列,基于Redis, 可以尝试一下。



  • Fabric, 可以通过它完成自动化部署和常规的运维等工作。《Fabric-让部署变得简单》_PPT
  • Supervisor 一个强大的进程管理工具,用来管理各种服务(比如Gunicorn、Celery等),服务挂掉时 Supervisor 会帮自动重启服务。


  • Tablib,这个挺好用,支持导出Excel, JSON, YAML, HTML, TSV, CSV格式数据, 我创建了一个Pyramid插件可以集成到Pyramid项目中使用 pyramid_tablib
  • 导出PDF有reportlabPyPDF2


  • velruse, 支持各大网站的身份验证, 国内部分我已经加入了WeiboDoubanQQTaobaoRenren,并merge到主版本库中。欢迎使用!


To Be Continued...

升级PostgreSQL 9.2 -> 9.3

Thursday, November 14, 2013

PostgreSQL发布9.3了, brew upgrade postgresql 升级到9.3, 竟然启动不起来, 查看日志发现原来9.2的数据格式不兼容,需要迁移一下数据, 碰到这个问题的同学可以看一下 :-)

错误日志, 数据不兼容

/usr/local(master ✔) tail -f /usr/local/var/postgres/server.log
FATAL:  database files are incompatible with server
DETAIL:  The data directory was initialized by PostgreSQL version 9.2, which is not compatible with this version 9.3.1.


PostgreSQL提供了一个升级迁移脚本 pg_upgrade, 用来迁移数据

pg_upgrade -b oldbindir -B newbindir -d olddatadir -D newdatadir [option...]

1. 新建一个PostgreSQL9.3的数据目录

/usr/local/var(master ✔) mv postgres postgres9.2
/usr/local/var(master ✔) initdb /usr/local/var/postgres -E utf8

2. 迁移数据到新目录中

/usr/local/var(master ✔) pg_upgrade \
-b /usr/local/Cellar/postgresql/9.2.4/bin/ \
-B /usr/local/Cellar/postgresql/9.3.1/bin/ \
-d /usr/local/var/postgres9.2 \
-D /usr/local/var/postgres \


Creating script to analyze new cluster                      ok
Creating script to delete old cluster                       ok

Upgrade Complete
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:

Running this script will delete the old cluster's data files:

3. 启动PostgreSQL9.3


/usr/local/var(master ✔) run_postgresql
server starting
/usr/local/var(master ✔) psql postgres
psql (9.3.1)
Type "help" for help.

postgres=# \l

4. 删除老版本和数据


/usr/local/var(master ✔) rm -rf analyze_new_cluster.sh delete_old_cluster.sh postgres9.2


brew cleanup postgresql



Monday, November 4, 2013


Day 1

上午是来自Twitter,LinkedIn,Github等大公司的四场英文主题演讲,演讲内容也比较泛,英文不好也太听明白,借了个同声传译的耳机,翻译质量也很一般很多术语翻得不准,听得费劲, 后来就听原声了。

  • 第一场是有来自Twitter的Raffi做的《Twitter面向服务的架构之路》, 介绍了Twitter这样一个高速变革高速发展的系统中,维持高并发而采取的一系列解决方案,以及管理系统复杂性所采取的一些设计理念. 其中讲到他们的RPC框架Finagle,有高并发需求的同学可以研究一下, 他们的Timeline cache也是用Redis在做。

  • 第二个主题演讲是来自Linkedin, 数据产品化,作为一个全球最大的职业类SNS,介绍他们如何通过数据进行产品化的思路,并展示了一些相关算法模型。 数据沉淀到一定规模后其实都应该考虑数据产品话,推荐系统是提高转化率的一个好方式。

  • 第三个主题演讲是来自Github的分享: 干掉产品经理, 大多数公司都会设置一个产品主管或者一堆产品经理来决定产品要有哪些功能特性, 但是,有一些企业正在抛开产品经理, 完全让开发者来决定应该实现哪些功能.
    当然Github的团队水平相当高,产品也特殊,这样一个产品工程师每天都要用,所以MicroSoft的“Eat Your Own Dogfood”很重要.

  • 上午最后一个的讲机器的同理心(Mechanical Sympathy),大概是通过赛车行业的例子来讲在软件开发中的一些理念,不明觉厉哪


第一场在《知名网站案例分析》专题会场听的阿里外贸团队在解决跨境网站中遇到的一些SEO及CDN的问题和解决方案,在SEO对性能优化方便他们通过Google Ajax异步兼容的方式来对系统进行优化,在页面中加入一个meta标记<meta name="fragment" content="!">, 爬虫发现页面含有这个标记会把URL变成htt://xxxx?_escaped_fragment_=, 程序根据?_escaped_fragment标记返回给爬虫快照,这个办法会形成两次请求,他们表示对现有的10%的爬虫占比可以接受, 其中提到通过Agent来判断是否为爬虫是不符合Google规范的,存在降权风险等。
对于地区差异大的网站,图片占大部分的下载资源,所以CDN的架构相当关键, 对于全部图片同步产生带宽成本大问题, 他们采用了同步主要图片(商品第一屏图片)的变通的方式提高用户体验。


豆瓣通过code平台的故事来分享了豆瓣的工程师文化,讲到工程师自发的创建code这样一个项目,慢慢的发展起来成为了豆瓣工程师每天依赖的工作平台,里面讲到一个有趣的事情这个项目并没有产品负责人,在一年的时间里没有全职的工程师投入,大多数需求呗提出来后,几天内就会有工程师主动将其实现,如果安排一个全职的负责人来负责这个code项目,负责人可能为了刷存在感,总会开发些不实用的功能,那么这个项目也许发展不下去了,哈哈哈, 干掉产品经理!!! :-P


  • 创始人的文化就是公司的文化
  • 大牛带小牛是最高效的成长方式
  • 小团队更适合杠杆率高的行业

Day 2



Day 3


  • 篱笆网主要是分享了他们如何解决数据访问层的性能优化和架构选型,同时也成就了在国内的互联网界Cassandra这样一个NoSQL产品的成功案例
  • 唯品会分享了在他们做大促销前的准备工作, 面对存在大量历史问题系统是如何做到支持5倍
  • 上午的最后一个演讲是来自新浪微博关于单元化架构的实践,通过单元化架构并行计算、数据本地化等方式来提高性能。

下午第一场是一个老外讲企业创新,这哥们后面还做了一个可穿戴计算的生态圈的介绍,中间有演示Google Glass,Facebook前端工程师Hedger Wang介绍碎片化终端整合的思考,下午场最喜欢这个演讲了,介绍了Web App和Native app的一些选择,如何更好的跨终端设计,以及Web App在跨终端的一些解决方案。 其中讲到到底是Web还是Native,Web的优势是广度的,当用户越来越多的时间花在你的app上的时候,我们应该把他带到Native上。 我觉得Web和Native都要有,呵呵,在资源不够的时候应该先Web再Native。 后面有讲到应该用Web Components的方式来解决跨终端web问题,而不是每个终端做一个相同功能的产品,通过Web Components方式来渲染适合各种终端的展现,这个不错有空要研究下。

后面几个是跨界演讲,应该算Lighting Talk 鬼脚七分享了他如何做自媒体,蔡学镛分享了他的成长经历,以及Roy历分享黑客的自我修养,这几个Lighting Talk听起来要轻松些。


本次大会的内容主要集中在大公司的大架构分享, 云计算和高并发等, 缺少Startup相关的分享,三天的大会时间有点长,整个听下来比较累,还是有不少收获的,见到了好多老朋友和认识了一些新朋友,比较喜欢的Topic有:


Friday, March 15, 2013

原文链接: http://book.douban.com/review/2043761/































“不错,他和我一样,是金秋十月出生的。不过不像冰球手,这其中倒没有奥秘。”大师谈兴正浓,接着讲道:“你若是读过点技术史,就知道1975年1月是硅谷最重要的时刻。正是那时,8800型个人电脑诞生了,成为当月《大众电子》杂志(Popular Electronics)的封面故事。不少人都看出电脑市场蕴含的巨大商机,可谁会先下海呢?如果你年纪够大,很可能已经在IBM之类的老牌公司谋到职位,再去自己创业机会成本太大;如果你年纪太小,恐怕还没有掌握必要的IT技能。因此,你的年纪必须恰到好处,才能显出英雄本色。盖茨当时正好从哈佛肄业,不过二十出头。他既懂编写软件,又是初生牛犊,于是抓住了黄金商机。



“哈哈,我生于1950年,不过我也不是没有机会参加IT界革命。我过了几年流浪的嬉皮生活,于1972年申请就读瑞德大学(Reed College)的电子工程专业。那一年乔布斯也搬了进来,我们在校园里还打过几次照面。可他比我有决断,一年后就辍学了,后来跑到硅谷闯出了一片天地。本科毕业后,我又鬼使神差地对社会学发生了兴趣,在东岸一所常青藤大学读了个博士。如此一来,个人电脑时代的所有商机都被我错过了。”







“我作实习生的时候,有一次吃饭碰到老板弗洛姆,就向他请教发家秘诀。他告诉我你们中国一句古话——塞翁失马,焉知非福。五十年代的华尔街还有点贵族风度,有点名气的律所都不愿意接“敌意收购”(hostile takeover)这类脏活。如果实在不好退却,他们就把脏活转包给弗洛姆的公司。转眼到了七十年代,金融管制放松了,信贷资金充裕了,投资者也变得气势汹汹了。这一切都推动了企业收购大潮。现在所有的律所都愿意接并购案了,不过你可以想到,只有弗洛姆的公司做得最为出色——因为他们已经积累了近二十年的从业经验。







[后记] 《非同凡响》(outliers,中文版译为“异类”,殊觉不妥)是去年年底在美国出版的一部非虚构类畅销书。据说此书的中译本在台湾出版后并未大卖,我的一位编辑朋友猜想,可能是此类图书不适合华人读者口味的缘故。我于是想到通过一位虚拟的成功学大师,来讲述书中的四个成才故事。有些读者也许已经发现,我参考的文本主要有两个:古龙的《七种武器》和金庸的《雪山飞狐》。



Tuesday, March 12, 2013


alt Buildout
(Remixed by Matt Hamilton, original from http://xkcd.com/303)

Buildout是一个基于Python的构建工具, 通过一个配置文件,可以从多个部分创建、组装并部署你的应用,即使应用包含了非Python的组件,Buildout也能够胜任. Buildout不但能够像setuptools一样自动更新或下载安装依赖包,而且还能够像virtualenv一样,构建一个封闭隔离的开发环境.



~/Projects$ mkdir buildout
~/Projects$ cd buildout


~/Projects/buildout$ wget http://downloads.buildout.org/2/bootstrap.py


~/Projects/buildout$ touch buildout.cfg


~/Projects/buildout$ python bootstrap.py
Creating directory '/Users/Eric/Projects/buildout/bin'.
Creating directory '/Users/Eric/Projects/buildout/parts'.
Creating directory '/Users/Eric/Projects/buildout/eggs'.
Creating directory '/Users/Eric/Projects/buildout/develop-eggs'.
Generated script '/Users/Eric/Projects/buildout/bin/buildout'.


  • bin目录用来存放生成的脚本文件
  • parts目录存放生成的数据,大多用不上
  • develop-eggs 存放指向开发目录的链接文件。和buildout.cfg中develop选项相关
  • eggs 是存放从网络上下载下来的egg包。这些包一般在buildout.cfg中的egg选项里定义



~/Projects/buildout$ vim buildout.cfg
# 每个buildout都要有一个parts列表,也可以为空。
# parts用来指定构建什么。如果parts中指定的段中还有parts的话,会递归构建。
parts = tools

# 每一段都要指定一个recipe, recipe包含python的代码,用来安装这一段,
# zc.recipe.egg就是把一些把下面的egg安装到eggs目录中
recipe = zc.recipe.egg
# 定义python解释器
interpreter = python
# 需要安装的egg
eggs =

执行buildout命令来构建一下, 这将会把Pyramid集成进来:

~/Projects/buildout$ bin/buildout



~/Projects/buildout$ bin/pcreate -t starter myproject


~/Projects/buildout$ vim buildout.cfg
parts =
develop = myproject

recipe = zc.recipe.egg
interpreter = python
eggs =

recipe = zc.recipe.egg
eggs = myproject


~/Projects/buildout$ bin/buildout


~/Projects/buildout$ bin/pserve myproject/development.ini
Starting server in PID 40619.
serving on


1. 固化egg的版本


extends = versions.cfg
versions = versions
show-picked-versions = true

配置中的“show-picked-versions = true “会在运行buildout的时候把所有的版本打印出来, 把它写到"versions.cfg"中就可以固化了:

Chameleon = 2.11
Mako = 0.7.3
MarkupSafe = 0.15
PasteDeploy = 1.5.0
WebOb = 1.2.3
distribute = 0.6.35
repoze.lru = 0.6
translationstring = 1.1
venusian = 1.0a7
zc.buildout = 2.0.1
zc.recipe.egg = 2.0.0a3
zope.deprecation = 4.0.2
zope.interface = 4.0.5

# Required by:
# pyramid-debugtoolbar==1.0.4
Pygments = 1.6

# Required by:
# myproject==0.0
pyramid = 1.4

# Required by:
# myproject==0.0
pyramid-debugtoolbar = 1.0.4

# Required by:
# myproject==0.0
waitress = 0.8.2

2. 使用mr.developer插件来组织大型的项目, 让开发更方便

extensions = mr.developer

3. 开发环境 VS 生产环境

我们可以创建多个配置文件, 比如把buildout.cfg作为生产环境的配置, 把develop的配置从buildout.cfg删除, 创建一个development.cfg作为开发环境的配置:

extends = buildout.cfg
develop = myproject


Tuesday, March 12, 2013

Buildout已经升级到2.0了, 刚刚升级了一下, 发现一些地方要注意.

  • 我们先要替换掉原来的bootstrap.py脚本, 下载新的2.0的bootstrap: http://downloads.buildout.org/2/bootstrap.py.

  • 新版本的buildout不再支持“buildout-versions” 和 “buildout.dumppickedversions“, 这个插件的功能已经内置了, 把show-picked-versions = true加到配置文件里面就行了.

show-picked-versions = true


Friday, November 9, 2012

pyVows, 这一个异步的BDD测试框架


def test_sum_returns_42():
    result = add_two_numbers(41, 1)

    assert result
    assert int(result)
    assert result == 42

尽管在这样一个非常简单的场景中, 我们有三个断言在这个测试中, 这样不太好, 我们想要每个测试一个断言, 所以我们可以这样:

def test_sum_returns_result():
    result = add_two_numbers(41, 1)
    assert result

def test_sum_returns_a_number():
    result = add_two_numbers(41, 1)
    assert int(result)

def test_sum_returns_42():
    result = add_two_numbers(41, 1)
    assert result == 42

除了add_two_numbers 这个函数被执行了三次, 一切OK. 当然在这么简单的测试中, 一个函数被执行多次也没关系, 但在真实的项目中, 我们应该减少调用次数, 这样我们的测试才能跑的更快。


class SumContext(Vows.Context):

    def topic(self):
        return add_two_numbers(41, 1)

    def we_get_a_result(self, topic):

    def we_get_a_number(self, topic):

    def we_get_42(self, topic):

如果没看懂没关系, 我们再来看看下面这个例子


# division_by_zero_vows.py

from pyvows import Vows, expect

# Create a Test Batch
class Divisions(Vows.Context):
    class WhenDividingANumberByZero(Vows.Context):
        def topic(self):
            return 42 / 0

        def we_get_division_by_zero_error(self, topic):

    class WhenDividingByOne(Vows.Context):
        def topic(self):
            return 42 / 1

        def we_get_the_same_number(self, topic):


 $ pyvows division_by_zero_vows.py

 Vows Results

  ✓ OK » 2 honored • 0 broken (0.000756s)

现在我们来看一个更为复杂一点的例子, 假设我们有一个水果对象模块叫the_good_things:

class Strawberry(object):
    def __init__(self):
        self.color = '#ff0000';

    def isTasty(self):
        return True

class PeeledBanana(object): pass

class Banana(object):
    def __init__(self):
        self.color = '#fff333';

    def peel(self):
        return PeeledBanana()

现在我们来写一些测试在 the_good_things_vows.py:

from pyvows import Vows, expect
from the_good_things import Strawberry, Banana, PeeledBanana

class TheGoodThings(Vows.Context):
    class AStrawberry(Vows.Context):
        def topic(self):
            return Strawberry()

        def is_red(self, topic):

        def and_tasty(self, topic):

    class ABanana(Vows.Context):
        def topic(self):
            return Banana()

        class WhenPeeled(Vows.Context):
            def topic(self, banana):
                return banana.peel()

            def returns_a_peeled_banana(self, topic):


$ pyvows the_good_things_vows.py

 Vows Results

  ✓ OK » 3 honored • 0 broken (0.000863s)



Friday, November 9, 2012


Google Chrome浏览器
Sparrow, Mac上最好的邮件客户端, 没有之一
1Password, 密码管理

Evernote, 中文叫印象笔记,用来做笔记的,很棒

Skitch, 截图工具

iPhoto, 照片管理

预览工具, 很好用, 读PDF之类的文档用它就够了
TotalFinder 这个插件是完全可以让你的 Finder 强大到爆的一个插件
The Unarchiver, 解压缩工具
Alfred, 替代Spotlight
MPlayerX, 视频播放器
AppCleaner, 删软件用的
Adium, IM客户端, 支持Gtalk, MSN等
Twitter Client for Mac
Macbo, 微博客户端
Mindnode Pro, 用来画思维导图的工具
Microsoft Word
Microsoft Excel
Microsoft PowerPoint



Transmit, FTP客户端
Pencil, 原型制作软件
Mou, Markdown可视化编辑器
Sublime Text 2, 代码编辑器
Fireworks, 图片处理
Sequel Pro, MySQL客户端


  • QQ拼音输入法
  • oh-my-zsh 我的shell环境
  • MacVim
  • Textmate 编辑器
  • Toast Titanium 光盘刻录
  • MesaSQLite SQLite 客户端
  • Magican 系统清理


Thursday, November 8, 2012

pygal, 是一个Python的SVG绘图lib, 可以很方便的用来做数据可视化, 也很容易集成到项目当中来。



>>> import pygal                                                       
>>> bar_chart = pygal.Bar()  
>>> bar_chart.add('Fibonacci', [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55])
>>> bar_chart.add('Padovan', [1, 1, 1, 2, 2, 3, 4, 5, 7, 9, 12])
>>> # 保存到文件
>>> bar_chart.render_to_file('bar_chart.svg')
>>> # 它还有个render_in_browser的方法, 直接输出到一个html文件,并在浏览器中显示
>>> bar_chart.render_in_browser()



例子 (in Pyramid Base Web Application):


from pyramid.response import Response
from pyramid.view import view_config
import pygal

def get_svg(request):
    bar_chart = pygal.Bar(width=600, height=400)
    bar_chart.add('Fibonacci', [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55])
    bar_chart.add('Padovan', [1, 1, 1, 2, 2, 3, 4, 5, 7, 9, 12])
    return Response(body=bar_chart.render(), content_type='image/svg+xml')

route config

config.add_route('svg', '/svg')

embed into html

<embed src="{{ req.route_url('svg')}}" type="image/svg+xml" width="600" height="400" />



Monday, November 5, 2012

1, 在网站中建立一个链接,并通过设备浏览器的User-Agent来判断设备是iOS还是Android还是其他。

@view_config(route_name='app', renderer='app.html')
def index(request):
    ua = request.user_agent
    if ('iPhone' in ua) or ('iPod' in ua) or ('iPad' in ua):
        # 跳到AppStore应用地址或者items-services协议地址
        return HTTPFound('itms-services://?action=download-manifest&url=http://xxx.com/app/app.plist')
    elif ('Android' in ua):
        # 跳到Android应用商店应用地址
        return HTTPFound('https://play.google.com/xxxxx')
        return {}




Monday, November 5, 2012



<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
               <string>com.xxxx.xxx (应用的id, 要和ipa文件里的一样)</string>


<a href="itms-services://?action=download-manifest&url=http://xxx.com/app/app.plist">越狱的iOS设备点此处安装最新版本</a>


Guide To Tracking Multiple Subdomains In Google Analytics

Thursday, August 9, 2012

Copied From: http://www.ericmobley.net/guide-to-tracking-multiple-subdomains-in-google-analytics/

Tracking multiple subdomains is rather easy.

Viewing your traffic for each subdomain is a little trickier. If all you
do is set up the code and do not create the profiles and filters as
described here, you will have one Google Analytics account tracking all
of your subdomains, and absolutely no way to know which subdomain to
attribute the traffic to.

The Code

The analytics code that you place on each subdomain will be the same.
See the code below.

<script type="text/javascript">

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-xxxxxxxxxxx-1']);
  _gaq.push(['_setDomainName', 'yoursite.com']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);


Notice there is only one new line of code in this example.

_gaq.push(['_setDomainName', 'yoursite.com']);

You can also get this code if, in Google Analytics, you go to Settings
-> Tracking Code -> One domain with multiple subdomains.

The Profiles

Do not create a new Google Analytics account for each subdomain. Sure,
it would technically work. There is nothing in the world to stop you.
But there is a better way.

Create one Google Analytics account, and then create a profile for each
subdomain plus another profile that tracks all subdomains collectively.
So if you have two subdomains (www and mobile) you will want three
profiles total. One for www, one for mobile, and one to track both
subdomains. This will take some time to set up, but will be well worth
it in the end.

Once you have created each profile, it’s time to apply the filters.

Filtering Subdomain Traffic In Profiles

We need to apply a filter to ensure that we track only traffic for the
profile’s designated subdomain. Go to Admin -> Profiles -> Filters ->
New Filter and refer to the screenshot below.

Google Analytics Profile

It’s that easy. Applying this filter to your profile will ensure that
this profile only tracks traffic for the specified subdomain. In this
case, mobile.yoursite.com.

Main Profile

When tracking multiple sub domains in one profile, you will not be able
to differentiate between your subdomains in your pages list in Google
Analytics unless you create a filter.

To illustrate this, go to your page list in analytics, Content -> Site
Content -> Pages. You can’t see the hostname at all! See screenshot


By default, Google Analytics does not show the hostname or subdomain in
your reports. You will not be able to see which home page the back slash
above refers to (/). It could refer to www or it could refer to mobile,
there is no way of knowing. That’s why we need a filter to apply to this

Filtering Main Profile

In your main profile for google analytics, go to create filter. Refer to
the screenshot below to see how to apply filter.


After applying this filter, the subdomain should appear in your page
list, and you will be able to differentiate between traffic for each

Wrap Up

In this guide, you should have learned how to 1) install Google
Analytics code for subdomains 2) filter traffic to ensure that profiles
track the traffic for it’s designated subdomain and 3) filter traffic in
your main profile to display the subdomain.

9 Steps to a High-Converting Landing Page

Wednesday, August 1, 2012

Copied From: http://www.onboardly.com/customer-acquisition/9-steps-to-a-high-converting-landing-page/


Handing someone you just met at a networking event a piece of scrap
paper with your details scrawled on it won’t get you too far. Doing this
is much like promoting your ill-constructed landing page. More often
than not, your landing page is a visitor’s first impression of your
product, and your best chance to convert that visitor into a customer.

You don’t need to be a rockstar designer to create a beautiful,
high-converting landing page. Follow this checklist and you will be well
on your way to a rapidly growing customer base.

1. Keep It Simple

Every bit of information about your brand or product does not need to
appear on your landing page. A landing page has one goal: lead capture.
If you bombard your users with too many quotes, pictures and text, all
you’ll end up with is a higher bounce rate. Instead, creating simple and
easy-to-digest sections will have a much more positive impact on new

Suprpod keeps the text to a minimum on their
landing page. Offering up simple graphics to explain exactly what the
platform does at each stage, visitors can quickly absorb, understand and
evaluate the startup.


2. Use Smart Graphics

Graphics on your landing page are like attention-seekers at parties.
They are the first to get noticed, and people either love them or hate
them. Using a cheesy stock image for your landing page is a total party

We recommend using a screenshot of your app or a professional photo of
your product. Whatever you use, make sure it’s authentically you. If
you’re a startup with a brand new concept, it would help to turn your
product description into a simple graphic (e.g. the three graphics in
the Suprpod example above).

The Gijit landing page has one commanding image: the
product. It makes the page clean, simple and informative.


3. Be Credible

If you’re a new startup, visitors will love the fact that you’ve been
featured by TechCrunch, Mashable, CNN – whatever. If you’ve partnered
with Amazon, Dropbox, Twitter or any other well-established brand, shout
it from the rooftops. Adding these accomplishments to your landing page
footer will make your visitors feel comfortable signing up or
purchasing. It’s an easy way to convert early!

4. Use Fewer Input Fields

Less is more in your visitors’ eyes, especially regarding how much
information they have to give. The more “required” fields you include on
your registration page, the less likely visitors are to give you
anything at all. You want to remove every obstacle possible between the
initial visit and the conversion.

Instead of asking for first name, last name, date of birth, address,
phone number, email address and mother’s maiden name, start with just an
email address. You can collect the rest after conversion. Using email
notices, drip marketing campaigns and incentives, you can collect
everything you need later. Once you have the initial lead, you have a
method of later contact.

Imperva has asked for
everything and the kitchen sink on their landing page. In reality, all
they need is a name and an email address (industry or business name
might be nice too).


On the other hand, Zipongo only requires an
email address and zip code, the minimum amount of information they need
to provide valuable deals to customers.


5. Make Registering Irresistible

It’s too easy for new visitors to bounce from your page. If you don’t
have a great call to action, they will. If your page has so much real
estate that visitors have to scroll to view it all, include more than
one call. You need to give visitors that push to commit and enter the
customer acquisition funnel.

Use actionable words such as donate, download, create, call, buy,
register, request and subscribe to encourage conversions. Here’s a trick
to test just how good your call to action really is.

A. Stand six feet back from your screen and look at it. What do you
see? What element stands out the most? It should be your CTA.

B. Sitting at a normal distance from your screen, tilt your head
sideways and slightly squint your eyes. Again, your call to action
should stand out the most.

6. Offer Something

As awesome as your landing page and brand is, offering a little
something extra for new registrations will often seal the deal.
Something simple like giving the first hundred people a discount, an
eBook or early access will maximize those conversions.

7. A/B Test

Unless you’ve done a ton of research, you won’t know for sure what font,
colors or copy lead to the most conversions, but A/B testing will tell
you. Landing page specialists like Unbounce let
you cleanly test your page. A/B testing involves creating two versions
of your web page: an A and a B.

Avoid using multivariate testing. Multivariate involves testing many
different elements at the same time (i.e. B has an alternate color
palette, different graphics and different copy). By doing this, you’ll
be unable to isolate which specific elements are most effective.
Performing simple and clean A/B testing will help you create the best
possible landing page.

For example, Manpacks A/B tested their landing page to determine what
brand messaging results in the most conversions.



8. Be Social

Creating a network through social sharing is the easiest way to get
exponential leads. That said, having a button for every social network
available is overkill. KISSmetrics gives visitors the ability to share
through Twitter, Facebook and Google+. Mashable adds LinkedIn to that
lineup. Figure out what networks your audience is using the most and
focus on those – the rest is just noise.

Using a tool like ClickToTweet allows you to
create a link that shares predetermined text via social media. The rule
of thumb is to keep the message short and sweet, especially on Twitter
where you should leave extra characters for retweets.

9. Create Excitement Through Copy

None of the above will be worth anything if your copy reads like a
children’s book. Don’t ignore your copy because it is often what
visitors will base their opinions on. Keep it short and sweet: state the
problem, your solution and a call to action. All other information is

It’s worth it to consider hiring a professional writer for your landing
page. The cost will pale in comparison to a stronger conversion rate.
Plus, writers know how to turn your 1500 word article into 140
characters. Try services like Scripted and
Elance to find top quality writers.

Optimizing your landing page will ensure that it is not your last point
of contact with potential customers. This checklist will undoubtedly
help you create a high-converting landing page in no time. When in
doubt, always test your hypothesis. Landing page development is a

10 reasons why I switched to Spine.js

Wednesday, August 1, 2012

Copied From: http://destroytoday.com/blog/reasons-for-spinejs/

In the past year, I shifted interests from the desktop to the web. I’m
really drawn to apps that can be accessed from any device with a
browser. I have a history with HTML, CSS, Flash and PHP, so I’m familiar
with the space, but only in a presentation sense—I’ve made websites, but
not web apps. I dove head-first into Rails and instantly fell in love,
but the immediate response I knew with Flash was replaced with page
loads. Because of this, I turned to Javascript.

Like the new kid at school, I didn’t know who was who in regard to
frameworks. I looked around and saw mentions of
Backbone.js everywhere, so
I assumed it was the standard. After several months, however, I realized
it’s not for me—Backbone.js lacks a clear direction of use. Every
tutorial I read used a different structure, and it almost seemed too
easy to disregard proven design patterns.

Enter Spine.js. I spent a night just reading
through its guides and examining its demo apps. Everything I saw just
looked right. That night, I wore a big smile and even had trouble
sleeping because I couldn’t wait to start using it. What made me so
excited?—these 10 things:

  1. A Clear Architecture

    Spine.js follows MVC (model-view-controller, for those who should
    take a moment to learn MVC). All the apps I’ve written follow the
    MVC architecture, so I immediately know how to structure my app
    using Spine.js. I also feel a sense of familiarity off the bat.
    There’s no question of which class does what or where each class

  2. Models are Models

    Backbone.js has models, but it’s awkward because there are also
    collections—essentially an array of models that can also query an
    API and populate itself with the results. Spine.js models are very
    similar to Rails models. A model can be instantiated to represent a
    record, but it also has class-level methods for retrieving records
    from the API. These methods return the results instead of
    populating an array, so we don’t have to contemplate where the class
    lives, as one would with collections. And because collections are
    instances, many of the examples I’ve seen treat them as singletons.
    As a result, those learning Backbone.js and following these examples
    are also learning how to write untestable code.

  3. Spine.app

    While using Backbone.js, I found myself copy/pasting code every time
    I created a new class. I missed the generators I grew accustomed to
    in Rails. With a single command, I could generate the new class
    along with its spec, based on a template—this adds years to a
    dev’s life. “Write Backbone.js generators” was on my todo list for
    weeks, but I never got around to it.

    Spine.app generates files. With a single line, I can create a class
    and its spec, just like in Rails. Hell, I can even generate a new
    app with one command.

  4. Dynamic Records

    This one is just crazy black magic, but it solves a problem I
    faced with Backbone.js. Let’s say you fetched a record in one view
    of the app. Then you fetch and update that same record in a
    different view. In Spine.js, both records will update. You don’t
    have to worry about keeping them in sync. The moment I read about
    this, a single tear rolled down my cheek.

  5. Elements Hash

    With Backbone.js, I constantly found myself manually assigning
    variables to nested elements in every view’s render method,
    repeating the same code for each element—that’s a lot of
    boilerplate. In Spine.js, there’s an ‘elements’ hash. The keys are
    selectors and the values are variable names. Just like the ‘events’
    hash in Backbone.js, all your elements are mapped—clearly and

  6. The Release Method

    In my Flash days, optimization was a key to survival. If ever I
    forgot to remove a single event listener, my app would leak memory
    like… a poorly maintained app. Because of this, every class I wrote
    included a method to nullify all references and remove all event
    listeners. Spine.js has this built in. Sold.

  7. Routing Lives in the Controller

    There is no Router class in Spine.js. This functionality is part of
    the Controller class where it belongs. In any controller, I can
    navigate to a new location and react to this new location. Other
    controllers can react to this new location as well. Now there’s no
    temptation to create a router singleton.

  8. Model Adapters

    By default, Spine.js saves models in memory, but there are two
    adapters that can be applied to any model class—Ajax and Local. By
    simply extending either of these adapters, your data can live in a
    remote database or even locally using HTML5’s local storage API. All
    this functionality is a matter of one line of code.

  9. Get a Model from its HTML Element

    This is another issue I faced with Backbone.js. I could instantiate
    a view and tie a model to it, but if I would ever need to reference
    that data without access to the view instance, I’d be out of luck.
    Spine.js provides access to an element’s model through a jQuery
    plugin. Just call the ‘data’ method on the element and you have your

  10. Logging

    Spine.js comes equipped with a nice little convenience module for
    logging. In any controller, you can call the log method and it will
    write to the console with a set prefix. You can then toggle whether
    or not to trace the logs without removing them.

In Conclusion

Now, this list is why I switched to Spine.js. Some apps might be better
suited for Backbone.js, or any other JS framework. If you’re researching
different frameworks, definitely take a look at all of them. Do your due
dilligence. Make sure whichever framework you choose has a clear example
and is free from gotchas. You don’t want to find yourself halfway
through development and questioning the framework.


By default, Spine.app generates an app with
Jasmine as its testing framework.
I much prefer Mocha.js, so I
forked Spine.app
to add
Mocha.js support. It also includes a HAML compiler and I have plans to
include SASS as well as other helpers.

Here are all my javascript
from my
recent high-dive into the language. They consist of links to articles,
libraries, and answered StackOverflow questions. Hopefully, they will
get a few of you out of a pickle.

I plan to write more about my Spine.js discoveries over the coming
months, so keep an eye out if you’re interested.

© 2009-2013 lxneng.com. All rights reserved. Powered by Pyramid

go to Top