Elasticsearch 笔记

2016年9月26日星期一

最近把项目的 ES 升级到了 2.4.0, 并优化了一下搜索结果, 做个笔记。

安装

如果没有安装 JDK,请安装 JDK

sudo add-apt-repository ppa:webupd8team/java -y
sudo apt-get update
sudo apt-get install oracle-java8-installer

安装 ES

aria2c https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.4.0/elasticsearch-2.4.0.tar.gz
tar zxvf elasticsearch-2.4.0.tar.gz
sudo mv elasticsearch-2.4.0 /usr/local/
# 修改一下相关配置 (path.data, path.logs等)
vim /usr/local/elasticsearch-2.4.0/config/elasticsearch.yml
/usr/local/elasticsearch/bin/elasticsearch

用 supervisor 来管理 ES, 创建配置文件 /etc/supervisor/conf.d/es.conf

[program:elasticsearch]
command=/usr/local/elasticsearch-2.4.0/bin/elasticsearch
directory=/usr/local/elasticsearch-2.4.0
user=lxneng
autostart=true
autorestart=true
redirect_stderr=true

reload supervisor config and check status

sudo supervisorctl reload
sudo supervisorctl status

check

$ http localhost:9200
HTTP/1.1 200 OK
Content-Length: 311
Content-Type: application/json; charset=UTF-8

{
    "cluster_name": "es001",
    "name": "Weapon X",
    "tagline": "You Know, for Search",
    "version": {
        "build_hash": "ce9f0c7394dee074091dd1bc4e9469251181fc55",
        "build_snapshot": false,
        "build_timestamp": "2016-08-29T09:14:17Z",
        "lucene_version": "5.5.2",
        "number": "2.4.0"
    }
}

done.

安装中文分词器ik

ES默认的中文分词器不是太好,比如上海这个词,会分成两个词来搜索,
而我们期待的可能是作为一个整体的词进行搜索。

这里安装的是ik:elasticsearch-analysis-ik

  1. 前往下载页面 https://github.com/medcl/elasticsearch-analysis-ik/releases 下载编译好的 zip 包,
  2. 解压到 /usr/local/elasticsearch-2.4.0/plugins/ik/,
  3. 重启 ES sudo supervisorctl restart elasticsearch

查看效果

$ http localhost:9200/posts/_analyze analyzer=ik_smart text=上海大学生

HTTP/1.1 200 OK
Content-Length: 177
Content-Type: application/json; charset=UTF-8

{
    "tokens": [
        {
            "end_offset": 2,
            "position": 0,
            "start_offset": 0,
            "token": "上海",
            "type": "CN_WORD"
        },
        {
            "end_offset": 5,
            "position": 1,
            "start_offset": 2,
            "token": "大学生",
            "type": "CN_WORD"
        }
    ]
}

Rails 项目中配置 ik 分词

Gemfile

# elasticsearch
gem "elasticsearch", git: "git://github.com/elasticsearch/elasticsearch-ruby.git"
gem "elasticsearch-model", git: "git://github.com/elasticsearch/elasticsearch-rails.git"
gem "elasticsearch-rails", git: "git://github.com/elasticsearch/elasticsearch-rails.git"

配置 model , 在 CURD 的时候同步到 ES

class Post < ActiveRecord::Base
  # index_name "posts-#{Rails.env}"
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks
  # ...

end

重新导入现有的数据

  • 创建 Rakefile, vi {Rails.root}/lib/tasks/elasticsearch.rake
require 'elasticsearch/rails/tasks/import'
  • 对某个字段用上 ik 分词器, 在 model 中配置
settings index: { number_of_shards: 1, number_of_replicas: 0 } do
  mapping do
   indexes :title, type: 'string', analyzer: 'ik_smart'
   indexes :body, type: 'string', analyzer: 'ik_smart'
  end
end
  • 重新导入
bundle exec rake environment elasticsearch:import:model CLASS='Post' BATCH=500  FORCE=y
  • 检查 mapping 是否更新, http localhost:9200/subjects/_mapping

  • done.

使用 function_score 来优化搜索结果

2.4 中这个 feature 真是太赞了, 我们可以写一段 groovy 脚本来修改 score 达到优化搜索结果,比如把最新的高分的匹配排在前面。

def self.search(query)
  __elasticsearch__.search(
    {query: {
      function_score: {
        functions: [
          { script_score: {
            script: "_score * log10(max(doc['like_count'].value, 1)) + ((doc['created_at'].value/1000) - 1412092800)/86400.0"
            }
          }],
        query: {
          match: { title: query }
        },
        boost_mode: "replace"
      }
    }})
end

[TIL] Cloudera Manager 监控数据的存储

2016年9月7日星期三

cm 监控数据默认是存储在 /var/lib/ 目录下的,为了避免系统盘空间不够的问题,可以修改 cm 的监控数据配置

Service Monitor 数据存储的配置

Service Monitor存储了时间序列和健康数据,Impla查询的元数据,Yarn应用的元数据。默认情况下,数据时存储在 /var/lib/cloudera-service-monitor/ 目录下,你也可以修改Service Monitor Storage Directory 配置 firehose.storage.base.directory

Host Monitor 数据存储的配置

Host Monitor存储了时间序列和健康数据。默认情况下,数据存储在 /var/lib/cloudera-host-monitor/ 目录下,你也可以修改 Host Monitor Storage Directory 配置。

Reference:

[TIR] HyperLogLogs in Redis

2016年8月17日星期三

A hyper-what-now?

A HyperLogLog is a probabilistic data structure used to count unique values — or as it’s referred to in mathematics: calculating the cardinality of a set.

These values can be anything: for example, IP addresses for the visitors of a website, search terms, or email addresses.

Counting unique values with exact precision requires an amount of memory proportional to the number of unique values. The reason for this is that there is no way of determining if a value has already been seen other than by comparing it to the previously seen values.

Since memory is a limited resource, doing this becomes problematic when working with large sets of values.

More

[TIL] 在 Hive 中把带分区的文本格式的表转换成 ORC 格式

2016年8月9日星期二

在我们 Data Pipeline 中有一个步骤我们需要对带分区的文本格式的表转换成 ORC 格式并进行
SNAPPY 压缩,放到 airflow 中 T+1 处理.

比如我们有一张 access_log_txt 外部表

CREATE EXTERNAL TABLE access_log_txt (
time string,app_id string,app_version string, ...more fields)
PARTITIONED BY (dt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;

有一张 access_log_orc 的表

CREATE TABLE access_log_orc (
time string,app_id string,app_version string, ...more fields)
PARTITIONED BY (dt string)
STORED AS ORC tblproperties ("orc.compress" = "SNAPPY");

如果数据表未分区,直接 insert into xxx select * from yyy

insert into access_log_orc select * from access_log_txt where foo=bar;

但是有分区的时候, 分区字段会包含在 select * from yyy 中,
造成和目标表字段数不一致的情况

hive> insert into access_log_orc PARTITION(dt='2016-08-09') select * from
access_log_txt where dt='2016-08-09';
FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into target
table because column number/types are different ''2016-08-09'': Table
insclause-0 has 62 columns, but query has 63 columns.

指定 select 字段列表的话,字段太多太累了,找到一种把分区字段从结果集排除的方法

hive > set hive.support.quoted.identifiers=none;
hive > insert into access_log_orc PARTITION(dt='2016-08-09') select `(dt)?+.+` from
access_log_txt where dt='2016-08-09';

Python Web开发中常用的第三方库

2013年11月28日星期四

Python Web开发中常用的第三方库

TL;DR

经常有朋友问,如果用Python来做Web开发,该选用什么框架?用Pyramid开发Web该选用怎样的组合等问题?在这里我将介绍一些Python Web开发中常用的第三方库。基本适用于Django以外的Web框架(Pyramid, Flask, Tornado, Web.py, Bottle等).

ORM

  • SQLAlchemy, 在ORM方面,首选SQLAlchemy,没有之一!
    支持SQLite, PostgreSQL, MySQL, Oracle, MS-SQL, Firebird, Sybase等主流关系数据库系统
    支持的Python环境有Python2、Python3,PyPy以及Jython。
    主要的特性请移步 Key Features of SQLAlchemy
    推荐和数据库迁移工具Alemic搭配使用

  • MongoEngine, 如果你用MongoDB,推荐MongoEngine.

Template Engine

在模板引擎方便选择也是比较多, 有ChameleonJinja2Mako等可供选择,用过ChameleonJinja2,性能都非常好.

Form Engine

Cache Engine & Session Store

  • Beaker 缓存和Session管理首选Beaker, 没有之一! 可以搭配文件、dbm、memcached、内存、数据库、NoSQL等作为存储后端. 如果你用Pyramid作为Web框架,那么可以直接使用pyramid_beaker.

Others

环境构建

任务队列

  • Celery (芹菜)一个分布式异步任务队列, 很强大!
  • RQ 这是一个轻量级的任务队列,基于Redis, 可以尝试一下。

WebServer

工具

  • Fabric, 可以通过它完成自动化部署和常规的运维等工作。《Fabric-让部署变得简单》_PPT
  • Supervisor 一个强大的进程管理工具,用来管理各种服务(比如Gunicorn、Celery等),服务挂掉时 Supervisor 会帮自动重启服务。

导出报表数据

  • Tablib,这个挺好用,支持导出Excel, JSON, YAML, HTML, TSV, CSV格式数据, 我创建了一个Pyramid插件可以集成到Pyramid项目中使用 pyramid_tablib
  • 导出PDF有reportlabPyPDF2

第三方身份验证

  • velruse, 支持各大网站的身份验证, 国内部分我已经加入了WeiboDoubanQQTaobaoRenren,并merge到主版本库中。欢迎使用!

Helper

To Be Continued...

升级PostgreSQL 9.2 -> 9.3

2013年11月14日星期四

PostgreSQL发布9.3了, brew upgrade postgresql 升级到9.3, 竟然启动不起来, 查看日志发现原来9.2的数据格式不兼容,需要迁移一下数据, 碰到这个问题的同学可以看一下 :-)

错误日志, 数据不兼容

/usr/local(master ✔) tail -f /usr/local/var/postgres/server.log
FATAL:  database files are incompatible with server
DETAIL:  The data directory was initialized by PostgreSQL version 9.2, which is not compatible with this version 9.3.1.

解决办法

PostgreSQL提供了一个升级迁移脚本 pg_upgrade, 用来迁移数据

pg_upgrade -b oldbindir -B newbindir -d olddatadir -D newdatadir [option...]

1. 新建一个PostgreSQL9.3的数据目录

/usr/local/var(master ✔) mv postgres postgres9.2
/usr/local/var(master ✔) initdb /usr/local/var/postgres -E utf8

2. 迁移数据到新目录中

/usr/local/var(master ✔) pg_upgrade \
-b /usr/local/Cellar/postgresql/9.2.4/bin/ \
-B /usr/local/Cellar/postgresql/9.3.1/bin/ \
-d /usr/local/var/postgres9.2 \
-D /usr/local/var/postgres \
-v

最后迁移完成打印下面的信息就代表迁移成功了

...
Creating script to analyze new cluster                      ok
Creating script to delete old cluster                       ok

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    analyze_new_cluster.sh

Running this script will delete the old cluster's data files:
    delete_old_cluster.sh

3. 启动PostgreSQL9.3

查看版本和数据

/usr/local/var(master ✔) run_postgresql
server starting
/usr/local/var(master ✔) psql postgres
psql (9.3.1)
Type "help" for help.

postgres=# \l

4. 删除老版本和数据

删除数据和刚刚执行pg_upgrade产生的两个脚本

/usr/local/var(master ✔) rm -rf analyze_new_cluster.sh delete_old_cluster.sh postgres9.2

卸载PostgreSQL9.2.4

brew cleanup postgresql

搞定!

QCon上海2013大会流水账

2013年11月4日星期一

QCon2013Shanghai

Day 1


上午是来自Twitter,LinkedIn,Github等大公司的四场英文主题演讲,演讲内容也比较泛,英文不好也太听明白,借了个同声传译的耳机,翻译质量也很一般很多术语翻得不准,听得费劲, 后来就听原声了。

  • 第一场是有来自Twitter的Raffi做的《Twitter面向服务的架构之路》, 介绍了Twitter这样一个高速变革高速发展的系统中,维持高并发而采取的一系列解决方案,以及管理系统复杂性所采取的一些设计理念. 其中讲到他们的RPC框架Finagle,有高并发需求的同学可以研究一下, 他们的Timeline cache也是用Redis在做。

  • 第二个主题演讲是来自Linkedin, 数据产品化,作为一个全球最大的职业类SNS,介绍他们如何通过数据进行产品化的思路,并展示了一些相关算法模型。 数据沉淀到一定规模后其实都应该考虑数据产品话,推荐系统是提高转化率的一个好方式。

  • 第三个主题演讲是来自Github的分享: 干掉产品经理, 大多数公司都会设置一个产品主管或者一堆产品经理来决定产品要有哪些功能特性, 但是,有一些企业正在抛开产品经理, 完全让开发者来决定应该实现哪些功能.
    当然Github的团队水平相当高,产品也特殊,这样一个产品工程师每天都要用,所以MicroSoft的“Eat Your Own Dogfood”很重要.

  • 上午最后一个的讲机器的同理心(Mechanical Sympathy),大概是通过赛车行业的例子来讲在软件开发中的一些理念,不明觉厉哪

下午是专题演讲

第一场在《知名网站案例分析》专题会场听的阿里外贸团队在解决跨境网站中遇到的一些SEO及CDN的问题和解决方案,在SEO对性能优化方便他们通过Google Ajax异步兼容的方式来对系统进行优化,在页面中加入一个meta标记<meta name="fragment" content="!">, 爬虫发现页面含有这个标记会把URL变成htt://xxxx?_escaped_fragment_=, 程序根据?_escaped_fragment标记返回给爬虫快照,这个办法会形成两次请求,他们表示对现有的10%的爬虫占比可以接受, 其中提到通过Agent来判断是否为爬虫是不符合Google规范的,存在降权风险等。
对于地区差异大的网站,图片占大部分的下载资源,所以CDN的架构相当关键, 对于全部图片同步产生带宽成本大问题, 他们采用了同步主要图片(商品第一屏图片)的变通的方式提高用户体验。

后来跑去听《团队文化》专题了,这个专题主要是讲工程师文化以及技术管理中的一些探讨。有来自Github,游戏公司、豆瓣以及座谈讨论会。

豆瓣通过code平台的故事来分享了豆瓣的工程师文化,讲到工程师自发的创建code这样一个项目,慢慢的发展起来成为了豆瓣工程师每天依赖的工作平台,里面讲到一个有趣的事情这个项目并没有产品负责人,在一年的时间里没有全职的工程师投入,大多数需求呗提出来后,几天内就会有工程师主动将其实现,如果安排一个全职的负责人来负责这个code项目,负责人可能为了刷存在感,总会开发些不实用的功能,那么这个项目也许发展不下去了,哈哈哈, 干掉产品经理!!! :-P

在座谈会中有一些不错的观点

  • 创始人的文化就是公司的文化
  • 大牛带小牛是最高效的成长方式
  • 小团队更适合杠杆率高的行业

Day 2


上午我主要是在《推荐系统》专题会场听推荐系统相关的分享,迟到了第一场只听了后半部分,第一场是来自百度的,也是这个专题我觉得讲的比较好的。
主要介绍百度在推荐系统上的实践,有相关推荐、个性化推荐、tag浏览等,通过用户建模、item建模、关联、个性化推荐等策略。根据应用需求和数据特点不断调整算法策略。

下午在《移动应用案例分析》专题会场听了豌豆荚在Android方便的技术研究以及搜狐新闻客户端后端架构的演进和Push系统,讲他们随着发展步伐如何做技术选型,技术架构等。
后面也没有特别感兴趣的主题加上又特别困,就回家了

Day 3


上午在《扩展性、可用性和高性能》专题会场听了篱笆网的技术演进、唯品会如何在很短时间内实现支持5倍流量的系统以及新浪微博分享的基于单元化架构的高性能服务实践。

  • 篱笆网主要是分享了他们如何解决数据访问层的性能优化和架构选型,同时也成就了在国内的互联网界Cassandra这样一个NoSQL产品的成功案例
  • 唯品会分享了在他们做大促销前的准备工作, 面对存在大量历史问题系统是如何做到支持5倍
  • 上午的最后一个演讲是来自新浪微博关于单元化架构的实践,通过单元化架构并行计算、数据本地化等方式来提高性能。

下午第一场是一个老外讲企业创新,这哥们后面还做了一个可穿戴计算的生态圈的介绍,中间有演示Google Glass,Facebook前端工程师Hedger Wang介绍碎片化终端整合的思考,下午场最喜欢这个演讲了,介绍了Web App和Native app的一些选择,如何更好的跨终端设计,以及Web App在跨终端的一些解决方案。 其中讲到到底是Web还是Native,Web的优势是广度的,当用户越来越多的时间花在你的app上的时候,我们应该把他带到Native上。 我觉得Web和Native都要有,呵呵,在资源不够的时候应该先Web再Native。 后面有讲到应该用Web Components的方式来解决跨终端web问题,而不是每个终端做一个相同功能的产品,通过Web Components方式来渲染适合各种终端的展现,这个不错有空要研究下。

后面几个是跨界演讲,应该算Lighting Talk 鬼脚七分享了他如何做自媒体,蔡学镛分享了他的成长经历,以及Roy历分享黑客的自我修养,这几个Lighting Talk听起来要轻松些。

总结


本次大会的内容主要集中在大公司的大架构分享, 云计算和高并发等, 缺少Startup相关的分享,三天的大会时间有点长,整个听下来比较累,还是有不少收获的,见到了好多老朋友和认识了一些新朋友,比较喜欢的Topic有:

用Buildout来构建Python项目

2013年3月12日星期二

什么是Buildout

alt Buildout
(Remixed by Matt Hamilton, original from http://xkcd.com/303)

Buildout是一个基于Python的构建工具, 通过一个配置文件,可以从多个部分创建、组装并部署你的应用,即使应用包含了非Python的组件,Buildout也能够胜任. Buildout不但能够像setuptools一样自动更新或下载安装依赖包,而且还能够像virtualenv一样,构建一个封闭隔离的开发环境.

初始化Buildout

首先我们新建一个目录来共享Buildout配置和文件:

~/Projects$ mkdir buildout
~/Projects$ cd buildout

下载一个2.0的bootstrap.py脚本:

~/Projects/buildout$ wget http://downloads.buildout.org/2/bootstrap.py

然后创建一个Buildout的配置文件:

~/Projects/buildout$ touch buildout.cfg

运行bootstrap.py来生成Buildout相关的文件和目录:

~/Projects/buildout$ python bootstrap.py
Creating directory '/Users/Eric/Projects/buildout/bin'.
Creating directory '/Users/Eric/Projects/buildout/parts'.
Creating directory '/Users/Eric/Projects/buildout/eggs'.
Creating directory '/Users/Eric/Projects/buildout/develop-eggs'.
Generated script '/Users/Eric/Projects/buildout/bin/buildout'.

从上面可以看出,创建了目录bin,parts,eggs,develop-eggs,在bin目录下生成了buildout脚本:

  • bin目录用来存放生成的脚本文件
  • parts目录存放生成的数据,大多用不上
  • develop-eggs 存放指向开发目录的链接文件。和buildout.cfg中develop选项相关
  • eggs 是存放从网络上下载下来的egg包。这些包一般在buildout.cfg中的egg选项里定义

把Python和Pyramid集成进来

配置Buildout

~/Projects/buildout$ vim buildout.cfg
[buildout]
# 每个buildout都要有一个parts列表,也可以为空。
# parts用来指定构建什么。如果parts中指定的段中还有parts的话,会递归构建。
parts = tools

[tools]
# 每一段都要指定一个recipe, recipe包含python的代码,用来安装这一段,
# zc.recipe.egg就是把一些把下面的egg安装到eggs目录中
recipe = zc.recipe.egg
# 定义python解释器
interpreter = python
# 需要安装的egg
eggs =
    pyramid

执行buildout命令来构建一下, 这将会把Pyramid集成进来:

~/Projects/buildout$ bin/buildout

用buildout来构建项目

现在可以创建Pyramid应用了:

~/Projects/buildout$ bin/pcreate -t starter myproject

配置一下Buildout:

~/Projects/buildout$ vim buildout.cfg
[buildout]
parts =
    tools
    apps
develop = myproject

[tools]
recipe = zc.recipe.egg
interpreter = python
eggs =
    pyramid

[apps]
recipe = zc.recipe.egg
eggs = myproject

再次运行一下buildout:

~/Projects/buildout$ bin/buildout

现在可以再buildout的环境中启动myproject了:

~/Projects/buildout$ bin/pserve myproject/development.ini
Starting server in PID 40619.
serving on http://0.0.0.0:6543

最佳实践/Tips

1. 固化egg的版本

把所有的版本信息写到[versions]里面:

extends = versions.cfg
versions = versions
show-picked-versions = true

配置中的“show-picked-versions = true “会在运行buildout的时候把所有的版本打印出来, 把它写到"versions.cfg"中就可以固化了:

[versions]
Chameleon = 2.11
Mako = 0.7.3
MarkupSafe = 0.15
PasteDeploy = 1.5.0
WebOb = 1.2.3
distribute = 0.6.35
repoze.lru = 0.6
translationstring = 1.1
venusian = 1.0a7
zc.buildout = 2.0.1
zc.recipe.egg = 2.0.0a3
zope.deprecation = 4.0.2
zope.interface = 4.0.5

# Required by:
# pyramid-debugtoolbar==1.0.4
Pygments = 1.6

# Required by:
# myproject==0.0
pyramid = 1.4

# Required by:
# myproject==0.0
pyramid-debugtoolbar = 1.0.4

# Required by:
# myproject==0.0
waitress = 0.8.2

2. 使用mr.developer插件来组织大型的项目, 让开发更方便

[buildout]
...
extensions = mr.developer
…

3. 开发环境 VS 生产环境

我们可以创建多个配置文件, 比如把buildout.cfg作为生产环境的配置, 把develop的配置从buildout.cfg删除, 创建一个development.cfg作为开发环境的配置:

[buildout]
extends = buildout.cfg
develop = myproject

升级Buildout到2.0版本

2013年3月12日星期二

Buildout已经升级到2.0了, 刚刚升级了一下, 发现一些地方要注意.

  • 我们先要替换掉原来的bootstrap.py脚本, 下载新的2.0的bootstrap: http://downloads.buildout.org/2/bootstrap.py.

  • 新版本的buildout不再支持“buildout-versions” 和 “buildout.dumppickedversions“, 这个插件的功能已经内置了, 把show-picked-versions = true加到配置文件里面就行了.

[buildout]
...
show-picked-versions = true
...

推荐一个Python的异步的BDD框架-pyVows

2012年11月9日星期五

pyVows, 这一个异步的BDD测试框架

想象我们正在测试一个加法函数:

def test_sum_returns_42():
    result = add_two_numbers(41, 1)

    assert result
    assert int(result)
    assert result == 42

尽管在这样一个非常简单的场景中, 我们有三个断言在这个测试中, 这样不太好, 我们想要每个测试一个断言, 所以我们可以这样:

def test_sum_returns_result():
    result = add_two_numbers(41, 1)
    assert result

def test_sum_returns_a_number():
    result = add_two_numbers(41, 1)
    assert int(result)

def test_sum_returns_42():
    result = add_two_numbers(41, 1)
    assert result == 42

除了add_two_numbers 这个函数被执行了三次, 一切OK. 当然在这么简单的测试中, 一个函数被执行多次也没关系, 但在真实的项目中, 我们应该减少调用次数, 这样我们的测试才能跑的更快。

我们可以用pyVows做如下的改进:

class SumContext(Vows.Context):

    def topic(self):
        return add_two_numbers(41, 1)

    def we_get_a_result(self, topic):
        expect(topic).Not.to_be_null()

    def we_get_a_number(self, topic):
        expect(topic).to_be_numeric()

    def we_get_42(self, topic):
        expect(topic).to_equal(42)

如果没看懂没关系, 我们再来看看下面这个例子

我们来做除零测试:

# division_by_zero_vows.py

from pyvows import Vows, expect

# Create a Test Batch
@Vows.batch
class Divisions(Vows.Context):
    class WhenDividingANumberByZero(Vows.Context):
        def topic(self):
            return 42 / 0

        def we_get_division_by_zero_error(self, topic):
            expect(topic).to_be_an_error_like(ZeroDivisionError)

    class WhenDividingByOne(Vows.Context):
        def topic(self):
            return 42 / 1

        def we_get_the_same_number(self, topic):
            expect(topic).to_equal(42)

我们来执行一下:

 $ pyvows division_by_zero_vows.py

 ============
 Vows Results
 ============

  ✓ OK » 2 honored • 0 broken (0.000756s)

现在我们来看一个更为复杂一点的例子, 假设我们有一个水果对象模块叫the_good_things:

class Strawberry(object):
    def __init__(self):
        self.color = '#ff0000';

    def isTasty(self):
        return True

class PeeledBanana(object): pass

class Banana(object):
    def __init__(self):
        self.color = '#fff333';

    def peel(self):
        return PeeledBanana()

现在我们来写一些测试在 the_good_things_vows.py:

from pyvows import Vows, expect
from the_good_things import Strawberry, Banana, PeeledBanana

@Vows.batch
class TheGoodThings(Vows.Context):
    class AStrawberry(Vows.Context):
        def topic(self):
            return Strawberry()

        def is_red(self, topic):
            expect(topic.color).to_equal('#ff0000')

        def and_tasty(self, topic):
            expect(topic.isTasty()).to_be_true()

    class ABanana(Vows.Context):
        def topic(self):
            return Banana()

        class WhenPeeled(Vows.Context):
            def topic(self, banana):
                return banana.peel()

            def returns_a_peeled_banana(self, topic):
                expect(topic).to_be_instance_of(PeeledBanana)

我们来运行一下这个测试:

$ pyvows the_good_things_vows.py

 ============
 Vows Results
 ============

  ✓ OK » 3 honored • 0 broken (0.000863s)

更多特性和使用方法请阅读官方文档http://pyvows.org/

推荐一个简单好用的SVG绘图库pygal

2012年11月8日星期四

pygal, 是一个Python的SVG绘图lib, 可以很方便的用来做数据可视化, 也很容易集成到项目当中来。

先来看个例子:

看看代码就这么几行

```python
>>> import pygal                                                       
>>> bar_chart = pygal.Bar()  
>>> bar_chart.add('Fibonacci', [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55])
>>> bar_chart.add('Padovan', [1, 1, 1, 2, 2, 3, 4, 5, 7, 9, 12])
>>> # 保存到文件
>>> bar_chart.render_to_file('bar_chart.svg')
>>> # 它还有个render_in_browser的方法, 直接输出到一个html文件,并在浏览器中显示
>>> bar_chart.render_in_browser()
file:///var/folders/47/zl40dfr57mddjn20xvwtz67m0000gn/T/tmpU9mNa7.html

集成到项目中

我们可以把图形svg内容输出embed到网页上就可以了

例子 (in Pyramid Base Web Application):

view

from pyramid.response import Response
from pyramid.view import view_config
import pygal

@view_config(route_name='svg')
def get_svg(request):
    bar_chart = pygal.Bar(width=600, height=400)
    bar_chart.add('Fibonacci', [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55])
    bar_chart.add('Padovan', [1, 1, 1, 2, 2, 3, 4, 5, 7, 9, 12])
    return Response(body=bar_chart.render(), content_type='image/svg+xml')

route config

config.add_route('svg', '/svg')

embed into html

<embed src="{{ req.route_url('svg')}}" type="image/svg+xml" width="600" height="400" />

这样就可以动态的输出svg数据图形到页面上了

通过二维码一键安装iOS或Android应用

2012年11月5日星期一

1, 在网站中建立一个链接,并通过设备浏览器的User-Agent来判断设备是iOS还是Android还是其他。

@view_config(route_name='app', renderer='app.html')
def index(request):
    ua = request.user_agent
    if ('iPhone' in ua) or ('iPod' in ua) or ('iPad' in ua):
        # 跳到AppStore应用地址或者items-services协议地址
        return HTTPFound('itms-services://?action=download-manifest&url=http://xxx.com/app/app.plist')
    elif ('Android' in ua):
        # 跳到Android应用商店应用地址
        return HTTPFound('https://play.google.com/xxxxx')
    else:
        return {}

2,用上面建立的链接地址做一个二维码

这样通过二维码扫一扫就可以下载iOS或者Android应用了。

用items-service协议通过网站发布iOS应用

2012年11月5日星期一

苹果允许用itms-services协议来直接在iPhone/iPad上安装应用程序,我们可以直接生成该协议需要的相关文件,这样app在还没发到AppStore之前可以通过这种方式来安装。前提是设备要是越狱的。

app.plist文件内容

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
   <key>items</key>
   <array>
       <dict>
           <key>assets</key>
           <array>
               <dict>
                   <key>kind</key>
                   <string>software-package</string>
                   <key>url</key>
                   <string>http://xxx.com/.../xxx.ipa(ipa文件的url地址)</string>
               </dict>
               <dict>
                   <key>kind</key>
                   <string>display-image</string>
                   <key>needs-shine</key>
                   <true/>
                   <key>url</key>
                   <string>应用icon地址</string>
               </dict>
           <dict>
                   <key>kind</key>
                   <string>full-size-image</string>
                   <key>needs-shine</key>
                   <true/>
                   <key>url</key>
                   <string>应用大icon地址</string>
               </dict>
           </array><key>metadata</key>
           <dict>
               <key>bundle-identifier</key>
               <string>com.xxxx.xxx (应用的id, 要和ipa文件里的一样)</string>
               <key>bundle-version</key>
               <string>1.0.0</string>
               <key>kind</key>
               <string>software</string>
               <key>subtitle</key>
               <string>应用的名称</string>
               <key>title</key>
               <string>应用的名称</string>
           </dict>
       </dict>
   </array>
</dict>
</plist>

建立一个html页面

<a href="itms-services://?action=download-manifest&url=http://xxx.com/app/app.plist">越狱的iOS设备点此处安装最新版本</a>

用浏览器访问这个页面并点击就可以安装了

9 Steps to a High-Converting Landing Page

2012年8月1日星期三

Copied From: http://www.onboardly.com/customer-acquisition/9-steps-to-a-high-converting-landing-page/

image

Handing someone you just met at a networking event a piece of scrap
paper with your details scrawled on it won’t get you too far. Doing this
is much like promoting your ill-constructed landing page. More often
than not, your landing page is a visitor’s first impression of your
product, and your best chance to convert that visitor into a customer.

You don’t need to be a rockstar designer to create a beautiful,
high-converting landing page. Follow this checklist and you will be well
on your way to a rapidly growing customer base.

1. Keep It Simple

Every bit of information about your brand or product does not need to
appear on your landing page. A landing page has one goal: lead capture.
If you bombard your users with too many quotes, pictures and text, all
you’ll end up with is a higher bounce rate. Instead, creating simple and
easy-to-digest sections will have a much more positive impact on new
visitors.

Suprpod keeps the text to a minimum on their
landing page. Offering up simple graphics to explain exactly what the
platform does at each stage, visitors can quickly absorb, understand and
evaluate the startup.

image

2. Use Smart Graphics

Graphics on your landing page are like attention-seekers at parties.
They are the first to get noticed, and people either love them or hate
them. Using a cheesy stock image for your landing page is a total party
foul.

We recommend using a screenshot of your app or a professional photo of
your product. Whatever you use, make sure it’s authentically you. If
you’re a startup with a brand new concept, it would help to turn your
product description into a simple graphic (e.g. the three graphics in
the Suprpod example above).

The Gijit landing page has one commanding image: the
product. It makes the page clean, simple and informative.

image

3. Be Credible

If you’re a new startup, visitors will love the fact that you’ve been
featured by TechCrunch, Mashable, CNN – whatever. If you’ve partnered
with Amazon, Dropbox, Twitter or any other well-established brand, shout
it from the rooftops. Adding these accomplishments to your landing page
footer will make your visitors feel comfortable signing up or
purchasing. It’s an easy way to convert early!

4. Use Fewer Input Fields

Less is more in your visitors’ eyes, especially regarding how much
information they have to give. The more “required” fields you include on
your registration page, the less likely visitors are to give you
anything at all. You want to remove every obstacle possible between the
initial visit and the conversion.

Instead of asking for first name, last name, date of birth, address,
phone number, email address and mother’s maiden name, start with just an
email address. You can collect the rest after conversion. Using email
notices, drip marketing campaigns and incentives, you can collect
everything you need later. Once you have the initial lead, you have a
method of later contact.

Imperva has asked for
everything and the kitchen sink on their landing page. In reality, all
they need is a name and an email address (industry or business name
might be nice too).

image

On the other hand, Zipongo only requires an
email address and zip code, the minimum amount of information they need
to provide valuable deals to customers.

image

5. Make Registering Irresistible

It’s too easy for new visitors to bounce from your page. If you don’t
have a great call to action, they will. If your page has so much real
estate that visitors have to scroll to view it all, include more than
one call. You need to give visitors that push to commit and enter the
customer acquisition funnel.

Use actionable words such as donate, download, create, call, buy,
register, request and subscribe to encourage conversions. Here’s a trick
to test just how good your call to action really is.

A. Stand six feet back from your screen and look at it. What do you
see? What element stands out the most? It should be your CTA.

B. Sitting at a normal distance from your screen, tilt your head
sideways and slightly squint your eyes. Again, your call to action
should stand out the most.

6. Offer Something

As awesome as your landing page and brand is, offering a little
something extra for new registrations will often seal the deal.
Something simple like giving the first hundred people a discount, an
eBook or early access will maximize those conversions.

7. A/B Test

Unless you’ve done a ton of research, you won’t know for sure what font,
colors or copy lead to the most conversions, but A/B testing will tell
you. Landing page specialists like Unbounce let
you cleanly test your page. A/B testing involves creating two versions
of your web page: an A and a B.

Avoid using multivariate testing. Multivariate involves testing many
different elements at the same time (i.e. B has an alternate color
palette, different graphics and different copy). By doing this, you’ll
be unable to isolate which specific elements are most effective.
Performing simple and clean A/B testing will help you create the best
possible landing page.

For example, Manpacks A/B tested their landing page to determine what
brand messaging results in the most conversions.

image

image

8. Be Social

Creating a network through social sharing is the easiest way to get
exponential leads. That said, having a button for every social network
available is overkill. KISSmetrics gives visitors the ability to share
through Twitter, Facebook and Google+. Mashable adds LinkedIn to that
lineup. Figure out what networks your audience is using the most and
focus on those – the rest is just noise.

Using a tool like ClickToTweet allows you to
create a link that shares predetermined text via social media. The rule
of thumb is to keep the message short and sweet, especially on Twitter
where you should leave extra characters for retweets.

9. Create Excitement Through Copy

None of the above will be worth anything if your copy reads like a
children’s book. Don’t ignore your copy because it is often what
visitors will base their opinions on. Keep it short and sweet: state the
problem, your solution and a call to action. All other information is
secondary.

It’s worth it to consider hiring a professional writer for your landing
page. The cost will pale in comparison to a stronger conversion rate.
Plus, writers know how to turn your 1500 word article into 140
characters. Try services like Scripted and
Elance to find top quality writers.

Optimizing your landing page will ensure that it is not your last point
of contact with potential customers. This checklist will undoubtedly
help you create a high-converting landing page in no time. When in
doubt, always test your hypothesis. Landing page development is a
science!

在Linux系统中怎样不删除重要的文件

2012年2月12日星期日

Linux.conf.au 2012活动中介绍了一个很给力的工具safe-rm,
它重新封装了一下/bin/rm, 对Linux系统管理员很有用的,
用来保护一些重要的文件.

1,安装

apt-get install safe-rm

2,这时系统的一些重要目录就不会被删除

$ rm -rf /usr
Skipping /usr

3,通过配置/etc/safe-rm.conf~/.safe-rm 添加你的需要保护的路径或文件

Hidden tips 001

2011年11月22日星期二

hidden tips

1, install ruby-1.9.3 on Mac OSX Lion via rvm

 rvm install 1.9.3 --with-gcc=clang

2, package name with underscore in buildout, should be replace underscore to DASH, for example pyramid_jinja2 in buildout.

 [versions]
 pyramid-jinja2 = 1.2

3, install proxychains on Mac OSX Lion via homebrew
download my Proxychains Formula, and run brew install proxychains

4, setting pyramid_debugtoolbar.
If the request’s REMOTE_ADDR is not 127.0.0.1, u should add config debugtoolbar.hosts in your .ini file, for example:

debugtoolbar.hosts = 127.0.0.1 192.168.0.116

Upgrade postgresql-8.4 to postgresql-9.1

2011年11月22日星期二

install postgresql 9.1 on ubuntu via apt-get

1, back up your databases

 ~ pg_dumpall > outputfile

2, add postgresql apt repository

~ sudo add-apt-repository ppa:pitti/postgresql

3, remove postgresql-8.4

~ sudo apt-get remove postgresql-8.4

4, update apt source index

~ sudo apt-get update

5, install postgresql-9.1

~ sudo apt-get install postgresql-9.1

6, create new user for postgres

~ sudo -u postgres sh
[sudo] password for eric: 
$ createuser -P eric
Enter password for new role: 
Enter it again: 
Shall the new role be a superuser? (y/n) y
$ exit

7, restore your data from backup

~ psql -d postgres -f outputfile

Done!~

Mac OS X 下无密钥方式连接基于L2TP协议的VPN

2011年11月7日星期一

需要连接一个L2TP协议的vpn, 填好信息竟然报错“IPSec 共享密钥”丢失。请验证您的设置并尝试重新连接。 但是这个vpn不需要IPSec 共享密钥啊, google了一把发现需要打补丁来绕过它。

/etc/ppp目录下新建一个文件options, 写入下面的内容

plugin L2TP.ppp
l2tpnoipsec

就可以无需密钥连接了,最后别忘了把高级设置里面"通过VPN连接发送所有流量"钩上。

In Pyramid-based website production environments, should generate a 404 NotFound Error instead of Server Error `URLDecodeError`.

2011年10月30日星期日

a unknown url
http://lxneng.com/sms/ests%C3%CD%E1%EF2.asp always request my
Pyramid-based website server, it will be raise a URLDecodeError.

Traceback:

Traceback (most recent call last):
File "/root/env/lib/python2.6/site-packages/repoze.tm2-1.0b2-py2.6.egg/repoze/tm/__init__.py", line 24, in __call__
    result = self.application(environ, save_status_and_headers)
File "/var/www/lxneng/src/lxneng/__init__.py", line 27, in __call__
    return self.application(environ, start_response)
File "/root/env/lib/python2.6/site-packages/pyramid/router.py", line 176, in __call__
    response = self.handle_request(request)
File "/root/env/lib/python2.6/site-packages/pyramid/tweens.py", line 17, in excview_tween
    response = handler(request)
File "/root/env/lib/python2.6/site-packages/pyramid/router.py", line 116, in handle_request
    tdict = traverser(request)
File "/root/env/lib/python2.6/site-packages/pyramid/traversal.py", line 610, in __call__
    vpath_tuple = traversal_path(vpath)
File "/root/env/lib/python2.6/site-packages/repoze/lru/__init__.py", line 96, in lru_cached
    val = f(*arg)
File "/root/env/lib/python2.6/site-packages/pyramid/traversal.py", line 486, in traversal_path
    raise URLDecodeError(e.encoding, e.object, e.start, e.end, e.reason)
URLDecodeError: 'utf8' codec can't decode bytes in position 4-5: invalid data

so this should generate a 404 instead of a 500 internal server error.

@view_config(context='pyramid.exceptions.URLDecodeError', renderer='404.html')
@view_config(context='pyramid.exceptions.NotFound', renderer='404.html')
def error_view(context, request):
    return {}

© 2009-2013 lxneng.com. All rights reserved. Powered by Pyramid

go to Top