-用户网络行为画像:大数据中的用户网络行为画像分析与内容推荐应用
        -前言
        -目录
        -上篇
        -第1章 用户画像概述
            -1.1 用户画像数据来源
                -1.1.1 用户属性
                -1.1.2 用户观影行为
            -1.2 用户画像特性
                -1.2.1 动态性
                -1.2.2 时空局部性
            -1.3 用户画像应用领域
                -1.3.1 搜索引擎
                -1.3.2 推荐系统
                -1.3.3 其他业务定制与优化
            -1.4 大数据给用户画像带来的机遇与挑战
        -第2章 用户画像建模
            -2.1 用户定量画像
            -2.2 用户定性画像
                -2.2.1 标签与用户定性画像
                -2.2.2 基于知识的用户定性画像分析
                -2.2.3 用户定性画像的构建
                    -1.构建领域词汇表
                    -2.确定类及类之间的结构
                    -3.定义属性
                    -4.定义实例
                    -5.定义约束公理和推理规则
                -2.2.4 定性画像知识的存储
    -   1.ModelMaker maker = ModelFactory.createModelRDBMaker(conn);
    -   2.Model testModel = maker.createModel("testDBModel");
    -   3.OntModelSpec spec = new OntModelSpec(OntModelSpec.OWL_MEM);
    -      DBModel = ModelFactory.createOntologyModel(spec, testModel);
    -   1.IDBConnection conn = null;
    -   2.Class.forName("com.mysql.jdbc.Driver").newInstance();
    -   3.String DB_URL = "jdbc:mysql://localhost:3306/testDB";
    -   4.String DB_USER = "root";
    -   5.String DB_PASS = "mvp";
    -   6.String DB_TYPE = "MySQL";
    -   7.conn = new DBConnection(DB_URL, DB_USER, DB_PASS, DB_TYPE);
    -   tempModel = maker.openModel("testDBModel" , true);
    -   InputStream inTest = FileManager.get().open(ont1);  //ont1是文件路径
    -   DBModel.read(inTest, testNamespace);  //testNamespace是本体中指定的命名空间
    -   inTest.close();
                -2.2.5 定性画像知识的推理
    -   1.List rules = Rule.rulesFromURL(path); //"file:./user/movie.rules"
    -   2.GenericRuleReasoner reasoner = new GenericRuleReasoner(rules);
    -   3.reasoner.setOWLTranslation(true);
    -   4.reasoner.setDerivationLogging(true);
    -   5.reasoner.setTransitiveClosureCaching(true);
    -   1.Model model = ModelFactory.createDefaultModel();
    -   2.model.read(path); //"file:./user/movie.owl"
    -   3.OntModel ont = ModelFactory.createOntologyModel(
    -                         OntM odelSpec.OWL_DL_MEM_ RDFS_INF, model);
    -   4.Resource configuration = ont.createResource();
    -   5.configuration.addProperty(ReasonerVocabulary.PROPruleMode, "hybrid");
    -   OntModel ontModel=ModelFactory.createOntologyModel(OntModelSpec.DAML_MEM);
    -   ModelFactory.createInfModel(getReasoner(rulePath), getOntModel(ontPath));
    -   1.Query query = QueryFactory.create(queryString);
    -   2.QueryExecution qe = QueryExecutionFactory.create(query, this.inf);
    -   3.ResultSet results = qe.execSelect();
    -   4.ResultSetFormatter.out(System.out, results, query);
    -   5.qe.close();
    -   String queryString = "PREFIX User:
    -           " + "SELECT ?user ?subject WHERE {?user User : like ?subject} ";
            -2.3 本章参考文献
        -第3章 群体用户画像分析
            -3.1 用户画像相似度
                -3.1.1 定量相似度计算
                -3.1.2 定性相似度计算
                -3.1.3 综合相似度计算
            -3.2 用户画像聚类
        -第4章 用户画像管理
            -4.1 存储机制
                -4.1.1 关系型数据库
                -4.1.2 NoSQL数据库
                -4.1.3 数据仓库
            -4.2 查询机制
            -4.3 定时更新机制
                -4.3.1 获取实时用户信息
                    -1.静态信息数据
                    -2.动态信息数据
                -4.3.2 更新触发条件
                -4.3.3 更新机制
    -   CREATE TABLE user_log(
    -       userID varchar(11),
    -       viewTime Datetime,
    -       actor varchar(11)
    -   );
    -   CREATE TABLE user_persona(
    -       userID varchar(11),
    -       actor varchar(11),
    -       totalCount int
    -   );
    -   CREATE TRIGGER trigger
    -   AFTER INSERT ON user_log
    -   FOR EACH ROW
    -   BEGIN
    -      declare c int;
    -      set c = (select totalCount from user_persona where userID=new.userID);
    -      update user_persona set totalCount =
    -                       c + 1 where userID = new.userID and actor=new.actor;
    -   END;
        -中篇
        -第5章 视频推荐概述
            -5.1 主流推荐方法的分类
                -5.1.1 协同过滤的推荐方法
                -5.1.2 基于内容的推荐方法
                -5.1.3 基于知识的推荐方法
                -5.1.4 混合推荐方法
            -5.2 推荐系统的评测方法
            -5.3 视频推荐与用户画像的逻辑关系
        -第6章 协同过滤推荐方法
            -6.1 概述
                    -1.User-based CF VS Item-based CF
                    -2.基于项目的协同过滤推荐VS基于内容的推荐
                    -3.Memory-based CF VS ModeI-based CF
            -6.2 关系矩阵及矩阵计算
                -6.2.1 U-U矩阵
                    -1.算法原理
                    -2.算法流程
                    -3.相关讨论
                -6.2.2 V-V矩阵
                    -1.算法原理
                    -2.算法流程
                -6.2.3 U-V矩阵
                    -1.SVD
                    -2.PCA
            -6.3 基于记忆的协同过滤算法
                -6.3.1 基于用户的协同过滤算法
                    -1.算法原理
                    -2.算法流程
                    -3.算法举例
                    -4.适用场景
                -6.3.2 基于物品的协同过滤算法
                    -1.算法原理
                    -2.算法流程
                    -3.算法举例
                    -4.适用场景
            -6.4 基于模型的协同过滤算法
                -6.4.1 基于隐因子模型的推荐算法
                    -1.算法原理
                    -2.算法流程
                    -3.适用场景
                -6.4.2 基于朴素贝叶斯分类的推荐算法
                    -1.算法原理
                    -2.算法流程
                    -3.适用场景
            -6.5 小结
            -6.6 本章参考文献
        -第7章 基于内容的推荐方法
            -7.1 概述
            -7.2 CB推荐中的特征向量
                -7.2.1 视频推荐中的物品画像
                -7.2.2 视频推荐中的用户画像
            -7.3 基础CB推荐算法
                    -1.算法原理
                    -2.算法流程
                    -3.适用场景
            -7.4 基于TF-IDF的CB推荐算法
                    -1.算法背景
                    -2.算法原理
                    -3.算法举例
            -7.5 基于KNN的CB推荐算法
                    -1.算法背景
                    -2.算法原理
            -7.6 基于Rocchio的CB推荐算法
                    -1.算法背景
                    -2.算法原理
            -7.7 基于决策树的CB推荐算法
                    -1.算法背景
                    -2.算法原理
            -7.8 基于线性分类的CB推荐算法
                    -1.算法背景
                    -2.算法原理
            -7.9 基于朴素贝叶斯的CB推荐算法
                    -1.算法背景
                    -2.算法原理
                    -3.算法举例
            -7.10 小结
            -7.11 本章参考文献
        -第8章 基于知识的推荐方法
            -8.1 概述
            -8.2 约束知识与约束推荐算法
                -8.2.1 约束知识示例
                -8.2.2 约束满足问题
                    -定义8-1 创建推荐任务
                    -定义8-2 推荐任务的解决
                    -定义8-3 冲突集
                    -定义8-4 诊断集
                -8.2.3 约束推荐算法流程
            -8.3 关联知识与关联推荐算法
                -8.3.1 关联规则描述
                    -定义8-5 物品与物品集合
                    -定义8-6 交易
                    -定义8-7 物品集的支持度
                    -定义8-8 物品集的最小支持度与频繁集
                    -定义8-9 关联规则
                    -定义8-10 关联规则的支持度
                    -定义8-11 关联规则的置信度
                    -定义8-12 关联规则的最小支持度和最小置信度
                    -定义8-13 强关联规则
                    -定义8-14 关联规则的提升度
                -8.3.2 关联规则挖掘
                -8.3.3 关联推荐算法流程
            -8.4 小结
            -8.5 本章参考文献
        -第9章 混合推荐方法
            -9.1 概述
            -9.2 算法设计层面的混合方法
                -9.2.1 并行式混合
                    -1.加权式
                    -2.切换式
                    -3.混杂式
                -9.2.2 整体式混合
                    -1.特征组合
                    -2.特征补充
                -9.2.3 流水线式混合
                    -1.层叠式
                    -2.级联式
                -9.2.4 典型混合应用系统
                    -1.并行式混合应用系统分析
                    -2.整体式混合应用系统分析
                    -3.流水线式混合应用系统分析
            -9.3 混合式视频推荐实例
                -9.3.1 MoRe系统概览
                -9.3.2 MoRe算法介绍
                    -1.MoRe-CF类推荐算法
                    -2.MoRe-CB类推荐算法
                -9.3.3 MoRe算法混合
                -9.3.4 MoRe实验分析
            -9.4 小结
            -9.5 本章参考文献
        -第10章 视频推荐评测
            -10.1 概述
            -10.2 视频推荐试验方法
                -10.2.1 在线评测
                    -1.评测指标
                    -2.注意事项
                -10.2.2 离线评测
                -10.2.3 用户调查
            -10.3 视频离线推荐评测指标
                -10.3.1 准确度指标
                    -1.评分准确度
                    -2.排序准确度
                    -3.分类准确度
                -10.3.2 多样性指标
                    -1.覆盖率
                    -2.多样性和新颖性
            -10.4 小结
            -10.5 本章参考文献
        -下篇
        -第11章 系统层面的快速推荐构建
            -11.1 概述
            -11.2 本章主要内容
            -11.3 系统部署
                -11.3.1 Hadoop2.2.0系统部署
                    -1.软件版本与下载
                    -2.系统Java环境配置
    -   export JAVA_HOME=/usr/java/jdk
    -   export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib/dt.jar:
    -                                          $JAVA_HOME/lib/tools.jar
    -   export PATH=$JAVA_HOME/bin:$PATH
    -   $source/etc/profile
    -   $java-version
    -   $javac-version
                    -3.maven配置
    -   # set MAVEN enviroment
    -   export M2_HOME=/usr/local/maven
    -   export M2=$M2_HOME/bin
    -   export MAVEN_OPTS="-Xms256m-Xmx512m"
    -   export PATH=$M2:$PATH
                    -4.Proto Buffers配置
    -   $tar-zxf protobuf-2.5.0.tar.gz
    -   $cd protobuf-2.5.0
    -   $./configure--prefix=/usr/local/protobuf
    -   $make check
    -   $make install
    -   # set Protobuf enviroment
    -   export PATH=/usr/local/protobuf/bin:$PATH
                    -5.Hadoop加载设置
    -   # set hadoop environment
    -   export HADOOP_HOME=/Please/set/your/hadoop/directory/
    -   export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
    -   export HADOOP_COMMON_HOME=${HADOOP_HOME}
    -   export HADOOP_HDFS_HOME=${HADOOP_HOME}
    -   export HADOOP_MAPRED_HOME=${HADOOP_HOME}
    -   export HADOOP_YARN_HOME=${HADOOP_HOME}
    -   export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    -   export HADOOP_VERSION=2.4.0
    -   export JAVA_LIBRARY_PATH=${HADOOP_HOME}/lib/native/
                    -6.Mahout设置加载
    -   # set Mahout environment
    -   export MAHOUT_HOME=/Please/set/your/Mahout/directory/
    -   export PATH=$MAHOUT_HOME/bin:$PATH
    -   export MAHOUT_CONF_DIR=$MAHOUT_HOME/src/conf
                -11.3.2 Hadoop运行时环境设置
                    -1.Hadoop配置文件说明
                    -2.配置文件目录说明
    -书名页
    -书名页
    -          fs.defaultFS
    -          hdfs://127.0.0.1:9000
    -          The name of the default file system.
    -书名页
    -书名页
    -          hadoop.tmp.dir
    -          /tmp
    -          A base for other temporary directories.
    -书名页
    -书名页
    -          io.file.buffer.size
    -          131072
    -书名页
    -书名页
    -          hadoop.proxyuser.hduser.hosts
    -          *
    -书名页
    -书名页
    -          hadoop.proxyuser.hduser.groups
    -          *
    -书名页
    -书名页
    -书名页
    -       dfs.namenode.secondary.http-address
    -       127.0.0.1:9001
    -书名页
    -书名页
    -       dfs.namenode.name.dir
    -       file:/Users/yankai/Project/hadoop/dfs/name
    -       namenode上存储hdfs名字空间元数据
    -书名页
    -书名页
    -       dfs.datanode.data.dir
    -       file:/Users/yankai/Project/dfs/data
    -       datanode上数据块的物理存储位置
    -书名页
    -书名页
    -       dfs.replication
    -       1
    -书名页
    -书名页
    -       dfs.webhdfs.enabled
    -       true
    -书名页
    -书名页
    -       dfs.datanode.max.xcievers
    -       4096
    -书名页
    -书名页
    -       yarn.nodemanager.aux-services
    -       mapreduce_shuffle
    -书名页
    -书名页
    -       yarn.nodemanager.aux-services.mapreduce.shuffle.class
    -       org.apache.hadoop.mapred.ShuffleHandler
    -书名页
    -书名页
    -       yarn.resourcemanager.address
    -       127.0.0.1:8032
    -书名页
    -书名页
    -       yarn.resourcemanager.scheduler.address
    -       127.0.0.1:8030
    -书名页
    -书名页
    -       yarn.resourcemanager.resource-tracker.address
    -       127.0.0.1:8031
    -书名页
    -书名页
    -       yarn.resourcemanager.admin.address
    -       127.0.0.1:8033
    -书名页
    -书名页
    -       yarn.resourcemanager.webapp.address
    -       127.0.0.1:8088
    -书名页
    -书名页
    -       mapreduce.framework.name
    -       yarn
    -书名页
    -书名页
    -       mapreduce.jobhistory.address
    -       127.0.0.1:10020
    -书名页
    -书名页
    -       mapreduce.jobhistory.webapp.address
    -       127.0.0.1:19888
    -书名页
                    -3.Hadoop使用测试
    -   $hadoop namenode–format
    -   $sh start-dfs.sh
    -   $sh start-yarn.sh
    -   $jps
    -   2929 ResourceManager
    -   2762 DataNode
    -   14989 Jps
    -   2649 NameNode
    -   6445
    -   3044 NodeManager
    -   $sh stop-dfs.sh
    -   $sh stop-yarn.sh
                    -4.运行基本的Hadoop示例
    -   $hdfs dfs-ls/
    -   $hdfs dfs-mkdir-p/user/test/wordcount/in
    -   $echo 'i am bad boy'>file01
    -   $echo 'hello bad boy '>file02
    -   hdfs dfs-put file0*/user/yankai/wordcount/in
    -   $hadoop jar{HADOOP_HOME}/share/hadoop/mapreduce/
    -               hadoop-mapreduce-examples-2.2.0.jar wordcount/user/
    -               test/wordcount/in/user/test/wordcount/out
    -   $hdfs-dfs-cat/user/yankai/wordcount/out/*
    -   am      1
    -   bad     2
    -   boy     2
    -   hello   1
    -   i       1
                    -5.Hadoop示例的Shell程序
    -   #!/bin/bash
    -   wordcount="/user/hadoop/demo/wordcount"
    -   hdfs dfs-rm-r${wordcount}
    -   hdfs dfs-mkdir-p${wordcount}/input
    -   rm-rf file01
    -   rm-rf file02
    -   echo"Hello World I love u">file01
    -   echo"this is your World I u">file02
    -   hdfs dfs-put file0*${wordcount}/input
    -   hdfs dfs-ls${wordcount}/input/
    -   hadoop jar${HADOOP_HOME}/share/hadoop/mapreduce/
    -                     hadoop-mapreduce-examples-2.2.0.jar
    -                     wordcount${wordcount}/input${wordcount}/output
    -   hdfs dfs-cat${wordcount}/output/*
    -   echo"hadoop wordcount end!!\n\n"
                -11.3.3 Spark与Mahout部署
                    -1.版本说明
                    -2.Mahout部署
    -   $git clone https://github.com/apache/mahout mahout
    -   $mvn-DskipTests clean instal-Dhadoop2.version=2.2.0
    -   # set Mahout environment
    -   export MAHOUT_HOME=/Users/yankai/Project/mahout
    -   export PATH=$MAHOUT_HOME/bin:$PATH
    -   export MAHOUT_CONF_DIR=$MAHOUT_HOME/src/conf
    -   !/bin/bash
    -   mahout_demo=/user/mahout/demo/item_based
    -   hdfs dfs-rm-r${mahout_demo}
    -   hdfs dfs-mkdir-p${mahout_demo}/input
    -   hdfs dfs-put demo.csv${mahout_demo}/input
    -   hdfs dfs-ls${mahout_demo}/input/
    -   mahout recommenditembased-s SIMILARITY_LOGLIKELIHOOD\
    -                   -i${mahout_demo}/input/demo.csv-o${mahout_demo}/output
    -                   --numRecommendations 25
    -   hdfs dfs-cat${mahout_demo}/output/*
    -   echo"mahout item_based end!!\n\n"
    -   1,101,5
    -   1,102,3
    -   1,103,2.5
    -   2,101,2
    -   2,102,2.5
    -   2,103,5
    -   2,104,2
    -   3,101,2.5
    -   3,104,4
    -   3,105,4.5
    -   3,107,5
    -   4,101,5
    -   4,103,3
    -   4,104,4.5
    -   4,106,4
    -   5,101,4
    -   5,102,3
    -   5,103,2
    -   5,104,4
    -   5,105,3.5
                    -3.Spark部署
    -   $sbt/sbt gen-idea
    -   SPARK_HADOOP_VERSION=1.0.1 SPARK_YARN=true ./sbt/sbt assembly
    -   ./spark-shell.sh
    -   2.scala>val data=Array(1,2,3,4,5)//产生data
    -   data:Array[Int]=Array(1,2,3,4,5)
    -   scala>val distData=sc.parallelize(data)//将data处理成RDD
    -   //显示出的类型为RDD
    -   distData:spark.RDD[Int]=spark.ParallelCollection@7a0ec850
    -   scala>distData.reduce(_+_)//在RDD上进行运算,对data里面元素进行加和
    -   12/05/10 09:36:20 INFO spark.SparkContext:Starting job...
    -   12/05/10 09:36:20 INFO spark.SparkContext:Job finished in 0.076729174 s
    -   res2:Int=15
    -   ${SPARK_HOME}/bin/run-example SparkPi
    -   #!/bin/bash
    -   mahout spark-shell <
    -           //create a new block with an additional column
    -           val blockWithBiasColumn=block.like(block.nrow,block.ncol+1)
    -           //copy data from current block into the new block
    -           blockWithBiasColumn(::,0 until block.ncol):=block
    -           //last column consists of ones
    -           blockWithBiasColumn(::,block.ncol):=1
    -           keys->blockWithBiasColumn
    -      }
    -       val betaWithBiasTerm=ols(drmXwithBiasColumn,y)
    -       goodnessOfFit(drmXwithBiasColumn,betaWithBiasTerm,y)
    -       val cachedDrmX=drmXwithBiasColumn.checkpoint()
    -       val betaWithBiasTerm=ols(cachedDrmX,y)
    -       val goodness=goodnessOfFit(cachedDrmX,betaWithBiasTerm,y)
    -       cachedDrmX.uncache()
    -       goodness
    -   }
    -   EOF
    -   #!/bin/bash
    -   hdfs dfs-rm-r/user/mahout/item
    -   hdfs dfs-mkdir-p/user/mahout/item/input
    -   hdfs dfs-put mahout-spark-item.csv/user/mahout/item/input
    -   mahout spark-itemsimilarity\
    -     --input/user/mahout/item/input/mahout-spark-item.csv\
    -     --output/user/mahout/item/output\
    -     --filter1 purchase\
    -     --filter2 view\
    -     --itemIDColumn 2\
    -     --rowIDColumn 0\
    -     --filterColumn 1
    -   u1,purchase,iphone
    -   u1,purchase,ipad
    -   u2,purchase,nexus
    -   u2,purchase,galaxy
    -   u3,purchase,surface
    -   u4,purchase,iphone
    -   u4,purchase,galaxy
    -   u1,view,iphone
    -   u1,view,ipad
    -   u1,view,nexus
    -   u1,view,galaxy
    -   u2,view,iphone
    -   u2,view,ipad
    -   u2,view,nexus
    -   u2,view,galaxy
    -   u3,view,surface
    -   u3,view,nexus
    -   u4,view,iphone
    -   u4,view,ipad
            -11.4 Mahout推荐引擎介绍
                -11.4.1 Item-based算法
                    -1.基本思想
                    -2.配置及使用
    -   1,101,5
    -   1,102,3
    -   1,103,2.5
    -   2,101,2
    -   2,102,2.5
    -   2,103,5
    -   2,104,2
    -   3,101,2.5
    -   3,104,4
    -   3,105,4.5
    -   3,107,5
    -   4,101,5
    -   4,103,3
    -   4,104,4.5
    -   4,106,4
    -   5,101,4
    -   5,102,3
    -   5,103,2
    -   5,104,4
    -   5,105,3.5
    -   $hdfs dfs-mkdir-p/user/mtest/demo/input
    -   $hdfs dfs-put demo.csv/user/mtest/demo/input
    -   $hdfs dfs-ls-l/user/mtest/demo/input
    -   $mahout recommenditembased-s SIMILARITY_LOGLIKELIHOOD–i
    -   /user/mtest/demo/input-o/user/mtest/demo/output--numRecommendations 25
    -   1    [104:2.8088317,106:2.5915816,105:2.5748677]
    -   2    [105:3.5743618,106:3.3991857]
    -   3    [103:4.336442,102:4.0915813,106:4.0915813]
    -   4    [102:3.6903737,105:3.6903737]
    -   5    [107:3.663558]
                    -3.Shell程序
    -   !/bin/bash                                   1,101,5
    -   #mahout_item.sh                              1,102,3
    -   mahout_demo=/user/mahout/demo/item_based     1,103,2.5
    -   hdfs dfs-rm-r${mahout_demo}                  2,101,2
    -   hdfs dfs-mkdir-p${mahout_demo}/input         2,102,2.5
    -                                                2,103,5
    -                                                2,104,2
    -   hdfs dfs-put demo.csv${mahout_demo}/input    3,101,2.5
    -   hdfs dfs-ls${mahout_demo}/input/             3,104,4
    -                                                3,105,4.5
    -   mahout recommenditembased-s                  3,107,5
    -               SIMILARITY_LOGLIKELIHOOD\        4,101,5
    -      -i${mahout_demo}/input/demo.csv-o         4,103,3
    -      ${mahout_demo}/output                     4,104,4.5
    -      --numRecommendations 25                   4,106,4
    -      hdfs dfs-cat${mahout_demo}/output/*       5,101,4
    -                                                5,102,3
    -   echo"mahout item_based end!!\n\n"            5,103,2
    -                                                5,104,4
    -                                                5,105,3.5
                -11.4.2 矩阵分解
                    -1.概述
                    -2.算法解释
                    -3.Mahout中的矩阵分解
    -   //init U and M with randomized value between 0.0 and 1.0 with standard Gaussian
    -   //distribution
    -   for(iter = 0; iter < numIterations;iter++)
    -   {
    -       for(user u and item i with rating R[u,i])
    -       {
    -           //dot product of feature vectors between user u and item i
    -           predicted_rating=U[u,]*M[i,]^t
    -           err=R[u,i]-predicted_rating
    -           //adjust U[u,] and M[i,]
    -           //p is the number of features
    -           for(f=0;f
                -11.4.3 ALS算法
                    -1.概述
                    -2.配置详细参数
                    -3.简单示例
    -   $mahout parallelALS --input$als_input--output$als_output
    -                       --lambda 0.1--implicitFeedback true
    -                       --alpha 0.8--numFeatures 2--numIterations 5
    -                       --numThreadsPerSolver 1--tempDir tmp
    -   $mahout recommendfactorized --input$als_input
    -                               --userFeatures$als_output/U/
    -                               --itemFeatures$als_output/M/--numRecommendations 1
    -                               --output recommendations--maxRating 1
                    -4.Shell程序
    -   #!/bin/bash
    -   ALS=/user/mahout/als
    -   hdfs dfs-rm-r${ALS}
    -   hdfs dfs-mkdir-p${ALS}/input
    -   hdfs dfs-mkdir-p${ALS}/middle
    -   hdfs dfs-mkdir-p${ALS}/temp
    -   hdfs dfs-put mahout-als.csv${ALS}/input
    -   mahout parallelALS           \
    -      --input${ALS}/input     \
    -      --output${ALS}/middle   \
    -      --lambda 0.1            \
    -      --implicitFeedback true  \
    -      --alpha 0.8         \
    -      --numFeatures 20   \
    -      --numIterations 5    \
    -      --numThreadsPerSolver 1   \
    -      --tempDir${ALS}/temp
    -   mahout recommendfactorized              \
    -      --input${ALS}/middle/userRatings   \
    -      --userFeatures${ALS}/middle/U        \
    -      --itemFeatures${ALS}/middle/M        \
    -      --numRecommendations 25            \
    -      --output${ALS}/output            \
    -      --maxRating        1
                -11.4.4 Mahout的Spark实现
                    -1.概述
                    -2.如何使用多个用户的行为
                    -3.输出
                    -4.配置详细参数
                    -5.简单示例
    -   #!/bin/bash
    -   hdfs dfs-rm-r/user/mahout/item
    -   hdfs dfs-mkdir-p/user/mahout/item/input
    -   hdfs dfs-put mahout-spark-item.csv/user/mahout/item/input
    -   mahout spark-itemsimilarity \
    -      --input/user/mahout/item/input/mahout-spark-item.csv \
    -      --output/user/mahout/item/output \
    -      --filter1 purchase \
    -      --filter2 view \
    -      --itemIDColumn 2 \
    -      --rowIDColumn 0 \
    -      --filterColumn 1 \
    -   u1,purchase,iphone
    -   u1,purchase,ipad
    -   u2,purchase,nexus
    -   u2,purchase,galaxy
    -   u3,purchase,surface
    -   u4,purchase,iphone
    -   u4,purchase,galaxy
    -   u1,view,iphone
    -   u1,view,ipad
    -   u1,view,nexus
    -   u1,view,galaxy
    -   u2,view,iphone
    -   u2,view,ipad
    -   u2,view,nexus
    -   u2,view,galaxy
    -   u3,view,surface
    -   u3,view,nexus
    -   u4,view,iphone
    -   u4,view,ipad
            -11.5 快速实战
                -11.5.1 概述
                -11.5.2 日志数据
    -   {"uid":"2dbe00e59df33c6589dcc7b2e4bf2639","mid":"20060","vid":"","typ e":"videoview","ctime":"1411619344","ip":"113.205.37.8","ForumNo":"","For umSerial":""}
    -   {"uid":"2dbe00e59df33c6589dcc7b2e4bf2639","mid":"17638","vid":"","typ
    -   e":"videoview","ctime":"1411619387","ip":"113.205.37.8","ForumNo":"1","Fo rumSerial":"7"}
    -   {"uid":"2dbe00e59df33c6589dcc7b2e4bf2639","mid":"12464","vid":"120451 4","type":"videoplay","ctime":"1411619402","ip":"113.205.37.8","ForumNo":"","ForumSerial":""}
    -   {"uid":"102fbd68fb80a17054760063d6a24bdf","mid":"368026","vid":"","ty pe":"videoview","ctime":"1411619499","ip":"27.9.98.18","ForumNo":"","Foru mSerial":""}
    -   {"uid":"102fbd68fb80a17054760063d6a24bdf","mid":"368026","vid":"94021 80","type":"videoplay","ctime":"1411619506","ip":"27.9.98.18","ForumNo":"","ForumSerial":""}
    -   {"uid":"01d98c2a7509973a65fd361aea0b685e","mid":"125642","vid":"","ty pe":"videoview","ctime":"1411619546","ip":"27.13.26.16","ForumNo":"","For umSerial":""}
    -   {"uid":"593621058be49e8d4363e8496d1fd331","mid":"367670","vid":"","ty pe":"videoview","ctime":"1411619594","ip":"27.13.188.127","ForumNo":"","F orumSerial":""}
    -   {"uid":"593621058be49e8d4363e8496d1fd331","mid":"367670","vid":"94361 42","type":"videoplay","ctime":"1411619597","ip":"27.13.188.127","ForumNo":"","ForumSerial":""}
    -   UserID,ItemId,score
    -   55847,372032,1
    -   52222,368943,2
    -   56742,369037,1
    -   52222,368943,2
    -   56743,367670,1
    -   56743,367670,1
    -   56743,367670,2
    -   56744,13823,1
    -   56744,13823,2
    -   56745,255042,1
    -   38411,367670,1
    -   56744,13823,2
    -   38411,367670,2
    -   56744,13823,2
    -   56746,367670,1
    -   56746,367670,2
    -   56744,13823,2
    -   56744,13823,2
    -   56744,13823,2
    -   56747,121939,1
    -   56743,367670,1
    -   56743,367670,2
    -   56748,6030,1
    -   56749,16172,2
    -   56750,363943,1
    -   56749,16172,2
    -   56749,16172,2
                -11.5.3 运行环境
                    -1.基于Mahout单机开发实践环境
    -书名页
    -       UTF-8
    -       0.9
    -书名页
    -书名页
    -       org.apache.mahout
    -       mahout-core
    -       ${mahout.version}
    -书名页
    -书名页
    -       org.apache.mahout
    -       mahout-integration
    -       ${mahout.version}
    -书名页
    -书名页
    -             org.mortbay.jetty
    -             jetty
    -书名页
    -书名页
    -             org.apache.cassandra
    -             cassandra-all
    -书名页
    -书名页
    -             me.prettyprint
    -             hector-core
    -书名页
    -书名页
    -书名页
                    -2.基于Hadoop的Mahout运行实践环境
    -   $wget http://archive.apache.org/dist/mahout/0.9/mahout-distribution-
    -                                                          0.9-src.tar.gz
    -   cp mahout-distribution-0.9/distribution/target/
    -   mahout-distribution-0.9.tar.gz/usr/lib/
    -   $vi/etc/profile
    -   #mahout
    -   export MAHOUT_HOME=/usr/lib/mahout-distribution-0.9
    -   export HADOOP_CONF_DIR=$HADOOP_HOME/conf
    -   export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf
    -   export PATH=$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$PATH
    -   export CLASSPATH=.:$MAHOUT_HOME/lib:$HADOOP_CONF_DIR:
    -                                          $MAHOUT_CONF_DIR:$CLASSPATH
    -   $source/etc/profile
    -   $vi mahout-distribution-0.9/bin/mahout
    -   MAHOUT_JAVA_HOME=/usr/local/jdk1.7.0
    -   $mahout–help
                -11.5.4 基于Mahout Item-based算法实践
                    -1.实验测试数据
    -   55847,372032,2
    -   55847,368943,2
    -   55847,369037,1
    -   52222,372032,1
    -   52222,368943,1
    -   52222,369037,2
    -   52222,13823,1
    -   56742,372032,2
    -   56742,13823,2
    -   56742,121939,2
    -   56742,363943,1
    -   56747,372032,2
    -   56747,369037,2
    -   56747,13823,2
    -   56747,6030,1
    -   38411,372032,1
    -   38411,368943,2
    -   38411,369037,1
    -   38411,13823,1
    -   38411,121939,2
    -   38411,6030,2
                    -2.基于Mahout单机实践
    -   import java.io.File;
    -   import java.io.IOException;
    -   import java.util.List;
    -   import org.apache.mahout.cf.taste.common.TasteException;
    -   import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
    -   import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
    -   import org.apache.mahout.cf.taste.impl.recommender.
    -                                          GenericItemBasedRecommender;
    -   import org.apache.mahout.cf.taste.impl.similarity.
    -                                          EuclideanDistanceSimilarity;
    -   import org.apache.mahout.cf.taste.impl.similarity.GenericItemSimilarity;
    -   import org.apache.mahout.cf.taste.model.DataModel;
    -   import org.apache.mahout.cf.taste.recommender.RecommendedItem;
    -   import org.apache.mahout.cf.taste.recommender.Recommender;
    -   import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
    -   public class ItemBasedTest{
    -      /**返回数据模型
    -      *@return
    -      */
    -      public static DataModel getDataModel(String fileName){
    -         DataModel dataModel=null;
    -         try{
    -             dataModel=new FileDataModel(new File(fileName));
    -         }catch (IOException e){
    -             e.printStackTrace();
    -         }
    -         return dataModel;
    -   }
    -   public static void ItemBasedCFTest(DataModel dataModel){
    -      int RECOMMENDER_NUM=3;//推荐数
    -      ItemSimilarity itemSimilarity;
    -      try{
    -          //1:计算相似度
    -          itemSimilarity=new EuclideanDistanceSimilarity(dataModel);
    -          //2:基于Item-based推荐
    -          Recommender itemBasedRecommend=new GenericItemBasedRecommender(
    -                                         dataModel,itemSimilarity);
    -          //3:为用户推荐
    -          LongPrimitiveIterator iter=dataModel.getUserIDs();
    -          while(iter.hasNext()){
    -             long uid=iter.nextLong();
    -             List ritemlist=itemBasedRecommend.
    -                                   recommend(uid,RECOMMENDER_NUM);
    -             System.out.print("uid:"+uid);
    -             for(RecommendedItem ritem:ritemlist){
    -                  System.out.printf("(%s,%f)",ritem.getItemID(),
    -                                                        ritem.getValue());
    -             }
    -             System.out.println();
    -      }
    -   }catch (TasteException e){
    -       e.printStackTrace();
    -   }
    -   }
    -   public static void main(String[] args){
    -      DataModel dataModel=ItemBasedTest.getDataModel("dataset.txt");
    -      ItemBasedTest.ItemBasedCFTest(dataModel);
    -    }
    -   }
    -   uid:38411(363943,1.333333)
    -   uid:52222(6030,1.200000)(121939,1.187156)(363943,1.000000)
    -   uid:55847(121939,1.760282)(6030,1.750000)(13823,1.714395)
    -   uid:56742(368943,2.000000)(369037,2.000000)(6030,2.000000)
    -   uid:56747(363943,2.000000)(368943,1.632321)(121939,1.625689)
                    -3.基于Hadoop环境的Mahout实践
    -   [hp@nwj5~]$hdfs dfs–copyFromLocal–f dataset.txt/guanyy/recommend/
    -   [hp@nwj5~]$hdfs dfs–cat/guanyy/recommend/dataset.txt
    -       [hp@nwj5 root]$mahout recommenditembased-s SIMILARITY_LOGLIKELIHOOD-i
    -   /guanyy/recommend/dataset.txt-o/guanyy/recommend/output--numRecommendations 3
    -   [hp@nwj5 root]$hdfs dfs-cat/guanyy/recommend/output/part-r-00000
    -   38411    [363943:1.6728837]
    -   52222    [121939:1.5098236,6030:1.4495928]
    -   55847    [13823:1.6176634,6030:1.1831632,121939:1.1497355]
    -   56742    [369037:2.0,6030:2.0,368943:2.0]
    -   56747    [368943:1.9102178,121939:1.9102178]
                -11.5.5 基于Mahout ALS算法实践
                    -1.实验测试数据
                    -2.基于Mahout单机实践
    -   import java.io.File;
    -   import java.io.IOException;
    -   import java.util.List;
    -   import org.apache.mahout.cf.taste.common.TasteException;
    -   import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
    -   import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
    -   import org.apache.mahout.cf.taste.impl.recommender.svd.ALSWRFactorizer;
    -   import org.apache.mahout.cf.taste.impl.recommender.svd.SVDRecommender;
    -   import org.apache.mahout.cf.taste.model.DataModel;
    -   import org.apache.mahout.cf.taste.recommender.RecommendedItem;
    -   import org.apache.mahout.cf.taste.recommender.Recommender;
    -   public class ALSTest{
    -   /**返回数据模型
    -    *@return
    -    */
    -   public static DataModel getDataModel(String fileName){
    -      DataModel dataModel=null;
    -      try{
    -          dataModel=new FileDataModel(new File(fileName));
    -      }catch (IOException e){
    -         e.printStackTrace();
    -      }
    -      return dataModel;
    -   }
    -   /**矩阵分解
    -    *@param dataModel
    -    */
    -   public static void ALSCFTest(DataModel dataModel){
    -      int RECOMMENDER_NUM=3;//推荐数
    -      int numFeatures=2;
    -      double lambda=0.1;
    -      int numIterations=5;
    -      boolean usesImplicitFeedback=true;
    -      double alpha=0.8;
    -      try{
    -          //1:构建ALS矩阵
    -          ALSWRFactorizer aLSWRFactorizer=new ALSWRFactorizer(
    -                       dataModel,numFeatures,lambda,numIterations,
    -                       usesImplicitFeedback,alpha);
    -          //2:基于ALS推荐
    -          Recommender alsRecommend=new SVDRecommender(dataModel,
    -                        aLSWRFactorizer);
    -          LongPrimitiveIterator iter=dataModel.getUserIDs();
    -          while(iter.hasNext()){
    -             long uid=iter.nextLong();
    -             Listritemlist = alsRecommend.recommend(uid,
    -                                            RECOMMENDER_NUM);
    -             System.out.print("uid:"+uid);
    -             for(RecommendedItem ritem:ritemlist){
    -                  System.out.printf("(%s,%f)",ritem.getItemID(),
    -                                                 ritem.getValue());
    -             }
    -             System.out.println();
    -         }
    -       }catch (TasteException e){
    -           e.printStackTrace();
    -       }
    -   }
    -   public static void main(String[] args){
    -      DataModel dataModel=ALSTest.getDataModel("dataset.txt");
    -      ALSTest.ALSCFTest(dataModel);
    -     }
    -   }
    -   uid:38411(363943,0.063672)
    -   uid:52222(121939,0.571006)(6030,0.515840)(363943,0.182902)
    -   uid:55847(13823,0.774462)(121939,0.517240)(6030,0.419183)
    -   uid:56742(368943,0.525633)(369037,0.247945)(6030,-0.195635)
    -   uid:56747(368943,0.769721)(121939,0.241241)(363943,-0.279181)
                    -3.基于Hadoop环境的Mahout实践
    -   [hp@nwj5~]$hdfs dfs–copyFromLocal–f dataset.txt/guanyy/recommend/
    -   [hp@nwj5~]$hdfs dfs–cat/guanyy/recommend/dataset.txt
    -   [hp@nwj5 root]$mahout parallelALS--input/guanyy/recommend/dataset.txt--output/guanyy/recommend/als/als_output--lambda 0.1--implicitFeedback true--alpha 0.8--numFeatures 2--numIterations 5--numThreadsPerSolver 1--tempDir/tmp
    -   [hp@nwj5 root]$mahout recommendfactorized--input/guanyy/recommend/als/als_output/userRatings--userFeatures/guanyy/recommend/als/als_output/U/--itemFeatures/guanyy/recommend/als/als_output/M/--numRecommendations 3--output/guanyy/recommend/als/recommendations--maxRating 1
    -   [hp@nwj5 root]$hdfs dfs-cat/guanyy/recommend/als/recommendations/part-m-00000
    -   38411 [363943:0.2882562]
    -   52222 [6030:0.5579467,121939:0.2638349]
    -   55847 [13823:0.445848,6030:0.4313553,121939:0.004491507]
    -   56742 [6030:0.5346928,369037:0.101436764]
    -   56747 [368943:0.7234793,121939:0.49579513,363943:0.15744102]
            -11.6 小结
            -11.7 本章参考文献
        -第12章 数据层面的分析与推荐案例
            -12.1 概述
            -12.2 本章主要内容
            -12.3 竞赛内容和意义
                -12.3.1 竞赛简介
                -12.3.2 竞赛任务和意义
            -12.4 客户-商户数据
                -12.4.1 数据描述
                    -1.客户属性文件
                    -2.客户行为记录文件
                    -3.训练数据集索引文件
                    -4.测试数据集索引文件
                -12.4.2 数据理解与分析
            -12.5 算法流程设计
                -12.5.1 特征提取
                    -1.拟提取客户属性和客户行为特征
                    -2.交互行为特征
                    -3.商户属性特征
                -12.5.2 分类器设计
                -12.5.3 算法流程总结
            -12.6 小结
            -12.7 本章参考文献
        -反侵权盗版声明
Copyright & copy 7dtime.com 2014-2018 all right reserved,powered by Gitbook该文件修订时间: 2018-09-02 18:24:02

results matching ""

    No results matching ""