大部分的特征都是单个值的,比如性别、年龄、职业、地理位置、时间、地点、分类、点击率等等;
但是还有一些是多个值的列表,比如最近点击ID列表、个人偏好的ID列表、物品分词列表等等,对于这种特征该怎么处理呢?
参考文章:https://stackoverflow.com/questions/48697799/tensorflow-feature-column-for-variable-list-of-values
以下是我自己的一个测试:
features = { 'letter': [ ['A', 'A'], ['C', 'D'], ['E', 'F'], ['G', 'A'], ['X', 'R'] ] } letter_feature = tf.feature_column.categorical_column_with_vocabulary_list( "letter", ["A", "B", "C", "D", "E", "F", "G", "X", "R"], dtype=tf.string) indicator = tf.feature_column.indicator_column(letter_feature) tensor = input_layer(features, [indicator]) print(tensor)
输出为:
tf.Tensor( [[2. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 1. 1. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 1. 1. 0. 0. 0.] [1. 0. 0. 0. 0. 0. 1. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 1. 1.]], shape=(5, 9), dtype=float32)
总结一下就是,将特征装饰为tf.feature_column.indicator_column就可以。
参考文章为:Tensorflow feature column for variable list of values,地址