转换TensorFlow教程以使用我自己的数据

小编典典

转换TensorFlow教程以使用我自己的数据

python

这是我最后一个问题的后续，从Pandas数据帧转换为TensorFlow张量对象

我现在处于下一步，需要更多帮助。我正在尝试替换此行代码

batch = mnist.train.next_batch(100)

替换我自己的数据。我找到了这个答案：TensorFlow教程batch_xs，batch_ys =
mnist.train.next_batch（100）的next_batch来自哪里？但我不明白：

1）为什么.next_batch（）在我的张量上不起作用。我创建不正确吗

2）如何实现在.next_batch（）问题答案中给出的伪代码

我目前有两个张量对象，一个具有希望用于训练模型的参数（dataVar_tensor），另一个具有正确的结果（depth_tensor）。我显然需要保持它们之间的关系，以使用正确的参数保持正确的响应。

请您花费一些时间来帮助我了解发生了什么并替换此行代码？

非常感谢

阅读 153

2020-12-20

共1个答案

小编典典

我删除了不相关的内容，以保留格式和缩进。希望现在应该清楚了。以下代码以N行为单位读取CSV文件（N在顶部的常量中指定）。每行包含一个日期（第一个单元格），然后是一个浮点数列表（480个单元格）和一个热向量（3个单元格）。然后，代码在读取它们时简单地打印这些日期，浮点数和一键矢量的批处理。通常，打印它们的地方就是您实际运行模型并代替占位符变量输入它们的地方。

请记住，这里将每一行都读取为一个字符串，然后将该行中的特定单元格转换为浮点数，这仅仅是因为第一个单元格更容易以字符串形式读取。如果所有数据都是数字，则只需将默认值设置为float
/ int而不是’a’，然后删除将字符串转换为float的代码即可。否则不需要！

我发表了一些评论，以澄清它在做什么。让我知道是否有不清楚的地方。

import tensorflow as tf

fileName = 'YOUR_FILE.csv'

try_epochs = 1
batch_size = 3

TD = 1 # this is my date-label for each row, for internal pruposes
TS = 480 # this is the list of features, 480 in this case
TL = 3 # this is one-hot vector of 3 representing the label

# set defaults to something (TF requires defaults for the number of cells you are going to read)
rDefaults = [['a'] for row in range((TD+TS+TL))]

# function that reads the input file, line-by-line
def read_from_csv(filename_queue):
    reader = tf.TextLineReader(skip_header_lines=False) # i have no header file
    _, csv_row = reader.read(filename_queue) # read one line
    data = tf.decode_csv(csv_row, record_defaults=rDefaults) # use defaults for this line (in case of missing data)
    dateLbl = tf.slice(data, [0], [TD]) # first cell is my 'date-label' for internal pruposes
    features = tf.string_to_number(tf.slice(data, [TD], [TS]), tf.float32) # cells 2-480 is the list of features
    label = tf.string_to_number(tf.slice(data, [TD+TS], [TL]), tf.float32) # the remainin 3 cells is the list for one-hot label
    return dateLbl, features, label

# function that packs each read line into batches of specified size
def input_pipeline(fName, batch_size, num_epochs=None):
    filename_queue = tf.train.string_input_producer(
        [fName],
        num_epochs=num_epochs,
        shuffle=True)  # this refers to multiple files, not line items within files
    dateLbl, features, label = read_from_csv(filename_queue)
    min_after_dequeue = 10000 # min of where to start loading into memory
    capacity = min_after_dequeue + 3 * batch_size # max of how much to load into memory
    # this packs the above lines into a batch of size you specify:
    dateLbl_batch, feature_batch, label_batch = tf.train.shuffle_batch(
        [dateLbl, features, label], 
        batch_size=batch_size,
        capacity=capacity,
        min_after_dequeue=min_after_dequeue)
    return dateLbl_batch, feature_batch, label_batch

# these are the date label, features, and label:
dateLbl, features, labels = input_pipeline(fileName, batch_size, try_epochs)

with tf.Session() as sess:

    gInit = tf.global_variables_initializer().run()
    lInit = tf.local_variables_initializer().run()

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    try:
        while not coord.should_stop():
            # load date-label, features, and label:
            dateLbl_batch, feature_batch, label_batch = sess.run([dateLbl, features, labels])

            print(dateLbl_batch);
            print(feature_batch);
            print(label_batch);
            print('----------');

    except tf.errors.OutOfRangeError:
        print("Done looping through the file")

    finally:
        coord.request_stop()

    coord.join(threads)

2020-12-20