请帮我!
我正在将多行文本文件转换为猪拉丁文。
示例:Pig的拉丁语翻译:这是一个示例。应该是:Histay siay naay xampleeay。
我需要将标点符号留在原处(大多数情况下是句子的结尾)。我还需要任何以原始字母大写字母开头,以猪拉丁字母大写字母开头的单词,以及其余的字母小写。
这是我的代码:
def main(): fileName= input('Please enter the file name: ') validate_file(fileName) newWords= convert_file(fileName) print(newWords) def validate_file(fileName): try: inputFile= open(fileName, 'r') inputFile.close() except IOError: print('File not found.') def convert_file(fileName): inputFile= open(fileName, 'r') line_string= [line.split() for line in inputFile] for line in line_string: for word in line: endString= str(word[1:]) them=endString, str(word[0:1]), 'ay' newWords="".join(them) return newWords
我的文本文件是:
This is an example. My name is Kara!
程序返回:
Please enter the file name: piglatin tester.py hisTay siay naay xample.eay yMay amenay siay ara!Kay None
我如何让他们按照他们所在的行打印?另外,我该如何处理标点符号和大写字母呢?
这是我对您的代码的修改。您应该考虑使用nltk。它具有更强大的单词标记化处理。
def main(): fileName= raw_input('Please enter the file name: ') validate_file(fileName) new_lines = convert_file(fileName) for line in new_lines: print line def validate_file(fileName): try: inputFile= open(fileName, 'r') inputFile.close() except IOError: print('File not found.') def strip_punctuation(line): punctuation = '' line = line.strip() if len(line)>0: if line[-1] in ('.','!','?'): punctuation = line[-1] line = line[:-1] return line, punctuation def convert_file(fileName): inputFile= open(fileName, 'r') converted_lines = [] for line in inputFile: line, punctuation = strip_punctuation(line) line = line.split() new_words = [] for word in line: endString= str(word[1:]) them=endString, str(word[0:1]), 'ay' new_word="".join(them) new_words.append(new_word) new_sentence = ' '.join(new_words) new_sentence = new_sentence.lower() if len(new_sentence): new_sentence = new_sentence[0].upper() + new_sentence[1:] converted_lines.append(new_sentence + punctuation) return converted_lines